Meaning based search unlocks hidden collections


Meaning-based search offers a new approach to search and discovery.

Together with our partners at JISC Collections, the British Library and Autonomy, we’re developing services that use meaning-based search to unlock significant research material, including previously inaccessible digitised editions of more than 65,000 books from the British Library’s 19th century collection.

Unlocking valuable collections

As we discussed back in 2010 , working with increasing levels of unstructured and unconnected data and information can be problematic. Traditionally, structured metadata has provided the foundation for search, but when that metadata is missing, or when the data is stored in separate collections, new approaches need to be found.

When we began working with JISC Collections to develop a new search for a vast collection of historic books and journals, we turned to meaning-based search and Autonomy IDOL (Intelligent Data Operating Layer) for the answer.

Mimas’ Vic Lyte explains further:

“UK Universities are uniquely placed to re-energise the UK economy by using their vast data and knowledge assets to drive innovation. Information is at the core of this, however it is not the volume of data and its growth that is the key problem – the real issue is the ability to understand and process 100% of information, structured and unstructured, in real time.

We’re using Autonomy IDOL to develop systems and explore techniques that offer ways to surface potentially hidden research material, especially content that lacks metadata. If we’re going to make this content available for researchers and educators, we need to adopt a semantic search strategy.”

The resulting services, JISC Historic Books and JISC Journal Archives are giving users the opportunity to cross search materials from major research collections, including 600 journals in the sciences, social sciences, humanities, medicine, law, engineering and the original page images, full text where available, and illustrations of 360,000 books published between 1475 and 1914. Using our pioneering new search interface, researchers can search across thousands of Early English, 18th and 19th Century texts, and dig deeper into the content to make connections between the materials and find undiscovered historical or thematic relationships across the documents. 65,000 books from the British Library’s 19th century collection are also searchable for the first time, meaning that more researchers will be able to access and use this valuable resource.

A meaning-based approach to search and discovery

Our challenge has always been to develop a more meaningful search experience, and through our ongoing partnership with Autonomy, we’re deploying IDOL technology to give researchers new ways of discovering related materials that traditional keyword searching wouldn’t find. With JISC Historic Books, Autonomy IDOL forms an understanding of the unstructured historical documents and begins to recognize relationships between the information. What this means is that, rather than searching simply by a specific keyword or phrase that could have a number of definitions or interpretations, our interface aims to understand relationships between documents and information and recognize the meaning behind the search query. Moving beyond standard keyword searching to meaning-based searching will give our users results that are based on context and allow linking to other pertinent documents.

Here’s what this approach makes possible:

  • cluster search results around related conceptual themes
  • full-text indexing of documents and associated materials
  • text-mining of full-text documents
  • dynamic clustering and serendipitous browsing
  • visualisation approaches to search results

This approach offers significant opportunities for researchers. As well as users being able to semantically search across a vast range of archives and manuscripts, image collections and digitised books, the conceptual clustering capability of text, video and speech provided by Autonomy IDOL means intelligent tools can be developed to support qualitative analysis on a large scale.

Our work so far

We have a long standing relationship with Autonomy and, in a first for UK higher education, we have successfully developed and deployed meaning-based computing via Autonomy IDOL into a number of key resources over the last four years.

Our initial developments explored how this powerful technology could provide researchers with a way to search and access the rich information stored in university institutional repositories across the UK. Our response was the Institutional Repository Search service, a platform that uses Autonomy IDOL to pull together the disjointed content and provide full semantic searching across 160 UK repositories.

Our work with the JISC Historic Books and the JISC Journal Archives services takes this research a step further. This is the largest deployment of its type within UK higher education, providing semantic search, concept and pattern-matching capability for full-text content and visual media.

Working with Mimas

Our work so far has revealed the possibilities of meaning-based search for higher education, and we see further potential for this technology. As we take our development forward, we’re interested to find new partners for collaboration. Here’s some of what we can offer:

  • Consultation and advice
  • Proof of Concept and bespoke deployments of meaning-based computing solutions using Autonomy IDOL
  • Application development, in-house or with external partners
  • An understanding of the complex needs of academic users of all levels
  • A range of fully supported deployments options from managed in-house to full Software as a Service (SaaS) Autonomy IDOL capability

Contact us

If you’d like to find out more about this story, or if you have any comments or suggestions please get in touch