Humans or machines? Looking to the future of metadata generation

15 October 2009

With more and more information hitting the web every day, there's a growing need for high-quality metadata to support effective search and retrieval. But what's the best way to generate this metadata? Thanks to funding from JISC, we're working with Intute to find out through the 'Value for money in automatic metadata generation' (ViM) project.

Humans or machines? Looking to the future of metadata generation

Metadata can enrich online information by clarifying it, increasing its accessibility and defining its context. So, the quality of metadata is vital to information users' ability to assess search results and select the most relevant.

But creating metadata manually is expensive – so, as our demand for it increases, we need to consider new ways to produce it.

The ideal test bed

To investigate the options thoroughly, the ViM project aims to gain a clear understanding of the search and retrieval needs of Higher Education. It will assess the 'value to user' of metadata in relation to the 'cost of creation', and establish the optimum point for value for money in metadata generation.

Ultimately, this will make the online provision of learning and research resources more cost-effective.

Mimas is uniquely placed to carry out this work. In delivering Intute, we already hold valuable information on the time and cost involved in manual metadata creation. As a leading national service for the discovery of academic internet-based information, and a respected tool for developing internet research skills, we've been exploring the cost-reduction issues of metadata generation for Intute for several years.

Also, Intute's current database of internet resources is the ideal test bed for the project – with more than 120,000 records, growing at a rate of 1,000 each month.

What we've done so far

We've been keeping a close eye on developments in automatic metadata generation, to learn more about its potential to improve operational efficiency. We've carried out two reviews of iVia's virtual library tools and a survey of current metadata reports.

Also, we've been working with the National Centre for Text Mining (NaCTeM) and using Autonomy IDOL software in the Intute Repository Search (IRS) project, which aims to create a content-related search facility for the UK.

On the Enhanced Tagging for Discovery (EnTag) Project, we tested tag recommender functionality in the context of politics data and are now extending this to other subjects in the PERTAINs Project. We've also carried out internal studies to define resource needs across different subject areas – and as part of this, we now have systems for collecting data on the time spent on metadata creation.

What we aim to do with the ViM project

Working in partnership with the University of Nottingham library as well as the Universities of Bristol and Oxford we will:

  • conduct user focus groups to identify differences in experience associated with metadata variables
  • comparatively test automatic metadata generation tools (in a service environment)
  • analyse the cost of different approaches to metadata creation (human and automatic)

Aims and objectives

We've defined the aims and objectives of the ViM Project as follows:

Aim 1: Better understand the information search and retrieval needs of students in Higher Education, to identify opportunities to increase the effectiveness of metadata

  • Objective 1: To identify essential and desirable metadata fields for Intute resources based on user requirements analysis (for undergraduates and postgraduates)
  • Objective 2: To identify and understand disciplinary differences and tailor metadata requirements to disciplines
  • Objective 3: In line with new understanding, make Intute resources easier to access and use
  • Objective 4: Share findings with the wider community, especially in the context of institutional repositories

Aim 2: Improve the efficiency of metadata generation processes

  • Objective 5: Increase value for money for services which create metadata, to enable more to be done – or give an efficiency saving
  • Objective 6: Exploit technological opportunities, to speed up development and delivery of services
  • Objective 7: Share lessons learnt, so they can be applied to other information services, especially within Mimas and the University of Nottingham

Reaching conclusions

Ultimately, we hope to answer this question:

"What is the optimum mix of human and automatic metadata generation?"

The project is due for completion at the end of September 2010, and we're confident that by then we can reach the right conclusions – because of our experience to date in metadata management, and the expertise we have at Mimas and especially within our Intute team.

Contact us

If you'd like to find out more about this story – or if you have any comments or suggestions – please contact us or use our feedback form.

Have you got a newsworthy item about Mimas or our portfolio that you think we should publish on this website? If so, please get in touch and we'd be happy to discuss it with you.

Related information

More about Intute

Funded by

Intute's ViM project is funded by JISC.

Mimas contacts

Caroline Williams
Intute Executive Director
Deputy Director of Mimas

T: +44 (0)161 275 0587
E: caroline.williams@manchester.ac.uk

Lisa Charnock
Intute Communications Officer

T: +44 (0)161 275 0620
E: lisa.charnock@manchester.ac.uk

External contacts

Debra Hiom
Intute Social Sciences Manager
ViM Project Manager

T: +44 (0)117 331 4381
E: d.hiom@bristol.ac.uk

Related links

Mimas, powering knowledge | Exhibitor | Online Information 2009, 01-03 December 2009, Olympia Grand Hall, London, UK
We exhibited at this year's Online Information event in London on 01–03 December 2009.