One of the challenges facing information managers today is the need to inter-relate different sources and types of information, whether it is an internet search across a range of resources with different formats, data structures and descriptions standards or an e-commerce system that needs to exchange data between proprietary applications. Understanding the structure and architecture of the data allows this to occur, and metadata is the means by which this happens. Using metadata to record data about information sources allows an initial assessment of compatibility and provides an avenue for merging information or for exchanging information between systems. In other words the concept of ‘interoperability’ has become a major theme for information managers and, ultimately, users. Using metadata to manage information resources is now an established part of the work of diverse groups of professionals, from web managers and librarians to IT managers and systems designers.
As well as those responsible for specifying, designing and setting up systems, the professionals operating systems increasingly need to know about metadata. They need to manage metadata. Information professionals are being asked to take on the responsibility for ensuring consistency of information so that it is accessible and can be retrieved easily. Metadata for Information Management and Retrieval1 describes some of the options available for managing metadata and illustrates this with examples from the library, information and records management domains. It describes the concepts behind metadata and complements Metadata Applications and Management2 with descriptions of metadata use in different sectors by experts in the field.
There has been some debate about the difference between creating metadata and cataloguing information resources. Many of the principles of cataloguing apply more generally to metadata. For instance, cataloguing rules ensure consistency. Standards for data encoding and the use of authority lists also improve the quality and consistency of indexing. The skills of librarians and information scientists are therefore directly relevant to metadata development, implementation and management.
Many of the metadata standards that emerged during the late 1990s have stabilised and are being adopted in libraries, museums, archives, government websites and corporate intranets. Some software suppliers are building metadata fields into the records structure and in some areas they depend on it for the management of information in data repositories (e.g. databases, library catalogues and rights management systems). The Dublin Core Metadata Element Set3 and other metadata schemes are now widely used for corporate intranets, local government geospatial information systems and electronic records management systems.
If metadata is to be adopted successfully, it is vital to have a corps of information workers who are capable of applying it, managers who understand its significance and specialists capable of extending existing metadata standards to new application areas. The library and information profession has many of the skills required to manage metadata schemes and to apply metadata to digital objects such as electronic documents, digitised images, sound recordings and data repositories.
Why metadata is important
The development of cataloguing over two millennia has provided a set of tools for describing published information. This has been drawn on by the web community. Correspondingly, the growth of the internet has focused public attention on the importance of information retrieval and management and has stimulated the development of tools to improve retrieval performance. Having a clear understanding of what metadata is and how it works also provides a means of managing information resources more effectively. In answering the question, ‘Why is metadata important?’ several arguments emerge:
- Metadata enhances retrieval performance. Metadata can improve retrieval, establishing a context for individual descriptors. For instance, the word ‘Green’ in the Creator or Author field indicates the name of an individual, whereas ‘green’ in the title of a document may be a subject retrieval term. Appropriate metadata tags around the different data elements allow search engines to seek information in a more discriminating way. The presence of a subject field (metadata element) can be used as a prompt for entering key words, or for use of controlled indexing terms to describe the document. Knowing how metadata works provides information managers with a mechanism for indexing documents more precisely and this can enhance retrieval performance.
- Metadata provides a way of managing electronic digital objects. Many software packages use metadata as a way of managing electronic resources, whether it is for records retention schedules or for digital preservation. Content management systems (CMS), for instance, use metadata to track when a digital object was last updated or verified, who was responsible for its creation and whether any special access conditions apply. Unlike paper records or printed publications, there is not a long tradition of managing digital objects, and metadata provides a focus for the establishment of standard practices. It is the metadata associated with digital objects that provides a common format for management and manipulation of resources.
- Metadata can help to determine the authenticity of data. Metadata provides an audit trail to establish ownership and authenticity of a digital object such as an electronic document or image. The history of what has happened to a document or record in its life becomes an important part of this. Metadata provides evidence about the provenance of a resource and this underpins good governance, transparency and accountability. This is increasingly important for the many organisations that depend on electronic records rather paper files. It becomes necessary to demonstrate that the electronic document has been kept securely, is a complete record, and has not been tampered with. Metadata provides evidence for the integrity of an electronic document. This is particularly important in a legal context where electronic documents or physical records may be used as evidence in legal proceedings.
- Metadata is the key to interoperability. Interoperability depends on exchange of metadata between systems to establish the nature of the data being transferred and how it should be handled. E-commerce is one example of interoperating, where several different proprietary systems may need to exchange data. Access to metadata helps to establish the protocols of exchange of data and ways in which it might be exploited. Another example is in the development of e-government initiatives around the world. The UK experience is described in Liane Broadley’s article in this issue of Update.
- Metadata is the future. An increasing number of software and systems suppliers are working to metadata standards or are creating their own standards for metadata. The growth of e-commerce depends on metadata. Many industries are developing their own infrastructure to allow software from different suppliers to work together and exchange data. Metadata generated by content management systems is seeing a renaissance on the internet after its initial use for subject description. Metadata standards are being used by portal software and to provide access to the information content of websites.
These five arguments for the importance of metadata provide a way of assessing the purposes to which metadata is put.
Metadata has been around since the first library catalogues were established more than 2,000 years ago. The term ‘metadata’ first appeared in the 1960s but became established in the database community in the 1970s. A useful definition sees metadata as ‘the means by which the structure and behaviour of data is recorded, controlled, and published across an organization’.4 This is a useful definition because it does not specify electronic resources and it deals with the key aspects of metadata’s purposes.
Five purposes of metadata
Metadata can be analysed in terms of the five general purposes to which it is put. This viewpoint model is a reflection of current development in metadata and in particular the growing importance of e-commerce. This model makes a distinction between the purposes of metadata (i.e. the ways in which it is used) and the intrinsic properties of metadata elements. In doing this it becomes clear that each data element can be used in a variety of ways and fulfils more than one purpose. The five purposes of metadata are:
Resource description. This is particularly important in organisations that need to describe their information assets. For example, under the Freedom of Information Act in the UK, public authorities have to produce publication schemes, which identify all their publications and intended publications. In the US, federal agencies have to make information available via the Government Information Locator Service (GILS).5 These both depend on adequate descriptions of the data. Information asset registers compiled by public authorities — and increasingly by the corporate sector — also require descriptions of information repositories and resources.
Information retrieval. In the academic sector a lot of effort has been put into resource discovery on the internet. Some institutions and agencies have devised subject-based gateways or portals that in effect catalogue relevant, high-quality web resources in a particular subject area. This provides users with a route to authoritative sources of information. The cataloguing data usually includes a description of the resource, controlled indexing terms and classification headings. This is a metadata resource and may also ‘mine’ or ‘extract’ metadata directly from target websites or electronic resources.
Management of information resources. The growth of electronic document and records management (EDRM) systems has resulted from the emerging requirements of larger organisations to manage both paper and electronic documentation effectively. EDRM systems need access to ‘cataloguing’ information about individual documents in order to manage record lifecycles. Examples include authorship and (not necessarily the same thing) ownership, provenance of the document (for legal purposes) and date of creation and modification. These and other data elements provide a basis for managing the documentation cost-effectively and consistently. CMS are also used to manage data resources, including material on intranet and websites.
Documenting ownership and authenticity of digital resources. Metadata provides a way of declaring the ownership of the intellectual content and layout of a document. It also provides a record of authenticity by providing an audit trail so that, for instance, an electronic document or a digital image will stand up in court. One of the preconditions for widespread acceptance of electronic documents as original evidence is electronic systems becoming the preferred medium for long-term storage of documents.
Interoperability. Metadata acts as an enabler of information and data transfer between systems and as such is a key component in interoperability. In order to allow software applications that have been designed independently to pass data between them, a common framework for describing the data being transferred is needed so that each ‘knows’ how to handle that data in the most appropriate manner. This may be at the level of distinguishing between different languages, or understanding different data formats.
Interoperability is one of the enablers for e-commerce. When a piece of data is passed from one system to another, the accompanying (or embedded) metadata allows the new application to make sense of the data and to use it in the appropriate fashion. This can be seen in the book trade, for instance, where many suppliers using different software packages need to be able to exchange data reliably. The widely adopted ONIX6 standard allows different participants in the chain from author to reader to exchange data without the need to integrate their systems.
This new, five-point model of metadata can be tested by revisiting the reasons for the importance of metadata described earlier. The five functions can be mapped on to the issues as shown in the diagram above.
Conclusion
The five-point model is one way of gaining an understanding of metadata. However, the model will need to develop as other purposes are discovered and become more prevalent. For instance, as new commercial systems for trading in intellectual property emerge, the metadata model will have to develop as well.
It is important that LIS professionals understand the role of metadata in information management and that they apply their skills in this context as well. Many information professionals have been involved in the development of metadata standards, such as the Dublin Core, and a good understanding of cataloguing rules and application of encoding schemes and authority lists is essential for the effective performance of systems that use metadata.
References
1 D. Haynes. Metadata for Information Management and Retrieval. Facet Publishing, 2004.
2 G. E. Gorman (ed.). International Yearbook of Library and Information Management 2003-2004: Metadata applications and management. Facet Publishing, 2004.
3 Dublin Core Metadata Element Set, version 1.1. DCMI, 2003 (http://dublincore.org/documents/dces/).
4 G. Tozer. Metadata Management for Information Control and Business Success. Artech House, 1999.
5Government Information Locator Service. US Government Printing Office, 2004 (www.access.gpo.gov/su_docs/gils/).
6 ONIX for Books. EDItEUR, 2004 (www.editeur.org/onix.html).
David Haynes is the author of Metadata for Information Management and Retrieval. He is Head of CILIP Consultancy Services and works extensively in the public and voluntary sectors on a variety of information consultancy and research projects. He has run a number of training courses on metadata for CILIP Training and SOCITM.