TSO (The Stationery Office) has recently been appointed a registration agency for digital object identifiers. Shane O’Neill discusses why this is good news for government departments, information professionals — and citizens.

This article is from the December 2002 Issue of Update.

Information workers in the public sector are constantly reminded what an exciting and challenging environment the UK is becoming for their profession, and of the opportunity for global leadership in information management and dissemination. The size of UK PLC, the fact that government covers the entire range of human activity across all disciplines and subjects, and the driving requirements of the Freedom of Information (FOI) Act, are all powerful forces shaping this new environment. So it is small wonder that you see a steady stream of experienced electronic publishing professionals moving from the private sector into departments of central government, to run websites or become CKOs or participate in the various debates on e-government. Seventy thousand public bodies are striving to cope with content management on a grand scale, develop their metadata frameworks and make their information available to the citizen by 2005.

Print and web must co-exist

This article gives an outline view of how persistent information tags, called Digital Object Identifiers (DOIs), can play their part in helping the public sector information community adapt to this demanding networked world. In particular, it shows how DOIs can help cope with two abiding realities, often ignored by the more enthusiastic advocates of the web.

First, most communication is via paper and physical print, including most of the important output, and, for the rest of our lifetimes, this ‘mixed economy’ of print and web access will be the dominant reality. Coping with this mixed economy is the hidden challenge that accompanies the supposed wholesale flight to the web. This issue is illustrated by one recent event, perhaps an early warning of the potential chaos that may ensue from not recognising this dual world. In October, the Iraq report was made available on the web, but it carried no Command Paper number or associated ISBN. The document (both in its final version and the various other versions preceding it) was only available to the public by downloading its several dozen pages to a (well-inked) printer. This download would take an inordinate amount of time for the 80 per cent or more of citizens who do not have broadband internet access. This document, potentially as important to European history as the Ems telegram, could be lost to the national archive and will almost certainly dissipate into cyberspace when No. 10 next upgrades its website and all the pointers to the document’s present location disappear.

Second, government talks of interoperability, between departments and between government and the private sectors, but what about within a single department? This is another ‘mixed economy’ — this time of databases, content and document management systems and other workflow management systems.

No organisation has a single content management system that covers all its activities and provides the means of solving its management of information issues, as well as fulfilling the onerous requirements of the FOI Act when it comes into force. Attempts to roll out pan-governmental solutions, although attractive to HM Treasury as it surveys the vast sums of money being expended throughout government, often fail. Departments may have several web environments, many content management systems as well as masses of internal database and documentation systems. How to bring some semblance of order to such decentralised environments, without casting a heavy centralised hand, is a challenge with which many are struggling.

A persistent and compelling solution

It is within this context that TSO (The Stationery Office) brings to bear its experience of both government and publishing, its traditional role as bibliographer to government’s published output and its recent designation by the International DOI Foundation as a registration agent for DOIs. With colleagues in government, we are seeking to make our contribution to the management and dissemination of information and knowledge throughout government and to the private sector.

Publishers found out many decades ago that information could not flow effectively across networks without the use of identifiers, commonly agreed and resolved (i.e. managed) by a registration agency. When, in 1967, W. H. Smith introduced its first Epos system in the UK, it demanded that publishers be able to provide it with electronic records which were unambiguous. (It is important for a bookseller to be able to identify each of the umpteen editions of Charles Dickens’s A Tale of Two Cities that is in print, and the format and price, etc.) Thus, the ISBN was born. Within half a dozen years it had migrated across the globe and now it drives the entire trading environment of publishing.

The scientific world came to similar conclusions in the mid-90s with the advent of the internet. Prior to this, scientists researched the latest developments in their field via printed journals, following the citations through their libraries’ multiple subscriptions and moving from printed copy to printed copy. The internet gave them the capacity to click on a citation and move from one publisher’s environment to another’s — but only if the publishers had agreed some common protocols.

Enlightened self-interest

Publishers are not naturally co-operative; they are fiercely competitive and jealously guard their copyright. But users and enlightened publishers prevailed and the DOI was born — identifying digital objects per se with links to their physical manifestation or other associated descriptors (metadata). The DOI — maintained by a registration agency — is a fixed point, never changing; URLs are the cyberspace equivalent of bookshelves, a physical location which often changes and leads to the loss of all links to that URL. So publishers had persistence, and a means of managing their metadata at scale and reliably building applications around a fixed point. More than 100 of the largest science journal publishers use CrossRef, based in Boston, to trade and exchange the leading developments in science today over the internet.

Now, some of these publishers, with their colleagues in government, are contributing to the development of e-government and bringing those experiences with them. Government is just emerging from its own equivalent of dot com, its first burst of getting everything out there on the web, and is now entering the next phase of how to ‘trade’ information across networks, with appropriate levels of bibliographical identification and permissions and controls. Why would this environment be any different from networked publishing? Hasn’t FOI made government the new publishers, but on a far grander scale? (Government, anyway, has always accounted for 10 per cent of the traditional published output in the UK and is a powerful driver in the information economy that this government so much wants the UK to lead.) The concept of persistent identifiers in government information management is now with us, in the shape of DOIs — in the words of one eminent e-government policy-maker, ‘the only show in town’.

An overarching inclusive approach

DOI is just a brand name for an identifier prefix which a community of interest agrees will be its umbrella framework for holding digital objects, with their overlapping schemas of metadata and their differing requirements in terms of format and usage. Affiliation to an internationally recognised body fulfils a number of important requirements — adherence to standards and confidence that links will be mapped to other developing metadata schemas in related communities of interest. Thus, DOI is mapped to Onix (the book industry trading standard) and to Scorm (the e-learning standard), to name but two. Technology integration and lobbying are sustained by an international body which supports development dialogue with the world’s leading technology players (witness the recent announcement of the DOI plug-in from Adobe). Registration agents abide by a code of practice and share application profiles with other RAs within a developing interoperable universe.

What does all this mean at the practical level?

What would widespread adoption of DOIs provide for the UK government information community? To start with, pointing to persistent and regulated DOIs rather than URLs would prevent the sort of ‘linkrot’ that occurs when websites are revamped. Research shows that more than 50 per cent of links do not survive two years (and this is an average — more volatile information can see links failing within months). Of more concern, though, is further research that suggests webmasters have reduced the degree to which they provide navigation outside environments they directly control. This reduces the comprehensiveness and subsequent value of the facility to the citizen user whose information needs take them over the boundaries of silo or departmental divisions. Have you tried following the trail of links across government websites over the past year where sites have undergone redesign or departments have merged or been reorganised? If so, you’ll see what I mean!

A safe bet

Furthermore, the adoption of DOIs means that this community can hedge its bets regarding who will win in the wars of standards and formats. A DOI is an accommodating number (given uniqueness by its registration authority-given prefix) and as such can embrace any existing identifier, either as a suffix or as part of a related string. It may be called up and allocated in real time on the web or automatically allocated (e.g. retrospective conversion of documents). In the act of allocation, kernel metadata is harvested automatically and creates either a local database of metadata, or a government-wide one. Yet content remains discrete and manageable. This creates the capability to manage content across the distributed environment, either at macro-governmental or departmental level. Additionally, there is the ability to database common metadata (and clearly one cannot have a database without assigning an identifier!) on which you can perform checks and housekeeping, and which you can re-harvest and generally use as a route to good information management disciplines. And you can also achieve FOI compliance. The alternative is to embrace environment-wide content management solutions (CMS), which is no doubt what many of the large suppliers of such ‘in-house solutions’ would like their customers to do. However, even where a discrete environment may be to some extent controlled, the need to interrelate with the wider world through networks means that the interoperability argument comes back into play most forcibly.

What about the alternatives?

Let me tempt fate and invite you to ask your IT community what they think of the DOI concept. One reaction might be that it’s just another acronym and fad. Really? But the global publishing world and the global science world use these identifier systems to make their worlds interoperable and achieve signal practical benefits. Another reaction might be: ‘We can redirect to new URL locations through our local CMS or new and emerging DNS tools.’ Really? But you cannot control the wider environment where your disciplines may not be in place (or your CMS!) and you cannot link either other objects of related interest or characteristics vitally important to the enquiry. (Which version? Who has access permission? Is there a printed version?) For example, linking a Statutory Instrument to its Enabling Act, to its embryonic manifestations (green and white papers), to the current related EC directive, requires embedded and persistent links, otherwise the citizen user becomes a highly dissatisfied researcher armed with unreliable information. The use of DOIs means that the links are permanent and the metadata around the objects can reliably be built into applications and information platforms, giving real added value to the target audience.

A natural development

TSO’s embrace of DOIs was a logical outcome of its bibliographical function within government. We need to identify for ourselves objects which are not always publications, to distinguish between versions of what in metadata terms look like the same thing, and to attribute permissions and linkages to related material. We became a registration agency for DOIs in the official and regulatory space because we identified clear benefits for both ourselves and our clients and because we satisfied the requirements of the IDF — neutral in its marketplace, a natural source for information within and about government. We were also of the size and technical competence to underpin the scalability and security of any such initiative if it became widely adopted within government.

We are evangelical about addressing the needs of the two mixed information economies, and are opening dialogues with all interested parts of government about how DOIs can help to make interoperability within departments and across government a practical reality.

Shane O’Neill is Managing Director, Advisory & Knowledge Services, TSO (The Stationery Office) (shane.oneill@tso.co.uk).

Updated: 04 August 2004
Registered charity no. 313014
VAT Registration No GB 233 1573 87
© Copyright CILIP 2008
CILIP, 7 Ridgmount Street, London WC1E 7AE
Tel: +44 (0)20 7255 0500 Fax: +44 (0)20 7255 0501