What action is being taken to achieve it? Stella Dextre Clarke reports.

This article is from the June 2002 Issue of Update.

Once upon a time there was a dream that throughout government, everyone would use the same word to mean the same thing. The dream was shared among many librarians grappling with the day-to-day difficulty of finding needles in the haystack of information heaped up by government departments, executive agencies, local authorities and quangos of all sorts.

The dream was hopefully labelled the 'Pan-Government Thesaurus', and it might even have become a reality if anyone had had the responsibility and the funding to make it happen. But in the world of targeted budgets, it was a Cinderella waiting for the fairy godmother.

Then, in 2000, the Office of the e-Envoy was charged with developing policies and standards for achieving interoperability and information systems coherence across the public sector.

The idea was that Tom, Dick and Harry out there, and their grandmother, should not have to know what department or what office to apply to for information or services. They should just be able to knock at one (electronic) door and ask for what they want in simple layman's language (or by pointing at it). Clever encoding and routing in the networks behind the door should deliver the answer, whatever its source.

What interoperability is made of

To support interoperability, standards are needed in three main areas: interconnectivity, data integration and information access. Rather than developing new standards, the policy is to adopt widely available solutions such as browser-based technology, XML and the standards commonly used on the internet and world wide web.

Although a new e-Government Metadata Standard (e-GMS) has been developed, this should be seen as an extension of the existing widely used Dublin Core standard, with which it is entirely compatible. Technology details are set out in the e-Government Interoperability Framework (e-GIF), complemented by the e-Government Metadata Framework, both available on the GovTalk website[1] and soon to be combined in one framework document. The e-GMS is a separate document, also available on GovTalk. Adherence to all the standards is mandatory throughout the public sector.

So what became of Cinderella?

Work on developing the metadata standard led to a re-examination of the idea of a standard thesaurus, to be used for the metatagging (indexing as we used to call it) of all public sector resources. If every document or record or web page carries metadata with it, from its first release, whenever that item gets integrated into some other system or network it will still carry a little label saying what it is about. The labels will be standardised by the thesaurus, and so people searching for information on a given subject will find all the relevant items.

But wait a minute, who assigns the little labels to the original items? Can we rely on them to search the thesaurus (quite a large one, for it must cover all the subjects dealt with by public sector bodies) and consistently pick out all the relevant terms? Who is going to check those guys all do a good job? The reality in many offices is that web 'publishing' is often carried out by the authors of the original documents, or by a webmaster who is hopelessly under-resourced for the meticulous task of adding metadata. Even when time is available, few people other than information professionals have been trained in how to use a thesaurus. If a significant proportion of all the resources to be searched have inconsistent subject metatags, randomly applied or omitted, then searches will yield poor-quality results and the whole system falls into disrepute. And in any case, who is going to train all the end-users in how to search with a thesaurus?

'Make it all intuitive,' comes a reply over the airwaves. 'Build it in behind the scenes. Let technology take the strain. Anyway, who needs metadata? We mostly just use Google.' There is no doubt that technology can do a lot, and we need all the help we can get. But technology is not the whole answer. And intuitive interfaces can typically manage only simple operations.

The Office of the e-Envoy organised a workshop in May 2001, with representatives from 28 public bodies, to debate the options. While universal adoption of a full-scale thesaurus was thought impractical, the following conclusions were agreed:

  • rather than a traditional thesaurus, a simple high-level taxonomy should be developed, called the UK GCL (Government Category List);
  • application of GCL terms to subject metadata should be part of the mandatory e-Government Metadata Standard;
  • guidance materials should be provided;
  • toolkits are needed to make the application of metadata as easy and error-proof as possible.

These conclusions represent a compromise between the desirable and the achievable. The GCL that has subsequently been developed may be viewed on the GovTalk website. It has around 360 'preferred terms' and more than 1,000 lead-in entries. It is designed to serve as a tool for browsing and navigation, not for precision searching. It therefore needs to be complemented by other tools and facilities.

And what of the one door to knock on?

As we know all too well, there are already tens if not hundreds of portals and gateways offering access to at least part of what's wanted. One of them, UK Online,[2] does try to cover all UK public sector resources on the web. Aided by the interconnectivity standards, it already opens an access route to staggering quantities of public information and services. But the route beyond the doorway, though steadily advancing, is often far from easy or clear. With the ambitious aim of making it easy for anyone to find anything they want, progress is bound to be slow.

The biggest challenge of all is, and always has been, access by subject: how to match the query or concern in the user's head with the resources that could address it. Words, in their infinite variety, are an inadequate vehicle for precise searching or matching. But words are the preferred means of communication, almost every time. Classification codes, however dear to the cognoscenti, do not go down well with the layman and their grandmother. Attractive graphics are great but can only guide us so far, when the scope of the maze behind them is so wide and so deep.

The GCL likewise is only one small part of the answer. As a high-level browsing tool it will help people isolate a subsector of the total resources available. But that subsector will still be immense, and within it users will still want to use words to pick out the items they seek. To complement the GCL we need all the powerful aids to free-text searching that modern technology can offer, and we should not totally give up on the idea of a 'proper' thesaurus.

A thesaurus works best within a narrow subject domain, and where unified management allows quality control at the time of input. Quite a few government agencies and departments already have a thesaurus, or are developing one, designed specifically for their own sectors. They do not have to face the battle of managing indexing throughout the public sector. Much more feasible is to design and apply tools that work well for their collections of resources and their target audience. Subject-specific websites can make good use of thesaurus-controlled metadata. So we have the prospect that precision searching will steadily improve on a variety of sectoral websites, gateways and portals.

Getting back to the one door, though, what happens when all the sectors are merged in UK Online and other cross-sectoral portals? As of this moment, the sectoral thesauri do not help because each one addresses only a small island in the resource pool. In any case, even where the thesauri exist, most have been in use for such a short time that few resources have been meta-tagged with them. After two or three years though, there will be a substantial body of resources tagged with both the GCL and one or other sectoral thesaurus. The challenge for portal implementers will be to find ways of exploiting both types of subject metadata. This will not be easy, and concept matching capabilities will never be perfect. But the challenge is there, and in a few years we shall see what ingenious solutions are devised.

Making it all work

The information professionals of many public bodies are already involved in implementation of the interoperability standards, including tagging with the GCL. Some have their own vocabularies to integrate, too. One of the challenges is to get the vocabularies integrated into the meta-tagging interface of web publishing and document management systems. Where there is more than one vocabulary, automatic mapping procedures are needed to minimise the work involved in tagging. Guidance materials on GovTalk give advice on mapping techniques, as well as on metatagging and maintaining a controlled vocabulary.

We have a long way to go before we can be confident that information is flowing seamlessly from source to would-be consumer. Fixing the policies and standards at government level is a necessary first step, but not the complete solution. To make it all work relies on teamwork from information professionals together with IT colleagues. It is a challenge, and an opportunity, for us all.

References

1. On the GovTalk website (www.govtalk.gov.uk/) may be found discussion forums as well as full details of the interoperability programme, including:

  • e-Government Interoperability Framework
  • e-Government Metadata Framework
  • e-Government Metadata Standard
  • GCL (Government Category List) and its index for downloading in PDF format or as a zipped set of HTML files
  • GCL maintenance guide
  • Guide to metatagging with the GCL
  • Specialised vocabularies and the GCL

As an alternative to downloading the GCL, visit http://www.govtalk.gov.uk/schemasstandards/gcl.asp to navigate it directly.

2. UK Online is at www.ukonline.gov.uk/. Try the Quick Find page to see how the GCL has been implemented there.

Stella Dextre Clarke is a consultant specialising in vocabulary tools such as thesauri and taxonomies. She has been helping the Office of the e-Envoy plan the Government Category List, in discussion with other government departments.

 

Updated: 12 April 2005
Registered charity no. 313014
VAT Registration No GB 233 1573 87
© Copyright CILIP 2008
CILIP, 7 Ridgmount Street, London WC1E 7AE
Tel: +44 (0)20 7255 0500 Fax: +44 (0)20 7255 0501