Susan Whitfield explains the challenges of managing a digitisation project across continents.
In 1993 the idea of sending hundreds of thousands of high-quality images quickly over the web was an ambitious vision. The first web browser had been launched and there were 50 websites available. The pessimists warned of the expense and limitations of storage, slowness of access, intractable copyright issues, and the transience and treachery of technology. Ten years on, the International Dunhuang Project (IDP) has defied them, offering a single-stop multilingual internet site giving information about, and high-quality images of, more than 50,000 Central Asian manuscripts, paintings and artefacts.
Two millennia before the development of the web, China had invented a technology that would similarly transform world civilisation – paper-making. By the fifth century when scribes started to copy Buddhist texts in Dunhuang, a Silk Road garrison town on the western borders of China, they used fine paper made from mulberry tree and hemp fibre. It was dyed yellow with bark extract, which acted as a water-repellent and insecticide. The sheets were glued together to form scrolls – sometimes 20 metres or more long – wrapped around a wooden roller and secured with braid.

Scrolls stacked up in cave 16; a secret opening leads to the Library Cave. Copyright: British Library
Dunhuang had been established by the Chinese administration as it pushed west into Central Asia, and the town remained under Chinese control for much of the first millennium. However, Dunhuang’s position on the Silk Road meant that monks, merchants, soldiers, diplomats and others came here from the other direction – from India and Persia – and brought their own languages, scripts and book formats. And during the first millennium, empires to its south and north – the Tibetan and Uighurs respectively – were also to be influential.
Buddhist library
Dunhuang was typical of Silk Road towns but it is now unique because a Buddhist library was preserved here. Tens of thousands of manuscripts in Chinese, Tibetan and many other languages and scripts were stored inside a small cave temple at a Buddhist monastery site south east of the town. The site – the Mogao Caves – had been founded by an itinerant monk in AD 366. Others followed and by the eighth century around a thousand caves honeycombed a mile of cliff overlooking a small river and monastic complex. The Library Cave, as it is now known, was originally built as a memorial chapel for a local monk in the mid-ninth century. Later his statue was moved out and the cave was filled from floor to ceiling with manuscripts and several hundred ceremonial paintings. Sometime after AD 1000 the cave door was plastered and then painted over and forgotten.
In 1900 a monk carrying out conservation on the caves accidentally uncovered the doorway. He informed the authorities and presented some of the finest paintings and manuscripts to local officials but no one came to investigate. When Aurel Stein, the first of several foreign archaeologists to be drawn to Dunhuang, arrived in 1907 the monk was persuaded to sell many paintings and manuscripts. A year later he sold more to Frenchman Paul Pelliot, who showed some examples to Chinese scholars in Beijing on his way home. This prompted pressure on the Chinese government to dispatch an envoy to clear the cave of its remaining contents – some 10,000 manuscripts. However, whether the guardian monk had secreted some finds away, substituted ones from elsewhere or bought some from local forgers, over the next decade Russian, Japanese and British expeditions all acquired more manuscripts, purportedly from the Library Cave. And so the contents of the cave were dispersed across continents.
From microfilm to digital
At the time, this find was unique. Few scholars had experience of cataloguing East Asian primary documents and there were no standard methods for their conservation. Although the cave had preserved them well, much of the material was in fragmentary condition, and the sheer quantity posed problems for conservators and curators. Progress was further delayed by two world wars.
Accessibility was increased in the 1950s and 1960s with microfilming of the collections, and in the 1970s Taiwanese publishers produced a pirated print copy of all the microfilms. At the same time scholars and conservators from China started a series of visits to London which culminated in a 14-volume facsimile copy of non-Buddhist manuscripts in the British Library. But these represented less than 10 per cent of the total, and traditional printing was obviously not going to be able to address the conundrum of increasing access while ensuring preservation.
A conference was convened in late 1993 bringing together representatives from all the major holders. The following spring they founded the International Dunhuang Project (IDP) specifically to collaborate on conservation and cataloguing and to increase access by creating a comprehensive online catalogue of all the material, linked to high-quality digital images.
The World Wide Web in 1941 (1)
It is only just over 30 years ago since the first computer network was created. Twenty years ago there were still only 1,000 hosts. In the late 1980s and early 1990s, with the development of academic networks and ftp, the number of mailing lists and digital resources grew, and these included several relating to China. One of the first ftp archives of old texts was the Asian Classics Input Project, and the first Chinese text archive was established in 1991.2 In 1992 the World Wide Web was invented. In 1993 the first browser was available, where 50 websites could be viewed. In 1994 the ANU Asian Studies Server was the 850th web server in the world. This was the environment in which IDP was conceived.
But by that time, anyone monitoring the progress of the internet and digitisation could see that any doubts about their capabilities would be addressed in the near future. The Dunhuang manuscripts were an ideal collection for a digital internet project: dispersed internationally yet forming a single collection; too numerous to reunite in any affordable and thus easily accessible print publication; their age and fragility limiting handling; and having relevance for a large body of scholars and others worldwide. Digitisation is an expensive option: it is still not generally recognised as an archival medium (as far as I know, the National Library of the Netherlands is still the only library to accept digital preservation) and so the most compelling argument for its use remains significantly improved access.3
A year later, the Chiang Ching-kuo Foundation (CCK) awarded the BL a three-year grant and I was able to start work on developing a database of the manuscripts in the BL’s collections. I was confident that digital imaging technology would improve radically, that relative costs of digital storage would decrease and that delivery of large files over the internet would become easier. All these were essential to make IDP feasible.
IDP has avoided the problem of access speed by offering the user three different-sized images: a reference thumbnail; a medium image, generally the size of the original; and a large image for viewing details. In addition, large originals are shot in sections, although a stitched image of the whole is available. All the images are colour-corrected against the original. In some cases infra-red and detail shots are taken.
It was vital at the onset to choose software which was well-established, cross-platform, multilingual and affordable to all IDP’s members. It was also apparent that without being able to adapt to constant technical developments the project would fail to reach its potential. This was a period when digital data and software soon became obsolete and where new standards were constantly being introduced.
Work on the database begins
By 1997 with the database designed (based on Master but with additional fields)4 and containing more than 20,000 records, we were ready to start digitising. Databases of good digital images were still rare on the web but a notable exception was the Huntington Archive based at Ohio State University which had gone online in 1995.5 John and Susan Huntington had an archive of tens of thousands of 35mm slides taken during their visits to Asia. These were scanned and made available online and the resulting images and metadata provided a benchmark for quality and good project design.
The start of digital imaging goes back much further than the web – to the early 20th century, with medical techniques. It was developed further from as early as the 1950s for television and by Nasa for satellite images. The first filmless electronic camera appeared in 1972 but it was not until the 1990s that the first professional and affordable digital cameras were produced. In 1997 there were none that matched the quality of film. The advantage of taking film and then scanning it was that the film could be rescanned easily in the future when technology improved. Scanners were comparatively cheap and 35mm cameras already available to most institutions. We therefore started producing digital images from slides with a grant from the UK Heritage Lottery Memorial Fund, basing many of the standards and our digital naming system on the Huntington Archive’s guidelines. We decided to start this work on the group of fragmentary manuscripts from Silk Road sites other than Dunhuang because this was little known material.
Limited funding
Although the Dunhuang manuscripts form the core of IDP, we realised the advantage of including all Eastern Silk Road (now Chinese Central Asia) manuscripts. Although the largest single group of manuscripts came from the Library Cave, tens of thousands on wood, leather, paper, silk and birchbark were found at other Silk Road towns and temples. These are all included within IDP’s remit, not only because access and preservation were equally at issue but also because this material was part of the same story. At the same time as we started digitising we began cataloguing non-Chinese material from the British, including Tibetan and Tangut, and adding these to the database. In 1998 with more than 1,000 images, the IDP web server went online.
IDP has always been an externally funded project and, without access to large amounts of funding, from the first it had to be extremely efficient and to show results to attract more funding.6 There was no money for web designers, imaging consultants, or full-time programmers. To make the project sustainable we chose off-the-shelf software and multilingual staff with multiple skills.
We also relied heavily on others’ expertise. I developed the first version of the database structure myself and, since then, I have worked closely with a database consultant who we employ as and when necessary. His input, understanding of the project and continued involvement has been an essential element of IDP’s success.
We continue to maintain the database and website ourselves and all IDP staff can troubleshoot – sorting out basic problems without expensive computer support. But this is not to deny the need for professionals, most especially in imaging. Library staff are text-literate but rarely image-literate. Yet the availability of cheap scanners and imaging manipulation software has deceived many into believing they have the skills to prepare professional-quality images.
Ironically I am now grateful to CCK for giving us a smaller grant than requested in 1994. This enabled us to start slowly, build secure foundations, and learn from others. By the time we received a large grant from the Andrew W. Mellon Foundation in 2001 to expand our digitisation programme significantly we had a tried and tested infrastructure. Digital scanning backs for high-format cameras were becoming available at this time and we thus switched to direct digital capture using a Photophase FX on a 4x5 camera, producing much better images.7 Image databases were just starting to take off: Mellon funded the BL and other holders of Dunhuang material in order to acquire images for its online image resource, ArtSTOR, announced in 2001.8
IDP was, from the start, an international collaboration. Although the foundations were laid at the BL, the other partners continued to play an important and active role in hosting conferences and continuing their own conservation and cataloguing programmes. By 2000 with the infrastructure in place and much of the BL data available online we were ready to start to work together with other IDP members to incorporate their data.
This started slowly with the addition of three Dunhuang manuscripts from the Chester Beatty Library in Dublin, providing one model of collaboration, that of IDP London acting as a host for images and data from elsewhere. The second model was developed in 2001 when an IDP digitisation, cataloguing and research centre was established at the National Library of China (NLC), generously funded by the Sino-British Fellowship Trust. Local staff were trained and then expected to work at the same rate as IDP London staff, producing work in accordance with IDP procedures and standards (published online and constantly reviewed and updated).9
IDP’s primary audience is the scholarly community and we realised that to give genuine access to all interested scholars we would have to offer multilingual versions of the IDP online database. In 2001 we redesigned the website in preparation for the launch of a Chinese version hosted by the NLC and we ran up against one technical issue that, despite my hopes, remained unresolved — multilingual environments.
Multilingual environments
Unicode was developed as early as 1987 to address precisely this issue, and Unicode standard 1.0 was published in 1992. I was confident in 1994 when IDP started that within a few years there would be no problems using multiple languages across applications, including on the web. I was wrong. What is surprising is that it has taken so long to reach Unicode 4.0, a genuinely useful system. There is still some way to go before using and displaying multiple languages in all applications becomes straightforward. We were lucky when setting up the Chinese web server to have the services of a database developer who was also knowledgeable about non-Roman scripts: this is not a common combination in Europe. The web database went online in November 2002 in a Chinese version, but the issues of encoding and displaying are not all resolved.
But IDP has the potential to reach a much wider audience than just the scholarly community and we made preliminary attempts to offer more accessible information through our education or ‘Special Topics’ pages. In 2000 we designed a map interface which enabled those without specific language skills to search for archaeological sites, historical photographs and excavated items using maps. This has proved very popular.
Subject search
In addition, although we have a catalogue search page where scholars can go directly to specific manuscripts/artefacts, we also included a subject search so that those without any prior knowledge could, for example, look at all manuscripts written in Tibetan which concern Buddhism, or in Chinese concerning camels.
Once we had developed the model for synchronising data between servers in China and Britain, we were ready to expand our international collaboration. Russian and Japanese IDP Centres were established in 2004 at the Institute of Oriental Studies in St Petersburg and Ryukoku University in Japan. The full IDP web database will be accessible in Russian and Japanese versions in 2005. We hope to have a German version online by 2006.
At the same time IDP has expanded its remit to include paintings and artefacts from Dunhuang and the Eastern Silk Road. One of the strengths of the web is that it can cut across traditional disciplinary boundaries. The distinction between art historical studies of the Dunhuang paintings and historical studies of the manuscripts is unnecessarily restricting, and the 2002 IDP database therefore incorporated the Dunhuang paintings held in the British Museum and the Freer Gallery. We are now adding images of artefacts from the British Museum and textiles from the Victoria and Albert Museum, all found at the same sites as the manuscripts. We hope to conclude agreements with the other major holding institutions over the next few years.
From a staff of one and an idea in 1994, we now have a genuinely collaborative project with a website which received more than 15m visits from more than 83,000 distinct hosts in 2004 alone. And all this has been achieved on a total budget of less than £1.5m. At the current work rate and with the collaboration of the final few major holders, we will be able to offer access to data on 90 per cent of the Dunhuang material, with images of more than 75 per cent, as well as a good proportion of other Silk Road material, by 2010.
The pessimists are still among us, their greatest concern being the long-term preservation and migration of digital data. My belief is that there is simply too much digital data around for these issues not to be resolved in the near future, but perhaps by the military and government rather than the library sector. In addition, once the collection is digitised we can always take advantage of advances in microfilming technology to migrate the colour images directly on to microfilm thus providing an accepted library archive.
My main concern, apart from the perennial fundraising, is to ensure that IDP continues to be an adaptable, collaborative and high-quality project so that it can provide a relevant, functional and friendly service for a growing body of users.
References
1 I am indebted to Matthew Ciolek for this information, taken largely from his indispensable website on internet Asian resources based at the Australian National University (http://coombs.anu.edu.au/). A timeline is given at http://coombs.anu.edu.au/asian-studies-timeline.html
2 More than 1,500 websites are now listed in the Internet Guide for Chinese Studies (http://sun.sino.uni-heidelberg.de/igcs/).
3 See my paper ‘Navigating through uncharted territory: IDP, an international internet digitisation project’ (http://idp.bl.uk/chapters/publications/
IDP_papers/advcttee.html) for a discussion of this, especially the section ‘The perils of digitisation’.
4 For an introduction to Master see http://xml.coverpages.org/masterGentintr.html. To download a pdf of the IDP database structure as of 2004 see http://idp.bl.uk/chapters/publications/IDP_papers/
dbaseDesign.pdf
5 http://kaladarshan.arts.ohio-state.edu/Default.html
6 Total external funds spent on IDP in its first decade amounted to about £1.5m, but expenditure in the first five years was less than £250K. A list of supporters is given on http://idp.bl.uk/chapters/about_IDP/
idpintro.html#funding.
7 ‘The FX boasts the largest capture area of any digital camera at 8.4x10cm.’ For more detail, see ‘Pro digital cameras: the high end digital market’, Shutterbug June 2001 (www.shutterbug.net/features/0601sb_pro/).
8 www.artstor.org
9 Available in shortened URL form or in full form via a downloadable pdf on http://idp.bl.uk/chapters/publications/
IDP_papers/standards.html
Dr Susan Whitfield (susan.whitfield@bl.uk) is Head of the International Dunhuang Project at the British Library (http://idp.bl.uk).
Updated: 14 November 2008