Copyright exception for text and data mining

I <3 DATA MINING by Jason Cale

Today’s computers are capable of processing large quantities of data quickly and efficiently. This has focused some attention on ‘big data’, ‘open data’, and, more recently, artificial intelligence. What do these capabilities mean more fundamentally for library and information collections?

Broadly termed, ‘text and data mining’ (TDM) refers to the processing of information to ‘discover patterns, trends and other useful information that cannot be detected through usual ‘human’ reading’.  Much of this is about ‘speed reading’: allowing a computer to sift through the text of thousands of books or millions of datapoints faster than any human. Anyone who has used the Google Books Ngram viewer has seen this in practice: the viewer queries Google’s sizeable corpus of digitised books to draw out instances of selected words, showing usage trends over time.

To enable this processing, content often needs to be transformed into a ‘mineable’ format, which may be as simple as turning it into plain text. Prior to 2014 there was no specific legal mechanism in the UK to enable copyright protected works to be so transformed. Anyone wishing to ‘mine’ in-copyright works would almost certainly have needed to obtain explicit permission from the copyright owner(s) first. Given the scale at which text and data mining operates, this was a barrier to research that involved material from much of the past century (at least). Growing interest in securing a legalframework to enable text and data mining culminated in the recommendations of the Hargreaves review of copyright in 2011 and the resulting implementation in 2014 of a copyright exception for non-commercial computational analysis.   

This post digs into the specifics of this still relatively new exception, looking in particular at how libraries and other information organisations may wish to interact with it. As a JISC report found in 2012  (when advocates were in the midst of pressing the UK for this exception in the wake of Hargreaves) ‘Text mining presents an opportunity for the UK, encouraging innovation and growth through leveraging additional value from the public research base.’. Now that the UK has a text and data mining exception, information professionals should ensure they have an understanding of how it can and should be used to benefit research and the development of new knowledge.

What the law says:

29A Copies for text and data analysis for non-commercial research

1.    The making of a copy of a work by a person who has lawful access to the work does not infringe copyright in the work provided that—
i.    the copy is made in order that a person who has lawful access to the work may carry out a computational analysis of anything recorded in the work for the sole purpose of research for a non-commercial purpose, and
ii.    the copy is accompanied by a sufficient acknowledgement (unless this would be impossible for reasons of practicality or otherwise).
Where a copy of a work has been made under this section, copyright in the work is infringed if—
iii.the copy is transferred to any other person, except where the transfer is authorised by the copyright owner, or
iv.the copy is used for any purpose other than that mentioned in subsection (1)(a), except where the use is authorised by the copyright owner.
If a copy made under this section is subsequently dealt with— is to be treated as an infringing copy for the purposes of that dealing, and
vi.if that dealing infringes copyright, it is to be treated as an infringing copy for all subsequent purposes.
In subsection (3) “dealt with” means sold or let for hire, or offered or exposed for sale or hire.
To the extent that a term of a contract purports to prevent or restrict the making of a copy which, by virtue of this section, would not infringe copyright, that term is unenforceable.

Breaking it down a bit

This exception allows ‘a person’ who has ‘lawful access’to a work to make a copy of that work so that ‘a person’ who has ‘lawful access’to that work can ‘carry out a computational analysis of anything recorded in the work’ for non-commercial research purposes.

The exception also allows for any contract term that would ‘prevent or restrict’ the making of such a copy to be ignored (ie because it is ‘unenforceable’ - see this previous post about contract override in copyright law).

The exception is subject to a number of requirements. These are:

  • the maker of the copy must have lawful access to the work,
  • the person undertaking the computational analysis must have lawful access to the work,
  • copies may only be made for enabling computational analysis for non-commercial research purposes,
  • copies must be accompanied by sufficient acknowledgement, other than where this would be impossible, and
  • copies must not be transferred to any other persons or used for other purposes (unless the copyright owner(s) give permission).

Maker versus analyst

This exception allows ‘a person’ to make a copy of a work so that ‘a person’ may undertake computational analysis. There is no limitation that the person who makescreates the copy must be the person who undertakes the analysis. The makercopierdoesn’t necessarily need to be the analyst. The exception permits one person to make a copy so that they or another person may undertake computational analysis with that copy, so long as the other requirements of the exception - such as lawful access - are met. If the maker is intending to rely on this TDM (Text and Data Mining) exception to justify their copying, they do need legitimately to be creating the copy so that a person with lawful access may carry out computational analysis.

Suppose two people - Person A and Person B - both enjoy lawful access to a work - Work X. The exception permits Person A to make a copy of Work X so that Person A or Person B can undertake computational analysis of anything recorded in the work. Person A does not necessarily need to be the analyst, even if they are the maker.

To take a similar example, suppose a librarian has lawful access to Work Y, because Work Y is held in the library’s collection. The librarian can make a copy of Work Y under this exception so that a library user, who also enjoys lawful access to Work Y by virtue of being a member of the library, can undertake computational analysis of anything recorded in the work.

Copies made under this exception may not be transferred to ‘any other person’ or used for any purpose other than computational analysis for non-commercial research. In the example of Person A and Person B, the copy of Work X may be transferred by Person A to Person B, so long as non-commercial analysis purpose remains. This can happen because both Person A and Person B have lawful access to Work X. The copy may not, however, be transferred to a person who does not have lawful access to Work X (for example, Person C) or to a person who does have lawful access but does not intend to use the copy for computational analysis for non-commercial research. Naturally, these acts may be undertaken if the copyright owner(s) give permission.
It is also worth noting that this exception relates to persons who have ‘lawful access’to a work. The exception does not discuss ownership of the work (as, for example, s. 31A(1)(a) discusses ‘lawful possession’ in respect of creating accessible copies of works). In other words, it should be sufficient for the party or parties involved in creating and using copies under this exception to merely enjoy lawful access to the original, and it is not necessary for the work in question actually to be owned by the party(ies). Again, an obvious benefit here relates to the users of libraries and archives, who will enjoy lawful access to the works within relevant library and archive collections, but of course will not exercise ownership over those works.

To date, discussion of this exception has tended to focus on subscription materials, mainly those held by universities. However, the exception is not limited to subscription content. The wording is framed to allow use of any copyright-protected work for the specified purpose. Therefore, the exception covers material in any format, for example the content of a web page or a printed book.

Contract terms and technical protection measures

This exception benefits from the laudable ‘contract override’ clause that was applied to a number of UK copyright exceptions in 2014. This clause is hugely important, and ensures that a copyright owner cannot use contract terms to prevent or restrict a person from benefiting from an otherwise valid exception. For example, a copyright owner cannot enforce a contract clause that seeks to prohibit the purchaser of a work from creating copies for the purpose of non-commercial computational analysis.

So long as you meet the requirements of the exception (eg in respect of non-commercial research, lawful access, and no-transfer of copies) you do not need to check whether you are compliant with any relevant contract terms before benefiting from this exception, save for one respect. The exception places clear limits around ‘lawful access’to works.Making a copy of a work that you do not have lawful access to or enabling someone to access a copy if that person does not have lawful access to the original are not permitted and would not be a legal uses under the exception.

Therefore, you should check any relevant contract, purchase, or licence terms to clarify who has lawful access to the original work before making use of this exception. For example, contract terms that limit access to the original work only to members of staff of your organisation, and not students, would mean that you cannot allow a student to undertake computational analysis using a copy of the work under the exception, because the student does not enjoy lawful access to the original work.

Contract terms are one way of limiting use of works and exercise of the exceptions to copyright. Another way is through technical protection measures (TPMs). TPMs place technical or operational barriers on the use of a work or the exercise of some other function. CAPTCHA is a common access barrier frequently encountered online. This technology limits access to material or the exercise of functions until a series of characters, symbols, or shapes have been deciphered and typed out, with the aim of confirming that the user is a human.

The Copyright, Designs and Patents Act 1988 does contain a remedy for situations where TPMs prevent permitted acts (s. 296ZE). However, this mechanism is clunky and for a TPM issue to be resolved a complaint must be made to the Secretary of State, through the Intellectual Property Office (IPO) . In 2015 LACA made a complaint to the IPO regarding the use of CAPTCHA in respect of the exercise of the text and data mining exception. We have posted previously about the complaint here.

In response to LACA’s complaint, the IPO concluded that, because the material that was being restricted was available online under particular contract terms, the narrow wording of the remedy clause meant the material was out of scope and that the IPO was therefore unable to oblige the copyright owner to remove or curtail the technical barriers.

In short, while the contract override clause helpfully ensures that contract and legal terms cannot stymie the exercise of certain copyright exceptions where otherwise lawful, the UK’s copyright laws are insufficient in terms of TPMs where the material in question is licensed and made available online. The law fails to prevent TPMs from restricting the otherwise-valid use of exceptions. It is important to be aware that TPMs may be used, intentionally or otherwise, to prevent or restrict the making of copies under the exception for computational analysis. However, the proposed Digital Single Market Directive, which is currently being discussed in the European Parliament, may partially tackle this issue. While not removing the right of content providers to protect their content using TPMs, the proposals aim to apply pressure so that such restrictions may only be used when there are legitimate fears of unlawful activity.


The text and data mining exception offers a potentially valuable tool for UK users of copyright-protected works, as well as for information-supplying organisations, such as libraries and archives. In particular, libraries and archives should be in the position with this exception to enable users to undertake computational analysis of material recorded within works held in collections. Although contract terms cannot prevent the making of copies for this purpose, contract terms may still be relevant in relation to defining who has lawful access to the original work. Likewise, TPMs may continue to present practical barriers to the proper use of this exception.


•    Copyright User provides a useful overview of this exception, in particular the role of database rights and contract law:

•    Future TDM expert reports

•    Copyright, Designs and Patents Act 1988, s.29A

Image referenceI <3 DATA MINING photographed by Jason Cale on Flickr Attribution-ShareAlike 2.0 Generic, cropped and resized.

Related knowledge and skills