This website uses cookies to store information on your computer. Some of these cookies are used for visitor analysis, others are essential to making our site function properly and improve the user experience. By using this site, you consent to the placement of these cookies. Click Accept to consent and dismiss this message or Deny to leave this website. Read our Privacy Statement for more.
About Us | Contact Us | Print Page | Sign In | Join now
News & Press: News

Q&A with US Government Information Librarian at Stanford University

19 June 2025  
Posted by: Rob Mackinlay
"Q&A with US Government Information Librarian at Stanford University"

Donald Trump signs executive orders

US Government Information Librarian at Stanford University, James R Jacobs, answers questions about how librarians and researchers in the US are saving information and data from Donald Trump’s administration.

James answered several questions for an article in Information Professional (June 2025) about the impact of Trump’s executive orders on research. He provided detailed answers that also highlighted the threat to democratic society and many links to data rescue projects.

James R Jacobs James R Jacobs

IP: What is your role as a US Government Information Librarian?

JRJ: I'm the US Government Information Librarian at Stanford University. I have a colleague who covers state, local, and international government information and we both sit within the Social Sciences Resource Group which includes librarians for other social science subjects (economics, political science, anthropology, communication, sociology, journalism, psychology).

Many academic libraries have librarians on staff that support government information, but I am rare in that my sole focus is government information. Many librarians have govinfo as a small percentage of their overall job duties (which sometimes include subject duties like political science or history, technical services and cataloguing, reference and instruction and the like).

My job, like the others in my unit, is to do research support and collection development. Historically, government information has been published in paper format (primarily) and we have long received reports and documents through the Federal Depository Library Program (FDLP) which has been in existence since 1813, and Stanford has been a member since its beginning in 1895.

While most of the other librarians in my unit support specific departments and research centers, I'm a "free agent" so to speak. Since govinfo/govt data touches on disciplines across the university, I receive research consult requests from a diverse set of students, researchers, and faculty from history, political science, to business, law, medicine, engineering, computer science etc. I also advise researchers with Freedom of Information Act (FOIA) requests, the process by which the public in the US can request information and records that are not publicly accessible.

IP: Could you give some details about the data saving projects that you work on and how you are involved in them?

JRJ:With the advent of the internet, much govinfo/data has shifted and is being published online via government websites and databases. So my collection development work has shifted online as well. I started the LOCKSS-USDOCS program 16 years ago to provide a collaborative preservation network focused on preserving the digital publications produced by the Government Publishing Office (GPO) - which used to be called the Govt Printing Office.

The concept is simple, just as GPO distributed paper documents to the 1100+ libraries around the country, I felt that it was important to continue doing "digital deposit". A network of libraries provides a failsafe for disappearing information and a network of access points to the public.

It's a shame that LOCKSS-USDOCS only collects content from GPO's content management system and not from across the 440+ executive branch agencies and commissions. There's a law called Title 44 of the US Code which requires agencies to send their publications to GPO for distribution to the FDLP, but with the advent of the internet, most agencies ignore that law, so you'll see that govinfo.gov includes only a small amount of content from the executive branch. If agencies would follow the law, much more content and data would be hosted on GOVINFO and preservation would be assured in a distributed and collaborative way to assure free access for the long term.

Which brings me to End of Term Archive. I began participating in EOT in 2008 and became an official partner in 2012. EOT does a broad and deep snapshot of the federal .gov/.mil web domain every 4 years.

Conceptually, EOT goes hand in hand with LOCKSS-USDOCS as it offers me a way to collect and preserve all of the web-published government information that doesn't make its way to GPO (which in depository library lingo are called "unreported" documents).

EOT is NOT a political or partisan project. We have done our work since 2008 during both Republican and Democratic presidential terms. It is done to assure the curation and preservation of federal government information.

EOT does industrial sized web harvesting of the US federal web domain (we collected about 300TB of data in 2020 and will be well above 1 PB for our 2024 efforts).

It's important to disambiguate the term "data". ALL information online is technically "data," ie zeros and ones that are machine readable. The "data" that EOT collects includes web pages and files linked off of those web pages, including PDF reports, spreadsheets of numeric data etc. But most of the coverage in the press talks about data in terms of "scientific" numeric data collected by federal agencies as part and process of the output of science. For example, this NOAA fisheries survey that I just saved in the Stanford Digital Repository yesterday at the request of a researcher.

It's all data all the way down. But I think that the media coverage often conflates and misunderstands the meaning of "data" and therefore misses the point on what's important and what each of the main projects is concerned with.

The 3 main projects focused on collecting and preserving federal government information and data are targeting different pieces of that overall information ecosystem.

EOT is collecting the context surrounding the work of the federal government (which includes web pages, official reports and publications, AND distinct data sets).

Data Rescue Project (DRP) is a group of (mostly) data librarians looking to collect and preserve distinct data sets and depositing them in datalumos.org (a project of the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan, a social sciences research data archive in existence for over 60 years).

The other main project is Public Environmental Data Partners(PEDP) a volunteer "coalition of several environmental, justice, and policy organizations, researchers across several universities, archivists, and students who rely on federal datasets and tools to support critical research, advocacy, policy, and litigation work." PEDP is organized by the Environmental Data Governance Initiative (EDGI) that is targeting environmental data and has done some really good work extracting data from web apps (like mapping visualizations etc) and replicating them outside of the .gov domain.

There is much human overlap between these 3 groups (eg I participate in all 3). PEDP/DRP people frequently submit urls for EOT to harvest, different groups have listed the data that they have collected on either/both of the PEDP/DRP data trackers.

There are other groups and individuals working in this space as well, some of which is fed into the 3 main projects and some of which is just distributed over bittorrent.

Reddit channel of 840k+ members who collect/hoard big data.

Safeguarding Research and Culture a research project run by Henrik Schöneman at Humboldt University in Berlin. Henrik also works closely with DRP.(Henrik and colleagues at SRC also answered some questions for IP June 2025).

The Data Liberation Project a project run by Big Local News project out of the stanford school of journalism.

IP: Your role has given you a good view of the impact of Donald Trump’s executive orders – what insights has this given you about the interaction between politics and information preservation?

JRJ:It's important to distinguish between my role in EOT (which is a non-political, non-partisan project running since 2008 during both Republican and Democratic administrations), and my personal political views. Federal information policy is political by default, since that's how policy is made. The library community (through groups like American Library Association (ALA), American Association of Law Libraries (AALL), Association of Research Libraries (ARL) etc ) have long advocated for public policy that supports libraries and public access to government information.

Given that, almost all of Trump's executive orders are illegal on their face because they abrogate the separation of powers set out in the US Constitution. Unfortunately, this administration is "flooding the zone" and doing its illegal actions very quickly and the US legal system - which has always been slow! - is in the midst of an unprecedented political stress test.

IP: Do you have a picture of how important a role US government data plays in research globally?

JRJ:The importance of the US government's statistics/data gathering system on the world can not be overstated. Countries, researchers, scientists, students and the general public around the world use and rely on US government information and data. So the loss of staff across all of the federal data gathering agencies will be felt around the world for many years to come.

One small anecdote to highlight this: last week, a researcher contacted me looking for data specific to Cambodia, including demographics and other variables down to the province/district level.The US Agency for International Development (USAID) had for many years collated these data. Because the Trump administration shut down USAID and fired all of its staff, the data for all countries, not just for Cambodia, have been lost. Not only that, but IGO's like the United Nations (UN) point to and rely on the USAID data on data.un.org. So it's clear that people and governments around the world rely on the information and data produced by federal agencies.

IP: And about the role of universities and university librarians in preserving data that is threatened by their own governments?

As noted earlier, the FDLP has been around since 1813, and libraries have long had a role in preserving and giving public access to government information (publications AND data). However, the advent of the internet has eroded libraries' traditional collection and preservation role. One of the key tenets of the FDLP is to provide a preservation buffer for public information. This administration's attack on public information and the public record has spurred many librarians and others to pitch in toward the work of EOT, DRP, PEDP etc.

As a side note, my colleague and I just wrote a book "Preserving Government Information: Past, Present, and Future" (https://freegovinfo.info/pgi) that among other things, maps out and advocates for a collaborative Digital Preservation Infrastructure to continue the historic work of the FDLP.

IP: Should a government have a right to delete/edit/change data like this – or do you think Trump’s actions have highlighted a legal loophole?

JRJ:Data and information change over time. Policies are superceded, data are re-analyzed and aggregated. So as part of a government's work, that is expected. In the US context, those changes, edits, deletions are documented in any number of ways as part of the documentary process.

This administration has not found a legal loophole, they are acting illegally and in bad faith to the necessity to democracy of public access to information about what the government is doing in the public's name at any point in time.

IP: Is there transparency about the process the administration uses for selecting data that it wants to remove and who is responsible for doing it?

JRJ:This administration's actions are seemingly random, but always malicious and capricious and the exact opposite of "transparent". (we really are living in Orwell's 1984!). Yes, they are working off a list of words that this administration doesn't like or agree with. But it is causing things like the deletion from the Dept of Defense website of an image of the WWII airplane called "Enola Gay" because some automated AI web search found the term "gay." Their ineptitude would be laughable if it weren't so damaging.

IP: Does this affect all government departments and are some more zealous than others in removing data?

JRJ:I'm not a federal employee, but yes I assume federal employees and entire agencies are acting in both positive and negative ways to continue to work and exist. I have seen many agency websites with notices posted that they are in the process of analyzing their sites in order to comply with executive orders. And many agency heads are being replaced with people who are allied with the Trump administration and unquestioningly loyal to President Trump.

IP: How much data is being removed – is there any way you can describe the pace of removal that might make sense to someone who doesn’t know about these things?

JRJ:I don't know if I can quantify the data loss. There is a system-wide attack on science writ large. There have been specific datasets targeted for deletion -- for example, NOAA's database tracking Weather and Climate Disasters, USDA and the US Forest Service's data including on federal funding and loans, climate risk, forest conservation, and rural clean energy projects (several farmers groups recently won their lawsuit to restore climate-related data).

And there have been wholesale destruction of the infrastructure of data collection through the blocking and/or complete cancellation of many billions of dollars in grant funding from NIH, NSF, DoD, DOE, Dept of Education, IMLS etc(https://www.appropriations.senate.gov/trumps-funding-freeze), the firing of federal employees across the executive branch if not complete erasure of entire agencies and sub-agencies like USAID and CDC. It's important to note the illegality of all of this; government data gathering is generally something that is required by specific laws and regulations.

Kelly Smith, a librarian friend of mine at UC San Diego, is managing a really great "trump tracker" site https://ucsd.libguides.com/usgov/trumptrackers.

IP: Some of our members are particularly worried about PubMed – how at risk is it?

JRJ:PubMed is an indexing and abstracting service of the fields of health and medicine similar to the ERIC database of education research supported by the Dept of Education. The service points to scholarly journal literature and other materials produced by scientists and researchers within the government as well as those at non-governmental academic research labs and institutes (much of which is federal grant funded).

So the question isn't really about the availability of PubMed (there are other medical/health related indexing and abstracting databases that will continue to exist), but about the attack on scientific research that will stop funding for important medical/health research going forward.

IP: When they are deleted/removed are the data sets destroyed, or just no longer published on government websites?

JRJ:It's hard to say for sure. Some are definitely deleted. Some are archived outside the government (like the Internet Archive), some have been recreated (eg https://www.restoredcdc.org/www.cdc.gov/). The real danger is for the publishing and access of government information and data going forward.

IP: In the article you describe ‘data blackholes’ where stopping collecting data over a short period undermines the long term value of the data. Are there any good past examples that could illustrate this?

JRJ:I don't have any past examples as what this administration is doing is unprecedented. Access to government information has always been contentious in a way (see https://freegovinfo.info/less_access), but not to any degree close to what this administration is doing.

Imagine if longitudinal surveys like the 50-year old General Social Survey (https://gss.norc.org) funded by NSF were suddenly defunded. GSS has long tracked american demographic, behavioral, and attitudinal questions, plus topics of special interest like "civil liberties, crime and violence, intergroup tolerance, morality, national spending priorities, psychological well-being, social mobility, and stress and traumatic events."

This trend data is incredibly valuable for many social science and other disciplines and so a gap of even 1 year could really effect the data and the scientific research that uses the data for its analysis. This is one of many examples of data across the government that needs ongoing collection efforts.

IP: Do you feel that the government is (or will be) making any effort to prevent organisations like yours from gathering or saving the data it deletes? Is the data you have saved now safe?

JRJ:The government information (which as I stated earlier *includes* numeric data) that EOT has collected is safe. We have replicated it in AWS and Common Crawl (a 501c3 non-profit that does large web harvesting for research purposes), and Internet Archive's data are housed in SF, Toronto, and Alexandria Egypt.

I am not fearful of reprisals from the govt for the work that EOT and the other data projects are doing. However, there is a general fear running through academia about budget cuts and job losses as the administration has attacked and slashed grants and federal funding to academic institutions like Columbia, Harvard, Johns Hopkins and others. If my institution were to get its funding slashed, there could be job losses of 10-30%. That would be catestrophic.

IP: How well organised/coordinated and funded are the organisations like yours that are trying to save the data and tools? Are commercial organisations doing it too?

JRJ:EOT and the other data groups are very well organized and coordinated through volunteer efforts of all of the participants. Until this year, EOT had never received any funding for EOT-specific work, it's been all in-kind work of the participants with infrastructure and technology served gratis by Internet Archive. However, this time around, Environmental Data and Government Initiative (EDGI) -- a participant in EOT as well as coordinating the PEDP -- received a foundation grant for its work and some of that money was sent to Internet Archive for EOT work. But none of the EOT participants have received any direct $$ to work on the project.

IP: Is there anything you think researchers and librarians in places like the UK should be doing to help?

JRJ:Access to government information should adhere to the FAIR principles (Findable, Accessible, Interoperable, Reusable). So anything - however large or small - that researchers and librarians can do to further a FAIR public record would go a long way towards assuring long-term access to government information.

That could be submitting seed URLs to the U.S. Government Web & Data Archive 2025 (linked off of our site) for webpages and content in need of preservation, letting DRP and PEDP know of data sets that need to be archived and/or joining those efforts, donating to the Internet Archive and the other data projects (DRP - Help us keep our data afloat/ and /EDGI - donate) making sure that one's own research data are archived in open data repositories like ICPSR, UK Data Archive, Zenodo (hosted by CERN) and the like.

You might also be interested in reading at least the conclusion of FGI's book and help us advocate for an open and collaborative "digital preservation infrastructure for the comprehensive preservation of government information."


Published: 25 April 2025


More from Information Professional

News

In depth

Interview

Insight

This reporting is funded by CILIP members. Find out more about the

Benefits of CILIP membership

Sign up for our fortnightly newsletter