Technology analyst and author of Facet’s The AI and Data Revolution: Understanding the New Data Landscape, Martin De Saulles, explores the potential of AI in the knowledge and information professions, and the need for a human/technology/information triumvirate to ensure success.
WHETHER we like it or not, AI is going to transform many aspects of our professional lives. Despite its current flaws and limitations, the technology in its various forms is reshaping how we find, interpret, manage and deploy information. In March 2025, Gartner predicted that global spending by organisations and individuals on generative AI (GenAI) technologies and services would reach $644bn, almost doubling 2024’s figure1.
While businesses are expected to abandon 30 per cent of AI projects during 20252 that leaves a majority carrying on with their deployments as they learn what works and what doesn’t.
"As the technology becomes cheaper and more accessible, most organisations will have access to the same tools so human creativity and ingenuity will become more highly valued."
As a point of clarification, I use the terms data and information interchangeably in this article to avoid some of the semantic subtleties inherent in the way AI ingests inputs and exhales outputs. However, the key point I want to make is that we are witnessing a paradigm shift in the relationship between humans, technology and information.
Going back to the invention of the metal moveable-type printing press in the 15th century and up to the rise of mass mobile computing almost 20 years ago, new technologies have incrementally distanced humans from the creation, analysis and distribution of information. The digital revolution in the second half of the twentieth century accelerated this as information could be replicated perfectly and at scale with the internet enabling cheap and mass distribution of data.
However, it is the current AI revolution that presents the largest leap in this evolution. The black box we call GenAI can now create new information in text, audio, image and video formats that previously did not exist. Autonomous agents (Agentic AI) have the potential to make decisions on our behalf behind the scenes, sourcing, interpreting and sharing information out of sight.
This article considers some of the opportunities offered by AI for information professionals as the data fuelling this revolution becomes ever-more valuable. AI is data-driven so individuals experienced in sourcing, organising and retrieving information and who understand the value that data has to achieving an organisation’s objectives are in an excellent position to succeed in this rapidly developing landscape.
For the reasons outlined above, it is inevitable that some jobs and roles will disappear but new ones will emerge. As the technology becomes cheaper and more accessible, most organisations will have access to the same tools so human creativity and ingenuity will become more highly valued.
The Commodification of Large Language Models (LLMs)
In January of this year, the Chinese startup, DeepSeek released its open-source reasoning model. This was claimed to offer similar performance to OpenAI’s o1 model, could run twice as quickly and was developed for one tenth the cost. The release was a wake-up call for Frontier model developers at Anthropic, OpenAI, Google and their investors as the barriers to entry for LLM development had been dramatically reduced. This, coupled with the release of high-performing open-source models from Meta and others points to a commodification of LLM products as the cost of using them falls and differences in performance narrow.
The implications of this for the business models of LLM developers and their investors could be severe but for developers building applications on top of such models the future is much brighter. Ultimately, AI-powered applications need to solve real organisational problems and be designed around the needs of different industries and specific use cases.
A key differentiator that offers competitive advantages over more generic models will be the data used in the pre- and post-training of LLMs and the information then processed by AI applications that resides within organisations or that they have unique access to. As unique and proprietary data begins to take a more central role in AI deployment success so the professionals trained in managing that data will become more valuable.
Data quality is key
The old computing aphorism, ‘garbage in, garbage out’ holds true with GenAI. An LLM is only as good as the data it is trained on, and it is becoming apparent that well-structured data optimised for models to ingest makes for better outputs. Many of the foundational models offered by OpenAI, Anthropic and Google were trained on a mix of structured and unstructured data harvested from the public web and other freely available sources.
The scale and scope of this training data has allowed these LLMs to generate impressive outputs based on the patterns observed during the ingestion and training phases. However, here lies one of the core weaknesses with such predictive models: they can generate results from users’ prompts that look accurate, but which may be hallucinations based on patterns observed in the training data.
A technique to reduce the risks of hallucinations and increase the relevancy of results is called retrieval-augmented generation (RAG). This involves fine-tuning an LLM by instructing it to call on additional data when responding to requests. These additional datasets can be added by users that have specific needs and who have access to information directly relevant to those needs.
RAG is widely used in customer service scenarios where a chatbot needs to call on company policies and product information to answer specific queries. It is also used in medical applications where accurate and detailed scientific answers are required as well as a host of other scenarios. In these cases, having well-structured data that follows defined conventions makes for faster and more accurate results. Recent research into the use of RAG based on structured data to improve analysis in the financial services sector saw the accuracy of results improve by 23 per cent while reducing response times to user prompts by over a third3.
The growing acceptance of RAG as a technique for organisations to implement GenAI initiatives focused on their specific needs presents a massive opportunity for information professionals. Using RAG alongside an LLM requires a range of technical skills on the programming side but also relies on having well-prepared data that is appropriately structured and cleaned and a retrieval method such as semantic search and regular data updates to ensure results are relevant and current. Knowledge and information management practitioners will be familiar with all these approaches and well placed to implement a RAG approach to LLM projects.
Sourcing and combining data
RAG provides a practical approach to tailoring LLMs to specific organisational needs. It also offers a route to market differentiation for businesses seeking competitive advantage. As LLMs become increasingly commoditised, the data they use to generate outputs will become more important. When most organisations have access to the same GenAI tools, it will be the ways they are used and customised that will offer the greatest advantages.
Finding and preparing new data sources internally and from outside the organisation will become an increasingly important task. These might be data assets already used for other purposes or generated by a business’s routine activities such as vehicle fleet management, building management systems, payroll data and customer feedback information.
LLMs offer the potential to derive insights from datasets of all sizes at scale and speeds not previously possible. This might include sentiment analysis of customer reviews, creating knowledge bases for employees and customers from product catalogues, summarising human resources (HR) procedures for new employees or identifying bottlenecks in supply chains from inventory and delivery documentation.
The benefits of such initiatives might be improved customer service, reduced operating costs and happier employees, all factors that improve a business’s performance relative to its competitors. In many cases, data will need to be sourced from outside the organisation and combined with internal assets.
Information professionals have long played a central role in identifying, accessing and using external data sources going back to online database providers such as Dialog and DataStar in the 1970s. These skills will become even more important as a new open standard for linking LLMs to multiple data sources, internal and external, begins to take hold. This is the Model Context Protocol (MCP) released by Anthropic in late 2024 and which provides a simple, open standard for building two-way connections between data sources and GenAI tools.
In the same way that the HTTP standard powered the rise of the WWW and SMTP drove the widespread adoption of email, MCP promises to define the connections in the plumbing of a networked AI world. The standard is supported by Microsoft, Google and OpenAI with thousands of publicly available MCP servers allowing users to connect their LLMs to data from providers ranging from Spotify to Salesforce.
MCP can also be used for internal purposes where an organisation needs to connect multiple data sources to their RAG workflows.
Advising and training users
For most organisations, ignoring the opportunities and challenges presented by the current wave of AI innovations is not a viable option. This plays well to a number of skills many information professionals have developed over the years. Helping users identify and use new AI tools, particularly the selection of appropriate LLMs for specific tasks will be an important role.
With new models from the major developers emerging every month, each designed for specific types of tasks, knowing when best to use Google’s Gemini 2.5 Pro, OpenAI’s GPT-4o or Anthropic’s Claude Opus 4 requires a high degree of knowledge about their strengths and weaknesses. This then leads into helping users structure appropriate prompts that will generate the best results.
GPT-4o, for example, can handle input prompts of up to 128,000 tokens, equivalent to approximately 96,000 words, the size of a full novel. So called prompt engineering is a skill in itself and expertise in it makes an enormous difference to the effectiveness and relevance of GenAI results.
I hope this article has provided some food for thought on the opportunities for information professionals presented by the rise of GenAI. There is undoubtedly considerable hype surrounding this new technology, much of it from companies and commentators who stand to benefit from such rapid change. However, I firmly believe that the current wave of AI innovations will fundamentally change how we work with information and how organisations will transform many of their operating processes.
The dotcom bust at the beginning of the 21st century led to many claims that the internet was just a blip in the evolution of computing, a belief that quickly subsided when it became clear that an open network for connecting the world’s computers was transformational. Whatever happens with the current wave of AI companies and their offerings over the coming several years, the technology will continue to evolve and offer advantages in data processing that should not be ignored.

To order your copy of The AI and Data Revolution: Understanding the New Data Landscape visit Martin De Saulles https://tinyurl.com/FacetAIRev. CILIP members can get a 35 per cent discount on all Facet books.
References
1 www.gartner.com/en/newsroom/press-releases/2025-03-31-gartner-forecasts-worldwide-genai-spending-to-reach-644-billion-in-2025
2 https://technologymagazine.com/ai-and-machine-learning/gartner-30-of-gen-ai-projects-to-be-abandoned-by-2025
3 Wang, J., Ding, W. and Zhu, X., 2025. Financial analysis: Intelligent financial data analysis system based on LLM-RAG. arXiv preprint arXiv:2504.06279.