Search suggestions:

Enriching Bibliographic Records with AI – Clarivate Leading an Industry Revolution

Share:

March 16, 2026 | 4 min read |

High-quality bibliographic metadata has always been the backbone of discovery in libraries. Accurate, consistent, and descriptive records enable researchers to find relevant resources efficiently, and to understand them in context. Today, artificial intelligence is opening up new possibilities for enriching bibliographic records at scale, and at Clarivate, we are leading this transformation.

Broadly speaking, we are pursuing two complementary approaches to AI-driven enrichment: enriching records using portions of the full text, and enriching them using the descriptive metadata already present in MARC records. Together, these approaches allow us to enhance discovery even when full text is not available.

 

Enrichment Using Portions of Full Text

When full text, or substantial excerpts of it, are available, AI can analyze the content itself to generate richer and more precise metadata. Using carefully designed prompts, our solution generates the following MARC fields:

  1. Summary
  2. Tables of contents
  3. Subject headings
  4. Language
  5. LC Classification and Dewey Call Numbers

This enrichment is currently applied to non-fiction resources and is designed to improve retrieval and reduce cataloguing efforts. To date, we have enriched over 400,000 titles, and this number continues to grow as we expand coverage in the Community Zone, making it richer and more discoverable for libraries worldwide.

 

Enrichment Using Existing MARC Metadata

Not all resources have full text available for analysis. In these situations, we apply AI to the descriptive metadata already present in the bibliographic record. Our approach focuses on analysing key descriptive fields (typically the title, subtitle, author, description, and table of contents), and generating subject headings from that information.

The main challenge in this process is ensuring that the input metadata is sufficiently rich and informative to produce relevant and accurate subjects. To address this, we apply a series of rules to filter out metadata that is not meaningful enough. We also use AI at this stage to evaluate the quality and informativeness of the input and to decide whether it is suitable for subject generation.

For example, if the Table of Contents includes just numbers (such as chapter 1, chapter 2 etc), we will exclude it from the prompt to generate the Subject Headings.

By combining librarian-curated metadata with AI-driven validation and enrichment, Clarivate enables libraries to unlock additional value from their existing records, without disrupting established workflows, while improving consistency, depth, and discovery across collections.

 

Emerging Challenge: Cataloguing AI-Generated and AI-Edited Content

As AI tools increasingly support the creation of metadata, they are simultaneously becoming creators and editors of the content itself. This shift introduces a new set of challenges for cataloguing and discovery. Across the library and research communities, several important questions are now under active discussion, including:

  • How can we reliably identify articles or books that were generated or edited by AI?
  • What methods or tools can be used to detect AI involvement?
  • What are the institutional needs and policies, and how would they affect this trend?

These questions do not yet have standardized answers, and practices are still evolving.

As these developments become increasingly common across the academic ecosystem, we are actively monitoring the evolving professional discussions on cataloguing practices for AI-generated materials and participating in community conversations to better understand emerging requirements and expectations.

 

Get
Started