Smart Harvesting brings artificial intelligence and sophisticated automation to the process of creating comprehensive research repositories and updating researcher profiles, which saves time and reduces manual work. This technology makes it possible for institutions to showcase the work of affiliated researchers completely and accurately, in one place and across all disciplines.
Matching research output to the right researcher, and ensuring you included everything, is often a complicated exercise. Smart Harvesting accomplishes this goal with unique machine learning algorithms for tackling two key tasks: matching scholars to their work; and populating the research information hub with this information at scale.
What the machine learns
For effective Smart Harvesting, machine learning identifies and assesses three main data properties: individual names; general information from the outputs; and semantic content.
The “name” property includes the use of initials or nicknames, name frequency, and name variants. “General” properties include researcher affiliation, country, academic discipline, years of activity, and more. The “semantic” properties consist of the frequency, absence or combination of key textual elements in abstracts, titles, journal names, and the like.
In assessing the semantic features, Smart Harvesting uses a natural language processing to examine a large corpus of texts. The result is a multidimensional, intricate map of the relative proximity of selected words.
Smart Harvesting gets smarter over time
Presented a large set of output records and researcher profiles, the Smart Harvesting technology can begin to identify patterns of information. It learns all likely variants of a researcher’s name, their affiliations, research domains, years of professional activity, previously known assets, and other relevant data.
From output records (e.g., scholarly indexes, a national or disciplinary repository), Smart Harvesting identifies emerging patterns in the titles, abstracts, subject matter, co-authors, journals, full texts, metadata, and author affiliations. The natural language processing technology, called “word embedding”, reveals how close to one another selected words are likely to be, and in what combinations, across all of the records examined.
With its machine learning underway, Smart Harvesting can quickly assess the name, general information and semantic features of a novel research output record in connection with a given researcher. Cross-checking all the data points, Smart Harvesting can then assess the relative likelihood that a given output is the product of a specific researcher. The process is rapid and provides academic institutions clear guidance to correctly match researchers and their work.
As the Smart Harvesting process is repeatedly run, periodically checking for new outputs, it is also continuously collecting and analyzing additional data. The more such intelligence-gathering forays undertaken, the more the technology is able to accurately and consistently identify each researcher’s work.
With its intelligent automation, the Smart Harvesting technology can also eliminate the need for manual intervention to remove null or indeterminate results. However, the institution can always set conditions for automatic or manual approval of candidate records, adding them to the research information hub based on the strength of the match ranking or any other desired variable.
The machine learning at the heart of Smart Harvesting discovers the relationship between an output and a researcher with progressively greater accuracy. It also builds a growing interconnected map of related scholarly information, which makes it possible to link research outputs, activities, and entities for a more comprehensive understanding of research projects and their outcomes.
Read this paper to learn more about how machine learning can help grow your institution’s research footprint.
Ex Libris Esploro empowers research institutions with integrated library systems. Harvest & showcase research outputs seamlessly: publications, grants, datasets, more. One platform, one truth.
June 8, 2021