Performance of 4 Pre-Trained Sentence Transformer Models in the Semantic Query of a Systematic Review Dataset on Peri-Implantitis

Carlo Galli; Nikolaos Donos; Elena Calciolari

doi:10.3390/info15020068

Information (Jan 2024)

Performance of 4 Pre-Trained Sentence Transformer Models in the Semantic Query of a Systematic Review Dataset on Peri-Implantitis

Carlo Galli,
Nikolaos Donos,
Elena Calciolari

Affiliations

Carlo Galli: Histology and Embryology Laboratory, Department of Medicine and Surgery, University of Parma, Via Volturno 39, 43126 Parma, Italy
Nikolaos Donos: Centre for Oral Clinical Research, Institute of Dentistry, Faculty of Medicine and Dentistry, Queen Mary University of London, London E1 2AD, UK
Elena Calciolari: Centre for Oral Clinical Research, Institute of Dentistry, Faculty of Medicine and Dentistry, Queen Mary University of London, London E1 2AD, UK

DOI: https://doi.org/10.3390/info15020068
Journal volume & issue: Vol. 15, no. 2
p. 68

Abstract

Read online

Systematic reviews are cumbersome yet essential to the epistemic process of medical science. Finding significant reports, however, is a daunting task because the sheer volume of published literature makes the manual screening of databases time-consuming. The use of Artificial Intelligence could make literature processing faster and more efficient. Sentence transformers are groundbreaking algorithms that can generate rich semantic representations of text documents and allow for semantic queries. In the present report, we compared four freely available sentence transformer pre-trained models (all-MiniLM-L6-v2, all-MiniLM-L12-v2, all-mpnet-base-v2, and All-distilroberta-v1) on a convenience sample of 6110 articles from a published systematic review. The authors of this review manually screened the dataset and identified 24 target articles that addressed the Focused Questions (FQ) of the review. We applied the four sentence transformers to the dataset and, using the FQ as a query, performed a semantic similarity search on the dataset. The models identified similarities between the FQ and the target articles to a varying degree, and, sorting the dataset by semantic similarities using the best-performing model (all-mpnet-base-v2), the target articles could be found in the top 700 papers out of the 6110 dataset. Our data indicate that the choice of an appropriate pre-trained model could remarkably reduce the number of articles to screen and the time to completion for systematic reviews.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords