Large language models to process, analyze, and synthesize biomedical texts: a scoping review

Simona Emilova Doneva; Sijing Qin; Beate Sick; Tilia Ellendorff; Jean-Philippe Goldman; Gerold Schneider; Benjamin Victor Ineichen

doi:10.1007/s44163-024-00197-2

Discover Artificial Intelligence (Dec 2024)

Large language models to process, analyze, and synthesize biomedical texts: a scoping review

Simona Emilova Doneva,
Sijing Qin,
Beate Sick,
Tilia Ellendorff,
Jean-Philippe Goldman,
Gerold Schneider,
Benjamin Victor Ineichen

Affiliations

Simona Emilova Doneva: Center for Reproducible Science, University of Zurich
Sijing Qin: Center for Reproducible Science, University of Zurich
Beate Sick: Division of Biostatistics, EBPI, University of Zurich
Tilia Ellendorff: Department of Computational Linguistics, University of Zurich
Jean-Philippe Goldman: Department of Computational Linguistics, University of Zurich
Gerold Schneider: Department of Computational Linguistics, University of Zurich
Benjamin Victor Ineichen: Center for Reproducible Science, University of Zurich

DOI: https://doi.org/10.1007/s44163-024-00197-2
Journal volume & issue: Vol. 4, no. 1
pp. 1 – 21

Abstract

Read online

Abstract The advent of large language models (LLMs) such as BERT and, more recently, GPT, is transforming our approach of analyzing and understanding biomedical texts. To stay informed about the latest advancements in this area, there is a need for up-to-date summaries on the role of LLM in Natural Language Processing (NLP) of biomedical texts. Thus, this scoping review aims to provide a detailed overview of the current state of biomedical NLP research and its applications, with a special focus on the evolving role of LLMs. We conducted a systematic search of PubMed, EMBASE, and Google Scholar for studies and conference proceedings published from 2017 to December 19, 2023, that develop or utilize LLMs for NLP tasks in biomedicine. We evaluated the risk of bias in these studies using a 3-item checklist. From 13,823 references, we selected 199 publications and conference proceedings for our review. LLMs are being applied to a wide array of tasks in the biomedical field, including knowledge management, text mining, drug discovery, and evidence synthesis. Prominent among these tasks are text classification, relation extraction, and named entity recognition. Although BERT-based models remain prevalent, the use of GPT-based models has substantially increased since 2023. We conclude that, despite offering opportunities to manage the growing volume of biomedical data, LLMs also present challenges, particularly in clinical medicine and evidence synthesis, such as issues with transparency and privacy concerns.

Published in Discover Artificial Intelligence

ISSN: 2731-0809 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44163

About the journal

Abstract

Keywords