Discover Artificial Intelligence (Dec 2024)

Large language models to process, analyze, and synthesize biomedical texts: a scoping review

  • Simona Emilova Doneva,
  • Sijing Qin,
  • Beate Sick,
  • Tilia Ellendorff,
  • Jean-Philippe Goldman,
  • Gerold Schneider,
  • Benjamin Victor Ineichen

DOI
https://doi.org/10.1007/s44163-024-00197-2
Journal volume & issue
Vol. 4, no. 1
pp. 1 – 21

Abstract

Read online

Abstract The advent of large language models (LLMs) such as BERT and, more recently, GPT, is transforming our approach of analyzing and understanding biomedical texts. To stay informed about the latest advancements in this area, there is a need for up-to-date summaries on the role of LLM in Natural Language Processing (NLP) of biomedical texts. Thus, this scoping review aims to provide a detailed overview of the current state of biomedical NLP research and its applications, with a special focus on the evolving role of LLMs. We conducted a systematic search of PubMed, EMBASE, and Google Scholar for studies and conference proceedings published from 2017 to December 19, 2023, that develop or utilize LLMs for NLP tasks in biomedicine. We evaluated the risk of bias in these studies using a 3-item checklist. From 13,823 references, we selected 199 publications and conference proceedings for our review. LLMs are being applied to a wide array of tasks in the biomedical field, including knowledge management, text mining, drug discovery, and evidence synthesis. Prominent among these tasks are text classification, relation extraction, and named entity recognition. Although BERT-based models remain prevalent, the use of GPT-based models has substantially increased since 2023. We conclude that, despite offering opportunities to manage the growing volume of biomedical data, LLMs also present challenges, particularly in clinical medicine and evidence synthesis, such as issues with transparency and privacy concerns.

Keywords