Biomedical Flat and Nested Named Entity Recognition: Methods, Challenges, and Advances

Yesol Park; Gyujin Son; Mina Rho

doi:10.3390/app14209302

Applied Sciences (Oct 2024)

Biomedical Flat and Nested Named Entity Recognition: Methods, Challenges, and Advances

Yesol Park,
Gyujin Son,
Mina Rho

Affiliations

Yesol Park: Department of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
Gyujin Son: Department of Artificial Intelligence, Hanyang University, Seoul 04763, Republic of Korea
Mina Rho: Department of Computer Science, Hanyang University, Seoul 04763, Republic of Korea

DOI: https://doi.org/10.3390/app14209302
Journal volume & issue: Vol. 14, no. 20
p. 9302

Abstract

Read online

Biomedical named entity recognition (BioNER) aims to identify and classify biomedical entities (i.e., diseases, chemicals, and genes) from text into predefined classes. This process serves as an important initial step in extracting biomedical information from textual sources. Considering the structure of the entities it addresses, BioNER tasks are divided into two categories: flat NER, where entities are non-overlapping, and nested NER, which identifies entities embedded within another. While early studies primarily addressed flat NER, recent advances in neural models have enabled more sophisticated approaches to nested NER, gaining increasing relevance in the biomedical field, where entity relationships are often complex and hierarchically structured. This review, thus, focuses on the latest progress in large-scale pre-trained language model-based approaches, which have shown the significantly improved performance of NER. The state-of-the-art flat NER models have achieved average F1-scores of 84% on BC2GM, 89% on NCBI Disease, and 92% on BC4CHEM, while nested NER models have reached 80% on the GENIA dataset, indicating room for enhancement. In addition, we discuss persistent challenges, including inconsistencies of named entities annotated across different corpora and the limited availability of named entities of various entity types, particularly for multi-type or nested NER. To the best of our knowledge, this paper is the first comprehensive review of pre-trained language model-based flat and nested BioNER models, providing a categorical analysis among the methods and related challenges for future research and development in the field.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords