Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning

Zhiyu Zhang; Arbee L. P. Chen

doi:10.1186/s12859-022-04994-3

BMC Bioinformatics (Nov 2022)

Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning

Zhiyu Zhang,
Arbee L. P. Chen

Affiliations

Zhiyu Zhang: Department of Computer Science, National Tsing Hua University
Arbee L. P. Chen: Department of Computer Science, National Tsing Hua University

DOI: https://doi.org/10.1186/s12859-022-04994-3
Journal volume & issue: Vol. 23, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Background Biomedical named entity recognition (BioNER) is a basic and important task for biomedical text mining with the purpose of automatically recognizing and classifying biomedical entities. The performance of BioNER systems directly impacts downstream applications. Recently, deep neural networks, especially pre-trained language models, have made great progress for BioNER. However, because of the lack of high-quality and large-scale annotated data and relevant external knowledge, the capability of the BioNER system remains limited. Results In this paper, we propose a novel fully-shared multi-task learning model based on the pre-trained language model in biomedical domain, namely BioBERT, with a new attention module to integrate the auto-processed syntactic information for the BioNER task. We have conducted numerous experiments on seven benchmark BioNER datasets. The proposed best multi-task model obtains F1 score improvements of 1.03% on BC2GM, 0.91% on NCBI-disease, 0.81% on Linnaeus, 1.26% on JNLPBA, 0.82% on BC5CDR-Chemical, 0.87% on BC5CDR-Disease, and 1.10% on Species-800 compared to the single-task BioBERT model. Conclusion The results demonstrate our model outperforms previous studies on all datasets. Further analysis and case studies are also provided to prove the importance of the proposed attention module and fully-shared multi-task learning method used in our model.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords