IEEE Access (Jan 2023)

A Dynamic Optimization-Based Ensemble Learning Method for Traditional Chinese Medicine Named Entity Recognition

  • Zongyao Zhao,
  • Yue Qian,
  • Qirui Liu,
  • Jiaxu Chen,
  • Yueyun Liu

DOI
https://doi.org/10.1109/ACCESS.2023.3313608
Journal volume & issue
Vol. 11
pp. 99101 – 99110

Abstract

Read online

The importance of named entity identification in traditional Chinese medicine (TCM) as the basis for supporting downstream tasks is receiving increasing attention. Deep learning-based methods have been widely used for related tasks. However, most current methods do not deal well with two common TCM entity recognition problems: an unbalanced number of entities and sparse entities. To solve these problems, we propose an ensemble learning method based on dynamic optimization. In this study, we first use bidirectional encoder representations from transformers (BERT) to extract word vectors and then further extract features based on BERT-bidirectional long short-term memory (BiLSTM). Then, we dynamically adjust the entity class and fusion weights of ensemble learning according to the entity distribution and sparsity of each batch. Finally, the prediction results are output through the conditional random field (CRF) layer. This approach allows the model to dynamically focus on difficult samples and to improve the update weights of the most beneficial learning tasks. In addition, we introduce a reduction factor to reduce the magnitude of the parameter updates when the entities are sparse. This prevents the model from being unduly disrupted by nonentity information. Therefore, our model can effectively reduce the negative impact of unbalanced numbers of entities and sparse entities. The experimental results show that our model achieves the best results on a publicly available TCM entity recognition dataset and has a faster convergence rate than the baseline model. Compared to the baseline model BERT-BiLSTM-CRF, our method improves the F1-score by 0.56, further demonstrating its effectiveness.

Keywords