IEEE Access (Jan 2024)
ANeTCM: A Novel MRC Framework for Traditional Chinese Medicine Named Entity Recognition
Abstract
Traditional Chinese medicine (TCM) named entity recognition for supporting downstream tasks is receiving increasing attention. However, mainstream named entity recognition models applied to the TCM domain are still affected by the following two challenges: lack of domain knowledge and imbalance between entity classes. Therefore, we propose ANeTCM, a model that enhances both domain knowledge and inter-entity balance. Specifically, we first use a large number of TCM medical case data to continuously pretrain Roberta and enhance its domain knowledge. Secondly, the sequence annotation is converted into a machine reading comprehension task, and gated linear units are incorporated to further enhance the model’s feature learning capability. Finally, the weights of the samples are adjusted using a normal distribution to address the imbalance of entity classes. We conducted extensive experiments on two TCM named entity recognition datasets and selected several competitive models. The experimental results show the effectiveness of our model.
Keywords