Journal of Harbin University of Science and Technology (Aug 2021)
Few-shot Named Entity Recognition for Medical Text
Abstract
Aiming at the problem that medical text named entity recognition lacks sufficient labeled data,a newly named entity recognition deep neural network and data enhancement method is proposed. First of all,the Bert word vector is extended with pinyin and strokes of Chinese characters to make it contain more useful information. Then the named entity recognition model and the word segmentation model are jointly trained to enhance the model's ability to recognize entity boundaries. Finally,an improved data enhancement method is used to process the training data,which can increase the recognition effect of the model on named entities while avoiding overfitting of the model. The experimental results on the electronic medical record text provided by CCKS-2019 show that the proposed method can effectively improve the accuracy of named entity recognition in the case of small samples and the recognition rate can still be maintained without a significant decrease when the training data is reduced by half.
Keywords