Jisuanji kexue yu tansuo (Jul 2024)
Study on Entity Extraction Method for Pharmaceutical Instructions Based on Pretrained Models
Abstract
The extraction of medical entities from drug instructions provides fundamental data for the intelligent retrieval of medication information and the construction of medical knowledge graphs, with remarkable research significance and practical value. However, the heterogeneity of medical entities in drug instructions for treating different diseases poses challenges in model training, which requires a large number of annotated samples. To address this issue, a “large model + small model” design approach is used in this research. Specifically, this research proposes a part-label named entity recognition model based on a pre-trained model, which first employs a pre-trained language model fine-tuned on a small number of samples to extract partial entities from drug instructions, and then utilizes a Transformer- based part-label model to further optimize the entity extraction results. The part-label model encodes the input text, identified partial entities, and entity labels using a planar lattice structure, extracts feature representations using Transformer, and predicts entity labels through a conditional random fields (CRF) layer. To reduce the need for annotated training data, a sample data augmentation method is proposed using entity masking strategy on labeled samples to train the part-label model. Experimental results validate the feasibility of the “large model + small model” approach in medical entity extraction, with precision (P), recall (R), and F1 score of 85.0%, 86.1%, and 85.6%, respectively, demonstrating superior performance compared with other learning methods.
Keywords