Taiyuan Ligong Daxue xuebao (Jan 2024)

A Clinical Event Extraction Method Based on a High-confidence Pseudo-label Data Selection Algorithm

  • Yuanyuan LUO,
  • Chunming YANG,
  • Bo LI,
  • Hui ZHANG,
  • Xujian ZHAO

DOI
https://doi.org/10.16355/j.tyut.1007-9432.2023BD011
Journal volume & issue
Vol. 55, no. 1
pp. 204 – 213

Abstract

Read online

Purposes Event extraction is a prerequisite for building high-quality event knowledge graphs. The dependency of event elements exists in the process of clinical event extraction. Existing methods fail to accurately identify event elements and combine them into events, and the amount of available clinical event tagging data is limited. These problems bring great challenges to the event extraction task. Methods In this research, clinical event is extracted and modelled as an entity recognition model, and a Chinese medical event extraction method incorporating multiple features is proposed: BERT-MCRF. In this method, Bidirectional Encoder Representation from Transformers(BERT) is used to construct the embedding and feature extraction parts of the model, multiple word sliding window features in the Conditional Random Fields(CRF) layer are added, then BERT-MCRF is used as a base experiment for semi-supervised experiments, and a high confidence pseudo-labeled data is proposed. The selection algorithm is used as a condition to filter the data, and 300 data of higher quality are obtained and merged with the original data. Finally, 1 700 corpus are constructed and the model is retrained. Findings The overall F1 value of the BERT-MCRF model on the three attribute entities reaches 80.21%, which is 15.11% better than that of the classical Bi-directional Long Short Term Memory-Conditional Random Fields (BiLSTM-CRF) model; with the model retrained by the semi-supervised idea, the final F1 value reaches 81.56%, which is 1.35% higher than the original BERT-MCRF.

Keywords