BMC Medical Informatics and Decision Making (Jun 2022)

A hybrid method based on semi-supervised learning for relation extraction in Chinese EMRs

  • Chunming Yang,
  • Dan Xiao,
  • Yuanyuan Luo,
  • Bo Li,
  • Xujian Zhao,
  • Hui Zhang

DOI
https://doi.org/10.1186/s12911-022-01908-4
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background Building a large-scale medical knowledge graphs needs to automatically extract the relations between entities from electronic medical records (EMRs) . The main challenges are the scarcity of available labeled corpus and the identification of complexity semantic relations in text of Chinese EMRs. A hybrid method based on semi-supervised learning is proposed to extract the medical entity relations from small-scale complex Chinese EMRs. Methods The semantic features of sentences are extracted by a residual network and the long dependent information is captured by bidirectional gated recurrent unit. Then the attention mechanism is used to assign weights for the extracted features respectively, and the output of two attention mechanisms is integrated for relation prediction. We adjusted the training process with manually annotated small-scale relational corpus and bootstrapping semi-supervised learning algorithm, and continuously expanded the datasets during the training process. Results We constructed a small corpus of Chinese EMRs relation extraction based on the EMR datasets released at the China Conference on Knowledge Graph and Semantic Computing. The experimental results show that the best F1-score of the proposed method on the overall relation categories reaches 89.78%, which is 13.07% higher than the baseline CNN.

Keywords