BMC Medical Informatics and Decision Making (Jul 2024)

Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer

  • Dongmei Li,
  • Yu Yang,
  • Jinman Cui,
  • Xianghao Meng,
  • Jintao Qu,
  • Zhuobin Jiang,
  • Yufeng Zhao

DOI
https://doi.org/10.1186/s12911-024-02577-1
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background Most Chinese joint entity and relation extraction tasks in medicine involve numerous nested entities, overlapping relations, and other challenging extraction issues. In response to these problems, some traditional methods decompose the joint extraction task into multiple steps or multiple modules, resulting in local dependency in the meantime. Methods To alleviate this issue, we propose a joint extraction model of Chinese medical entities and relations based on RoBERTa and single-module global pointer, namely RSGP, which formulates joint extraction as a global pointer linking problem. Considering the uniqueness of Chinese language structure, we introduce the RoBERTa-wwm pre-trained language model at the encoding layer to obtain a better embedding representation. Then, we represent the input sentence as a third-order tensor and score each position in the tensor to prepare for the subsequent process of decoding the triples. In the end, we design a novel single-module global pointer decoding approach to alleviate the generation of redundant information. Specifically, we analyze the decoding process of single character entities individually, improving the time and space performance of RSGP to some extent. Results In order to verify the effectiveness of our model in extracting Chinese medical entities and relations, we carry out the experiments on the public dataset, CMeIE. Experimental results show that RSGP performs significantly better on the joint extraction of Chinese medical entities and relations, and achieves state-of-the-art results compared with baseline models. Conclusion The proposed RSGP can effectively extract entities and relations from Chinese medical texts and help to realize the structure of Chinese medical texts, so as to provide high-quality data support for the construction of Chinese medical knowledge graphs.

Keywords