Mathematical Biosciences and Engineering (Jan 2024)

A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks

  • Shuilong Zou,
  • Zhaoyang Liu,
  • Kaiqi Wang,
  • Jun Cao,
  • Shixiong Liu,
  • Wangping Xiong ,
  • Shaoyi Li

DOI
https://doi.org/10.3934/mbe.2024064
Journal volume & issue
Vol. 21, no. 1
pp. 1489 – 1507

Abstract

Read online

Effective information extraction of pharmaceutical texts is of great significance for clinical research. The ancient Chinese medicine text has streamlined sentences and complex semantic relationships, and the textual relationships may exist between heterogeneous entities. The current mainstream relationship extraction model does not take into account the associations between entities and relationships when extracting, resulting in insufficient semantic information to form an effective structured representation. In this paper, we propose a heterogeneous graph neural network relationship extraction model adapted to traditional Chinese medicine (TCM) text. First, the given sentence and predefined relationships are embedded by bidirectional encoder representation from transformers (BERT fine-tuned) word embedding as model input. Second, a heterogeneous graph network is constructed to associate words, phrases, and relationship nodes to obtain the hidden layer representation. Then, in the decoding stage, two-stage subject-object entity identification method is adopted, and the identifier adopts a binary classifier to locate the start and end positions of the TCM entities, identifying all the subject-object entities in the sentence, and finally forming the TCM entity relationship group. Through the experiments on the TCM relationship extraction dataset, the results show that the precision value of the heterogeneous graph neural network embedded with BERT is 86.99% and the F1 value reaches 87.40%, which is improved by 8.83% and 10.21% compared with the relationship extraction models CNN, Bert-CNN, and Graph LSTM.

Keywords