Applied Sciences (Jul 2024)

Fd-CasBGRel: A Joint Entity–Relationship Extraction Model for Aquatic Disease Domains

  • Hongbao Ye,
  • Lijian Lv,
  • Chengquan Zhou,
  • Dawei Sun

DOI
https://doi.org/10.3390/app14146147
Journal volume & issue
Vol. 14, no. 14
p. 6147

Abstract

Read online

Entity–relationship extraction plays a pivotal role in the construction of domain knowledge graphs. For the aquatic disease domain, however, this relationship extraction is a formidable task because of overlapping relationships, data specialization, limited feature fusion, and imbalanced data samples, which significantly weaken the extraction’s performance. To tackle these challenges, this study leverages published books and aquatic disease websites as data sources to compile a text corpus, establish datasets, and then propose the Fd-CasBGRel model specifically tailored to the aquatic disease domain. The model uses the Casrel cascading binary tagging framework to address relationship overlap; utilizes task fine-tuning for better performance on aquatic disease data; trains on specialized aquatic disease corpora to improve adaptability; and integrates the BRC feature fusion module—which incorporates self-attention mechanisms, BiLSTM, relative position encoding, and conditional layer normalization—to leverage entity position and context for enhanced fusion. Further, it replaces the traditional cross-entropy loss function with the GHM loss function to mitigate category imbalance issues. The experimental results indicate that the F1 score of the Fd-CasBGRel on the aquatic disease dataset reached 84.71%, significantly outperforming several benchmark models. This model effectively addresses the challenges of ternary extraction’s low performance caused by high data specialization, insufficient feature integration, and data imbalances. The model achieved the highest F1 score of 86.52% on the overlapping relationship category dataset, demonstrating its robust capability in extracting overlapping data. Furthermore, We also conducted comparative experiments on the publicly available dataset WebNLG, and the model in this paper obtained the best performance metrics compared to the rest of the comparative models, indicating that the model has good generalization ability.

Keywords