Knowledge graph construction for heart failure using large language models with prompt engineering

Tianhan Xu; Tianhan Xu; Yixun Gu; Mantian Xue; Renjie Gu; Bin Li; Xiang Gu

doi:10.3389/fncom.2024.1389475

Frontiers in Computational Neuroscience (Jul 2024)

Knowledge graph construction for heart failure using large language models with prompt engineering

Tianhan Xu,
Tianhan Xu,
Yixun Gu,
Mantian Xue,
Renjie Gu,
Bin Li,
Xiang Gu

Affiliations

Tianhan Xu: School of Information Engineering, Yangzhou University, Yangzhou, Jiangsu, China
Tianhan Xu: School of Information Engineering, Yangzhou Polytechnic Institute, Yangzhou, Jiangsu, China
Yixun Gu: Department of Radiation Oncology, Yangzhou Second People's Hospital, Yangzhou, Jiangsu, China
Mantian Xue: School of Information Engineering, Yangzhou University, Yangzhou, Jiangsu, China
Renjie Gu: Department of Cardiovascular, Northern Jiangsu Province People Hospital of Yangzhou University, Yangzhou, Jiangsu, China
Bin Li: School of Information Engineering, Yangzhou University, Yangzhou, Jiangsu, China
Xiang Gu: Department of Cardiovascular, Northern Jiangsu Province People Hospital of Yangzhou University, Yangzhou, Jiangsu, China

DOI: https://doi.org/10.3389/fncom.2024.1389475
Journal volume & issue: Vol. 18

Abstract

Read online

IntroductionConstructing an accurate and comprehensive knowledge graph of specific diseases is critical for practical clinical disease diagnosis and treatment, reasoning and decision support, rehabilitation, and health management. For knowledge graph construction tasks (such as named entity recognition, relation extraction), classical BERT-based methods require a large amount of training data to ensure model performance. However, real-world medical annotation data, especially disease-specific annotation samples, are very limited. In addition, existing models do not perform well in recognizing out-of-distribution entities and relations that are not seen in the training phase.MethodIn this study, we present a novel and practical pipeline for constructing a heart failure knowledge graph using large language models and medical expert refinement. We apply prompt engineering to the three phases of schema design: schema design, information extraction, and knowledge completion. The best performance is achieved by designing task-specific prompt templates combined with the TwoStepChat approach.ResultsExperiments on two datasets show that the TwoStepChat method outperforms the Vanillia prompt and outperforms the fine-tuned BERT-based baselines. Moreover, our method saves 65% of the time compared to manual annotation and is better suited to extract the out-of-distribution information in the real world.

Published in Frontiers in Computational Neuroscience

ISSN: 1662-5188 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: http://www.frontiersin.org/computational_neuroscience

About the journal

Abstract

Keywords