Automated information extraction model enhancing traditional Chinese medicine RCT evidence extraction (Evi-BERT): algorithm development and validation

Yizhen Li; Zhongzhi Luan; Yixing Liu; Heyuan Liu; Jiaxing Qi; Dongran Han

doi:10.3389/frai.2024.1454945

Frontiers in Artificial Intelligence (Aug 2024)

Automated information extraction model enhancing traditional Chinese medicine RCT evidence extraction (Evi-BERT): algorithm development and validation

Yizhen Li,
Zhongzhi Luan,
Yixing Liu,
Heyuan Liu,
Jiaxing Qi,
Dongran Han

Affiliations

Yizhen Li: School of Computer Science and Engineering, Beihang University, Beijing, China
Zhongzhi Luan: School of Computer Science and Engineering, Beihang University, Beijing, China
Yixing Liu: School of Management, Beijing University of Chinese Medicine, Beijing, China
Heyuan Liu: School of Life and Science, Beijing University of Chinese Medicine, Beijing, China
Jiaxing Qi: School of Computer Science and Engineering, Beihang University, Beijing, China
Dongran Han: School of Life and Science, Beijing University of Chinese Medicine, Beijing, China

DOI: https://doi.org/10.3389/frai.2024.1454945
Journal volume & issue: Vol. 7

Abstract

Read online

BackgroundIn the field of evidence-based medicine, randomized controlled trials (RCTs) are of critical importance for writing clinical guidelines and providing guidance to practicing physicians. Currently, RCTs rely heavily on manual extraction, but this method has data breadth limitations and is less efficient.ObjectivesTo expand the breadth of data and improve the efficiency of obtaining clinical evidence, here, we introduce an automated information extraction model for traditional Chinese medicine (TCM) RCT evidence extraction.MethodsWe adopt the Evidence-Bidirectional Encoder Representation from Transformers (Evi-BERT) for automated information extraction, which is combined with rule extraction. Eleven disease types and 48,523 research articles from the China National Knowledge Infrastructure (CNKI), WanFang Data, and VIP databases were selected as the data source for extraction. We then constructed a manually annotated dataset of TCM clinical literature to train the model, including ten evidence elements and 24,244 datapoints. We chose two models, BERT-CRF and BiLSTM-CRF, as the baseline, and compared the training effects with Evi-BERT and Evi-BERT combined with rule expression (RE).ResultsWe found that Evi-BERT combined with RE achieved the best performance (precision score = 0.926, Recall = 0.952, F1 score = 0.938) and had the best robustness. We totally summarized 113 pieces of rule datasets in the regulation extraction procedure. Our model dramatically expands the amount of data that can be searched and greatly improves efficiency without losing accuracy.ConclusionOur work provided an intelligent approach to extracting clinical evidence for TCM RCT data. Our model can help physicians reduce the time spent reading journals and rapidly speed up the screening of clinical trial evidence to help generate accurate clinical reference guidelines. Additionally, we hope the structured clinical evidence and structured knowledge extracted from this study will help other researchers build large language models in TCM.

Published in Frontiers in Artificial Intelligence

ISSN: 2624-8212 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.frontiersin.org/journals/artificial-intelligence#

About the journal

Abstract

Keywords