PICO entity extraction for preclinical animal literature

Qianying Wang; Jing Liao; Mirella Lapata; Malcolm Macleod

doi:10.1186/s13643-022-02074-4

Systematic Reviews (Sep 2022)

PICO entity extraction for preclinical animal literature

Qianying Wang,
Jing Liao,
Mirella Lapata,
Malcolm Macleod

Affiliations

Qianying Wang: CCBS, Edinburgh Medical School, University of Edinburgh
Jing Liao: CCBS, Edinburgh Medical School, University of Edinburgh
Mirella Lapata: ILCC, School of Informatics, University of Edinburgh
Malcolm Macleod: CCBS, Edinburgh Medical School, University of Edinburgh

DOI: https://doi.org/10.1186/s13643-022-02074-4
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background Natural language processing could assist multiple tasks in systematic reviews to reduce workflow, including the extraction of PICO elements such as study populations, interventions, comparators and outcomes. The PICO framework provides a basis for the retrieval and selection for inclusion of evidence relevant to a specific systematic review question, and automatic approaches to PICO extraction have been developed particularly for reviews of clinical trial findings. Considering the difference between preclinical animal studies and clinical trials, developing separate approaches is necessary. Facilitating preclinical systematic reviews will inform the translation from preclinical to clinical research. Methods We randomly selected 400 abstracts from the PubMed Central Open Access database which described in vivo animal research and manually annotated these with PICO phrases for Species, Strain, methods of Induction of disease model, Intervention, Comparator and Outcome. We developed a two-stage workflow for preclinical PICO extraction. Firstly we fine-tuned BERT with different pre-trained modules for PICO sentence classification. Then, after removing the text irrelevant to PICO features, we explored LSTM-, CRF- and BERT-based models for PICO entity recognition. We also explored a self-training approach because of the small training corpus. Results For PICO sentence classification, BERT models using all pre-trained modules achieved an F1 score of over 80%, and models pre-trained on PubMed abstracts achieved the highest F1 of 85%. For PICO entity recognition, fine-tuning BERT pre-trained on PubMed abstracts achieved an overall F1 of 71% and satisfactory F1 for Species (98%), Strain (70%), Intervention (70%) and Outcome (67%). The score of Induction and Comparator is less satisfactory, but F1 of Comparator can be improved to 50% by applying self-training. Conclusions Our study indicates that of the approaches tested, BERT pre-trained on PubMed abstracts is the best for both PICO sentence classification and PICO entity recognition in the preclinical abstracts. Self-training yields better performance for identifying comparators and strains.

Published in Systematic Reviews

ISSN: 2046-4053 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine
Website: https://systematicreviewsjournal.biomedcentral.com

About the journal

Abstract

Keywords