Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool

Allison Gates; Michelle Gates; Shannon Sim; Sarah A. Elliott; Jennifer Pillay; Lisa Hartling

doi:10.1186/s12874-021-01354-2

BMC Medical Research Methodology (Aug 2021)

Creating efficiencies in the extraction of data from randomized trials: a prospective evaluation of a machine learning and text mining tool

Allison Gates,
Michelle Gates,
Shannon Sim,
Sarah A. Elliott,
Jennifer Pillay,
Lisa Hartling

Affiliations

Allison Gates: Department of Pediatrics and the Alberta Research Centre for Health Evidence, University of Alberta, Edmonton Clinic Health Academy
Michelle Gates: Department of Pediatrics and the Alberta Research Centre for Health Evidence, University of Alberta, Edmonton Clinic Health Academy
Shannon Sim: Department of Pediatrics and the Alberta Research Centre for Health Evidence, University of Alberta, Edmonton Clinic Health Academy
Sarah A. Elliott: Department of Pediatrics and the Alberta Research Centre for Health Evidence, University of Alberta, Edmonton Clinic Health Academy
Jennifer Pillay: Department of Pediatrics and the Alberta Research Centre for Health Evidence, University of Alberta, Edmonton Clinic Health Academy
Lisa Hartling: Department of Pediatrics and the Alberta Research Centre for Health Evidence, University of Alberta, Edmonton Clinic Health Academy

DOI: https://doi.org/10.1186/s12874-021-01354-2
Journal volume & issue: Vol. 21, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background Machine learning tools that semi-automate data extraction may create efficiencies in systematic review production. We evaluated a machine learning and text mining tool’s ability to (a) automatically extract data elements from randomized trials, and (b) save time compared with manual extraction and verification. Methods For 75 randomized trials, we manually extracted and verified data for 21 data elements. We uploaded the randomized trials to an online machine learning and text mining tool, and quantified performance by evaluating its ability to identify the reporting of data elements (reported or not reported), and the relevance of the extracted sentences, fragments, and overall solutions. For each randomized trial, we measured the time to complete manual extraction and verification, and to review and amend the data extracted by the tool. We calculated the median (interquartile range [IQR]) time for manual and semi-automated data extraction, and overall time savings. Results The tool identified the reporting (reported or not reported) of data elements with median (IQR) 91% (75% to 99%) accuracy. Among the top five sentences for each data element at least one sentence was relevant in a median (IQR) 88% (83% to 99%) of cases. Among a median (IQR) 90% (86% to 97%) of relevant sentences, pertinent fragments had been highlighted by the tool; exact matches were unreliable (median (IQR) 52% [33% to 73%]). A median 48% of solutions were fully correct, but performance varied greatly across data elements (IQR 21% to 71%). Using ExaCT to assist the first reviewer resulted in a modest time savings compared with manual extraction by a single reviewer (17.9 vs. 21.6 h total extraction time across 75 randomized trials). Conclusions Using ExaCT to assist with data extraction resulted in modest gains in efficiency compared with manual extraction. The tool was reliable for identifying the reporting of most data elements. The tool’s ability to identify at least one relevant sentence and highlight pertinent fragments was generally good, but changes to sentence selection and/or highlighting were often required. Protocol https://doi.org/10.7939/DVN/RQPJKS

Published in BMC Medical Research Methodology

ISSN: 1471-2288 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General)
Website: http://bmcmedresmethodol.biomedcentral.com

About the journal

Abstract

Keywords