STEED: A data mining tool for automated extraction of experimental parameters and risk of bias items from in vivo publications.

Wolfgang Emanuel Zurrer; Amelia Elaine Cannon; Ewoud Ewing; David Brüschweiler; Julia Bugajska; Bernard Friedrich Hild; Marianna Rosso; Daniel Salo Reich; Benjamin Victor Ineichen

doi:10.1371/journal.pone.0311358

PLoS ONE (Jan 2024)

STEED: A data mining tool for automated extraction of experimental parameters and risk of bias items from in vivo publications.

Wolfgang Emanuel Zurrer,
Amelia Elaine Cannon,
Ewoud Ewing,
David Brüschweiler,
Julia Bugajska,
Bernard Friedrich Hild,
Marianna Rosso,
Daniel Salo Reich,
Benjamin Victor Ineichen

Affiliations

Wolfgang Emanuel Zurrer
Amelia Elaine Cannon
Ewoud Ewing
David Brüschweiler
Julia Bugajska
Bernard Friedrich Hild
Marianna Rosso
Daniel Salo Reich
Benjamin Victor Ineichen

DOI: https://doi.org/10.1371/journal.pone.0311358
Journal volume & issue: Vol. 19, no. 11
p. e0311358

Abstract

Read online

Background and methodsSystematic reviews, i.e., research summaries that address focused questions in a structured and reproducible manner, are a cornerstone of evidence-based medicine and research. However, certain steps in systematic reviews, such as data extraction, are labour-intensive, which hampers their feasibility, especially with the rapidly expanding body of biomedical literature. To bridge this gap, we aimed to develop a data mining tool in the R programming environment to automate data extraction from neuroscience in vivo publications. The function was trained on a literature corpus (n = 45 publications) of animal motor neuron disease studies and tested in two validation corpora (motor neuron diseases, n = 31 publications; multiple sclerosis, n = 244 publications).ResultsOur data mining tool, STEED (STructured Extraction of Experimental Data), successfully extracted key experimental parameters such as animal models and species, as well as risk of bias items like randomization or blinding, from in vivo studies. Sensitivity and specificity were over 85% and 80%, respectively, for most items in both validation corpora. Accuracy and F1-score were above 90% and 0.9 for most items in the validation corpora, respectively. Time savings were above 99%.ConclusionsOur text mining tool, STEED, can extract key experimental parameters and risk of bias items from the neuroscience in vivo literature. This enables the tool's deployment for probing a field in a research improvement context or replacing one human reader during data extraction, resulting in substantial time savings and contributing towards the automation of systematic reviews.

Published in PLoS ONE

ISSN: 1932-6203 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Medicine; Science
Website: https://journals.plos.org/plosone/

About the journal