BioReader: a text mining tool for performing classification of biomedical literature

Christian Simon; Kristian Davidsen; Christina Hansen; Emily Seymour; Mike Bogetofte Barnkob; Lars Rønn Olsen

doi:10.1186/s12859-019-2607-x

BMC Bioinformatics (Feb 2019)

BioReader: a text mining tool for performing classification of biomedical literature

Christian Simon,
Kristian Davidsen,
Christina Hansen,
Emily Seymour,
Mike Bogetofte Barnkob,
Lars Rønn Olsen

Affiliations

Christian Simon: Disease Systems Biology, Novo Nordisk Center for Protein Research, University of Copenhagen
Kristian Davidsen: Department of Health Technology, Technical University of Denmark
Christina Hansen: Department of Health Technology, Technical University of Denmark
Emily Seymour: La Jolla Institute for Allergy and Immunology
Mike Bogetofte Barnkob: MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, University of Oxford
Lars Rønn Olsen: Department of Health Technology, Technical University of Denmark

DOI: https://doi.org/10.1186/s12859-019-2607-x
Journal volume & issue: Vol. 19, no. S13
pp. 165 – 170

Abstract

Read online

Abstract Background Scientific data and research results are being published at an unprecedented rate. Many database curators and researchers utilize data and information from the primary literature to populate databases, form hypotheses, or as the basis for analyses or validation of results. These efforts largely rely on manual literature surveys for collection of these data, and while querying the vast amounts of literature using keywords is enabled by repositories such as PubMed, filtering relevant articles from such query results can be a non-trivial and highly time consuming task. Results We here present a tool that enables users to perform classification of scientific literature by text mining-based classification of article abstracts. BioReader (Biomedical Research Article Distiller) is trained by uploading article corpora for two training categories - e.g. one positive and one negative for content of interest - as well as one corpus of abstracts to be classified and/or a search string to query PubMed for articles. The corpora are submitted as lists of PubMed IDs and the abstracts are automatically downloaded from PubMed, preprocessed, and the unclassified corpus is classified using the best performing classification algorithm out of ten implemented algorithms. Conclusion BioReader supports data and information collection by implementing text mining-based classification of primary biomedical literature in a web interface, thus enabling curators and researchers to take advantage of the vast amounts of data and information in the published literature. BioReader outperforms existing tools with similar functionalities and expands the features used for mining literature in database curation efforts. The tool is freely available as a web service at http://www.cbs.dtu.dk/services/BioReader

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords