PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.

Kersten Döring; Björn A Grüning; Kiran K Telukunta; Philippe Thomas; Stefan Günther

doi:10.1371/journal.pone.0163794

PLoS ONE (Jan 2016)

PubMedPortable: A Framework for Supporting the Development of Text Mining Applications.

Kersten Döring,
Björn A Grüning,
Kiran K Telukunta,
Philippe Thomas,
Stefan Günther

Affiliations

Kersten Döring
Björn A Grüning
Kiran K Telukunta
Philippe Thomas
Stefan Günther

DOI: https://doi.org/10.1371/journal.pone.0163794
Journal volume & issue: Vol. 11, no. 10
p. e0163794

Abstract

Read online

Information extraction from biomedical literature is continuously growing in scope and importance. Many tools exist that perform named entity recognition, e.g. of proteins, chemical compounds, and diseases. Furthermore, several approaches deal with the extraction of relations between identified entities. The BioCreative community supports these developments with yearly open challenges, which led to a standardised XML text annotation format called BioC. PubMed provides access to the largest open biomedical literature repository, but there is no unified way of connecting its data to natural language processing tools. Therefore, an appropriate data environment is needed as a basis to combine different software solutions and to develop customised text mining applications. PubMedPortable builds a relational database and a full text index on PubMed citations. It can be applied either to the complete PubMed data set or an arbitrary subset of downloaded PubMed XML files. The software provides the infrastructure to combine stand-alone applications by exporting different data formats, e.g. BioC. The presented workflows show how to use PubMedPortable to retrieve, store, and analyse a disease-specific data set. The provided use cases are well documented in the PubMedPortable wiki. The open-source software library is small, easy to use, and scalable to the user's system requirements. It is freely available for Linux on the web at https://github.com/KerstenDoering/PubMedPortable and for other operating systems as a virtual container. The approach was tested extensively and applied successfully in several projects.

Published in PLoS ONE

ISSN: 1932-6203 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Medicine; Science
Website: https://journals.plos.org/plosone/

About the journal