Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

André SANTOS; Regina NOGUEIRA; Anália LOURENÇO

doi:10.14201/ADCAIJ20121118

Advances in Distributed Computing and Artificial Intelligence Journal (Jul 2012)

Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

André SANTOS,
Regina NOGUEIRA,
Anália LOURENÇO

Affiliations

André SANTOS: Centre of Biological Engineering
Regina NOGUEIRA: Centre of Biological Engineering
Anália LOURENÇO: Centre of Biological Engineering

DOI: https://doi.org/10.14201/ADCAIJ20121118
Journal volume & issue: Vol. 1, no. 1
pp. 1 – 8

Abstract

Read online

Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.

Published in Advances in Distributed Computing and Artificial Intelligence Journal

ISSN: 2255-2863 (Online)
Publisher: Ediciones Universidad de Salamanca
Country of publisher: Spain
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://revistas.usal.es/index.php/2255-2863/

About the journal

Abstract

Keywords