Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

Anália LOURENÇO; Regina NOGUEIRA; André SANTOS

Advances in Distributed Computing and Artificial Intelligence Journal (Jul 2013)

Applying a text mining framework to the extraction of numerical parameters from scientific literature in the biotechnology domain

Anália LOURENÇO,
Regina NOGUEIRA,
André SANTOS

Affiliations

Anália LOURENÇO
Regina NOGUEIRA
André SANTOS

Journal volume & issue: Vol. 1, no. 1
pp. 1 – 8

Abstract

Read online

Scientific publications are the main vehicle to disseminate information in the field of biotechnology for wastewater treatment. Indeed, the new research paradigms and the application of high-throughput technologies have increased the rate of publication considerably. The problem is that manual curation becomes harder, prone-to-errors and time-consuming, leading to a probable loss of information and inefficient knowledge acquisition. As a result, research outputs are hardly reaching engineers, hampering the calibration of mathematical models used to optimize the stability and performance of biotechnological systems. In this context, we have developed a data curation workflow, based on text mining techniques, to extract numerical parameters from scientific literature, and applied it to the biotechnology domain. A workflow was built to process wastewater-related articles with the main goal of identifying physico-chemical parameters mentioned in the text. This work describes the implementation of the workflow, identifies achievements and current limitations in the overall process, and presents the results obtained for a corpus of 50 full-text documents.

Published in Advances in Distributed Computing and Artificial Intelligence Journal

ISSN: 2255-2863 (Online)
Publisher: Ediciones Universidad de Salamanca
Country of publisher: Spain
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://revistas.usal.es/index.php/2255-2863/

About the journal

Abstract

Keywords