Computer Science (Jan 2012)

Information Extraction From Chemical Patents

  • Sandra Bergmann,
  • Mathilde Romberg

DOI
https://doi.org/10.7494/csci.2012.13.2.21
Journal volume & issue
Vol. 13, no. 2
p. 21

Abstract

Read online

The development of new chemicals or pharmaceuticals is preceded by an indepth analysis of published patents in this field. This information retrieval is a costly and time inefficient step when done by a human reader, yet it is mandatory for potential success of an investment. The goal of the research project UIMA-HPC is to automate and hence speed-up the process of knowledge mining about patents. Multi-threaded analysis engines, developed according to UIMA (Unstructured Information Management Architecture) standards, process texts and images in thousands of documents in parallel. UNICORE (UNiform Interface to COmputing Resources) workflow control structures make it possible to dynamically allocate resources for every given task to gain best cpu-time/realtime ratios in an HPC environment.