Informatics in Medicine Unlocked (Jan 2019)

From medical records to research papers: A literature analysis pipeline for supporting medical genomic diagnosis processes

  • Fernando López Bello,
  • Hugo Naya,
  • Víctor Raggio,
  • Aiala Rosá

Journal volume & issue
Vol. 15

Abstract

Read online

In this paper, we introduce a framework for processing genetics and genomics literature, based on ontologies and lexical resources from the biomedical domain. The main objective is to support the diagnosis process that is done by medical geneticists who extract knowledge from published works. We constructed a pipeline that gathers several genetics- and genomics-related resources and applies natural language processing techniques, which include named entity recognition and relation extraction. Working on a corpus created from PubMed abstracts, we built a knowledge database that can be used for processing medical records written in Spanish. Given a medical record from Uruguayan healthcare patients, we show how we can map it to the database and perform graph queries for relevant knowledge paths. The framework is not an end user application, but an extensible processing structure to be leveraged by external applications, enabling software developers to streamline incorporation of the extracted knowledge. Keywords: Controlled vocabulary, Natural language processing, Genomics, Automated pattern recognition, Publications, Medical records