Data (Dec 2022)

Natural Language Processing to Extract Information from Portuguese-Language Medical Records

  • Naila Camila da Rocha,
  • Abner Macola Pacheco Barbosa,
  • Yaron Oliveira Schnr,
  • Juliana Machado-Rugolo,
  • Luis Gustavo Modelli de Andrade,
  • José Eduardo Corrente,
  • Liciana Vaz de Arruda Silveira

DOI
https://doi.org/10.3390/data8010011
Journal volume & issue
Vol. 8, no. 1
p. 11

Abstract

Read online

Studies that use medical records are often impeded due to the information presented in narrative fields. However, recent studies have used artificial intelligence to extract and process secondary health data from electronic medical records. The aim of this study was to develop a neural network that uses data from unstructured medical records to capture information regarding symptoms, diagnoses, medications, conditions, exams, and treatment. Data from 30,000 medical records of patients hospitalized in the Clinical Hospital of the Botucatu Medical School (HCFMB), São Paulo, Brazil, were obtained, creating a corpus with 1200 clinical texts. A natural language algorithm for text extraction and convolutional neural networks for pattern recognition were used to evaluate the model with goodness-of-fit indices. The results showed good accuracy, considering the complexity of the model, with an F-score of 63.9% and a precision of 72.7%. The patient condition class reached a precision of 90.3% and the medication class reached 87.5%. The proposed neural network will facilitate the detection of relationships between diseases and symptoms and prevalence and incidence, in addition to detecting the identification of clinical conditions, disease evolution, and the effects of prescribed medications.

Keywords