Cell Death and Disease (Sep 2024)

Text mining method to unravel long COVID’s clinical condition in hospitalized patients

  • Pilar Tavares Veras Florentino,
  • Vinícius de Oliveira Araújo,
  • Henrique Zatti,
  • Caio Vinícius Luis,
  • Célia Regina Santos Cavalcanti,
  • Matheus Henrique Citibaldi de Oliveira,
  • Anderson Henrique França Figueredo Leão,
  • Juracy Bertoldo Junior,
  • George G. Caique Barbosa,
  • Ernesto Ravera,
  • Alberto Cebukin,
  • Renata Bernardes David,
  • Danilo Batista Vieira de Melo,
  • Tales Mota Machado,
  • Nancy C. J. Bellei,
  • Viviane Boaventura,
  • Manoel Barral-Netto,
  • Soraya S. Smaili

DOI
https://doi.org/10.1038/s41419-024-07043-4
Journal volume & issue
Vol. 15, no. 9
pp. 1 – 9

Abstract

Read online

Abstract Long COVID is characterized by persistent that extends symptoms beyond established timeframes. Its varied presentation across different populations and healthcare systems poses significant challenges in understanding its clinical manifestations and implications. In this study, we present a novel application of text mining technique to automatically extract unstructured data from a long COVID survey conducted at a prominent university hospital in São Paulo, Brazil. Our phonetic text clustering (PTC) method enables the exploration of unstructured Electronic Healthcare Records (EHR) data to unify different written forms of similar terms into a single phonemic representation. We used n-gram text analysis to detect compound words and negated terms in Portuguese-BR, focusing on medical conditions and symptoms related to long COVID. By leveraging text mining, we aim to contribute to a deeper understanding of this chronic condition and its implications for healthcare systems globally. The model developed in this study has the potential for scalability and applicability in other healthcare settings, thereby supporting broader research efforts and informing clinical decision-making for long COVID patients.