A New Natural Language Processing–Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study

Bruno Paiva; Marcos André Gonçalves; Leonardo Chaves Dutra da Rocha; Milena Soriano Marcolino; Fernanda Cristina Barbosa Lana; Maira Viana Rego Souza-Silva; Jussara M Almeida; Polianna Delfino Pereira; Claudio Moisés Valiense de Andrade; Angélica Gomides dos Reis Gomes; Maria Angélica Pires Ferreira; Frederico Bartolazzi; Manuela Furtado Sacioto; Ana Paula Boscato; Milton Henriques Guimarães-Júnior; Priscilla Pereira dos Reis; Felício Roberto Costa; Alzira de Oliveira Jorge; Laryssa Reis Coelho; Marcelo Carneiro; Thaís Lorenna Souza Sales; Silvia Ferreira Araújo; Daniel Vitório Silveira; Karen Brasil Ruschel; Fernanda Caldeira Veloso Santos; Evelin Paola de Almeida Cenci; Luanna Silva Monteiro Menezes; Fernando Anschau; Maria Aparecida Camargos Bicalho; Euler Roberto Fernandes Manenti; Renan Goulart Finger; Daniela Ponce; Filipe Carrilho de Aguiar; Luiza Margoto Marques; Luís César de Castro; Giovanna Grünewald Vietta; Mariana Frizzo de Godoy; Mariana do Nascimento Vilaça; Vivian Costa Morais

doi:10.2196/54246

JMIR Medical Informatics (Oct 2024)

A New Natural Language Processing–Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study

Bruno Paiva,
Marcos André Gonçalves,
Leonardo Chaves Dutra da Rocha,
Milena Soriano Marcolino,
Fernanda Cristina Barbosa Lana,
Maira Viana Rego Souza-Silva,
Jussara M Almeida,
Polianna Delfino Pereira,
Claudio Moisés Valiense de Andrade,
Angélica Gomides dos Reis Gomes,
Maria Angélica Pires Ferreira,
Frederico Bartolazzi,
Manuela Furtado Sacioto,
Ana Paula Boscato,
Milton Henriques Guimarães-Júnior,
Priscilla Pereira dos Reis,
Felício Roberto Costa,
Alzira de Oliveira Jorge,
Laryssa Reis Coelho,
Marcelo Carneiro,
Thaís Lorenna Souza Sales,
Silvia Ferreira Araújo,
Daniel Vitório Silveira,
Karen Brasil Ruschel,
Fernanda Caldeira Veloso Santos,
Evelin Paola de Almeida Cenci,
Luanna Silva Monteiro Menezes,
Fernando Anschau,
Maria Aparecida Camargos Bicalho,
Euler Roberto Fernandes Manenti,
Renan Goulart Finger,
Daniela Ponce,
Filipe Carrilho de Aguiar,
Luiza Margoto Marques,
Luís César de Castro,
Giovanna Grünewald Vietta,
Mariana Frizzo de Godoy,
Mariana do Nascimento Vilaça,
Vivian Costa Morais

Affiliations

Bruno Paiva: ORCiD
Marcos André Gonçalves: ORCiD
Leonardo Chaves Dutra da Rocha: ORCiD
Milena Soriano Marcolino: ORCiD
Fernanda Cristina Barbosa Lana: ORCiD
Maira Viana Rego Souza-Silva: ORCiD
Jussara M Almeida: ORCiD
Polianna Delfino Pereira: ORCiD
Claudio Moisés Valiense de Andrade: ORCiD
Angélica Gomides dos Reis Gomes: ORCiD
Maria Angélica Pires Ferreira: ORCiD
Frederico Bartolazzi: ORCiD
Manuela Furtado Sacioto: ORCiD
Ana Paula Boscato: ORCiD
Milton Henriques Guimarães-Júnior: ORCiD
Priscilla Pereira dos Reis: ORCiD
Felício Roberto Costa: ORCiD
Alzira de Oliveira Jorge: ORCiD
Laryssa Reis Coelho: ORCiD
Marcelo Carneiro: ORCiD
Thaís Lorenna Souza Sales: ORCiD
Silvia Ferreira Araújo: ORCiD
Daniel Vitório Silveira: ORCiD
Karen Brasil Ruschel: ORCiD
Fernanda Caldeira Veloso Santos: ORCiD
Evelin Paola de Almeida Cenci: ORCiD
Luanna Silva Monteiro Menezes: ORCiD
Fernando Anschau: ORCiD
Maria Aparecida Camargos Bicalho: ORCiD
Euler Roberto Fernandes Manenti: ORCiD
Renan Goulart Finger: ORCiD
Daniela Ponce: ORCiD
Filipe Carrilho de Aguiar: ORCiD
Luiza Margoto Marques: ORCiD
Luís César de Castro: ORCiD
Giovanna Grünewald Vietta: ORCiD
Mariana Frizzo de Godoy: ORCiD
Mariana do Nascimento Vilaça: ORCiD
Vivian Costa Morais: ORCiD

DOI: https://doi.org/10.2196/54246
Journal volume & issue: Vol. 12
p. e54246

Abstract

Read online

BackgroundProper analysis and interpretation of health care data can significantly improve patient outcomes by enhancing services and revealing the impacts of new technologies and treatments. Understanding the substantial impact of temporal shifts in these data is crucial. For example, COVID-19 vaccination initially lowered the mean age of at-risk patients and later changed the characteristics of those who died. This highlights the importance of understanding these shifts for assessing factors that affect patient outcomes. ObjectiveThis study aims to propose detection, initial characterization, and semantic characterization (DIS), a new methodology for analyzing changes in health outcomes and variables over time while discovering contextual changes for outcomes in large volumes of data. MethodsThe DIS methodology involves 3 steps: detection, initial characterization, and semantic characterization. Detection uses metrics such as Jensen-Shannon divergence to identify significant data drifts. Initial characterization offers a global analysis of changes in data distribution and predictive feature significance over time. Semantic characterization uses natural language processing–inspired techniques to understand the local context of these changes, helping identify factors driving changes in patient outcomes. By integrating the outcomes from these 3 steps, our results can identify specific factors (eg, interventions and modifications in health care practices) that drive changes in patient outcomes. DIS was applied to the Brazilian COVID-19 Registry and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) data sets. ResultsOur approach allowed us to (1) identify drifts effectively, especially using metrics such as the Jensen-Shannon divergence, and (2) uncover reasons for the decline in overall mortality in both the COVID-19 and MIMIC-IV data sets, as well as changes in the cooccurrence between different diseases and this particular outcome. Factors such as vaccination during the COVID-19 pandemic and reduced iatrogenic events and cancer-related deaths in MIMIC-IV were highlighted. The methodology also pinpointed shifts in patient demographics and disease patterns, providing insights into the evolving health care landscape during the study period. ConclusionsWe developed a novel methodology combining machine learning and natural language processing techniques to detect, characterize, and understand temporal shifts in health care data. This understanding can enhance predictive algorithms, improve patient outcomes, and optimize health care resource allocation, ultimately improving the effectiveness of machine learning predictive algorithms applied to health care data. Our methodology can be applied to a variety of scenarios beyond those discussed in this paper.

Published in JMIR Medical Informatics

ISSN: 2291-9694 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://medinform.jmir.org

About the journal