The Lancet Regional Health. Americas (Nov 2021)

Data-driven risk stratification for preterm birth in Brazil: a population-based study to develop of a machine learning risk assessment approach

  • Thiago Augusto Hernandes Rocha,
  • Erika Bárbara Abreu Fonseca de Thomaz,
  • Dante Grapiuna de Almeida,
  • Núbia Cristina da Silva,
  • Rejane Christine de Sousa Queiroz,
  • Luciano Andrade,
  • Luiz Augusto Facchini,
  • Marcos Luiggi Lemos Sartori,
  • Dalton Breno Costa,
  • Marcos Adriano Garcia Campos,
  • Antônio Augusto Moura da Silva,
  • Catherine Staton,
  • João Ricardo Nickenig Vissoci

Journal volume & issue
Vol. 3
p. 100053

Abstract

Read online

Background: Preterm birth (PTB) is a growing health issue worldwide, currently considered the leading cause of newborn deaths. To address this challenge, the present work aims to develop an algorithm capable of accurately predicting the week of delivery supporting the identification of a PTB in Brazil. Methods: This a population-based study analyzing data from 3,876,666 mothers with live births distributed across the 3,929 Brazilian municipalities. Using indicators comprising delivery characteristics, primary care work processes, and physical infrastructure, and sociodemographic data we applied a machine learning-based approach to estimate the week of delivery at the point of care level. We tested six algorithms: eXtreme Gradient Boosting, Elastic Net, Quantile Ordinal Regression - LASSO, Linear Regression, Ridge Regression and Decision Tree. We used the root-mean-square error (RMSE) as a precision. Findings: All models obtained RMSE indexes close to each other. The lower levels of RMSE were obtained using the eXtreme Gradient Boosting approach which was able to estimate the week of delivery within a 2.09 window 95%IC (2.090–2.097). The five most important variables to predict the week of delivery were: number of previous deliveries through Cesarean-Section, number of prenatal consultations, age of the mother, existence of ultrasound exam available in the care network, and proportion of primary care teams in the municipality registering the oral care consultation. Interpretation: Using simple data describing the prenatal care offered, as well as minimal characteristics of the pregnant, our approach was capable of achieving a relevant predictive performance regarding the week of delivery. Funding: Bill and Melinda Gates Foundation, and National Council for Scientific and Technological Development – Brazil, (Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPQ acronym in portuguese) Support of the research project named: Data-Driven Risk Stratification for Preterm Birth in Brazil: Development of a Machine Learning-Based Innovation for Health Care- Grant: OPP1202186

Keywords