Кардиоваскулярная терапия и профилактика (Mar 2025)

Development and validation of machine learning models predicting hospitalizations of hypertensive patients over 12 months

  • A. E. Andreychenko,
  • A. D. Ermak,
  • D. V. Gavrilov,
  • R. E. Novitsky,
  • O. M. Drapkina,
  • A. V. Gusev

DOI
https://doi.org/10.15829/1728-8800-2025-4130
Journal volume & issue
Vol. 24, no. 1

Abstract

Read online

Aim. To develop models for predicting hospitalizations of hypertensive (HTN) over 12 months using machine learning algorithms and to validate them using real-world practice data.Material and methods. Based on the data from depersonalized electronic health records obtained from the Webiomed platform, 1165770 records of 151492 patients with HTN were selected. After the initial selection, a total of 43 anamnestic, constitutional, clinical, and paraclinical features were used as predictors. Automatic machine learning tools were used to create the models. A wide range of algorithms was considered, including logistic regression, decision tree-based methods using gradient boosting and bagging, discriminant analysis, a neural network algorithm and a naive Bayes classifier. Data from a single region were used for external validation.Results. The XGBoost model showed the best results, achieving an area under the ROC curve (AUC) of 0,849 (95% confidence interval: 0,825-0,873) during internal testing and 0,815 (95% confidence interval: 0,797-0,835) during external validation.Conclusion. A new highly accurate model for predicting hospitaliza­tion of HTN patients based on real-world data was developed. The results of external validation of the final model showed relative re­sistance to new data from another region that in combination with quality metrics presents the possibility of its approval for application in clinical practice.

Keywords