Machine Learning–Based Prediction for Incident Hypertension Based on Regular Health Checkup Data: Derivation and Validation in 2 Independent Nationwide Cohorts in South Korea and Japan

Seung Ha Hwang; Hayeon Lee; Jun Hyuk Lee; Myeongcheol Lee; Ai Koyanagi; Lee Smith; Sang Youl Rhee; Dong Keon Yon; Jinseok Lee

doi:10.2196/52794

Journal of Medical Internet Research (Nov 2024)

Machine Learning–Based Prediction for Incident Hypertension Based on Regular Health Checkup Data: Derivation and Validation in 2 Independent Nationwide Cohorts in South Korea and Japan

Seung Ha Hwang,
Hayeon Lee,
Jun Hyuk Lee,
Myeongcheol Lee,
Ai Koyanagi,
Lee Smith,
Sang Youl Rhee,
Dong Keon Yon,
Jinseok Lee

Affiliations

Seung Ha Hwang: ORCiD
Hayeon Lee: ORCiD
Jun Hyuk Lee: ORCiD
Myeongcheol Lee: ORCiD
Ai Koyanagi: ORCiD
Lee Smith: ORCiD
Sang Youl Rhee: ORCiD
Dong Keon Yon: ORCiD
Jinseok Lee: ORCiD

DOI: https://doi.org/10.2196/52794
Journal volume & issue: Vol. 26
p. e52794

Abstract

Read online

BackgroundWorldwide, cardiovascular diseases are the primary cause of death, with hypertension as a key contributor. In 2019, cardiovascular diseases led to 17.9 million deaths, predicted to reach 23 million by 2030. ObjectiveThis study presents a new method to predict hypertension using demographic data, using 6 machine learning models for enhanced reliability and applicability. The goal is to harness artificial intelligence for early and accurate hypertension diagnosis across diverse populations. MethodsData from 2 national cohort studies, National Health Insurance Service-National Sample Cohort (South Korea, n=244,814), conducted between 2002 and 2013 were used to train and test machine learning models designed to anticipate incident hypertension within 5 years of a health checkup involving those aged ≥20 years, and Japanese Medical Data Center cohort (Japan, n=1,296,649) were used for extra validation. An ensemble from 6 diverse machine learning models was used to identify the 5 most salient features contributing to hypertension by presenting a feature importance analysis to confirm the contribution of each future. ResultsThe Adaptive Boosting and logistic regression ensemble showed superior balanced accuracy (0.812, sensitivity 0.806, specificity 0.818, and area under the receiver operating characteristic curve 0.901). The 5 key hypertension indicators were age, diastolic blood pressure, BMI, systolic blood pressure, and fasting blood glucose. The Japanese Medical Data Center cohort dataset (extra validation set) corroborated these findings (balanced accuracy 0.741 and area under the receiver operating characteristic curve 0.824). The ensemble model was integrated into a public web portal for predicting hypertension onset based on health checkup data. ConclusionsComparative evaluation of our machine learning models against classical statistical models across 2 distinct studies emphasized the former’s enhanced stability, generalizability, and reproducibility in predicting hypertension onset.

Published in Journal of Medical Internet Research

ISSN: 1438-8871 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Medicine: Public aspects of medicine
Website: https://www.jmir.org

About the journal