Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions

Yu-Qing Cai; Da-Xin Gong; Li-Ying Tang; Yue Cai; Hui-Jun Li; Tian-Ci Jing; Mengchun Gong; Wei Hu; Zhen-Wei Zhang; Xingang Zhang; Guang-Wei Zhang

doi:10.2196/47645

Journal of Medical Internet Research (Jul 2024)

Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions

Yu-Qing Cai,
Da-Xin Gong,
Li-Ying Tang,
Yue Cai,
Hui-Jun Li,
Tian-Ci Jing,
Mengchun Gong,
Wei Hu,
Zhen-Wei Zhang,
Xingang Zhang,
Guang-Wei Zhang

Affiliations

Yu-Qing Cai: ORCiD
Da-Xin Gong: ORCiD
Li-Ying Tang: ORCiD
Yue Cai: ORCiD
Hui-Jun Li: ORCiD
Tian-Ci Jing: ORCiD
Mengchun Gong: ORCiD
Wei Hu: ORCiD
Zhen-Wei Zhang: ORCiD
Xingang Zhang: ORCiD
Guang-Wei Zhang: ORCiD

DOI: https://doi.org/10.2196/47645
Journal volume & issue: Vol. 26
p. e47645

Abstract

Read online

In recent years, there has been explosive development in artificial intelligence (AI), which has been widely applied in the health care field. As a typical AI technology, machine learning models have emerged with great potential in predicting cardiovascular diseases by leveraging large amounts of medical data for training and optimization, which are expected to play a crucial role in reducing the incidence and mortality rates of cardiovascular diseases. Although the field has become a research hot spot, there are still many pitfalls that researchers need to pay close attention to. These pitfalls may affect the predictive performance, credibility, reliability, and reproducibility of the studied models, ultimately reducing the value of the research and affecting the prospects for clinical application. Therefore, identifying and avoiding these pitfalls is a crucial task before implementing the research. However, there is currently a lack of a comprehensive summary on this topic. This viewpoint aims to analyze the existing problems in terms of data quality, data set characteristics, model design, and statistical methods, as well as clinical implications, and provide possible solutions to these problems, such as gathering objective data, improving training, repeating measurements, increasing sample size, preventing overfitting using statistical methods, using specific AI algorithms to address targeted issues, standardizing outcomes and evaluation criteria, and enhancing fairness and replicability, with the goal of offering reference and assistance to researchers, algorithm developers, policy makers, and clinical practitioners.

Published in Journal of Medical Internet Research

ISSN: 1438-8871 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Medicine: Public aspects of medicine
Website: https://www.jmir.org

About the journal