Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study

Juan Xie; Run-wei Ma; Yu-jing Feng; Yuan Qiao; Hong-yan Zhu; Xing-ping Tao; Wen-juan Chen; Cong-yun Liu; Tan Li; Kai Liu; Li-ming Cheng

doi:10.1186/s12879-025-10797-7

BMC Infectious Diseases (Mar 2025)

Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study

Juan Xie,
Run-wei Ma,
Yu-jing Feng,
Yuan Qiao,
Hong-yan Zhu,
Xing-ping Tao,
Wen-juan Chen,
Cong-yun Liu,
Tan Li,
Kai Liu,
Li-ming Cheng

Affiliations

Juan Xie: Department of Anesthesiology, Kunming Children’S Hospital
Run-wei Ma: Department of Cardiac Surgery, Fuwai Yunnan Hospital, Chinese Academy of Medical Sciences/Affiliated Cardiovascular Hospital of Kunming Medical University
Yu-jing Feng: Comprehensive Pediatrics, Wenshan Maternal and Child Health Care Hospital
Yuan Qiao: Comprehensive Pediatrics and Neonatology, Chuxiong Yi Autonomous Prefecture People’s Hospital
Hong-yan Zhu: Pediatric Respiratory Department, Qujing Maternal and Child Health Hospital
Xing-ping Tao: Department of Pediatrics, Kaiyuan People’s Hospital
Wen-juan Chen: Department of Pediatrics and Emergency, Yuxi Children’S Hospital
Cong-yun Liu: Comprehensive Pediatrics & Pulmonary and Critical Care Medicine, Baoshan People’s Hospital
Tan Li: Department of Respiratory Medicine Kunming Children’S Hospital
Kai Liu: Comprehensive Pediatrics & Pulmonary and Critical Care Medicine, Kunming Children’S Hospital
Li-ming Cheng: Department of Anesthesiology, Kunming Children’S Hospital

DOI: https://doi.org/10.1186/s12879-025-10797-7
Journal volume & issue: Vol. 25, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Pertussis is a highly contagious respiratory disease. Even though vaccination has reduced the incidence, cases have resurfaced in certain regions due to immune escape and waning vaccine efficacy. Identifying high-risk patients to mitigate transmission and avert complications promptly is imperative. Nevertheless, the current diagnostic methods, including PCR and bacterial culture, are time-consuming and expensive. Some studies have attempted to develop risk prediction models based on multivariate data, but their performance can be improved. Therefore, this study aims to further optimize and expand the risk assessment tool to more efficiently identify high-risk individuals and compensate for the shortcomings of existing diagnostic methods. Objective The aim of this study was to develop a pertussis risk prediction model that is both efficient and has good generalization ability, applicable to different datasets. The model was constructed using machine learning techniques based on multicenter data and screened for key features. The performance and generalization ability of the model were evaluated by deploying it on an online platform. At the same time, this study aims to provide a rapid and accurate auxiliary diagnostic tool for clinical practice to help identify high-risk patients in a timely manner, optimize early intervention strategies, reduce the risk of complications and reduce transmission, thereby improving the efficiency of public health management. Methods First, data from 1085 suspected pertussis patients from 7 centers were collected, and ten key features were analyzed using the lasso regression and Boruta algorithm: PDW-MPV-RATIO, SII, white blood cells, platelet distribution width, mean platelet volume, lymphocytes, cough duration, vaccination, fever, and lytic lymphocytes.Eight models were then trained and validated to assess their performance and to confirm their generalization ability with external datasets based on these features. Finally, an online platform was constructed for clinicians to use the models in real time. Results The random forest model demonstrated excellent discrimination ability in the validation set, with an AUC of 0.98, and an AUC of 0.97 in the external validation set. Calibration curve and decision curve analysis showed that the model had high accuracy in predicting low-to-medium risk patients, which could help clinicians avoid unnecessary interventions, especially in resource-limited settings. The application of this model can help optimize the early identification and management of high-risk patients and improve clinical decision-making. Conclusion The pertussis prediction model devised in this study was validated using multicenter data, exhibited high prediction performance, and was successfully implemented online. Future research should broaden the data sources and incorporate dynamic data to enhance the model's accuracy and applicability.

Published in BMC Infectious Diseases

ISSN: 1471-2334 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Infectious and parasitic diseases
Website: https://bmcinfectdis.biomedcentral.com

About the journal

Abstract

Keywords