International Journal of General Medicine (Aug 2025)
A Multi-Algorithm Machine Learning Model for Predicting the Risk of Preterm Birth in Patients with Early-Onset Preeclampsia
Abstract
Yanhong Xu,1,* Yizheng Zu,2,* Ying Zhang,1,* Zewei Liang,1 Xia Xu,1,3– 5 Jianying Yan1,3– 5 1College of Clinical Medicine for Obstetrics & Gynecology and Pediatrics, Fujian Medical University Fujian Maternity and Child Health Hospital, Fuzhou, Fujian, People’s Republic of China; 2The Affiliated Changzhou Second People’s Hospital of Nanjing Medical University, Changzhou, Jiangsu, People’s Republic of China; 3Fujian Clinical Research Center for Maternal-Fetal Medicine, Fuzhou, Fujian, People’s Republic of China; 4Laboratory of Maternal-Fetal Medicine, Fujian Maternity and Child Health Hospital, Fuzhou, Fujian, People’s Republic of China; 5National Key Obstetric Clinical Specialty Construction Institution of China, Fuzhou, Fujian, People’s Republic of China*These authors contributed equally to this workCorrespondence: Xia Xu, College of Clinical Medicine for Obstetrics & Gynecology and Pediatrics, Fujian Medical University Fujian Maternity and Child Health Hospital, No. 18 Daoshan Road, Gulou District, Fuzhou, Fujian, People’s Republic of China, Email [email protected] Jianying Yan, College of Clinical Medicine for Obstetrics & Gynecology and Pediatrics, Fujian Medical University Fujian Maternity and Child Health Hospital, No. 18 Daoshan Road, Gulou District, Fuzhou, Fujian, People’s Republic of China, Email [email protected]: To analyze the risk factors for preterm birth in patients with early-onset preeclampsia (EOPE) based on multi-algorithm machine learning and to construct a predictive model to explore the predictive value of the model.Methods: A retrospective analysis was conducted on 442 EOPE patients from a single tertiary center, divided into preterm birth (< 37 weeks, n=358) and term-born (≥ 37 weeks, n=84) groups. Univariate analysis, random forest importance assessment, lasso regression combined with multivariate regression analysis were used for feature evaluation. Eight machine learning models were trained (70% data) and validated (30% data). A Stacking ensemble model was constructed, and SHapley Additive exPlanations (SHAP) was used for feature interpretation.Results: The area under the receiver operating characteristic curve (AUROC) for predicting preterm birth in EOPE patients using Logistic Regression, Gaussian Naive Bayes, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine, Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT), Multi-Layer Perceptron, and Elastic Net were 0.763, 0.712, 0.821, 0.832, 0.821, 0.842, 0.784, and 0.763, respectively. The Stacking model (XGBoost+GBDT+SVM) achieved superior performance (AUROC=0.865). Three independent risk factors were identified: fetal growth restriction (aOR=3.50, p = 0.047), serum cystatin C (aOR=11.27, p = 0.018), and C-reactive protein (aOR=1.37, p < 0.001). SHAP analysis revealed GBDT as the top contributor to Stacking predictions, with microalbunminuria (GBDT, XGBoost) and age (SVM) being the most influential features.Conclusion: Machine learning models can serve as reliable assessment tools for predicting the risk of preterm birth in patients with EOPE. The ensemble prediction model demonstrates the best predictive performance, helping obstetricians identify high-risk patients and perform early intervention to improve perinatal outcomes.Keywords: machine learning, preterm birth, early-onset preeclampsia, clinical prediction model