Neuropsychiatric Disease and Treatment (May 2025)
Machine Learning Based Early Diagnosis of ADHD with SHAP Value Interpretation: A Retrospective Observational Study
Abstract
Xinyu Zhang,1,* Xue Xiao,1,* Yufan Luo,1 Wei Xiao,1 Yingsi Cao,1 Yuanjin Chang,1 Dongqin Wu,1 Hua Xu,1 Jinlin Zhao,1 Xianhui Deng,2 Yuanying Jiang,3 Ruijin Xie,1,4 Yueying Liu1 1Department of Pediatrics, Affiliated Hospital of Jiangnan University, Wuxi, People’s Republic of China; 2Department of Neonatology, Jiangyin People’s Hospital of Nantong University, Wuxi, People’s Republic of China; 3Linping Campus, The Second Affiliated Hospital of Zhejiang University School of Medicine, Hangzhou, People’s Republic of China; 4Yangzhou Polytechnic College, Yangzhou, People’s Republic of China*These authors contributed equally to this workCorrespondence: Ruijin Xie; Yueying Liu, Department of Pediatrics, Affiliated Hospital of Jiangnan University, Wuxi, People’s Republic of China, Email [email protected]; [email protected]: Attention-Deficit/Hyperactivity Disorder (ADHD) is a common neurodevelopmental disorder in children, characterized by inattention, hyperactivity, and impulsivity. Current diagnostic methods for ADHD rely primarily on behavioral assessments, which can be challenging due to symptom overlap with other psychiatric disorders and significant inter-individual variability. Developing potential early diagnostic methods for ADHD is imperative to mitigate the risk of misdiagnosis and enhance the evaluation of treatment efficacy.Methods: The study was conducted at the Department of Pediatrics, Affiliated Hospital of Jiangnan University, from November 2022 to January 2024. Clinical data, including complete blood count, liver and kidney function tests, blood glucose levels, serum electrolyte tests, and serum 25-dihydroxyvitamin D3 levels, were collected. Feature selection and model construction were performed using various machine learning algorithms.Results: Our results indicated that the Gradient Boosting Machine algorithm is the optimal model.Conclusion: Our machine learning analyses suggest that the Gradient Boosting Machine (GBM) model may be the optimal choice, highlighting blood beta-2 microglobulin levels, red blood cell distribution width, 25-dihydroxyvitamin D3, and the percentage of eosinophils as key predictors of ADHD risk, thereby aiding early diagnosis. Further large-scale studies are warranted to validate these findings and explore the underlying mechanisms.Keywords: ADHD, diagnosis, biomarkers, machine learning, SHAP methods