A Prediction Model of Autism Spectrum Diagnosis from Well-Baby Electronic Data Using Machine Learning

Ayelet Ben-Sasson; Joshua Guedalia; Liat Nativ; Keren Ilan; Meirav Shaham; Lidia V. Gabis

doi:10.3390/children11040429

Children (Apr 2024)

A Prediction Model of Autism Spectrum Diagnosis from Well-Baby Electronic Data Using Machine Learning

Ayelet Ben-Sasson,
Joshua Guedalia,
Liat Nativ,
Keren Ilan,
Meirav Shaham,
Lidia V. Gabis

Affiliations

Ayelet Ben-Sasson: Department of Occupational Therapy, Faculty of Social Welfare and Health Sciences, University of Haifa, Haifa 3498838, Israel
Joshua Guedalia: Department of Occupational Therapy, Faculty of Social Welfare and Health Sciences, University of Haifa, Haifa 3498838, Israel
Liat Nativ: Department of Occupational Therapy, Faculty of Social Welfare and Health Sciences, University of Haifa, Haifa 3498838, Israel
Keren Ilan: Department of Occupational Therapy, Faculty of Social Welfare and Health Sciences, University of Haifa, Haifa 3498838, Israel
Meirav Shaham: Department of Occupational Therapy, Faculty of Social Welfare and Health Sciences, University of Haifa, Haifa 3498838, Israel
Lidia V. Gabis: Maccabi Healthcare Services, Tel-Aviv 6812509, Israel

DOI: https://doi.org/10.3390/children11040429
Journal volume & issue: Vol. 11, no. 4
p. 429

Abstract

Read online

Early detection of autism spectrum disorder (ASD) is crucial for timely intervention, yet diagnosis typically occurs after age three. This study aimed to develop a machine learning model to predict ASD diagnosis using infants’ electronic health records obtained through a national screening program and evaluate its accuracy. A retrospective cohort study analyzed health records of 780,610 children, including 1163 with ASD diagnoses. Data encompassed birth parameters, growth metrics, developmental milestones, and familial and post-natal variables from routine wellness visits within the first two years. Using a gradient boosting model with 3-fold cross-validation, 100 parameters predicted ASD diagnosis with an average area under the ROC curve of 0.86 (SD < 0.002). Feature importance was quantified using the Shapley Additive explanation tool. The model identified a high-risk group with a 4.3-fold higher ASD incidence (0.006) compared to the cohort (0.001). Key predictors included failing six milestones in language, social, and fine motor domains during the second year, male gender, parental developmental concerns, non-nursing, older maternal age, lower gestational age, and atypical growth percentiles. Machine learning algorithms capitalizing on preventative care electronic health records can facilitate ASD screening considering complex relations between familial and birth factors, post-natal growth, developmental parameters, and parent concern.

Published in Children

ISSN: 2227-9067 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine: Pediatrics
Website: http://www.mdpi.com/journal/children

About the journal

Abstract

Keywords