Informatics in Medicine Unlocked (Jan 2019)
Investigating autism etiology and heterogeneity by decision tree algorithm
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder that causes deficits in cognition, communication and social skills. ASD, however, is a highly heterogeneous disorder. This heterogeneity has made identifying the etiology of ASD a particularly difficult challenge, as patients exhibit a wide spectrum of symptoms without any unifying genetic or environmental factors to account for the disorder. For better understanding of ASD, it is paramount to identify potential genetic and environmental risk factors that are comorbid with it. Identifying such factors is of great importance to determine potential causes for the disorder, and understand its heterogeneity. Existing large-scale datasets offer an opportunity for computer scientists to undertake this task by utilizing machine learning to reliably and efficiently obtain insight about potential ASD risk factors, which would in turn assist in guiding research in the field. In this study, decision tree algorithms were utilized to analyze related factors in datasets obtained from the National Database for Autism Research (NDAR) consisting of nearly 3000 individuals. We were able to identify 15 medical conditions that were highly associated with ASD diagnoses in patients; furthermore, we extended our analysis to the family medical history of patients and we report six potentially hereditary medical conditions associated with ASD. Associations reported had a 90% accuracy. Meanwhile, gender comparisons highlighted conditions that were unique to each gender and others that overlapped. Those findings were validated by the academic literature, thus opening the way for new directions for the use of decision tree algorithms to further understand the etiology of autism. Keywords: Autism spectrum disorder, Decision tree, Feature selection