BMC Medical Research Methodology (Nov 2024)
Bayesian additive regression trees for predicting childhood asthma in the CHILD cohort study
Abstract
Abstract Background Asthma is a heterogeneous disease that affects millions of children and adults. There is a lack of objective gold standard diagnosis that spans the ages; instead, diagnoses are made by clinician assessment based on a cluster of signs, symptoms and objective tests dependent on age. Yet, there is a clear morbidity associated with chronic asthma symptoms. Machine learning has become a popular tool to improve asthma diagnosis and classification. There is a paucity of literature on the use of Bayesian machine learning algorithms to predict asthma diagnosis in children. This paper develops a prediction model using the Bayesian additive regression trees (BART) and compares its performance to various machine learning algorithms in predicting the diagnosis of childhood asthma. Methods Clinically relevant variables collected at or before 3 years of age from 2794 participants in the CHILD Cohort Study were used to predict physician-diagnosed asthma at age 5. BART and six other commonly used machine learning algorithms, namely adaptive boosting, logistic regression, decision tree, neural network, random forest, and support vector machine were trained. Measures of performance including sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve were calculated. The confidence intervals were calculated using Bootstrapping samples. Important predictors and interaction effects associated with asthma were also identified using BART. Results BART, logistic regression and random forest showed the highest area under the ROC curve compared to other machine learning algorithms. Based on BART, recurrent wheeze, respiratory infection and food sensitization at 3 years of age were the most important predictors. The three most important interaction effects were found to be interaction terms of respiratory infection at 3 years and recurrent wheezing at 3 years, maternal asthma and paternal asthma, and maternal wheezing and inhalant sensitization of child at 3 years. Conclusions BART demonstrated promising prediction performance when compared to other machine learning algorithms. Future research could validate the BART in an external cohort to evaluate its reliability and generalizability.
Keywords