Frontiers in Earth Science (Mar 2023)
Regression analysis of air pollution and pediatric respiratory diseases based on interpretable machine learning
Abstract
Air pollution is of high relevance to human health. In this study, multiple machine-learning (ML) models—linear regression, random forest (RF), AdaBoost, and neural networks (NNs)—were used to explore the potential impacts of air-pollutant concentrations on the incidence of pediatric respiratory diseases in Taizhou, China. A number of explainable artificial intelligence (XAI) methods were further applied to analyze the model outputs and quantify the feature importance. Our results demonstrate that there are significant seasonal variations both in the numbers of pediatric respiratory outpatients and the concentrations of air pollutants. The concentrations of NO2, CO, and particulate matter (PM10 and PM2.5), as well as the numbers of outpatients, reach their peak values in the winter. This indicates that air pollution is a major factor in pediatric respiratory diseases. The results of the regression models show that ML methods can capture the trends and turning points of clinic visits, and the non-linear models were superior to the linear ones. Among them, the RF model served as the best-performing model. The analysis on the RF model by XAI found that AQI, O3, PM10, and the current month are the most important predictors affecting the numbers of pediatric respiratory outpatients. This shows that the number of outpatients rises with an increasing AQI, especially with the increasing of particulate matter. Our study indicates that ML models with XAI methods are promising for revealing the underlying impacts of air pollution on the pediatric respiratory diseases, which further assists the health-related decision-making.
Keywords