GIScience & Remote Sensing (Dec 2024)
An explainable AI framework for spatiotemporal risk factor analysis in public health: a case study of cardiovascular mortality in South Korea
Abstract
Understanding environmental disease risk factor analysis at the district level is essential for gaining valuable insights into regional disease variations, offering a broader perspective compared to individual-level studies. Recently, explainable artificial intelligence (XAI) has received increasing attention in the analysis of factors affecting public health. However, previous purely data-driven XAI-based risk factor analyses faced challenges in capturing regional effect of environmental variables, leading to confusion regarding key spatiotemporal risk factors. Therefore, this study proposes a framework that includes two complementary XAI-based risk factor analyses following two assumptions. Regionally rescaled environmental variables must account for the unequal effects on environmental factors, which are likely influenced by variations in adaptation capacity to weather conditions and differences in exposure-response relationships to air pollutants. District-level disease distribution highlights geographic disparity in sociodemographic vulnerability, whereas temporal variation in diseases by district underscores temporal environmental impacts. Based on these two hypotheses, we rescaled environmental variables using two complementary schemes: one that employs the district-level disease distribution as the target variable, and another that utilizes the temporal residual of the disease within each district. We evaluated this framework by analyzing the association between cardiovascular age-standardized mortality rate (CVD-ASMR) and various risk factors in South Korea from 2010 to 2019, using high-performing random forest and light gradient boosting models with additive Shapley explanation. Compared to previous purely data-driven XAI-based analyses, the proposed schemes achieved significantly better results in capturing regional exposure-response relationships. In two complementary schemes, the most explainable factor to districts with high CVD-ASMR was low education level related to sociodemographic vulnerability, whereas the most explainable factors to high temporal CVD-ASMR patterns were low greenness and high air pollution levels. In addition, the two complementary schemes enabled us to reasonably analyze the interaction effect of the two risk factors, i.e. temperature and air pollutants. Furthermore, high CVD-ASMR and its high temporal variation were observed in situations of high sociodemographic vulnerability with poor air quality. These findings provide insightful public health planning for sustainable cities and society by pinpointing high-risk areas and tailoring strategies to address regional environmental challenges.
Keywords