Explainable artificial intelligence (XAI) for exploring spatial variability of lung and bronchus cancer (LBC) mortality rates in the contiguous USA

Zia U. Ahmed; Kang Sun; Michael Shelly; Lina Mu

doi:10.1038/s41598-021-03198-8

Scientific Reports (Dec 2021)

Explainable artificial intelligence (XAI) for exploring spatial variability of lung and bronchus cancer (LBC) mortality rates in the contiguous USA

Zia U. Ahmed,
Kang Sun,
Michael Shelly,
Lina Mu

Affiliations

Zia U. Ahmed: Research and Education in Energy, Environment and Water (RENEW) Institute, University at Buffalo
Kang Sun: Department of Civil, Structural and Environmental Engineering, University at Buffalo
Michael Shelly: Research and Education in Energy, Environment and Water (RENEW) Institute, University at Buffalo
Lina Mu: Department of Epidemiology and Environmental Health, University at Buffalo

DOI: https://doi.org/10.1038/s41598-021-03198-8
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Machine learning (ML) has demonstrated promise in predicting mortality; however, understanding spatial variation in risk factor contributions to mortality rate requires explainability. We applied explainable artificial intelligence (XAI) on a stack-ensemble machine learning model framework to explore and visualize the spatial distribution of the contributions of known risk factors to lung and bronchus cancer (LBC) mortality rates in the conterminous United States. We used five base-learners—generalized linear model (GLM), random forest (RF), Gradient boosting machine (GBM), extreme Gradient boosting machine (XGBoost), and Deep Neural Network (DNN) for developing stack-ensemble models. Then we applied several model-agnostic approaches to interpret and visualize the stack ensemble model's output in global and local scales (at the county level). The stack ensemble generally performs better than all the base learners and three spatial regression models. A permutation-based feature importance technique ranked smoking prevalence as the most important predictor, followed by poverty and elevation. However, the impact of these risk factors on LBC mortality rates varies spatially. This is the first study to use ensemble machine learning with explainable algorithms to explore and visualize the spatial heterogeneity of the relationships between LBC mortality and risk factors in the contiguous USA.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal