Healthcare Analytics (Nov 2022)
The determinants of health assessment in the United States: A supervised learning approach
Abstract
In this article, we exploit a large dataset of surveys to answer a simple questions: which factors drive good (or bad) health? Using a set of 14 very diverse predictors (both socioeconomic and physiological), we perform sets of supervised learning tasks to determine which variables best explain the self-assessment of health conditions. Our predictive algorithms range from simple regressions to tabular networks and include random forests, all of which allow for some interpretability, directly or indirectly, either via feature importance or via conditional permutation influence of the trained models. Our results indicate that two indicators, in particular, emerge as potent determinants of physical well-being: income and exercise. The body mass index is the third main driver, though its role is less prominent. Importantly, for reproducibility, the dataset used in the study is in open access.