Machine-learning model for predicting depression in second-hand smokers in cross-sectional data using the Korea National Health and Nutrition Examination Survey

Na Hyun Kim; Myeongju Kim; Jong Soo Han; Hyoju Sohn; Bumjo Oh; Ji Won Lee; Sumin Ahn

doi:10.1177/20552076241257046

Digital Health (May 2024)

Machine-learning model for predicting depression in second-hand smokers in cross-sectional data using the Korea National Health and Nutrition Examination Survey

Na Hyun Kim,
Myeongju Kim,
Jong Soo Han,
Hyoju Sohn,
Bumjo Oh,
Ji Won Lee,
Sumin Ahn

Affiliations

Na Hyun Kim: Health Promotion Center, , Seongnam, South Korea
Myeongju Kim: Center for Artificial Intelligence in Healthcare, Seoul National University Bundang Hospital Healthcare Innovation Park, Seongnam, South Korea
Jong Soo Han: Health Promotion Center, , Seongnam, South Korea
Hyoju Sohn: Center for Artificial Intelligence in Healthcare, Seoul National University Bundang Hospital Healthcare Innovation Park, Seongnam, South Korea
Bumjo Oh: Department of Family Medicine, SMG-SNU Boramae Medical Center, Seoul, Republic of Korea
Ji Won Lee: Department of Urology, , Seongnam, South Korea
Sumin Ahn: Department of Digital Healthcare, , Seongnam, South Korea

DOI: https://doi.org/10.1177/20552076241257046
Journal volume & issue: Vol. 10

Abstract

Read online

Objective Depression among non-smokers at risk of second-hand smoke (SHS) exposure has been a neglected public health concern despite their vulnerability. The objective of this study was to develop high-performance machine-learning (ML) models for the prediction of depression in non-smokers and to identify important predictors of depression for second-hand smokers. Methods ML algorithms were created using demographic and clinical data from the Korea National Health and Nutrition Examination Survey (KNHANES) participants from 2014, 2016, and 2018 ( N = 11,463). The Patient Health Questionnaire was used to diagnose depression with a total score of 10 or higher. The final model was selected according to the area under the curve (AUC) or sensitivity. Shapley additive explanations (SHAP) were used to identify influential features. Results The light gradient boosting machine (LGBM) with the highest positive predictive value (PPV; 0.646) was selected as the best model among the ML algorithms, whereas the support vector machine (SVM) had the highest AUC (0.900). The most influential factors identified using the LGBM were stress perception, followed by subjective health status and quality of life. Among the smoking-related features, urine cotinine levels were the most important, and no linear relationship existed between the smoking-related features and the values of SHAP. Conclusions Compared with the previously developed ML models, our LGBM models achieved excellent and even superior performance in predicting depression among non-smokers at risk of SHS exposure, suggesting potential goals for depression-preventive interventions for non-smokers during public health crises.

Published in Digital Health

ISSN: 2055-2076 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://journals.sagepub.com/home/dhj

About the journal