Digital Health (May 2024)
Machine-learning model for predicting depression in second-hand smokers in cross-sectional data using the Korea National Health and Nutrition Examination Survey
Abstract
Objective Depression among non-smokers at risk of second-hand smoke (SHS) exposure has been a neglected public health concern despite their vulnerability. The objective of this study was to develop high-performance machine-learning (ML) models for the prediction of depression in non-smokers and to identify important predictors of depression for second-hand smokers. Methods ML algorithms were created using demographic and clinical data from the Korea National Health and Nutrition Examination Survey (KNHANES) participants from 2014, 2016, and 2018 ( N = 11,463). The Patient Health Questionnaire was used to diagnose depression with a total score of 10 or higher. The final model was selected according to the area under the curve (AUC) or sensitivity. Shapley additive explanations (SHAP) were used to identify influential features. Results The light gradient boosting machine (LGBM) with the highest positive predictive value (PPV; 0.646) was selected as the best model among the ML algorithms, whereas the support vector machine (SVM) had the highest AUC (0.900). The most influential factors identified using the LGBM were stress perception, followed by subjective health status and quality of life. Among the smoking-related features, urine cotinine levels were the most important, and no linear relationship existed between the smoking-related features and the values of SHAP. Conclusions Compared with the previously developed ML models, our LGBM models achieved excellent and even superior performance in predicting depression among non-smokers at risk of SHS exposure, suggesting potential goals for depression-preventive interventions for non-smokers during public health crises.