Frontiers in Nutrition (May 2024)
Identification of cachexia in lung cancer patients with an ensemble learning approach
Abstract
ObjectiveNutritional intervention prior to the occurrence of cachexia will significantly improve the survival rate of lung cancer patients. This study aimed to establish an ensemble learning model based on anthropometry and blood indicators without information on body weight loss to identify the risk factors of cachexia for early administration of nutritional support and for preventing the occurrence of cachexia in lung cancer patients.MethodsThis multicenter study included 4,712 lung cancer patients. The least absolute shrinkage and selection operator (LASSO) method was used to obtain the key indexes. The characteristics excluded weight loss information, and the study data were randomly divided into a training set (70%) and a test set (30%). The training set was used to select the optimal model among 18 models and verify the model performance. A total of 18 machine learning models were evaluated to predict the occurrence of cachexia, and their performance was determined using area under the curve (AUC), accuracy, precision, recall, F1 score, and Matthews correlation coefficient (MCC).ResultsAmong 4,712 patients, 1,392 (29.5%) patients were diagnosed with cachexia based on the framework of Fearon et al. A 17-variable gradient boosting classifier (GBC) model including body mass index (BMI), feeding situation, tumor stage, neutrophil-to-lymphocyte ratio (NLR), and some gastrointestinal symptoms was selected among the 18 machine learning models. The GBC model showed good performance in predicting cachexia in the training set (AUC = 0.854, accuracy = 0.819, precision = 0.771, recall = 0.574, F1 score = 0.658, MCC = 0.549, and kappa = 0.538). The abovementioned indicator values were also confirmed in the test set (AUC = 0.859, accuracy = 0.818, precision = 0.801, recall = 0.550, F1 score = 0.652, and MCC = 0.552, and kappa = 0.535). The learning curve, decision boundary, precision recall (PR) curve, the receiver operating curve (ROC), the classification report, and the confusion matrix in the test sets demonstrated good performance. The feature importance diagram showed the contribution of each feature to the model.ConclusionsThe GBC model established in this study could facilitate the identification of cancer cachexia in lung cancer patients without weight loss information, which would guide early implementation of nutritional interventions to decrease the occurrence of cachexia and improve the overall survival (OS).
Keywords