Mathematics (May 2024)
Comparison of Feature Selection Methods—Modelling COPD Outcomes
Abstract
Selecting features associated with patient-centered outcomes is of major relevance yet the importance given depends on the method. We aimed to compare stepwise selection, least absolute shrinkage and selection operator, random forest, Boruta, extreme gradient boosting and generalized maximum entropy estimation and suggest an aggregated evaluation. We also aimed to describe outcomes in people with chronic obstructive pulmonary disease (COPD). Data from 42 patients were collected at baseline and at 5 months. Acute exacerbations were the aggregated most important feature in predicting the difference in the handgrip muscle strength (dHMS) and the COVID-19 lockdown group had an increased dHMS of 3.08 kg (CI95 ≈ [0.04, 6.11]). Pack-years achieved the highest importance in predicting the difference in the one-minute sit-to-stand test and no clinical change during lockdown was detected. Charlson comorbidity index was the most important feature in predicting the difference in the COPD assessment test (dCAT) and participants with severe values are expected to have a decreased dCAT of 6.51 points (CI95 ≈ [2.52, 10.50]). Feature selection methods yield inconsistent results, particularly extreme gradient boosting and random forest with the remaining. Models with features ordered by median importance had a meaningful clinical interpretation. Lockdown seem to have had a negative impact in the upper-limb muscle strength.
Keywords