Comparison of Feature Selection Methods—Modelling COPD Outcomes

Jorge Cabral; Pedro Macedo; Alda Marques; Vera Afreixo

doi:10.3390/math12091398

Mathematics (May 2024)

Comparison of Feature Selection Methods—Modelling COPD Outcomes

Jorge Cabral,
Pedro Macedo,
Alda Marques,
Vera Afreixo

Affiliations

Jorge Cabral: Center for Research and Development in Mathematics and Applications (CIDMA), Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal
Pedro Macedo: Center for Research and Development in Mathematics and Applications (CIDMA), Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal
Alda Marques: Respiratory Research and Rehabilitation Laboratory (Lab3R), School of Health Sciences (ESSUA) and Institute of Biomedicine (iBiMED), University of Aveiro, 3810-193 Aveiro, Portugal
Vera Afreixo: Center for Research and Development in Mathematics and Applications (CIDMA), Department of Mathematics, University of Aveiro, 3810-193 Aveiro, Portugal

DOI: https://doi.org/10.3390/math12091398
Journal volume & issue: Vol. 12, no. 9
p. 1398

Abstract

Read online

Selecting features associated with patient-centered outcomes is of major relevance yet the importance given depends on the method. We aimed to compare stepwise selection, least absolute shrinkage and selection operator, random forest, Boruta, extreme gradient boosting and generalized maximum entropy estimation and suggest an aggregated evaluation. We also aimed to describe outcomes in people with chronic obstructive pulmonary disease (COPD). Data from 42 patients were collected at baseline and at 5 months. Acute exacerbations were the aggregated most important feature in predicting the difference in the handgrip muscle strength (dHMS) and the COVID-19 lockdown group had an increased dHMS of 3.08 kg (CI95 ≈ [0.04, 6.11]). Pack-years achieved the highest importance in predicting the difference in the one-minute sit-to-stand test and no clinical change during lockdown was detected. Charlson comorbidity index was the most important feature in predicting the difference in the COPD assessment test (dCAT) and participants with severe values are expected to have a decreased dCAT of 6.51 points (CI95 ≈ [2.52, 10.50]). Feature selection methods yield inconsistent results, particularly extreme gradient boosting and random forest with the remaining. Models with features ordered by median importance had a meaningful clinical interpretation. Lockdown seem to have had a negative impact in the upper-limb muscle strength.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords