Comparative performances of machine learning algorithms in radiomics and impacting factors

Antoine Decoux; Loic Duron; Paul Habert; Victoire Roblot; Emina Arsovic; Guillaume Chassagnon; Armelle Arnoux; Laure Fournier

doi:10.1038/s41598-023-39738-7

Scientific Reports (Aug 2023)

Comparative performances of machine learning algorithms in radiomics and impacting factors

Antoine Decoux,
Loic Duron,
Paul Habert,
Victoire Roblot,
Emina Arsovic,
Guillaume Chassagnon,
Armelle Arnoux,
Laure Fournier

Affiliations

Antoine Decoux: Université Paris Cité, PARCC UMRS 970, INSERM
Loic Duron: Université Paris Cité, PARCC UMRS 970, INSERM
Paul Habert: Université Paris Cité, PARCC UMRS 970, INSERM
Victoire Roblot: Université Paris Cité, PARCC UMRS 970, INSERM
Emina Arsovic: Université Paris Cité, PARCC UMRS 970, INSERM
Guillaume Chassagnon: Department of Radiology, Université Paris Cité, AP-HP, Hôpital Cochin
Armelle Arnoux: Unité de Recherche Clinique, Center d’Investigation Clinique 1418 Épidémiologie Clinique, Université Paris Cité, AP-HP, Hôpital Européen Georges Pompidou, INSERM
Laure Fournier: Department of Radiology, Université Paris Cité, AP-HP, Hôpital Européen Georges Pompidou, PARCC UMRS 970, INSERM

DOI: https://doi.org/10.1038/s41598-023-39738-7
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 10

Abstract

Read online

Abstract There are no current recommendations on which machine learning (ML) algorithms should be used in radiomics. The objective was to compare performances of ML algorithms in radiomics when applied to different clinical questions to determine whether some strategies could give the best and most stable performances regardless of datasets. This study compares the performances of nine feature selection algorithms combined with fourteen binary classification algorithms on ten datasets. These datasets included radiomics features and clinical diagnosis for binary clinical classifications including COVID-19 pneumonia or sarcopenia on CT, head and neck, orbital or uterine lesions on MRI. For each dataset, a train-test split was created. Each of the 126 (9 × 14) combinations of feature selection algorithms and classification algorithms was trained and tuned using a ten-fold cross validation, then AUC was computed. This procedure was repeated three times per dataset. Best overall performances were obtained with JMI and JMIM as feature selection algorithms and random forest and linear regression models as classification algorithms. The choice of the classification algorithm was the factor explaining most of the performance variation (10% of total variance). The choice of the feature selection algorithm explained only 2% of variation, while the train-test split explained 9%.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal