Automated data preparation for in vivo tumor characterization with machine learning

Denis Krajnc; Clemens P. Spielvogel; Clemens P. Spielvogel; Marko Grahovac; Boglarka Ecsedi; Sazan Rasul; Nina Poetsch; Tatjana Traub-Weidinger; Alexander R. Haug; Alexander R. Haug; Zsombor Ritter; Hussain Alizadeh; Marcus Hacker; Thomas Beyer; Laszlo Papp; Laszlo Papp

doi:10.3389/fonc.2022.1017911

Frontiers in Oncology (Oct 2022)

Automated data preparation for in vivo tumor characterization with machine learning

Denis Krajnc,
Clemens P. Spielvogel,
Clemens P. Spielvogel,
Marko Grahovac,
Boglarka Ecsedi,
Sazan Rasul,
Nina Poetsch,
Tatjana Traub-Weidinger,
Alexander R. Haug,
Alexander R. Haug,
Zsombor Ritter,
Hussain Alizadeh,
Marcus Hacker,
Thomas Beyer,
Laszlo Papp,
Laszlo Papp

Affiliations

Denis Krajnc: QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
Clemens P. Spielvogel: Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Clemens P. Spielvogel: Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna, Vienna, Austria
Marko Grahovac: Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Boglarka Ecsedi: QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
Sazan Rasul: Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Nina Poetsch: Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Tatjana Traub-Weidinger: Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Alexander R. Haug: Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Alexander R. Haug: Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna, Vienna, Austria
Zsombor Ritter: Department of Medical Imaging, University of Pécs, Medical School, Pécs, Hungary
Hussain Alizadeh: 1st Department of Internal Medicine, University of Pécs, Medical School, Pécs, Hungary
Marcus Hacker: Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Thomas Beyer: QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
Laszlo Papp: QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
Laszlo Papp: Applied Quantum Computing group, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria

DOI: https://doi.org/10.3389/fonc.2022.1017911
Journal volume & issue: Vol. 12

Abstract

Read online

BackgroundThis study proposes machine learning-driven data preparation (MLDP) for optimal data preparation (DP) prior to building prediction models for cancer cohorts.MethodsA collection of well-established DP methods were incorporated for building the DP pipelines for various clinical cohorts prior to machine learning. Evolutionary algorithm principles combined with hyperparameter optimization were employed to iteratively select the best fitting subset of data preparation algorithms for the given dataset. The proposed method was validated for glioma and prostate single center cohorts by 100-fold Monte Carlo (MC) cross-validation scheme with 80-20% training-validation split ratio. In addition, a dual-center diffuse large B-cell lymphoma (DLBCL) cohort was utilized with Center 1 as training and Center 2 as independent validation datasets to predict cohort-specific clinical endpoints. Five machine learning (ML) classifiers were employed for building prediction models across all analyzed cohorts. Predictive performance was estimated by confusion matrix analytics over the validation sets of each cohort. The performance of each model with and without MLDP, as well as with manually-defined DP were compared in each of the four cohorts.ResultsSixteen of twenty established predictive models demonstrated area under the receiver operator characteristics curve (AUC) performance increase utilizing the MLDP. The MLDP resulted in the highest performance increase for random forest (RF) (+0.16 AUC) and support vector machine (SVM) (+0.13 AUC) model schemes for predicting 36-months survival in the glioma cohort. Single center cohorts resulted in complex (6-7 DP steps) DP pipelines, with a high occurrence of outlier detection, feature selection and synthetic majority oversampling technique (SMOTE). In contrast, the optimal DP pipeline for the dual-center DLBCL cohort only included outlier detection and SMOTE DP steps.ConclusionsThis study demonstrates that data preparation prior to ML prediction model building in cancer cohorts shall be ML-driven itself, yielding optimal prediction models in both single and multi-centric settings.

Published in Frontiers in Oncology

ISSN: 2234-943X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Website: https://www.frontiersin.org/journals/oncology/

About the journal

Abstract

Keywords