Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection

S. Sakira Hassan; Pekka Ruusuvuori; Leena Latonen; Heikki Huttunen

doi:10.4137/CIN.S30795

Cancer Informatics (Jan 2015)

Flow Cytometry-Based Classification in Cancer Research: A View on Feature Selection

S. Sakira Hassan,
Pekka Ruusuvuori,
Leena Latonen,
Heikki Huttunen

Affiliations

S. Sakira Hassan: Department of Signal Processing, Tampere University of Technology, Tampere, Finland.
Pekka Ruusuvuori: BioMediTech, University of Tampere, Tampere, Finland.
Leena Latonen: BioMediTech, University of Tampere, Tampere, Finland.
Heikki Huttunen: Department of Signal Processing, Tampere University of Technology, Tampere, Finland.

DOI: https://doi.org/10.4137/CIN.S30795
Journal volume & issue: Vol. 14s5

Abstract

Read online

In this paper, we study the problem of feature selection in cancer-related machine learning tasks. In particular, we study the accuracy and stability of different feature selection approaches within simplistic machine learning pipelines. Earlier studies have shown that for certain cases, the accuracy of detection can easily reach 100% given enough training data. Here, however, we concentrate on simplifying the classification models with and seek for feature selection approaches that are reliable even with extremely small sample sizes. We show that as much as 50% of features can be discarded without compromising the prediction accuracy. Moreover, we study the model selection problem among the ℓ 1 regularization path of logistic regression classifiers. To this aim, we compare a more traditional cross-validation approach with a recently proposed Bayesian error estimator.

Published in Cancer Informatics

ISSN: 1176-9351 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Website: https://journals.sagepub.com/home/cix

About the journal