Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer

Kevin R. Coombes; Christopher J. Logothetis; Timothy J. McDonnell; Spyros Tsavachidis; Sijin Wen; Kim Anh Do; Jing Wang

Cancer Informatics (Jan 2006)

Merging microarray data, robust feature selection, and predicting prognosis in prostate cancer

Kevin R. Coombes,
Christopher J. Logothetis,
Timothy J. McDonnell,
Spyros Tsavachidis,
Sijin Wen,
Kim Anh Do,
Jing Wang

Affiliations

Kevin R. Coombes
Christopher J. Logothetis
Timothy J. McDonnell
Spyros Tsavachidis
Sijin Wen
Kim Anh Do
Jing Wang

Journal volume & issue: Vol. 2
pp. 87 – 97

Abstract

Read online

Motivation: Individual microarray studies searching for prognostic biomarkers often have few samples and low statistical power; however, publicly accessible data sets make it possible to combine data across studies.Method: We present a novel approach for combining microarray data across institutions and platforms. We introduce a new algorithm, robust greedy feature selection (RGFS), to select predictive genes.Results: We combined two prostate cancer microarray data sets, confirmed the appropriateness of the approach with the Kolmogorov-Smirnov goodness-of-fit test, and built several predictive models. The best logistic regression model with stepwise forward selection used 7 genes and had a misclassification rate of 31%. Models that combined LDA with different feature selection algorithms had misclassification rates between 19% and 33%, and the sets of genes in the models varied substantially during cross-validation. When we combined RGFS with LDA, the best model used two genes and had a misclassification rate of 15%.Availability: Affymetrix U95Av2 array data are available at http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi. The cDNA microarray data are available through the Stanford Microarray Database (http://cmgm.stanford.edu/pbrown/). GeneLink software is freely available at http://bioinformatics.mdanderson.org/GeneLink/. DNA-Chip Analyzer software is publicly available at http://biosun1.harvard.edu/complab/dchip/.

Published in Cancer Informatics

ISSN: 1176-9351 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Website: https://journals.sagepub.com/home/cix

About the journal

Abstract

Keywords