A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.

Meysam Bastani; Larissa Vos; Nasimeh Asgarian; Jean Deschenes; Kathryn Graham; John Mackey; Russell Greiner

doi:10.1371/journal.pone.0082144

PLoS ONE (Jan 2013)

A machine learned classifier that uses gene expression data to accurately predict estrogen receptor status.

Meysam Bastani,
Larissa Vos,
Nasimeh Asgarian,
Jean Deschenes,
Kathryn Graham,
John Mackey,
Russell Greiner

Affiliations

Meysam Bastani
Larissa Vos
Nasimeh Asgarian
Jean Deschenes
Kathryn Graham
John Mackey
Russell Greiner

DOI: https://doi.org/10.1371/journal.pone.0082144
Journal volume & issue: Vol. 8, no. 12
p. e82144

Abstract

Read online

BACKGROUND: Selecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results. METHODS: To learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor. RESULTS: This produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status. CONCLUSIONS: Our efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions.

Published in PLoS ONE

ISSN: 1932-6203 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Medicine; Science
Website: https://journals.plos.org/plosone/

About the journal