Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data [version 1; referees: 2 approved]

Linh Nguyen; Cuong C Dang; Pedro Ballester

doi:10.12688/f1000research.10529.1

F1000Research (Dec 2016)

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data [version 1; referees: 2 approved]

Linh Nguyen,
Cuong C Dang,
Pedro Ballester

Affiliations

Linh Nguyen: Cancer Research Center of Marseille UMR7258, Marseille, France
Cuong C Dang: Cancer Research Center of Marseille UMR7258, Marseille, France
Pedro Ballester: Cancer Research Center of Marseille UMR7258, Marseille, France

DOI: https://doi.org/10.12688/f1000research.10529.1
Journal volume & issue: Vol. 5

Abstract

Read online

Background: Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data. Methods: Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC50 measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than K-fold cross-validation. Results and Discussion: Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG. Conclusions: We now know that this type of models can predict in vitro tumour response to these drugs. These models can thus be further investigated on in vivo tumour models.

Published in F1000Research

ISSN: 2046-1402 (Online)
Publisher: F1000 Research Ltd
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://f1000research.com

About the journal

Abstract

Keywords