BMC Bioinformatics (Mar 2021)

A machine learning-based gene signature of response to the novel alkylating agent LP-184 distinguishes its potential tumor indications

  • Umesh Kathad,
  • Aditya Kulkarni,
  • Joseph Ryan McDermott,
  • Jordan Wegner,
  • Peter Carr,
  • Neha Biyani,
  • Rama Modali,
  • Jean-Philippe Richard,
  • Panna Sharma,
  • Kishor Bhatia

DOI
https://doi.org/10.1186/s12859-021-04040-8
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 23

Abstract

Read online

Abstract Background Non-targeted cytotoxics with anticancer activity are often developed through preclinical stages using response criteria observed in cell lines and xenografts. A panel of the NCI-60 cell lines is frequently the first line to define tumor types that are optimally responsive. Open data on the gene expression of the NCI-60 cell lines, provides a unique opportunity to add another dimension to the preclinical development of such drugs by interrogating correlations with gene expression patterns. Machine learning can be used to reduce the complexity of whole genome gene expression patterns to derive manageable signatures of response. Application of machine learning in early phases of preclinical development is likely to allow a better positioning and ultimate clinical success of molecules. LP-184 is a highly potent novel alkylating agent where the preclinical development is being guided by a dedicated machine learning-derived response signature. We show the feasibility and the accuracy of such a signature of response by accurately predicting the response to LP-184 validated using wet lab derived IC50s on a panel of cell lines. Results We applied our proprietary RADR® platform to an NCI-60 discovery dataset encompassing LP-184 IC50s and publicly available gene expression data. We used multiple feature selection layers followed by the XGBoost regression model and reduced the complexity of 20,000 gene expression values to generate a 16-gene signature leading to the identification of a set of predictive candidate biomarkers which form an LP-184 response gene signature. We further validated this signature and predicted response to an additional panel of cell lines. Considering fold change differences and correlation between actual and predicted LP-184 IC50 values as validation performance measures, we obtained 86% accuracy at four-fold cut-off, and a strong (r = 0.70) and significant (p value 1.36e−06) correlation between actual and predicted LP-184 sensitivity. In agreement with the perceived mechanism of action of LP-184, PTGR1 emerged as the top weighted gene. Conclusion Integration of a machine learning-derived signature of response with in vitro assessment of LP-184 efficacy facilitated the derivation of manageable yet robust biomarkers which can be used to predict drug sensitivity with high accuracy and clinical value.

Keywords