IEEE Access (Jan 2022)

A Novel Machine Learning Model for Identifying Patient-Specific Cancer Driver Genes

  • Heewon Jung,
  • Jonghwan Choi,
  • Jiwoo Park,
  • Jaegyoon Ahn

DOI
https://doi.org/10.1109/ACCESS.2022.3176376
Journal volume & issue
Vol. 10
pp. 54245 – 54253

Abstract

Read online

The identification of patient-specific cancer driver genes plays a crucial role in the development of personalized cancer treatment and drug development. Several computational methods have been proposed for identifying patient-specific cancer driver genes, most of which rank driver genes ac-cording to scores calculated from various gene or protein network information. In this paper, we propose a machine learning model for more accurate identification of patient-specific cancer driver genes. The training data for the proposed model is composed of the gene vectors, which indicate the impacts that one gene can have on or receive from all the genes. The gene vector is patient-specific, in other words, one gene can have many gene vectors from many cancer patients. To make gene vectors, first a patient-specific gene network is built using the gene expression data of each cancer patient and gene regulatory network, then modified PageRank is applied to the patient-specific gene network to make the impact matrix, from which gene vectors can be extracted. We used the Random Forest model to train gene vectors to find and discriminate patterns that show how known driver genes affect, or are affected by, other genes. The proposed model was tested through cross validations and independent tests using different sets of known cancer driver genes and six cancer types from The Cancer Genome Atlas (TCGA) data, and showed higher F-scores than existing patient-specific driver gene identification algorithms. The majority of predicted driver genes were rare, and F-scores calculated with these rare genes are higher than or comparable to those of frequently identified driver genes.

Keywords