Pan-cancer integrative analysis of whole-genome De novo somatic point mutations reveals 17 cancer types

Amin Ghareyazi; Amirreza Kazemi; Kimia Hamidieh; Hamed Dashti; Maedeh Sadat Tahaei; Hamid R. Rabiee; Hamid Alinejad-Rokny; Iman Dehzangi

doi:10.1186/s12859-022-04840-6

BMC Bioinformatics (Jul 2022)

Pan-cancer integrative analysis of whole-genome De novo somatic point mutations reveals 17 cancer types

Amin Ghareyazi,
Amirreza Kazemi,
Kimia Hamidieh,
Hamed Dashti,
Maedeh Sadat Tahaei,
Hamid R. Rabiee,
Hamid Alinejad-Rokny,
Iman Dehzangi

Affiliations

Amin Ghareyazi: Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology
Amirreza Kazemi: Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology
Kimia Hamidieh: Department of Computer Science, University of Toronto
Hamed Dashti: Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology
Maedeh Sadat Tahaei: Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology
Hamid R. Rabiee: Bioinformatics and Computational Biology Lab, Department of Computer Engineering, Sharif University of Technology
Hamid Alinejad-Rokny: BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney
Iman Dehzangi: Department of Computer Science, Rutgers University

DOI: https://doi.org/10.1186/s12859-022-04840-6
Journal volume & issue: Vol. 23, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Background The advent of high throughput sequencing has enabled researchers to systematically evaluate the genetic variations in cancer, identifying many cancer-associated genes. Although cancers in the same tissue are widely categorized in the same group, they demonstrate many differences concerning their mutational profiles. Hence, there is no definitive treatment for most cancer types. This reveals the importance of developing new pipelines to identify cancer-associated genes accurately and re-classify patients with similar mutational profiles. Classification of cancer patients with similar mutational profiles may help discover subtypes of cancer patients who might benefit from specific treatment types. Results In this study, we propose a new machine learning pipeline to identify protein-coding genes mutated in many samples to identify cancer subtypes. We apply our pipeline to 12,270 samples collected from the international cancer genome consortium, covering 19 cancer types. As a result, we identify 17 different cancer subtypes. Comprehensive phenotypic and genotypic analysis indicates distinguishable properties, including unique cancer-related signaling pathways. Conclusions This new subtyping approach offers a novel opportunity for cancer drug development based on the mutational profile of patients. Additionally, we analyze the mutational signatures for samples in each subtype, which provides important insight into their active molecular mechanisms. Some of the pathways we identified in most subtypes, including the cell cycle and the Axon guidance pathways, are frequently observed in cancer disease. Interestingly, we also identified several mutated genes and different rates of mutation in multiple cancer subtypes. In addition, our study on “gene-motif” suggests the importance of considering both the context of the mutations and mutational processes in identifying cancer-associated genes. The source codes for our proposed clustering pipeline and analysis are publicly available at: https://github.com/bcb-sut/Pan-Cancer .

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords