Deep Learning for Multi-Tissue Cancer Classification of Gene Expressions (GeneXNet)

Tarek Khorshed; Mohamed N. Moustafa; Ahmed Rafea

doi:10.1109/ACCESS.2020.2992907

IEEE Access (Jan 2020)

Deep Learning for Multi-Tissue Cancer Classification of Gene Expressions (GeneXNet)

Tarek Khorshed,
Mohamed N. Moustafa,
Ahmed Rafea

Affiliations

Tarek Khorshed: ORCiD; Department of Computer Science and Engineering, The American University in Cairo, New Cairo, Egypt
Mohamed N. Moustafa: ORCiD; Department of Computer Science and Engineering, The American University in Cairo, New Cairo, Egypt
Ahmed Rafea: ORCiD; Department of Computer Science and Engineering, The American University in Cairo, New Cairo, Egypt

DOI: https://doi.org/10.1109/ACCESS.2020.2992907
Journal volume & issue: Vol. 8
pp. 90615 – 90629

Abstract

Read online

Cancer classification using gene expressions is extremely challenging given the complexity and high dimensionality of the data. Current classification methods typically rely on samples collected from a single tissue type and perform a prerequisite of gene feature selection to avoid processing the full set of genes. These methods fall short in taking advantage of genome-wide next generation sequencing technologies which provide a snapshot of the whole transcriptome rather than a predetermined subset of genes. We propose a deep learning framework for cancer diagnosis by developing a multi-tissue cancer classifier based on whole-transcriptome gene expressions collected from multiple tumor types covering multiple organ sites. We introduce a new Convolutional Neural Network architecture called Gene eXpression Network (GeneXNet), which is specifically designed to address the complex nature of gene expressions. Our proposed GeneXNet provides capabilities of detecting genetic alterations driving cancer progression by learning genomic signatures across multiple tissue types without requiring the prerequisite of gene feature selection. Our model achieves 98.9% classification accuracy on human samples representing 33 different cancer tumor types across 26 organ sites. We demonstrate how our model can be used for transfer learning to build classifiers for tumors lacking sufficient samples to be trained independently. We introduce visualization procedures to provide biological insight on how our model is performing classification across multiple tumors.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords