IEEE Access (Jan 2019)

Lightweight Convolutional Neural Network for Breast Cancer Classification Using RNA-Seq Gene Expression Data

  • Murtada K. Elbashir,
  • Mohamed Ezz,
  • Mohanad Mohammed,
  • Said S. Saloum

DOI
https://doi.org/10.1109/ACCESS.2019.2960722
Journal volume & issue
Vol. 7
pp. 185338 – 185348

Abstract

Read online

Gene expressions are considered among the most used features in cancer classification. The available gene expression data has a small number of samples and a relatively big number of dimensions, and that makes it not suitable for deep Convolutional Neural Networks (CNN) architectures, which exhibit state-of-the-art performance in many fields. In this paper, we propose a lightweight CNN architecture for breast cancer classification using gene expression data downloaded from Pan-Cancer Atlas using “Illumina HiSeq” platform. The downloaded gene expression data is preprocessed and then transformed into 2D-images. We started the preprocessing by removing the outlier samples, which are determined based on the Array-Array Intensity Correlation (AAIC), which defines a symmetric square matrix of Spearman correlation. Then we applied a normalization process on the gene expression data to ensure that we can infer the expression level from it correctly and avoid biases in the expression measures. Finally, filtering is applied on the data. Model selection or a parameters search strategy is conducted to choose the values of the CNN hyper-parameters that give optimal performance. Our experiments show that our proposed method achieves an accuracy of 98.76%, which is the highest compared to other competing methods.

Keywords