BMC Cancer (Jan 2020)

PDAC-ANN: an artificial neural network to predict pancreatic ductal adenocarcinoma based on gene expression

  • Palloma Porto Almeida,
  • Cristina Padre Cardoso,
  • Leandro Martins de Freitas

DOI
https://doi.org/10.1186/s12885-020-6533-0
Journal volume & issue
Vol. 20, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Although the pancreatic ductal adenocarcinoma (PDAC) presents high mortality and metastatic potential, there is a lack of effective therapies and a low survival rate for this disease. This PDAC scenario urges new strategies for diagnosis, drug targets, and treatment. Methods We performed a gene expression microarray meta-analysis of the tumor against normal tissues in order to identify differentially expressed genes (DEG) shared among all datasets, named core-genes (CG). We confirmed the CG protein expression in pancreatic tissue through The Human Protein Atlas. It was selected five genes with the highest area under the curve (AUC) among these proteins with expression confirmed in the tumor group to train an artificial neural network (ANN) to classify samples. Results This microarray included 461 tumor and 187 normal samples. We identified a CG composed of 40 genes, 39 upregulated, and one downregulated. The upregulated CG included proteins and extracellular matrix receptors linked to actin cytoskeleton reorganization. With the Human Protein Atlas, we verified that fourteen genes of the CG are translated, with high or medium expression in most of the pancreatic tumor samples. To train our ANN, we selected the best genes (AHNAK2, KRT19, LAMB3, LAMC2, and S100P) to classify the samples based on AUC using mRNA expression. The network classified tumor samples with an f1-score of 0.83 for the normal samples and 0.88 for the PDAC samples, with an average of 0.86. The PDAC-ANN could classify the test samples with a sensitivity of 87.6 and specificity of 83.1. Conclusion The gene expression meta-analysis and confirmation of the protein expression allow us to select five genes highly expressed PDAC samples. We could build a python script to classify the samples based on RNA expression. This software can be useful in the PDAC diagnosis.

Keywords