Scientific Reports (Dec 2021)

Deep learning data augmentation for Raman spectroscopy cancer tissue classification

  • Man Wu,
  • Shuwen Wang,
  • Shirui Pan,
  • Andrew C. Terentis,
  • John Strasswimmer,
  • Xingquan Zhu

DOI
https://doi.org/10.1038/s41598-021-02687-0
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Recently, Raman Spectroscopy (RS) was demonstrated to be a non-destructive way of cancer diagnosis, due to the uniqueness of RS measurements in revealing molecular biochemical changes between cancerous vs. normal tissues and cells. In order to design computational approaches for cancer detection, the quality and quantity of tissue samples for RS are important for accurate prediction. In reality, however, obtaining skin cancer samples is difficult and expensive due to privacy and other constraints. With a small number of samples, the training of the classifier is difficult, and often results in overfitting. Therefore, it is important to have more samples to better train classifiers for accurate cancer tissue classification. To overcome these limitations, this paper presents a novel generative adversarial network based skin cancer tissue classification framework. Specifically, we design a data augmentation module that employs a Generative Adversarial Network (GAN) to generate synthetic RS data resembling the training data classes. The original tissue samples and the generated data are concatenated to train classification modules. Experiments on real-world RS data demonstrate that (1) data augmentation can help improve skin cancer tissue classification accuracy, and (2) generative adversarial network can be used to generate reliable synthetic Raman spectroscopic data.