IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

Permuted KPCA and SMOTE to Guide GAN-Based Oversampling for Imbalanced HSI Classification

  • Tajul Miftahushudur,
  • Bruce Grieve,
  • Hujun Yin

DOI
https://doi.org/10.1109/JSTARS.2023.3326963
Journal volume & issue
Vol. 17
pp. 489 – 505

Abstract

Read online

Lack of sufficient and balanced data is one of the major challenges in hyperspectral image classification. This problem can cause poor classification performance, especially in detecting or classifying samples of minority classes. The easiest way to overcome the problem is by resampling or creating synthetic samples to balance the class distributions. As the most advanced generative method, generative adversarial networks (GANs) have been used for generating synthetic data. However, GANs need a large amount or sufficient minority class data to train. In this article, we propose to leverage the synthetic minority oversampling technique (SMOTE) in GANs for creating high quality synthetic data to tackle the imbalance problem. The main idea is to train the generator of the GAN to synthesize data from pattern vectors instead of random noise vectors so to guide the GAN to produce data that can expand the minority class data on the decision boundaries. We used kernel principal component analysis and SMOTE to create the pattern vectors and used a silhouette score to control and prevent overlapping issues. In addition, we applied a self-attention module and an automatic data filter to further minimize potentially wrongly labeled or overlapping samples before being added into the training set. Experimental results on both hyperspectral and remote sensing datasets show that the proposed technique can generate more realistic, diverse, and unambiguous synthetic data, resulting in significantly improved classification performances over the existing oversampling techniques.

Keywords