Remote Sensing (Sep 2022)

A Hybrid Classification of Imbalanced Hyperspectral Images Using ADASYN and Enhanced Deep Subsampled Multi-Grained Cascaded Forest

  • Debaleena Datta,
  • Pradeep Kumar Mallick,
  • Annapareddy V. N. Reddy,
  • Mazin Abed Mohammed,
  • Mustafa Musa Jaber,
  • Abed Saif Alghawli,
  • Mohammed A. A. Al-qaness

DOI
https://doi.org/10.3390/rs14194853
Journal volume & issue
Vol. 14, no. 19
p. 4853

Abstract

Read online

Hyperspectral image (HSI) analysis generally suffers from issues such as high dimensionality, imbalanced sample sets for different classes, and the choice of classifiers for artificially balanced datasets. The existing conventional data imbalance removal techniques and forest classifiers lack a more efficient approach to dealing with the aforementioned issues. In this study, we propose a novel hybrid methodology ADASYN-enhanced subsampled multi-grained cascade forest (ADA-Es-gcForest) which comprises four folds: First, we extracted the most discriminative global spectral features by reducing the vast dimensions, i.e., the redundant bands using principal component analysis (PCA). Second, we applied the subsampling-based adaptive synthetic minority oversampling method (ADASYN) to augment and balance the dataset. Third, we used the subsampled multi-grained scanning (Mg-sc) to extract the minute local spatial–spectral features by adaptively creating windows of various sizes. Here, we used two different forests—a random forest (RF) and a complete random forest (CRF)—to generate the input joint-feature vectors of different dimensions. Finally, for classification, we used the enhanced deep cascaded forest (CF) that improvised in the dimension reduction of the feature vectors and increased the connectivity of the information exchange between the forests at the different levels, which elevated the classifier model’s accuracy in predicting the exact class labels. Furthermore, the experiments were accomplished by collecting the three most appropriate, publicly available his landcover datasets—the Indian Pines (IP), Salinas Valley (SV), and Pavia University (PU). The proposed method achieved 91.47%, 98.76%, and 94.19% average accuracy scores for IP, SV, and PU datasets. The validity of the proposed methodology was testified against the contemporary state-of-the-art eminent tree-based ensembled methods, namely, RF, rotation forest (RoF), bagging, AdaBoost, extreme gradient boost, and deep multi-grained cascade forest (DgcForest), by simulating it numerically. Our proposed model achieved correspondingly higher accuracies than those classifiers taken for comparison for all the HS datasets.

Keywords