Remote Sensing (Dec 2021)

Machine Learning Classification of Endangered Tree Species in a Tropical Submontane Forest Using WorldView-2 Multispectral Satellite Imagery and Imbalanced Dataset

  • Colbert M. Jackson,
  • Elhadi Adam

DOI
https://doi.org/10.3390/rs13244970
Journal volume & issue
Vol. 13, no. 24
p. 4970

Abstract

Read online

Accurate maps of the spatial distribution of tropical tree species provide valuable insights for ecologists and forest management. The discrimination of tree species for economic, ecological, and technical reasons is usually necessary for achieving promising results in tree species mapping. Most of the data used in tree species mapping normally have some degree of imbalance. This study aimed to assess the effects of imbalanced data in identifying and mapping trees species under threat in a selectively logged sub-montane heterogeneous tropical forest using random forest (RF) and support vector machine with radial basis function (RBF-SVM) kernel classifiers and WorldView-2 multispectral imagery. For comparison purposes, the original imbalanced dataset was standardized using three data sampling techniques: oversampling, undersampling, and combined oversampling and undersampling techniques in R. The combined oversampling and undersampling technique produced the best results: F1-scores of 68.56 ± 2.6% for RF and 64.64 ± 3.4% for SVM. The balanced dataset recorded improved classification accuracy compared to the original imbalanced dataset. This research observed that more separable classes recorded higher F1-scores. Among the species, Syzygium guineense and Zanthoxylum gilletii were the most accurately mapped whereas Newtonia buchananii was the least accurately mapped. The most important spectral bands with the ability to detect and distinguish between tree species as measured by random forest classifier, were the Red, Red Edge, Near Infrared 1, and Near Infrared 2.

Keywords