International Journal of Applied Earth Observations and Geoinformation (May 2024)

Class imbalance: A crucial factor affecting the performance of tea plantations mapping by machine learning

  • Yuanjun Xiao,
  • Jingfeng Huang,
  • Wei Weng,
  • Ran Huang,
  • Qi Shao,
  • Chang Zhou,
  • Shengcheng Li

Journal volume & issue
Vol. 129
p. 103849

Abstract

Read online

Due to disparities in area among various land cover types, class imbalance has always existed in crop mapping research, posing uncertainties in extracting minority classes which occupy a smaller area. In this paper, taking tea plantations mapping in Hangzhou city as an example, we created a series of training datasets with different imbalance-ratios (IRs), compared the accuracy between the extraction models using these datasets, and analyzed the impact of class imbalance on various machine learning algorithms (Artificial Neural Network, Decision Tree, Random Forest and XGBoost), aiming to provide a feasible approach to improve the mapping accuracy of minority classes. The leave-one-out cross validation results showed that in most cases, with the increase of the IR, the model’s F2-score first increased and then decreased, and the increase of F2-scores ranged from 0.2% to 29.2%, suggesting that moderately increasing the number of other samples in the training dataset can improve the tea plantations extraction accuracy. Consistent result can also be obtained by using the whole city’s samples for modeling and random sampling validation. XGBoost performed best among the four algorithms, which yielded the optimal tea plantations map with a PA of 97%, UA of 93% and F2-score of 96% when the IR of the training dataset was 6. The UA was 19% higher than that of the model using a balanced dataset (IR=1) and was 11% higher than that of the model using pseudo-balanced datasets created by the oversampling method. The conclusions of this study offer insights for the identification of minority classes, contributing to achieving higher accuracy in remote sensing crop mapping.

Keywords