Applied Sciences (Jul 2020)

The Classification Performance and Mechanism of Machine Learning Algorithms in Winter Wheat Mapping Using Sentinel-2 10 m Resolution Imagery

  • Peng Fang,
  • Xiwang Zhang,
  • Panpan Wei,
  • Yuanzheng Wang,
  • Huiyi Zhang,
  • Feng Liu,
  • Jun Zhao

DOI
https://doi.org/10.3390/app10155075
Journal volume & issue
Vol. 10, no. 15
p. 5075

Abstract

Read online

Machine learning algorithms are crucial for crop identification and mapping. However, many works only focus on the identification results of these algorithms, but pay less attention to their classification performance and mechanism. In this paper, based on Google Earth Engine (GEE), Sentinel-2 10 m resolution images during a specific phenological period of winter wheat were obtained. Then, support vector machine (SVM), random forest (RF), and classification and regression tree (CART) machine learning algorithms were employed to identify and map winter wheat in a large-scale area. The hyperparameters of the three machine learning algorithms were tuned by grid search and the 5-fold cross-validation method. The classification performance of the three machine learning algorithms were compared, the results of which demonstrate that SVM achieves best performance in identifying winter wheat, and its overall accuracy (OA), user’s accuracy (UA), producer’s accuracy (PA), and kappa coefficient (Kappa) are 0.94, 0.95, 0.95, and 0.92, respectively. Moreover, 50 various combinations of training and validation sets were used to analyze the generalization ability of the algorithms, and the results show that the average OA of SVM, RF, and CART are 0.93, 0.92, and 0.88, respectively, thus indicating that SVM and RF are more robust than CART. To further explore the sensitivity of SVM, RF, and CART to variations of the algorithm parameters—namely, (C and gamma), (tree and split), and (maxD and minSP)—we employed the grid search method to iterate these parameters, respectively, and to analyze the effect of these parameters on the accuracy scores and classification residuals. It was found that with the change of (C and gamma) in (0.01~1000), SVM’s maximum variation of accuracy score is up to 0.63, and the maximum variation of residuals is 76,215 km2. We concluded that SVM is sensitive to the parameters (C and gamma) and presents a positive correlation. When the parameters (tree and split) change between (100~600) and (1~6), respectively, the RF’s maximum variation of accuracy score is 0.08, and the maximum variation of residuals is 1157 km2, indicating that RF is low in sensitivity toward the parameters (tree and split). When the parameters (maxD and minSP) are between (10~60), the maximum accuracy change value is 0.06, and the maximum variation of residuals is 6943 km2. Therefore, compared to RF, CART is sensitive to the parameters (maxD and minSP) and has poor robustness. In general, under the conditions of the hyperparameters, SVM and RF exhibit optimal classification performance, while CART has relatively inferior performance. Meanwhile, SVM, RF, and CART have different sensitivities toward the algorithm parameters; that is, SVM and CART are more sensitive to the algorithm parameters, while RF has low sensitivity toward changes in the algorithm parameters. The different parameters cause great changes in the accuracy scores and residuals, so it is necessary to determine the algorithm hyperparameters. Generally, default parameters can be used to achieve crop classification, but we recommend the enumeration method, similar to grid search, as a practical way to improve the classification performance of the algorithm if the best classification effect is expected.

Keywords