Remote Sensing (Jun 2024)
Identifying Plausible Labels from Noisy Training Data for a Land Use and Land Cover Classification Application in Amazônia Legal
Abstract
Most studies in the field of land use and land cover (LULC) classification in remote sensing rely on supervised classification, which requires a substantial amount of accurate label data. However, reliable data are often not immediately available, and are obtained through time-consuming manual labor. One potential solution to this problem is the use of already available classification maps, which may not be the true ground truth and may contain noise from multiple possible sources. This is also true for the classification maps of the MapBiomas project, which provides land use and land cover (LULC) maps on a yearly basis, classifying the Amazon basin into more than 24 classes based on the Landsat data. In this study, we utilize the Sentinel-2 data with a higher spatial resolution in conjunction with the MapBiomas maps to evaluate a proposed noise removal method and to improve classification results. We introduce a novel noise detection method that relies on identifying anchor points in feature space through clustering with self-organizing maps (SOM). The pixel label is relabeled using nearest neighbor rules, or can be removed if it is unknown. A challenge in this approach is the quantification of noise in such a real-world dataset. To overcome this problem, highly reliable validation sets were manually created for quantitative performance assessment. The results demonstrate a significant increase in overall accuracy compared to MapBiomas labels, from 79.85% to 89.65%. Additionally, we trained the L2HNet using both MapBiomas labels and the filtered labels from our approach. The overall accuracy for this model reached 93.75% with the filtered labels, compared to the baseline of 74.31%. This highlights the significance of noise detection and filtering in remote sensing, and emphasizes the need for further research in this area.
Keywords