The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (Aug 2020)

AUTOMATICALLY GENERATED TRAINING DATA FOR LAND COVER CLASSIFICATION WITH CNNS USING SENTINEL-2 IMAGES

  • M. Voelsen,
  • J. Bostelmann,
  • A. Maas,
  • F. Rottensteiner,
  • C. Heipke

DOI
https://doi.org/10.5194/isprs-archives-XLIII-B3-2020-767-2020
Journal volume & issue
Vol. XLIII-B3-2020
pp. 767 – 774

Abstract

Read online

Pixel-wise classification of remote sensing imagery is highly interesting for tasks like land cover classification or change detection. The acquisition of large training data sets for these tasks is challenging, but necessary to obtain good results with deep learning algorithms such as convolutional neural networks (CNN). In this paper we present a method for the automatic generation of a large amount of training data by combining satellite imagery with reference data from an available geospatial database. Due to this combination of different data sources the resulting training data contain a certain amount of incorrect labels. We evaluate the influence of this so called label noise regarding the time difference between acquisition of the two data sources, the amount of training data and the class structure. We combine Sentinel-2 images with reference data from a geospatial database provided by the German Land Survey Office of Lower Saxony (LGLN). With different training sets we train a fully convolutional neural network (FCN) and classify four land cover classes (Building, Agriculture, Forest, Water). Our results show that the errors in the training samples do not have a large influence on the resulting classifiers. This is probably due to the fact that the noise is randomly distributed and thus, neighbours of incorrect samples are predominantly correct. As expected, a larger amount of training data improves the results, especially for the less well represented classes. Other influences are different illuminations conditions and seasonal effects during data acquisition. To better adapt the classifier to these different conditions they should also be included in the training data.