ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences (Jun 2021)

WHICH 3D DATA REPRESENTATION DOES THE CROWD LIKE BEST? CROWD-BASED ACTIVE LEARNING FOR COUPLED SEMANTIC SEGMENTATION OF POINT CLOUDS AND TEXTURED MESHES

  • M. Kölle,
  • D. Laupheimer,
  • V. Walter,
  • N. Haala,
  • U. Soergel

DOI
https://doi.org/10.5194/isprs-annals-V-2-2021-93-2021
Journal volume & issue
Vol. V-2-2021
pp. 93 – 100

Abstract

Read online

Semantic interpretation of multi-modal datasets is of great importance in many domains of geospatial data analysis. However, when training models for automated semantic segmentation, labeled training data is required and in case of multi-modality for each representation form of the scene. To completely avoid the time-consuming and cost-intensive involvement of an expert in the annotation procedure, we propose an Active Learning (AL) pipeline where a Random Forest classifier selects a subset of points sufficient for training and where necessary labels are received from the crowd. In this AL loop, we aim on coupled semantic segmentation of an Airborne Laser Scanning (ALS) point cloud and the corresponding 3D textured mesh generated from LiDAR data and imagery in a hybrid manner. Within this work we pursue two main objectives: i) We evaluate the performance of the AL pipeline applied to an ultra-high resolution ALS point cloud and a derived textured mesh (both benchmark datasets are available at https://ifpwww.ifp.uni-stuttgart.de/benchmark/hessigheim/default.aspx). ii) We investigate the capabilities of the crowd regarding interpretation of 3D geodata and observed that the crowd performs about 3 percentage points better when labeling meshes compared to point clouds. We additionally demonstrate that labels received solely by the crowd can power a machine learning system only differing in Overall Accuracy by less than 2 percentage points for the point cloud and less than 3 percentage points for the mesh, compared to using the completely labeled training pool. For deriving this sparse training set, we ask the crowd to label 0.25 % of available training points, resulting in costs of 190 $.