Land (Oct 2022)

A Joint Bayesian Optimization for the Classification of Fine Spatial Resolution Remotely Sensed Imagery Using Object-Based Convolutional Neural Networks

  • Omer Saud Azeez,
  • Helmi Z. M. Shafri,
  • Aidi Hizami Alias,
  • Nuzul Azam Haron

DOI
https://doi.org/10.3390/land11111905
Journal volume & issue
Vol. 11, no. 11
p. 1905

Abstract

Read online

In recent years, deep learning-based image classification has become widespread, especially in remote sensing applications, due to its automatic and strong feature extraction capability. However, as deep learning methods operate on rectangular-shaped image patches, they cannot accurately extract objects’ boundaries, especially in complex urban settings. As a result, combining deep learning and object-based image analysis (OBIA) has become a new avenue in remote sensing studies. This paper presents a novel approach for combining convolutional neural networks (CNN) with OBIA based on joint optimization of segmentation parameters and deep feature extraction. A Bayesian technique was used to find the best parameters for the multiresolution segmentation (MRS) algorithm while the CNN model learns the image features at different layers, achieving joint optimization. The proposed classification model achieved the best accuracy, with 0.96 OA, 0.95 Kappa, and 0.96 mIoU in the training area and 0.97 OA, 0.96 Kappa, and 0.97 mIoU in the test area, outperforming several benchmark methods including Patch CNN, Center OCNN, Random OCNN, and Decision Fusion. The analysis of CNN variants within the proposed classification workflow showed that the HybridSN model achieved the best results compared to 2D and 3D CNNs. The 3D CNN layers and combining 3D and 2D CNN layers (HybridSN) yielded slightly better accuracies than the 2D CNN layers regarding geometric fidelity, object boundary extraction, and separation of adjacent objects. The Bayesian optimization could find comparable optimal MRS parameters for the training and test areas, with excellent quality measured by AFI (0.046, −0.037) and QR (0.945, 0.932). In the proposed model, higher accuracies could be obtained with larger patch sizes (e.g., 9 × 9 compared to 3 × 3). Moreover, the proposed model is computationally efficient, with the longest training being fewer than 25 s considering all the subprocesses and a single training epoch. As a result, the proposed model can be used for urban and environmental applications that rely on VHR satellite images and require information about land use.

Keywords