IEEE Access (Jan 2024)

On Enhancing Crack Semantic Segmentation Using StyleGAN and Brownian Bridge Diffusion

  • Collins C. Rakowski,
  • Thirimachos Bourlai

DOI
https://doi.org/10.1109/ACCESS.2024.3368376
Journal volume & issue
Vol. 12
pp. 34769 – 34784

Abstract

Read online

Inspection for cracks is an essential yet labor-intensive aspect of maintenance for structures in active service bridges. Deep learning networks, combined with an abundance of segmented image data representing various types of cracks, enable the development of a computer vision-based solution. Often, segmentation data is scarce and requires a great deal of time to annotate. This paper introduces a novel approach to structural crack detection using synthetic data generation and advanced semantic segmentation models. We employ StyleGAN3 and the Brownian Bridge Diffusion Model (BBDM) to create a diverse and realistic dataset of synthetic structural crack images, addressing the critical challenge of creating segmentation data in training deep learning models for crack detection. Our methodology is based upon the DeepLabV3+, i.e., a semantic segmentation architecture that builds on DeepLabv3 by adding a simple yet effective decoder module to enhance segmentation results. The original DeepLabV3+ model is insufficient and thus, we first perform a meticulous hyperparameter tuning, which is responsible for about a 10% increase in overall performance. Next, we generate multiple image-to-image translations with BBDMs synthetic datasets and pair them with a set of fine-selected data augmentation techniques, including motion, zoom, and defocus blur, to improve crack segmentation performance. When compared to the state-of-the-art latest work on the same database that achieved an accuracy of 61.49%, our proposed work attains a Mean Intersection over Union (MeanIoU) accuracy of 65.62% through ensemble modeling on multiple synthesized datasets, employing a majority voting strategy. We also showcase the potential of diffusion models in synthetically generated datasets that elevate semantic segmentation accuracy and introduce blur augmentation as a viable technique for enhancing model robustness. The results indicate that our approach not only surpasses conventional methods in terms of MeanIoU but also offers a new avenue of research into diffusion-model-based synthetic image generation for improved semantic segmentation performance.

Keywords