IEEE Access (Jan 2023)

Sim-to-Real Transfer for Object Detection in Aerial Inspections of Transmission Towers

  • Augusto J. Peterlevitz,
  • Mateus A. Chinelatto,
  • Angelo G. Menezes,
  • Cezanne A. M. Motta,
  • Guilherme A. B. Pereira,
  • Gustavo L. Lopes,
  • Gustavo De M. Souza,
  • Juan Rodrigues,
  • Lilian C. Godoy,
  • Mario A. F. F. Koller,
  • Mateus O. Cabral,
  • Nicole E. Alves,
  • Paulo H. Silva,
  • Ricardo Cherobin,
  • Roberto A. O. Yamamoto,
  • Ricardo D. Da Silva

DOI
https://doi.org/10.1109/ACCESS.2023.3322374
Journal volume & issue
Vol. 11
pp. 110312 – 110327

Abstract

Read online

Training deep learning models for object detection usually requires a large amount of data, a condition that is not common for most real-world applications, especially in the context of aerial imagery. One possible solution is the use of simulators to generate synthetic data. For a good generalization, the model must be able to learn on the simulated data and perform correctly on the real data, process known as sim-to-real transfer. In this work, we analyze the generation of synthetic data to account for a data-scarce real-world scenario, which includes aerial imagery and object detection of transmission towers and their components. We evaluate the impact of image-to-image translation methods as domain adaptation techniques. In this analysis, we explore training strategies to mitigate the domain shift between synthetic and real data. According to our experimental results, the use of domain-adapted data through image-to-image translation could slightly improve the detection performance in real test data when compared to training with raw synthetic images only or with small datasets of real data, although it was noted through a visual analysis that objects with small bounding boxes, like clamp, anchoring clamp and ball link, could be distorted or vanished by the application of image-to-image translation methods. Additionally, when only a small subset of real data is available, training with both real and synthetic data at once led to better detection results, surpassing combinations of pre-training on synthetic and fine-tuning on real data.

Keywords