The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (May 2022)
UNSUPERVISED HARMONIOUS IMAGE COMPOSITION FOR DISASTER VICTIM DETECTION
Abstract
Deep detection networks trained with a large amount of annotated data achieve high accuracy in detecting various objects, such as pedestrians, cars, lanes, etc. These models have been deployed and used in many scenarios. A disaster victim detector is very useful when searching for victims who are partially buried by debris caused by earthquake or building collapse. However, considering that larger quantities of real images with buried victims are difficult to obtain for training, a deep detector model cannot give full play to its advantages. In this paper we generate realistic images for training a victim detector. We first randomly cut out human body parts from an open source human data set and paste them into the ruins background images. Then, we propose an unsupervised generative adversarial network (GAN) to harmonize the body parts to fit the style (illumination, texture and color characteristics) of the background. These generated images are finally used to fine-tune a detection network YOLOv5. We evaluate both the AP (average precision) for IoU (Intersection over Union) 0.5 and for IoU ∈ [0.5:0.05:0.95], which are denoted as AP@0:5 and AP@[.5 : .95], respectively. The best experimental results show that the YOLOv5l pre-trained on the COCO data set performs poorly on detecting victims, and the AP@[.5 : .95] is only 19.5%. The model that uses our composite images as fine-tuning data can effectively detect victims, and increases the AP@[.5 : .95] to 33.6%. The AP@0:5 increases from 32.4% to 53.4%. Our unsupervised harmonization method further improves the results by 2.1% and 6.1%, respectively.