ON THE ACCURACY OF YOLOV8-CNN REGARDING DETECTION OF HUMANS IN NADIR AERIAL IMAGES FOR SEARCH AND RESCUE APPLICATIONS

J. Berndt; H. Meißner; T. Kraft

doi:10.5194/isprs-archives-XLVIII-1-W2-2023-139-2023

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (Dec 2023)

ON THE ACCURACY OF YOLOV8-CNN REGARDING DETECTION OF HUMANS IN NADIR AERIAL IMAGES FOR SEARCH AND RESCUE APPLICATIONS

J. Berndt,
H. Meißner,
T. Kraft

Affiliations

J. Berndt: Institute of Optical Sensor Systems, German Aerospace Center, 12489 Berlin, Germany
H. Meißner: Institute of Optical Sensor Systems, German Aerospace Center, 12489 Berlin, Germany
T. Kraft: Institute of Optical Sensor Systems, German Aerospace Center, 12489 Berlin, Germany

DOI: https://doi.org/10.5194/isprs-archives-XLVIII-1-W2-2023-139-2023
Journal volume & issue: Vol. XLVIII-1-W2-2023
pp. 139 – 146

Abstract

Read online

The use of deep learning techniques especially in conjunction with convolutional neural networks (CNN) has attracted major attention of the remote sensing community. Main use cases are object detection, image classification and image segmentation. The paper will focus on object detection, specifically on detection of humans. In search and rescue applications it is common to map larger areas with downward facing cameras. However, there are many training data sets for CNNs showing oblique images which strongly differ from nadir aerial images used for real-time maps.To circumnavigate this issue, an unique data set was created. It solely contains nadir images at different ground sample distances (GSD) varying from one to five centimetres. Diversity of the training data is ensured through various flights using an unmanned aerial vehicle (UAV) at different locations. GSD dependency is valuable prior knowledge as it enhances the difficulty associated with human detection in aerial images. An image, depicting a human at one centimetre GSD contains much more information than the same human depicted in an image of three centimetres. That is one reason why networks trained on a variety of ground sample distances possibly struggle to detect humans reliably on a certain GSD. The unique data set consists of four subsets (divided by GSD). Each subset contains 1000 manually annotated humans, augmented by rotation and colour shift resulting in 12000 training samples used to train the new released YoloV8 CNN. The entire training and test process is unified to ensure comparable input conditions.

Published in The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISSN: 1682-1750 (Print); 2194-9034 (Online)
Publisher: Copernicus Publications
Country of publisher: Germany
LCC subjects: Technology: Engineering (General). Civil engineering (General): Applied optics. Photonics
Website: http://www.isprs.org/publications/archives.aspx

About the journal