Improved Architecture and Training Strategies of YOLOv7 for Remote Sensing Image Object Detection

Dewei Zhao; Faming Shao; Qiang Liu; Heng Zhang; Zihan Zhang; Li Yang

doi:10.3390/rs16173321

Remote Sensing (Sep 2024)

Improved Architecture and Training Strategies of YOLOv7 for Remote Sensing Image Object Detection

Dewei Zhao,
Faming Shao,
Qiang Liu,
Heng Zhang,
Zihan Zhang,
Li Yang

Affiliations

Dewei Zhao: College of Field Engineering, Army Engineering University of PLA, Nanjing 210007, China
Faming Shao: College of Field Engineering, Army Engineering University of PLA, Nanjing 210007, China
Qiang Liu: College of Field Engineering, Army Engineering University of PLA, Nanjing 210007, China
Heng Zhang: College of Field Engineering, Army Engineering University of PLA, Nanjing 210007, China
Zihan Zhang: College of Field Engineering, Army Engineering University of PLA, Nanjing 210007, China
Li Yang: College of Field Engineering, Army Engineering University of PLA, Nanjing 210007, China

DOI: https://doi.org/10.3390/rs16173321
Journal volume & issue: Vol. 16, no. 17
p. 3321

Abstract

Read online

The technology for object detection in remote sensing images finds extensive applications in production and people’s lives, and improving the accuracy of image detection is a pressing need. With that goal, this paper proposes a range of improvements, rooted in the widely used YOLOv7 algorithm, after analyzing the requirements and difficulties in the detection of remote sensing images. Specifically, we strategically remove some standard convolution and pooling modules from the bottom of the network, adopting stride-free convolution to minimize the loss of information for small objects in the transmission. Simultaneously, we introduce a new, more efficient attention mechanism module for feature extraction, significantly enhancing the network’s semantic extraction capabilities. Furthermore, by adding multiple cross-layer connections in the network, we more effectively utilize the feature information of each layer in the backbone network, thereby enhancing the network’s overall feature extraction capability. During the training phase, we introduce an auxiliary network to intensify the training of the underlying network and adopt a new activation function and a more efficient loss function to ensure more effective gradient feedback, thereby elevating the network performance. In the experimental results, our improved network achieves impressive mAP scores of 91.2% and 80.8% on the DIOR and DOTA version 1.0 remote sensing datasets, respectively. These represent notable improvements of 4.5% and 7.0% over the original YOLOv7 network, significantly enhancing the efficiency of detecting small objects in particular.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords