Complex & Intelligent Systems (Dec 2024)
LDWLE: self-supervised driven low-light object detection framework
Abstract
Abstract Low-light object detection involves identifying and locating objects in images captured under poor lighting conditions. It plays a significant role in surveillance and security, night pedestrian recognition, and autonomous driving, showcasing broad application prospects. Most existing object detection algorithms and datasets are designed for normal lighting conditions, leading to a significant drop in detection performance when applied to low-light environments. To address this issue, we propose a Low-Light Detection with Low-Light Enhancement (LDWLE) framework. LDWLE is an encoder-decoder architecture where the encoder transforms the raw input data into a compact, abstract representation (encoding), and the decoder gradually generates the target output format from the representation produced by the encoder. Specifically, during training, low-light images are input into the encoder, which produces feature representations that are decoded by two separate decoders: an object detection decoder and a low-light image enhancement decoder. Both decoders share the same encoder and are trained jointly. Throughout the training process, the two decoders optimize each other, guiding the low-light image enhancement towards improvements that benefit object detection. If the input image is normally lit, it first passes through a low-light image conversion module to be transformed into a low-light image before being fed into the encoder. If the input image is already a low-light image, it is directly input into the encoder. During the testing phase, the model can be evaluated in the same way as a standard object detection algorithm. Compared to existing object detection algorithms, LDWLE can train a low-light robust object detection model using standard, normally lit object detection datasets. Additionally, LDWLE is a versatile training framework that can be implemented on most one-stage object detection algorithms. These algorithms typically consist of three components: the backbone, neck, and head. In this framework, the backbone functions as the encoder, while the neck and head form the object detection decoder. Extensive experiments on the COCO, VOC, and ExDark datasets have demonstrated the effectiveness of LDWLE in low-light object detection. In quantitative measurements, it achieves an AP of 25.5 and 38.4 on the synthetic datasets COCO-d and VOC-d, respectively, and achieves the best AP of 30.5 on the real-world dataset ExDark. In qualitative measurements, LDWLE can accurately detect most objects on both public real-world low-light datasets and self-collected ones, demonstrating strong adaptability to varying lighting conditions and multi-scale objects.
Keywords