Alexandria Engineering Journal (Apr 2025)
Object detection in real-time video surveillance using attention based transformer-YOLOv8 model
Abstract
Object detection plays a crucial role in various applications, including surveillance, autonomous driving, and industrial automation, where accurate and timely identification of objects is essential. This research proposes a novel framework that combines the YOLOv8 backbone network with an attention mechanism and a Transformer-based detection head, significantly enhancing object detection performance in real-time images and video. The incorporation of attention mechanisms refines feature extraction from complex scenes, enabling the model to focus on relevant regions within images. Using the integration of Transformer architecture, the model leverages long-range dependencies and global context, leading to more accurate bounding box predictions. The proposed system effectively processes real-time data, demonstrating superior classification performance with precision rates reaching 96.78 % and recall rates of 96.89 %. The mean average precision (mAP) is calculated at 89.67 %, showcasing the framework's robustness across various practical scenarios. The framework is developed to address challenges in object detection, such as detecting multiple objects in crowded environments and varying lighting conditions. The Python architecture supports the implementation of the proposed model. The Python architecture supports the implementation of the proposed model. The results section assesses the Attention Transformer-YOLOv8 model against established algorithms like Faster R-CNN, YOLOv3, YOLOv5n, and SSD, utilizing metrics.