Object detection in real-time video surveillance using attention based transformer-YOLOv8 model

Divya Nimma; Omaia Al-Omari; Rahul Pradhan; Zoirov Ulmas; R.V.V. Krishna; Ts. Yousef A.Baker El-Ebiary; Vuda Sreenivasa Rao

Alexandria Engineering Journal (Apr 2025)

Object detection in real-time video surveillance using attention based transformer-YOLOv8 model

Divya Nimma,
Omaia Al-Omari,
Rahul Pradhan,
Zoirov Ulmas,
R.V.V. Krishna,
Ts. Yousef A.Baker El-Ebiary,
Vuda Sreenivasa Rao

Affiliations

Divya Nimma: Computational Science, University of southern Mississippi, UMMC, USA; Corresponding author.
Omaia Al-Omari: Information Systems Department, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia
Rahul Pradhan: Department of Computer Engineering & Applications, GLA University, Mathura, India
Zoirov Ulmas: Artificial Intelligence Department at the Tashkent State University of Economics, Uzbekistan
R.V.V. Krishna: ECE Department, Aditya University, Surampalem, AP, India
Ts. Yousef A.Baker El-Ebiary: Faculty of Informatics and Computing, UniSZA University, Malaysia
Vuda Sreenivasa Rao: Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP 522302, India

Journal volume & issue: Vol. 118
pp. 482 – 495

Abstract

Read online

Object detection plays a crucial role in various applications, including surveillance, autonomous driving, and industrial automation, where accurate and timely identification of objects is essential. This research proposes a novel framework that combines the YOLOv8 backbone network with an attention mechanism and a Transformer-based detection head, significantly enhancing object detection performance in real-time images and video. The incorporation of attention mechanisms refines feature extraction from complex scenes, enabling the model to focus on relevant regions within images. Using the integration of Transformer architecture, the model leverages long-range dependencies and global context, leading to more accurate bounding box predictions. The proposed system effectively processes real-time data, demonstrating superior classification performance with precision rates reaching 96.78 % and recall rates of 96.89 %. The mean average precision (mAP) is calculated at 89.67 %, showcasing the framework's robustness across various practical scenarios. The framework is developed to address challenges in object detection, such as detecting multiple objects in crowded environments and varying lighting conditions. The Python architecture supports the implementation of the proposed model. The Python architecture supports the implementation of the proposed model. The results section assesses the Attention Transformer-YOLOv8 model against established algorithms like Faster R-CNN, YOLOv3, YOLOv5n, and SSD, utilizing metrics.

Published in Alexandria Engineering Journal

ISSN: 1110-0168 (Print); 2090-2670 (Online)
Publisher: Elsevier
Country of publisher: Egypt
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: http://www.journals.elsevier.com/alexandria-engineering-journal/

About the journal

Abstract

Keywords