A Full Featured Configurable Accelerator for Object Detection With YOLO

Daniel Pestana; Pedro R. Miranda; Joao D. Lopes; Rui P. Duarte; Mario P. Vestias; Horacio C. Neto; Jose T. De Sousa

doi:10.1109/ACCESS.2021.3081818

IEEE Access (Jan 2021)

A Full Featured Configurable Accelerator for Object Detection With YOLO

Daniel Pestana,
Pedro R. Miranda,
Joao D. Lopes,
Rui P. Duarte,
Mario P. Vestias,
Horacio C. Neto,
Jose T. De Sousa

Affiliations

Daniel Pestana: INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
Pedro R. Miranda: INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
Joao D. Lopes: ORCiD; INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
Rui P. Duarte: ORCiD; INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
Mario P. Vestias: ORCiD; INESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Lisboa, Portugal
Horacio C. Neto: ORCiD; INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
Jose T. De Sousa: ORCiD; INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal

DOI: https://doi.org/10.1109/ACCESS.2021.3081818
Journal volume & issue: Vol. 9
pp. 75864 – 75877

Abstract

Read online

Object detection and classification is an essential task of computer vision. A very efficient algorithm for detection and classification is YOLO (You Look Only Once). We consider hardware architectures to run YOLO in real-time on embedded platforms. Designing a new dedicated accelerator for each new version of YOLO is not feasible given the fast delivery of new versions. This work’s primary goal is to design a configurable and scalable core for creating specific object detection and classification systems based on YOLO, targeting embedded platforms. The core accelerates the execution of all the algorithm steps, including pre-processing, model inference and post-processing. It considers a fixed-point format, linearised activation functions, batch-normalisation, folding, and a hardware structure that exploits most of the available parallelism in CNN processing. The proposed core is configured for real-time execution of YOLOv3-Tiny and YOLOv4-Tiny, integrated into a RISC-V-based system-on-chip architecture and prototyped in an UltraScale XCKU040 FPGA (Field Programmable Gate Array). The solution achieves a performance of 32 and 31 frames per second for YOLOv3-Tiny and YOLOv4-Tiny, respectively, with a 16-bit fixed-point format. Compared to previous proposals, it improves the frame rate at a higher performance efficiency. The performance, area efficiency and configurability of the proposed core enable the fast development of real-time YOLO-based object detectors on embedded systems.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords