FPGA Implementation of a Fault-Tolerant Fused and Branched CNN Accelerator With Reconfigurable Capabilities

Rizwan Tariq Syed; Yanhua Zhao; Junchao Chen; Marko Andjelkovic; Markus Ulbricht; Milos Krstic

doi:10.1109/ACCESS.2024.3392240

IEEE Access (Jan 2024)

FPGA Implementation of a Fault-Tolerant Fused and Branched CNN Accelerator With Reconfigurable Capabilities

Rizwan Tariq Syed,
Yanhua Zhao,
Junchao Chen,
Marko Andjelkovic,
Markus Ulbricht,
Milos Krstic

Affiliations

Rizwan Tariq Syed: ORCiD; IHP—Leibniz-Institut für Innovative Mikroelektronik, Frankfurt (Oder), Germany
Yanhua Zhao: ORCiD; IHP—Leibniz-Institut für Innovative Mikroelektronik, Frankfurt (Oder), Germany
Junchao Chen: ORCiD; IHP—Leibniz-Institut für Innovative Mikroelektronik, Frankfurt (Oder), Germany
Marko Andjelkovic: ORCiD; IHP—Leibniz-Institut für Innovative Mikroelektronik, Frankfurt (Oder), Germany
Markus Ulbricht: ORCiD; IHP—Leibniz-Institut für Innovative Mikroelektronik, Frankfurt (Oder), Germany
Milos Krstic: ORCiD; IHP—Leibniz-Institut für Innovative Mikroelektronik, Frankfurt (Oder), Germany

DOI: https://doi.org/10.1109/ACCESS.2024.3392240
Journal volume & issue: Vol. 12
pp. 57847 – 57862

Abstract

Read online

The ImageNet moment was a turning point for Convolutional Neural Networks (CNNs), as it demonstrated their potential to revolutionize computer vision tasks. This triumph of CNNs has motivated solving even more complex problems involving multiple tasks from multiple data modalities. Conventionally, a single CNN accelerator has been optimized to perform just one task or multiple correlated tasks. This study presents a shared-layers approach that leverages the pattern-learning capabilities of CNNs to perform multiple uncorrelated tasks from different modalities using a single hardware accelerator. We overcame the challenge of data imbalance in multi-modal learning by synthetic data generation. We achieved an average classification accuracy above 90% on a single CNN accelerator, which would otherwise require three accelerators. Due to the reliability concerns imposed by transistor shrinking and aging, we extended the shared layers methodology and introduced a fault-tolerant CNN accelerator with reconfigurable capabilities supporting fault-tolerant (FT), high-performance (HP), and de-stress (DS) modes. FT mode provides high reliability against soft errors utilizing double/triple modular redundancy, HP mode offers peak performance of 0.979 TOPs using parallel execution, and DS mode reduces dynamic power consumption by up to 68.6% in clock-gated design and even more using a partial reconfiguration method, contributing to decelerating the aging process of the circuit. We have comprehensively evaluated two different CNN architectures (i.e., fused and branched), for three distinct tasks, in three different operating modes, based on accuracy, quantization, pruning, hardware resource utilization, power, energy, performance, and reliability.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords