IEEE Access (Jan 2024)
FPGA Implementation of a Fault-Tolerant Fused and Branched CNN Accelerator With Reconfigurable Capabilities
Abstract
The ImageNet moment was a turning point for Convolutional Neural Networks (CNNs), as it demonstrated their potential to revolutionize computer vision tasks. This triumph of CNNs has motivated solving even more complex problems involving multiple tasks from multiple data modalities. Conventionally, a single CNN accelerator has been optimized to perform just one task or multiple correlated tasks. This study presents a shared-layers approach that leverages the pattern-learning capabilities of CNNs to perform multiple uncorrelated tasks from different modalities using a single hardware accelerator. We overcame the challenge of data imbalance in multi-modal learning by synthetic data generation. We achieved an average classification accuracy above 90% on a single CNN accelerator, which would otherwise require three accelerators. Due to the reliability concerns imposed by transistor shrinking and aging, we extended the shared layers methodology and introduced a fault-tolerant CNN accelerator with reconfigurable capabilities supporting fault-tolerant (FT), high-performance (HP), and de-stress (DS) modes. FT mode provides high reliability against soft errors utilizing double/triple modular redundancy, HP mode offers peak performance of 0.979 TOPs using parallel execution, and DS mode reduces dynamic power consumption by up to 68.6% in clock-gated design and even more using a partial reconfiguration method, contributing to decelerating the aging process of the circuit. We have comprehensively evaluated two different CNN architectures (i.e., fused and branched), for three distinct tasks, in three different operating modes, based on accuracy, quantization, pruning, hardware resource utilization, power, energy, performance, and reliability.
Keywords