An Empirical Evaluation of Enhanced Performance Softmax Function in Deep Learning

Sumiran Mehra; Gopal Raut; Ribhu Das Purkayastha; Santosh Kumar Vishvakarma; Anton Biasizzo

doi:10.1109/ACCESS.2023.3265327

IEEE Access (Jan 2023)

An Empirical Evaluation of Enhanced Performance Softmax Function in Deep Learning

Sumiran Mehra,
Gopal Raut,
Ribhu Das Purkayastha,
Santosh Kumar Vishvakarma,
Anton Biasizzo

Affiliations

Sumiran Mehra: Department of Electrical Engineering, Indian Institute of Technology Indore, Indore, India
Gopal Raut: Department of Electrical Engineering, Indian Institute of Technology Indore, Indore, India
Ribhu Das Purkayastha: Department of Electrical Engineering, Indian Institute of Technology Indore, Indore, India
Santosh Kumar Vishvakarma: ORCiD; Department of Electrical Engineering, Indian Institute of Technology Indore, Indore, India
Anton Biasizzo: ORCiD; Jožef Stefan Institute, Ljubljana, Slovenia

DOI: https://doi.org/10.1109/ACCESS.2023.3265327
Journal volume & issue: Vol. 11
pp. 34912 – 34924

Abstract

Read online

This article present a highly efficient and performance-enhanced Softmax Function (SF) designed for a deep neural network accelerator. The SF is an essential component of deep learning models, primarily used in the classification layer, and also in hidden layers of advanced neural networks like Transformer and Capsule networks. The primary challenge of designing an efficient hardware architecture for SF is the complex exponential and division computational sub-blocks. To address this challenge, a hardware-optimized pipelined CORDIC-based architecture is proposed, leveraging the mutual exclusivity of the CO-ordinate Rotational DIgital Computer (CORDIC) algorithm, designed for enhanced throughput, area, and power. To maintain good accuracy in deep learning models, the proposed SF design undergoes a Pareto study that evaluates the variation of accuracy concerning the number of pipeline stages. The proposed design is quantized to 16-bit precision, and inference accuracy is validated for different datasets. The SF is prototyped using Xilinx Zynq FPGA, operating at 685MHz, and ASIC implementation is performed for 45nm technology node at 5GHz of maximum operating frequency. The design achieves a validation accuracy loss of less than 2% while reducing silicon area and Energy-Delay-Product (EDP) by $12\times $ . Post-synthesis simulation results indicate that the proposed design outperforms state-of-the-art architectures, achieving $3\times $ better performance in terms of area, power, and logic delay.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords