Efficient memristor accelerator for transformer self-attention functionality

Meriem Bettayeb; Yasmin Halawani; Muhammad Umair Khan; Hani Saleh; Baker Mohammad

doi:10.1038/s41598-024-75021-z

Scientific Reports (Oct 2024)

Efficient memristor accelerator for transformer self-attention functionality

Meriem Bettayeb,
Yasmin Halawani,
Muhammad Umair Khan,
Hani Saleh,
Baker Mohammad

Affiliations

Meriem Bettayeb: System-on-Chip Lab, Computer and Information Engineering, Khalifa University
Yasmin Halawani: College of Engineering and IT, University of Dubai
Muhammad Umair Khan: System-on-Chip Lab, Computer and Information Engineering, Khalifa University
Hani Saleh: System-on-Chip Lab, Computer and Information Engineering, Khalifa University
Baker Mohammad: System-on-Chip Lab, Computer and Information Engineering, Khalifa University

DOI: https://doi.org/10.1038/s41598-024-75021-z
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 15

Abstract

Read online

Abstract The adoption of transformer networks has experienced a notable surge in various AI applications. However, the increased computational complexity, stemming primarily from the self-attention mechanism, parallels the manner in which convolution operations constrain the capabilities and speed of convolutional neural networks (CNNs). The self-attention algorithm, specifically the matrix-matrix multiplication (MatMul) operations, demands a substantial amount of memory and computational complexity, thereby restricting the overall performance of the transformer. This paper introduces an efficient hardware accelerator for the transformer network, leveraging memristor-based in-memory computing. The design targets the memory bottleneck associated with MatMul operations in the self-attention process, utilizing approximate analog computation and the highly parallel computations facilitated by the memristor crossbar architecture. Remarkably, this approach resulted in a reduction of approximately 10 times in the number of multiply-accumulate (MAC) operations in transformer networks, while maintaining 95.47% accuracy for the MNIST dataset, as validated by a comprehensive circuit simulator employing NeuroSim 3.0. Simulation outcomes indicate an area utilization of 6895.7 $$\mu m^2$$ μ m 2 , a latency of 15.52 seconds, an energy consumption of 3 mJ, and a leakage power of 59.55 $$\mu W$$ μ W . The methodology outlined in this paper represents a substantial stride towards a hardware-friendly transformer architecture for edge devices, poised to achieve real-time performance.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal