Hadamard product-based in-memory computing design for floating point neural network training

Anjunyi Fan; Yihan Fu; Yaoyu Tao; Zhonghua Jin; Haiyue Han; Huiyu Liu; Yaojun Zhang; Bonan Yan; Yuchao Yang; Ru Huang

doi:10.1088/2634-4386/acbab9

Neuromorphic Computing and Engineering (Jan 2023)

Hadamard product-based in-memory computing design for floating point neural network training

Anjunyi Fan,
Yihan Fu,
Yaoyu Tao,
Zhonghua Jin,
Haiyue Han,
Huiyu Liu,
Yaojun Zhang,
Bonan Yan,
Yuchao Yang,
Ru Huang

Affiliations

Anjunyi Fan: Institute for Artificial Intelligence, Peking University , Beijing, People’s Republic of China; Beijing Advanced Innovation Center for Integrated Circuits, School of Integrated Circuits, Peking University , Beijing, People’s Republic of China
Yihan Fu: Institute for Artificial Intelligence, Peking University , Beijing, People’s Republic of China; Beijing Advanced Innovation Center for Integrated Circuits, School of Integrated Circuits, Peking University , Beijing, People’s Republic of China
Yaoyu Tao: Institute for Artificial Intelligence, Peking University , Beijing, People’s Republic of China; Beijing Advanced Innovation Center for Integrated Circuits, School of Integrated Circuits, Peking University , Beijing, People’s Republic of China
Zhonghua Jin: Pimchip Technology Co., Ltd , Beijing, People’s Republic of China
Haiyue Han: Pimchip Technology Co., Ltd , Beijing, People’s Republic of China
Huiyu Liu: Pimchip Technology Co., Ltd , Beijing, People’s Republic of China
Yaojun Zhang: Pimchip Technology Co., Ltd , Beijing, People’s Republic of China
Bonan Yan: ORCiD; Institute for Artificial Intelligence, Peking University , Beijing, People’s Republic of China; Beijing Advanced Innovation Center for Integrated Circuits, School of Integrated Circuits, Peking University , Beijing, People’s Republic of China
Yuchao Yang: ORCiD; Institute for Artificial Intelligence, Peking University , Beijing, People’s Republic of China; Beijing Advanced Innovation Center for Integrated Circuits, School of Integrated Circuits, Peking University , Beijing, People’s Republic of China; School of Electronic and Computer Engineering, Peking University , Shenzhen, People’s Republic of China; Center for Brain Inspired Intelligence, Chinese Institute for Brain Research (CIBR) , Beijing, People’s Republic of China
Ru Huang: Institute for Artificial Intelligence, Peking University , Beijing, People’s Republic of China; Beijing Advanced Innovation Center for Integrated Circuits, School of Integrated Circuits, Peking University , Beijing, People’s Republic of China

DOI: https://doi.org/10.1088/2634-4386/acbab9
Journal volume & issue: Vol. 3, no. 1
p. 014009

Abstract

Read online

Deep neural networks (DNNs) are one of the key fields of machine learning. It requires considerable computational resources for cognitive tasks. As a novel technology to perform computing inside/near memory units, in-memory computing (IMC) significantly improves computing efficiency by reducing the need for repetitive data transfer between the processing and memory units. However, prior IMC designs mainly focus on the acceleration for DNN inference. DNN training with the IMC hardware has rarely been proposed. The challenges lie in the requirement of DNN training for high precision (e.g. floating point (FP)) and various operations of tensors (e.g. inner and outer products). These challenges call for the IMC design with new features. This paper proposes a novel Hadamard product-based IMC design for FP DNN training. Our design consists of multiple compartments, which are the basic units for the matrix element-wise processing. We also develop BFloat16 post-processing circuits and fused adder trees, laying the foundation for IMC FP processing. Based on the proposed circuit scheme, we reformulate the back-propagation training algorithm for the convenience and efficiency of the IMC execution. The proposed design is implemented with commercial 28 nm technology process design kits and benchmarked with widely used neural networks. We model the influence of the circuit structural design parameters and provide an analysis framework for design space exploration. Our simulation validates that MobileNet training with the proposed IMC scheme saves $91.2\%$ in energy and $13.9\%$ in time versus the same task with NVIDIA GTX 3060 GPU. The proposed IMC design has a data density of 769.2 Kb mm ^−2 with the FP processing circuits included, showing a 3.5 × improvement than the prior FP IMC designs.

Published in Neuromorphic Computing and Engineering

ISSN: 2634-4386 (Online)
Publisher: IOP Publishing
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://iopscience.iop.org/journal/2634-4386

About the journal

Abstract

Keywords