Acceleration of LSTM With Structured Pruning Method on FPGA

Shaorun Wang; Peng Lin; Ruihan Hu; Hao Wang; Jin He; Qijun Huang; Sheng Chang

doi:10.1109/ACCESS.2019.2917312

IEEE Access (Jan 2019)

Acceleration of LSTM With Structured Pruning Method on FPGA

Shaorun Wang,
Peng Lin,
Ruihan Hu,
Hao Wang,
Jin He,
Qijun Huang,
Sheng Chang

Affiliations

Shaorun Wang: School of Physics and Technology, Wuhan University, Wuhan, China
Peng Lin: School of Physics and Technology, Wuhan University, Wuhan, China
Ruihan Hu: School of Physics and Technology, Wuhan University, Wuhan, China
Hao Wang: School of Physics and Technology, Wuhan University, Wuhan, China
Jin He: ORCiD; School of Physics and Technology, Wuhan University, Wuhan, China
Qijun Huang: School of Physics and Technology, Wuhan University, Wuhan, China
Sheng Chang: ORCiD; School of Physics and Technology, Wuhan University, Wuhan, China

DOI: https://doi.org/10.1109/ACCESS.2019.2917312
Journal volume & issue: Vol. 7
pp. 62930 – 62937

Abstract

Read online

This paper focuses on accelerating long short-term memory (LSTM), which is one of the popular types of recurrent neural networks (RNNs). Because of the large number of weight memory accesses and high computation complexity with the cascade-dependent structure, it is a big challenge to efficiently implement the LSTM on field-programmable gate arrays (FPGAs). To speed up the inference on FPGA, considering its limited resource, a structured pruning method that can not only reduce the LSTM model's size without loss of prediction accuracy but also eliminate the imbalance computation and irregular memory accesses is proposed. Besides that, the hardware architecture of the compressed LSTM is designed to pursue high performance. As a result, the implementation of an LSTM language module on Stratix V GXA7 FPGA can achieve 85.2 GOPS directly on the sparse LSTM network by our method, corresponding to 681.6-GOPS effective throughput on the dense one, which shows that the proposed structured pruning algorithm makes 7.82 times speedup when only 1/8 parameters are reserved. We hope that our method can give an efficient way to accelerate the LSTM and similar recurrent neural networks when the resource-limited environment is emphasized.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords