An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device

Xin Zhou; Wei Xie; Han Zhou; Yongjing Cheng; Ximing Wang; Yun Ren; Shandong Yuan; Liuwen Li

doi:10.1109/ACCESS.2024.3437663

IEEE Access (Jan 2024)

An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device

Xin Zhou,
Wei Xie,
Han Zhou,
Yongjing Cheng,
Ximing Wang,
Yun Ren,
Shandong Yuan,
Liuwen Li

Affiliations

Xin Zhou: ORCiD; College of Information and Communication, National University of Defense Technology, Wuhan, Hubei, China
Wei Xie: College of Information and Communication, National University of Defense Technology, Wuhan, Hubei, China
Han Zhou: College of Information and Communication, National University of Defense Technology, Wuhan, Hubei, China
Yongjing Cheng: College of Information and Communication, National University of Defense Technology, Wuhan, Hubei, China
Ximing Wang: ORCiD; College of Information and Communication, National University of Defense Technology, Wuhan, Hubei, China
Yun Ren: College of Information and Communication, National University of Defense Technology, Wuhan, Hubei, China
Shandong Yuan: College of Information and Communication, National University of Defense Technology, Wuhan, Hubei, China
Liuwen Li: ORCiD; College of Information and Communication, National University of Defense Technology, Wuhan, Hubei, China

DOI: https://doi.org/10.1109/ACCESS.2024.3437663
Journal volume & issue: Vol. 12
pp. 106579 – 106592

Abstract

Read online

Recently, the combination of convolutional neural network (CNN) and long short-term memory (LSTM) exhibits better performance than single network architecture. Most of these studies connect LSTM networks behind CNNs. When operating on hardware, the current design of CNN-LSTM is similar to a pipeline architecture. However, the classic structure lead to a feature loss when data is sent to LSTM since CNN is not good at extracting temporal features. At the same time, as the depth and scale increases, it will bring a huge amount of computation, which makes hardware implementation difficult. Based on that, a parallel CNN-LSTM architecture is proposed, in which two networks extract features from the input data synchronously, being proven to be more effective than classical CNN-LSTM. This paper designs a parallel CNN-LSTM computing device based on FPGA. The device is divided into control unit and operation unit. Control stream and data stream transport between the two units, ensuring the proper running of the device. A highly parallel multi-channel convolution layer and pooling layer are designed to improve the calculation efficiency. A 4-stage pipeline structure is adopted to implement the LSTM part. This paper makes full use of on-chip BRAM to design a look-up table for activation function approximation, reducing the resource consumption by 95% compared with the traditional polynomial approximation. Finally, we verify our device under cooperative spectrum sensing (CSS) and handwritten classification scenarios. Proposed device reaches higher accuracy in two scenarios compared with classic CNN-LSTM structure as well as faster calculating speed, and the overall project power is limited below 2W. The scalability and limitation of this computing device are also discussed.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords