Zhejiang Daxue xuebao. Lixue ban (Jul 2024)
Design of a sparse convolutional cardiac arrhythmia classification accelerator for wearable devices(针对可穿戴设备的稀疏化卷积心律失常分类加速器设计)
Abstract
In recent years, convolutional neural networks (CNN) have gained significant attention in the research of electrocardiogram (ECG) signal diagnosis due to their excellent detection and recognition capabilities. However, the large size of parameters and computational requirements of CNN models make it challenging to deploy them on resource-constrained wearable embedded devices. Pruning techniques can be used to compress the network size, but the irregular distribution of weights after sparsity pruning still brings problems in efficiently processing non-zero data in the pruned network model. To address this issue, an efficient sparse convolutional cardiac arrhythmia classification accelerator is designed. Firstly, a sparse compressed dataflow approach using multi-batch partitioning is proposed to improve computational efficiency. Secondly, a sparse-aware processing element (PE) array is designed to perform convolution operations, addressing the issue of load imbalance caused by pruning. FPGA simulation results demonstrate that the designed sparse convolution accelerator achieves a throughput of 53.84 billion operations per second (GOP·s-1) at the clock frequency of 200 MHz, with power consumption of 0.263 W and energy efficiency of 204.72 GOP·W-1. It achieves a 5-class classification accuracy of 98%. Compared to previous sparse accelerator designs, it can achieve a maximum acceleration of 2.0-2.4 times and an energy efficiency improvement of 4.5-6.1 times, making it suitable for wearable ECG classification devices.(近年来,卷积神经网络(convolutional neural networks,CNN)因其优异的检测与识别能力,在心电图(electrocardiogram,ECG)信号的诊断研究中备受瞩目。然而,CNN模型因参数规模和算力需求较大,难以用于资源受限的可穿戴嵌入式设备。虽然使用剪枝技术可压缩网络规模,但稀疏化剪枝后权重的不规则分布使修剪后的网络模型依然存在无法高效处理非零数据的问题。为此,设计了一种高效的稀疏化卷积心律失常分类加速器。首先,采用多批次划分的稀疏压缩数据流方法提高计算效率;其次,设计了稀疏感知的计算阵列(processing element,PE)完成卷积运算,解决剪枝带来的负载不平衡问题。在现场可编程门阵列(field programmable gate array,FPGA)平台上的仿真结果显示,所设计的稀疏化卷积加速器在200 MHz时钟频率下的吞吐率为53.84 GOP·s-1、平均功耗为0.263 W、能效比为204.72 GOP·W-1、五分类准确率达98%,相较于以往的稀疏化加速器,最高可以实现2.0~2.4倍的加速、4.5~6.1倍的能效比提升,适用于可穿戴的ECG分类设备。)
Keywords