IEEE Access (Jan 2021)

An Efficient im2row-Based Fast Convolution Algorithm for ARM Cortex-M MCUs

  • Peng Wang,
  • Xiaoqin Wang,
  • Rui Luo,
  • Dingyi Wang,
  • Mengjie Luo,
  • Shushan Qiao,
  • Yumei Zhou

DOI
https://doi.org/10.1109/ACCESS.2021.3110827
Journal volume & issue
Vol. 9
pp. 124384 – 124395

Abstract

Read online

With the rise of IoT and edge computing, deploying neural networks (NNs) on low-power edge computing devices is drawing more and more attention. In NNs, convolutional layers take up the majority of the computing cycles, especially when NNs are implemented on ARM processors. Therefore, it is necessary to optimize the convolutional implementation on ARM Cortex-M MCUs. This paper proposes an efficient im2row-based fast convolution algorithm with two innovations. First, a novel im2row method for reusing the data of adjacent convolutional windows is presented. This method utilizes a reusable im2row buffer for data reuse, significantly reducing the amount of data copied during im2row and improving efficiency. Second, in algorithm implementation, a q7_t to q15_t data type extension technique that avoids data reordering is employed. This technique eliminates data reordering instructions, thus reducing the runtime of the algorithm. We evaluate our algorithm in separate convolutional layers and NNs. The results for convolutional layers show that, compared to baseline, the proposed algorithm speeds up the convolutional layer by an average of $1.42\times $ , and the maximum speedup is up to $2.9\times $ . Experiments on different NNs demonstrate that our algorithm can speed up the overall NN by up to $2.15\times $ .

Keywords