IEEE Access (Jan 2021)

p-im2col: Simple Yet Efficient Convolution Algorithm With Flexibly Controlled Memory Overhead

  • Anton V. Trusov,
  • Elena E. Limonova,
  • Dmitry P. Nikolaev,
  • Vladimir V. Arlazarov

DOI
https://doi.org/10.1109/ACCESS.2021.3135690
Journal volume & issue
Vol. 9
pp. 168162 – 168184

Abstract

Read online

Convolution is the most time-consuming operation in modern deep artificial neural networks, so its performance is crucial for fast inference. One of the standard approaches to fast convolution computation is to use GeMM-based convolution algorithms relying on efficient general matrix multiplication (GeMM) from optimized BLAS libraries. However, commonly used GeMM-based algorithms may cause significant memory overhead or avoid it only at the cost of worse performance. In this paper, we propose a novel convolution algorithm, p-im2col, based on a well-known im2col algorithm that avoids memory overhead by splitting a single multiplication of a large matrix into several multiplications of smaller matrices. We theoretically and experimentally compare our algorithm with two other GeMM-based algorithms: im2col, which is widely used as a baseline, and the memory-efficient kn2row-aa. We measure the inference time of these algorithms on central processing units of x86, x86_64, ARM, and MIPS architectures for a large set of convolutional parameters. The proposed algorithm demonstrates a speedup over im2col and kn2row-aa in a number of cases and a significant reduction in additional memory requirements compared to im2col. Based on our experiments, we present a new convolution algorithm selection scheme that considers memory restrictions, CPU architecture, and convolutional parameters and provides a noticeable advantage over each particular algorithm.

Keywords