p-im2col: Simple Yet Efficient Convolution Algorithm With Flexibly Controlled Memory Overhead

Anton V. Trusov; Elena E. Limonova; Dmitry P. Nikolaev; Vladimir V. Arlazarov

doi:10.1109/ACCESS.2021.3135690

IEEE Access (Jan 2021)

p-im2col: Simple Yet Efficient Convolution Algorithm With Flexibly Controlled Memory Overhead

Anton V. Trusov,
Elena E. Limonova,
Dmitry P. Nikolaev,
Vladimir V. Arlazarov

Affiliations

Anton V. Trusov: ORCiD; Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Elena E. Limonova: ORCiD; Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Dmitry P. Nikolaev: ORCiD; Smart Engines Service LLC, Moscow, Russia
Vladimir V. Arlazarov: ORCiD; Smart Engines Service LLC, Moscow, Russia

DOI: https://doi.org/10.1109/ACCESS.2021.3135690
Journal volume & issue: Vol. 9
pp. 168162 – 168184

Abstract

Read online

Convolution is the most time-consuming operation in modern deep artificial neural networks, so its performance is crucial for fast inference. One of the standard approaches to fast convolution computation is to use GeMM-based convolution algorithms relying on efficient general matrix multiplication (GeMM) from optimized BLAS libraries. However, commonly used GeMM-based algorithms may cause significant memory overhead or avoid it only at the cost of worse performance. In this paper, we propose a novel convolution algorithm, p-im2col, based on a well-known im2col algorithm that avoids memory overhead by splitting a single multiplication of a large matrix into several multiplications of smaller matrices. We theoretically and experimentally compare our algorithm with two other GeMM-based algorithms: im2col, which is widely used as a baseline, and the memory-efficient kn2row-aa. We measure the inference time of these algorithms on central processing units of x86, x86_64, ARM, and MIPS architectures for a large set of convolutional parameters. The proposed algorithm demonstrates a speedup over im2col and kn2row-aa in a number of cases and a significant reduction in additional memory requirements compared to im2col. Based on our experiments, we present a new convolution algorithm selection scheme that considers memory restrictions, CPU architecture, and convolutional parameters and provides a noticeable advantage over each particular algorithm.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords