IEEE Access (Jan 2024)
Optimizing RGB Convolution on Cortex-M7 MCU: Approaches for Performance Improvement
Abstract
With the advancement of technologies such as the Internet of Things (IoT), deploying Deep Neural Network (DNN) models on edge devices like microcontrollers has become one of the most intriguing areas in the field of AI. However, seamlessly deploying and operating these models in extremely constrained environments poses significant challenges. Despite these limitations, microcontrollers are often utilized due to their ability to operate with low power consumption for specific tasks. Deep learning models typically consist of numerous computationally intensive convolutional layers, with the RGB channels contributing to the highest peak memory usage and overhead. In this paper, we propose an efficient CNN kernel algorithm that enables deep learning models to operate effectively on resource-constrained microcontroller architectures through network compression. By focusing on low-power ARM Cortex-M7 based systems, we analyze the balance between key metrics such as memory consumption, execution time, and speed in relation to various parameters through exploration. Our proposed CNN algorithm reduces temporary memory overhead by at least the square of the kernel size compared to the im2col-based convolution implementation. Furthermore, it demonstrates a speedup of 1.15 times for CNNs, even with modifications to a single layer.
Keywords