International Journal of Information and Communication Technology Research (Mar 2023)
A Partial Method for Calculating CNN Networks Based On Loop Tiling
Abstract
Convolutional Neural Networks (CNNs) have been widely deployed in the fields of artificial intelligence and computer vision. In these applications, the CNN part is the most computationally intensive. When these applications are run in an embedded device, the embedded processor can hardly handle the processing. This paper implements loop tiling to explain how one can construct a lightweight, low-power, and efficient CNN hardware accelerator for embedded computing devices. This method breaks a large CNN engine into small CNN engines and calculates them by low hardware resources. Finally, the results of small CNN engines are added and concatenated to construct the large CNN output. Using this method, a small accelerator can be configured to run a wide range of large CNNs. A small accelerator with one layer is designed to evaluate our methodology. Our initial investigations show that based on our methodology, the constructed accelerator can run a modified version of MobileNetV1, 70 times per second.