Applied Sciences (Sep 2024)
Energy-Efficient Joint Partitioning and Offloading for Delay-Sensitive CNN Inference in Edge Computing
Abstract
With the development of deep learning foundation model technology, the types of computing tasks have become more complex, and the computing resources and memory required for these tasks have also become more substantial. Since it has long been revealed that task offloading in cloud servers has many drawbacks, such as high communication delay and low security, task offloading is mostly carried out in the edge servers of the Internet of Things (IoT) network. However, edge servers in IoT networks are characterized by tight resource constraints and often the dynamic nature of data sources. Therefore, the question of how to perform task offloading of deep learning foundation model services on edge servers has become a new research topic. However, the existing task offloading methods either can not meet the requirements of massive CNN architecture or require a lot of communication overhead, leading to significant delays and energy consumption. In this paper, we propose a parallel partitioning method based on matrix convolution to partition foundation model inference tasks, which partitions large CNN inference tasks into subtasks that can be executed in parallel to meet the constraints of edge devices with limited hardware resources. Then, we model and mathematically express the problem of task offloading. In a multi-edge-server, multi-user, and multi-task edge-end system, we propose a task-offloading method that balances the tradeoff between delay and energy consumption. It adopts a greedy algorithm to optimize task-offloading decisions and terminal device transmission power to maximize the benefits of task offloading. Finally, extensive experiments verify the significant and extensive effectiveness of our algorithm.
Keywords