Energy-Efficient Joint Partitioning and Offloading for Delay-Sensitive CNN Inference in Edge Computing

Zhiyong Zha; Yifei Yang; Yongjun Xia; Zhaoyi Wang; Bin Luo; Kaihong Li; Chenkai Ye; Bo Xu; Kai Peng

doi:10.3390/app14198656

Applied Sciences (Sep 2024)

Energy-Efficient Joint Partitioning and Offloading for Delay-Sensitive CNN Inference in Edge Computing

Zhiyong Zha,
Yifei Yang,
Yongjun Xia,
Zhaoyi Wang,
Bin Luo,
Kaihong Li,
Chenkai Ye,
Bo Xu,
Kai Peng

Affiliations

Zhiyong Zha: State Grid Information Telecommunication Co., Ltd., Wuhan 430048, China
Yifei Yang: Hubei Key Laboratory of Smart Internet Technology, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
Yongjun Xia: Hubei Huazhong Electric Power Technology Development Co., Ltd., Wuhan 430079, China
Zhaoyi Wang: Hubei Key Laboratory of Smart Internet Technology, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
Bin Luo: Hubei Huazhong Electric Power Technology Development Co., Ltd., Wuhan 430079, China
Kaihong Li: School of Electronic Information Science and Technology, Wuhan University, Wuhan 430072, China
Chenkai Ye: Hubei Key Laboratory of Smart Internet Technology, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
Bo Xu: Hubei Key Laboratory of Smart Internet Technology, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
Kai Peng: Hubei Key Laboratory of Smart Internet Technology, School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China

DOI: https://doi.org/10.3390/app14198656
Journal volume & issue: Vol. 14, no. 19
p. 8656

Abstract

Read online

With the development of deep learning foundation model technology, the types of computing tasks have become more complex, and the computing resources and memory required for these tasks have also become more substantial. Since it has long been revealed that task offloading in cloud servers has many drawbacks, such as high communication delay and low security, task offloading is mostly carried out in the edge servers of the Internet of Things (IoT) network. However, edge servers in IoT networks are characterized by tight resource constraints and often the dynamic nature of data sources. Therefore, the question of how to perform task offloading of deep learning foundation model services on edge servers has become a new research topic. However, the existing task offloading methods either can not meet the requirements of massive CNN architecture or require a lot of communication overhead, leading to significant delays and energy consumption. In this paper, we propose a parallel partitioning method based on matrix convolution to partition foundation model inference tasks, which partitions large CNN inference tasks into subtasks that can be executed in parallel to meet the constraints of edge devices with limited hardware resources. Then, we model and mathematically express the problem of task offloading. In a multi-edge-server, multi-user, and multi-task edge-end system, we propose a task-offloading method that balances the tradeoff between delay and energy consumption. It adopts a greedy algorithm to optimize task-offloading decisions and terminal device transmission power to maximize the benefits of task offloading. Finally, extensive experiments verify the significant and extensive effectiveness of our algorithm.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords