Mixed-Precision Neural Architecture Search and Dynamic Split Point Selection for Split Computing

Naoki Nagamatsu; Kenshiro Ise; Yuko Hara

doi:10.1109/ACCESS.2024.3455251

IEEE Access (Jan 2024)

Mixed-Precision Neural Architecture Search and Dynamic Split Point Selection for Split Computing

Naoki Nagamatsu,
Kenshiro Ise,
Yuko Hara

Affiliations

Naoki Nagamatsu: Department of Information and Communications, Tokyo Institute of Technology, Tokyo, Japan
Kenshiro Ise: Department of Information and Communications, Tokyo Institute of Technology, Tokyo, Japan
Yuko Hara: ORCiD; Department of Information and Communications, Tokyo Institute of Technology, Tokyo, Japan

DOI: https://doi.org/10.1109/ACCESS.2024.3455251
Journal volume & issue: Vol. 12
pp. 137439 – 137454

Abstract

Read online

Split computing (SC) is an emerging technique to perform the inference task of deep neural network (DNN) models using both mobile devices and cloud/edge servers in a hybrid manner. To improve the end-to-end inference time over the network, SC splits a single DNN model into a head model and a tail model for deployment on the mobile device and the server, respectively. A further extension of SC, referred to as dynamic SC (DSC), determines the split point dynamically depending on various network conditions such as bandwidth. This article proposes a DNN optimization approach for DSC based on mixed-precision quantization. Given a vanilla DNN model, our work optimizes the given model in two steps. First, a DSC-aware mixed-precision layer-wise quantization is performed statically via neural architecture search to generate multiple potential split points. Then a bitwidth-wise DSC algorithm is applied to dynamically select one optimal split point among the candidate points. Our evaluation on the EfficientNet-B0 and EfficientNet-B3 architectures demonstrated that our work provides more effective split points than existing quantization works while mitigating the degradation of inference accuracy. In terms of the end-to-end inference time, on the EfficientNet-B0 (B3) architecture, our work obtained relative average and maximum gains of 9.12% (4.05%) and 27.49% (12.42%), respectively, over a state-of-the-art mix-precision quantization work while achieving comparable accuracy.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords