IEEE Access (Jan 2024)

Mixed-Precision Neural Architecture Search and Dynamic Split Point Selection for Split Computing

  • Naoki Nagamatsu,
  • Kenshiro Ise,
  • Yuko Hara

DOI
https://doi.org/10.1109/ACCESS.2024.3455251
Journal volume & issue
Vol. 12
pp. 137439 – 137454

Abstract

Read online

Split computing (SC) is an emerging technique to perform the inference task of deep neural network (DNN) models using both mobile devices and cloud/edge servers in a hybrid manner. To improve the end-to-end inference time over the network, SC splits a single DNN model into a head model and a tail model for deployment on the mobile device and the server, respectively. A further extension of SC, referred to as dynamic SC (DSC), determines the split point dynamically depending on various network conditions such as bandwidth. This article proposes a DNN optimization approach for DSC based on mixed-precision quantization. Given a vanilla DNN model, our work optimizes the given model in two steps. First, a DSC-aware mixed-precision layer-wise quantization is performed statically via neural architecture search to generate multiple potential split points. Then a bitwidth-wise DSC algorithm is applied to dynamically select one optimal split point among the candidate points. Our evaluation on the EfficientNet-B0 and EfficientNet-B3 architectures demonstrated that our work provides more effective split points than existing quantization works while mitigating the degradation of inference accuracy. In terms of the end-to-end inference time, on the EfficientNet-B0 (B3) architecture, our work obtained relative average and maximum gains of 9.12% (4.05%) and 27.49% (12.42%), respectively, over a state-of-the-art mix-precision quantization work while achieving comparable accuracy.

Keywords