MobilePrune: Neural Network Compression via ℓ0 Sparse Group Lasso on the Mobile System

Yubo Shao; Kaikai Zhao; Zhiwen Cao; Zhehao Peng; Xingang Peng; Pan Li; Yijie Wang; Jianzhu Ma

doi:10.3390/s22114081

Sensors (May 2022)

MobilePrune: Neural Network Compression via ℓ0 Sparse Group Lasso on the Mobile System

Yubo Shao,
Kaikai Zhao,
Zhiwen Cao,
Zhehao Peng,
Xingang Peng,
Pan Li,
Yijie Wang,
Jianzhu Ma

Affiliations

Yubo Shao: Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
Kaikai Zhao: Department of Computer Science, Indiana University at Bloomington, Bloomington, IN 47405, USA
Zhiwen Cao: Department of Computer Graphics, Purdue University, West Lafayette, IN 47907, USA
Zhehao Peng: Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
Xingang Peng: Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100190, China
Pan Li: Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
Yijie Wang: Department of Computer Science, Indiana University at Bloomington, Bloomington, IN 47405, USA
Jianzhu Ma: Institute for Artificial Intelligence, Peking University, Beijing 100871, China

DOI: https://doi.org/10.3390/s22114081
Journal volume & issue: Vol. 22, no. 11
p. 4081

Abstract

Read online

It is hard to directly deploy deep learning models on today’s smartphones due to the substantial computational costs introduced by millions of parameters. To compress the model, we develop an ℓ0-based sparse group lasso model called MobilePrune which can generate extremely compact neural network models for both desktop and mobile platforms. We adopt group lasso penalty to enforce sparsity at the group level to benefit General Matrix Multiply (GEMM) and develop the very first algorithm that can optimize the ℓ0 norm in an exact manner and achieve the global convergence guarantee in the deep learning context. MobilePrune also allows complicated group structures to be applied on the group penalty (i.e., trees and overlapping groups) to suit DNN models with more complex architectures. Empirically, we observe the substantial reduction of compression ratio and computational costs for various popular deep learning models on multiple benchmark datasets compared to the state-of-the-art methods. More importantly, the compression models are deployed on the android system to confirm that our approach is able to achieve less response delay and battery consumption on mobile phones.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords