The Possibility of Combining and Implementing Deep Neural Network Compression Methods

Bratislav Predić; Uroš Vukić; Muzafer Saračević; Darjan Karabašević; Dragiša Stanujkić

doi:10.3390/axioms11050229

Axioms (May 2022)

The Possibility of Combining and Implementing Deep Neural Network Compression Methods

Bratislav Predić,
Uroš Vukić,
Muzafer Saračević,
Darjan Karabašević,
Dragiša Stanujkić

Affiliations

Bratislav Predić: Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia
Uroš Vukić: Faculty of Electronic Engineering, University of Niš, Aleksandra Medvedeva 14, 18000 Niš, Serbia
Muzafer Saračević: Department of Computer Sciences, University of Novi Pazar, Dimitrija Tucovića bb, 36300 Novi Pazar, Serbia
Darjan Karabašević: Faculty of Applied Management, Economics and Finance, University Business Academy in Novi Sad, Jevrejska 24, 11000 Belgrade, Serbia
Dragiša Stanujkić: Technical Faculty in Bor, University of Belgrade, Vojske Jugoslavije 12, 19210 Bor, Serbia

DOI: https://doi.org/10.3390/axioms11050229
Journal volume & issue: Vol. 11, no. 5
p. 229

Abstract

Read online

In the paper, the possibility of combining deep neural network (DNN) model compression methods to achieve better compression results was considered. To compare the advantages and disadvantages of each method, all methods were applied to the ResNet18 model for pretraining to the NCT-CRC-HE-100K dataset while using CRC-VAL-HE-7K as the validation dataset. In the proposed method, quantization, pruning, weight clustering, QAT (quantization-aware training), preserve cluster QAT (hereinafter PCQAT), and distillation were performed for the compression of ResNet18. The final evaluation of the obtained models was carried out on a Raspberry Pi 4 device using the validation dataset. The greatest model compression result on the disk was achieved by applying the PCQAT method, whose application led to a reduction in size of the initial model by as much as 45 times, whereas the greatest model acceleration result was achieved via distillation on the MobileNetV2 model. All methods led to the compression of the initial size of the model, with a slight loss in the model accuracy or an increase in the model accuracy in the case of QAT and weight clustering. INT8 quantization and knowledge distillation also led to a significant decrease in the model execution time.

Published in Axioms

ISSN: 2075-1680 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/axioms

About the journal

Abstract

Keywords