Advances in Electrical and Computer Engineering (Nov 2024)
Comparative Analysis of the Robustness of 3-bit PoTQ and UQ and their Application in Post-training Quantization
Abstract
In this paper, we propose a 3-bit quantizer model whose step sizes are obtained by multiplying the smallest step size by successive powers of 2, hence the name PoTQ (Power of Two Quantizer). Despite the fact that this model is non-uniform, it is quite simple to design, similarly to the most exploited and simplest quantizer model, UQ (Uniform Quantizer). Referring to these similarities, as well as to the non-existence of robustness analysis of SQNR to changes in the variance of data being quantized, we conduct a comparative theoretical analysis for both 3-bit quantizer models. In addition, we provide experimental results of the application of both quantizer models in post-training quantization of MLP (Multilayer Perceptron) weights. To illustrate the importance of our robustness analysis, we provide results for the case with and without normalization of the MLP weights, corresponding to the matched and heavily mismatched scenarios. We show that 3-bit PoTQ provides greater robustness of SQNR compared to 3-bit UQ. We also show that PoTQ outperforms UQ in preserving the accuracy of the compressed MLP model. Due to its simple design and superior performance, we can anticipate that 3-bit PoTQ will be widely used likewise UQ.
Keywords