Applied Sciences (Aug 2024)
Advances in the Neural Network Quantization: A Comprehensive Review
Abstract
Artificial intelligence technologies based on deep convolutional neural networks and large language models have made significant breakthroughs in many tasks, such as image recognition, target detection, semantic segmentation, and natural language processing, but also face a conflict between the high computational capacity of the algorithms and limited deployment resources. Quantization, which converts floating-point neural networks into low-bit-width integer networks, is an important and essential technique for efficient deployment and cost reduction in edge computing. This paper analyzes various existing quantization methods, showcases the deployment accuracy of advanced techniques, and discusses the future challenges and trends in this domain.
Keywords