Complex & Intelligent Systems (Aug 2024)
A novel iteration scheme with conjugate gradient for faster pruning on transformer models
Abstract
Abstract Pre-trained models based on the Transformer architecture have significantly advanced research within the domain of Natural Language Processing (NLP) due to their superior performance and extensive applicability across multiple technological sectors. Despite these advantages, there is a significant challenge in optimizing these models for more efficient deployment. To be concrete, the existing post-training pruning frameworks of transformer models suffer from inefficiencies in the crucial stage of pruning accuracy recovery, which impacts the overall pruning efficiency. To address this issue, this paper introduces a novel and efficient iteration scheme with conjugate gradient in the pruning recovery stage. By constructing a series of conjugate iterative directions, this approach ensures each optimization step is orthogonal to the previous ones, which effectively reduces redundant explorations of the search space. Consequently, each iteration progresses effectively towards the global optimum, thereby significantly enhancing search efficiency. The conjugate gradient-based faster-pruner reduces the time expenditure of the pruning process while maintaining accuracy, demonstrating a high degree of solution stability and exceptional model acceleration effects. In pruning experiments conducted on the BERTBASE and DistilBERT models, the faster-pruner exhibited outstanding performance on the GLUE benchmark dataset, achieving a reduction of up to 36.27% in pruning time and a speed increase of up to 1.45× on an RTX 3090 GPU.
Keywords