A novel iteration scheme with conjugate gradient for faster pruning on transformer models

Jun Li; Yuchen Zhu; Kexue Sun

doi:10.1007/s40747-024-01595-w

Complex & Intelligent Systems (Aug 2024)

A novel iteration scheme with conjugate gradient for faster pruning on transformer models

Jun Li,
Yuchen Zhu,
Kexue Sun

Affiliations

Jun Li: College of Electronic and Optical Engineering and College of Flexible Electronics (Future Technology), Nanjing University of Posts and Telecommunications
Yuchen Zhu: College of Electronic and Optical Engineering and College of Flexible Electronics (Future Technology), Nanjing University of Posts and Telecommunications
Kexue Sun: College of Electronic and Optical Engineering and College of Flexible Electronics (Future Technology), Nanjing University of Posts and Telecommunications

DOI: https://doi.org/10.1007/s40747-024-01595-w
Journal volume & issue: Vol. 10, no. 6
pp. 7863 – 7875

Abstract

Read online

Abstract Pre-trained models based on the Transformer architecture have significantly advanced research within the domain of Natural Language Processing (NLP) due to their superior performance and extensive applicability across multiple technological sectors. Despite these advantages, there is a significant challenge in optimizing these models for more efficient deployment. To be concrete, the existing post-training pruning frameworks of transformer models suffer from inefficiencies in the crucial stage of pruning accuracy recovery, which impacts the overall pruning efficiency. To address this issue, this paper introduces a novel and efficient iteration scheme with conjugate gradient in the pruning recovery stage. By constructing a series of conjugate iterative directions, this approach ensures each optimization step is orthogonal to the previous ones, which effectively reduces redundant explorations of the search space. Consequently, each iteration progresses effectively towards the global optimum, thereby significantly enhancing search efficiency. The conjugate gradient-based faster-pruner reduces the time expenditure of the pruning process while maintaining accuracy, demonstrating a high degree of solution stability and exceptional model acceleration effects. In pruning experiments conducted on the BERTBASE and DistilBERT models, the faster-pruner exhibited outstanding performance on the GLUE benchmark dataset, achieving a reduction of up to 36.27% in pruning time and a speed increase of up to 1.45× on an RTX 3090 GPU.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords