Applications of Pruning Methods in Natural Language Processing

Marva Touheed; Urooj Zubair; Dilshad Sabir; Ali Hassan; Muhammad Fasih Uddin Butt; Farhan Riaz; Wadood Abdul; Rashid Ayub

doi:10.1109/ACCESS.2024.3411776

IEEE Access (Jan 2024)

Applications of Pruning Methods in Natural Language Processing

Marva Touheed,
Urooj Zubair,
Dilshad Sabir,
Ali Hassan,
Muhammad Fasih Uddin Butt,
Farhan Riaz,
Wadood Abdul,
Rashid Ayub

Affiliations

Marva Touheed: ORCiD; Department of Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad, Pakistan
Urooj Zubair: ORCiD; Department of Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad, Pakistan
Dilshad Sabir: ORCiD; Department of Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad, Pakistan
Ali Hassan: ORCiD; Department of Computer and Software Engineering, College of Electrical and Mechanical Engineering, National University of Sciences and Technology, Islamabad, Pakistan
Muhammad Fasih Uddin Butt: ORCiD; Department of Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad, Pakistan
Farhan Riaz: School of Computer Science, College of Health and Science, University of Lincoln, Lincoln, U.K.
Wadood Abdul: Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Rashid Ayub: ORCiD; Department of Science Technology and Innovation, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2024.3411776
Journal volume & issue: Vol. 12
pp. 89418 – 89438

Abstract

Read online

Deep neural networks (DNN) are in high demand because of their widespread applications in natural language processing, image processing, and a lot of other domains. However, due to their computational expense, over-parameterization, and large memory requirements, DNN applications often require the use of substantial model resources. This strict requirement of latency and limited memory availability are hurdles in the device deployment of these technologies. Therefore, a common idea could be to mitigate the DNN-based models’ size without any performance degradation using different compression techniques. During the last few years, a great deal of progress has been made in the field of Natural Language Processing (NLP) using deep learning approaches. The objective of this research is to offer a thorough overview of the various pruning methods applied in the context of NLP. In this paper, we review several recent pruning-based schemes used for converting standard networks into their compact and accelerated versions. Traditionally, pruning is a technique for improving latency, reducing model size, and computational complexity which is a viable approach to deal with the above-mentioned challenges. In general, these techniques are divided into two main categories: structural and unstructured pruning methods. Structural pruning methods are further classified into filter, channel, layer, block, and movement pruning. Whereas, neuron, magnitude-based, and iterative pruning lie in the category of unstructured pruning. For each method, we discuss the related metrics and benchmarks. Then recent work on each method is discussed in detail, which provides insightful analysis of the performance, related applications, and pros and cons. Then, a comparative analysis is provided to analyze the differences among approaches. Finally, the paper concludes with possible future directions and some technical challenges.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords