Dianzi Jishu Yingyong (Jan 2023)
Knowledge distillation of news text classification based on BERT-CNN
Abstract
In recent years, after the era of big data has entered human life, many unrecognizable text, semantic and other data have appeared in people's lives, which are very large in volume and intricate in semantics, which makes the classification task more difficult. How to make computers classify this information accurately has become an important task of current research. In this process, Chinese news text classification has become a branch in this field, which has a crucial role in the control of national public opinion, the understanding of users' daily behavior, and the prediction of users' future speech and behavior. In view of the shortage of news text classification models with large number of parameters and long training time, the BERT-CNN based knowledge distillation is proposed to compress the training time while maximizing the model performance and striving for a compromise between the two. According to the technical characteristics of model compression, BERT is used as the teacher model and CNN is used as the student model, and BERT is pre-trained first before allowing the student model to generalize the capability of the teacher model. The experimental results show that the model parametric number compression is about 1/82 and the time reduction is about 1/670 with the model performance loss of about 2.09%.
Keywords