A Text Classification Model via Multi-Level Semantic Features

Keji Mao; Jinyu Xu; Xingda Yao; Jiefan Qiu; Kaikai Chi; Guanglin Dai

doi:10.3390/sym14091938

Symmetry (Sep 2022)

A Text Classification Model via Multi-Level Semantic Features

Keji Mao,
Jinyu Xu,
Xingda Yao,
Jiefan Qiu,
Kaikai Chi,
Guanglin Dai

Affiliations

Keji Mao: College of Computer Science and Technology College of Software, Zhejiang University of Technology, Hangzhou 310023, China
Jinyu Xu: College of Computer Science and Technology College of Software, Zhejiang University of Technology, Hangzhou 310023, China
Xingda Yao: College of Computer Science and Technology College of Software, Zhejiang University of Technology, Hangzhou 310023, China
Jiefan Qiu: College of Computer Science and Technology College of Software, Zhejiang University of Technology, Hangzhou 310023, China
Kaikai Chi: College of Computer Science and Technology College of Software, Zhejiang University of Technology, Hangzhou 310023, China
Guanglin Dai: College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China

DOI: https://doi.org/10.3390/sym14091938
Journal volume & issue: Vol. 14, no. 9
p. 1938

Abstract

Read online

Text classification is a major task of NLP (Natural Language Processing) and has been the focus of attention for years. News classification as a branch of text classification is characterized by complex structure, large amounts of information and long text length, which in turn leads to a decrease in the accuracy of classification. To improve the classification accuracy of Chinese news texts, we present a text classification model based on multi-level semantic features. First, we add the category correlation coefficient to TF-IDF (Term Frequency-Inverse Document Frequency) and the frequency concentration coefficient to CHI (Chi-Square), and extract the keyword semantic features with the improved algorithm. Then, we extract local semantic features with TextCNN with symmetric-channel and global semantic information from a BiLSTM with attention. Finally, we fuse the three semantic features for the prediction of text categories. The results of experiments on THUCNews, LTNews and MCNews show that our presented method is highly accurate, with 98.01%, 90.95% and 94.24% accuracy, respectively. With model parameters two magnitudes smaller than Bert, the improvements relative to the baseline Bert+FC are 1.27%, 1.2%, and 2.81%, respectively.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords