A Hybrid Classification Method via Character Embedding in Chinese Short Text With Few Words

Yi Zhu; Yun Li; Yongzheng Yue; Jipeng Qiang; Yunhao Yuan

doi:10.1109/ACCESS.2020.2994450

IEEE Access (Jan 2020)

A Hybrid Classification Method via Character Embedding in Chinese Short Text With Few Words

Yi Zhu,
Yun Li,
Yongzheng Yue,
Jipeng Qiang,
Yunhao Yuan

Affiliations

Yi Zhu: ORCiD; School of Information Engineering, Yangzhou University, Jiangsu, China
Yun Li: School of Information Engineering, Yangzhou University, Jiangsu, China
Yongzheng Yue: ORCiD; Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Hefei, China
Jipeng Qiang: ORCiD; School of Information Engineering, Yangzhou University, Jiangsu, China
Yunhao Yuan: ORCiD; School of Information Engineering, Yangzhou University, Jiangsu, China

DOI: https://doi.org/10.1109/ACCESS.2020.2994450
Journal volume & issue: Vol. 8
pp. 92120 – 92128

Abstract

Read online

Last decades have witnessed the significance development of research in short text classification. However, most existing methods only focus on the text which contained dozens of words like Twitter or MicroBlog, but not take the short text with few words like news headline or invoice name into consideration. Meanwhile, contemporary short text classification methods either to expand feature of short text with external corpus or to learn the feature representation from all the texts, which have not take the difference between words of short text into full consideration. Notably, the classification of short text with few words are usually determined by a few specific key words contrary to documents classification or traditional short text classification. To address these problems, this paper propose a hybrid classification method of Attention mechanism and Feature selection via Character embedding in Chinese short text with few words, called AFC. More specifically, firstly, the character embedding is computed to represent Chinese short texts with few words, which takes full advantage of short text information without external corpus. Secondly, attention-based LSTM is introduced in our method to project the data into feature representation space with weighting, which make the keywords in classification have more subtle value. Furthermore, the semantic similarity between content and class label information is calculated for feature selection, which reduces the possible negative influence of some redundant information on classification. Experiments on real-world datasets demonstrate the effectiveness of our method compared to other competing methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords