IEEE Access (Jan 2023)
Enhanced Word-Unit Broad Learning System With Sememes
Abstract
High accuracy in text classification can be achieved by simultaneously learning multiple sources of information, such as sequence and word. In this study, we propose a novel learning framework for text classification, called Word-unit Broad Learning System (BLS). The Word-unit BLS utilises a flat neural network known as BLS and offers three key advantages. First, it provides higher accuracy and shorter training time compared to popular machine learning methods, allowing for the simultaneous learning of sequence information and word importance. Second, we incorporate a multi-layer perceptron with attention aggregation in the feature-mapped layer, along with position encoding, to capture the latent relationship between each word and the contextual information in a global context. Lastly, we introduce a novel approach to enhance word representation by employing sememes in the enhancement node layer, thereby improving the feature distribution of each word in the vector space. The effectiveness of the proposed framework was evaluated by conducting experiments on four datasets covering various types of text classifications. The results demonstrate that Word-unit BLS achieves 8.26% higher accuracy than Naive Bayes while requiring 1/33 of the training time. Furthermore, when compared with traditional BLS models, Word-unit BLS outperforms in learning the sequence information. The effectiveness of sememe enhancement in word representation is also demonstrated, particularly in the case of large-scale datasets.
Keywords