IEEE Access (Jan 2023)

A Novel Text Classification Model Combining Time Correlation Principle and Rough Set Theory

  • Dejun Zhang

DOI
https://doi.org/10.1109/ACCESS.2023.3332909
Journal volume & issue
Vol. 11
pp. 135797 – 135810

Abstract

Read online

This research aims to design a literary text feature classification and information extraction model based on the principle of temporal association and rough set theory. We put forward a new text classification method through the in-depth study of time series correlation principle algorithm and text classification technology based on rough set theory. First, we propose to use the lexical space feature vector as the input channel of the rough set model to extract literary sentence-level features according to the spatial relationship between words. Secondly, aiming at the problem of low efficiency of KNN text classification algorithm, we propose a KNN literature text classification algorithm based on rough set approximation set, which significantly improves classification efficiency while ensuring classification accuracy. The effectiveness of the algorithm is proved by experiments, and it promotes the progress of rough set theory in practical application research. In addition, we propose an improved attribute reduction algorithm in the process of literary text classification by combining feature selection, information extraction, and the correlation of feature items generated by text description and the evaluation criteria of rough set itself. This algorithm makes the reduced attribute set more important, and then improves the text recognition rate. Through comparative experiments, it is proved that our improved method increases the number of applicable texts by 12.86%, and the improvement effect is good. In summary, our model combines the temporal correlation principle with rough set theory, and provides a new method for feature classification and information extraction of literary texts. Our results demonstrate that the method is able to achieve better classification results when applied to collections of literary texts.

Keywords