Engineering and Applied Science Research (Jul 2021)

Concept-based one-class SVM classifier with supervised term weighting scheme for imbalanced sentiment classification

  • Khanista Namee,
  • Jantima Polpinij

Journal volume & issue
Vol. 48, no. 5
pp. 604 – 613

Abstract

Read online

Imbalanced sentiment is one of the key classification issues. Many studies have proposed imbalanced sentiment classification improvements, but the topic remains problematic as a major challenge. This paper proposes a method, called “concept-based one-class SVM classifier”, to address imbalanced sentiment classification that consists of three main techniques. First, we apply Word2Vec and PageRank algorithms to extract “concepts” and their related terms (called “members”) embedded in texts. The corpus of “concepts” is then used to prepare the dataset by replacing words with the “concepts”. This reduces term ambiguity and also the size of word vectors. Second, supervised term weighting (STW) schemes are applied to determine the importance of a word in a document of a specific class. This reflects the class distinguishing power of each term. Finally, the one-class support vector machine (SVM) algorithm is used for sentiment classifier modeling. This has proved useful for imbalanced data classification, especially when the minority class lacks structure and is predominantly composed of small disjuncts or outliers. By combining these techniques, our proposed method may be able to competently identify and distinguish between the characteristics of each class, especially in the context of an imbalanced data scenario. After validating the proposed method with the hotel review dataset, and running experiments with different imbalanced ratios, our proposed method returned satisfactory results of recall, precision, and F1. We then selected the best model generated from our method and compared the results to the state-of-the-art method. Our proposed method returned better results than the state-of-the-art method, with improved scores of F1 at 3.19%. Moreover, if considering for the computational processing time, our proposed method is faster than the state-of-the-art method.

Keywords