Short Text Classification with Tolerance-Based Soft Computing Method

Vrushang Patel; Sheela Ramanna; Ketan Kotecha; Rahee Walambe

doi:10.3390/a15080267

Algorithms (Jul 2022)

Short Text Classification with Tolerance-Based Soft Computing Method

Vrushang Patel,
Sheela Ramanna,
Ketan Kotecha,
Rahee Walambe

Affiliations

Vrushang Patel: Deloitte Inc., Calgary, AB T2P 0R8, Canada
Sheela Ramanna: Department of Applied Computer Science, University of Winnipeg, Winnipeg, MB R3B 2E9, Canada
Ketan Kotecha: Symbiosis Institute of Technology (SIT), Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis International (Deemed University), Pune 412115, India
Rahee Walambe: Symbiosis Institute of Technology (SIT), Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis International (Deemed University), Pune 412115, India

DOI: https://doi.org/10.3390/a15080267
Journal volume & issue: Vol. 15, no. 8
p. 267

Abstract

Read online

Text classification aims to assign labels to textual units such as documents, sentences and paragraphs. Some applications of text classification include sentiment classification and news categorization. In this paper, we present a soft computing technique-based algorithm (TSC) to classify sentiment polarities of tweets as well as news categories from text. The TSC algorithm is a supervised learning method based on tolerance near sets. Near sets theory is a more recent soft computing methodology inspired by rough sets where instead of set approximation operators used by rough sets to induce tolerance classes, the tolerance classes are directly induced from the feature vectors using a tolerance level parameter and a distance function. The proposed TSC algorithm takes advantage of the recent advances in efficient feature extraction and vector generation from pre-trained bidirectional transformer encoders for creating tolerance classes. Experiments were performed on ten well-researched datasets which include both short and long text. Both pre-trained SBERT and TF-IDF vectors were used in the experimental analysis. Results from transformer-based vectors demonstrate that TSC outperforms five well-known machine learning algorithms on four datasets, and it is comparable with all other datasets based on the weighted F1, Precision and Recall scores. The highest AUC-ROC (Area under the Receiver Operating Characteristics) score was obtained in two datasets and comparable in six other datasets. The highest ROC-PRC (Area under the Precision–Recall Curve) score was obtained in one dataset and comparable in four other datasets. Additionally, significant differences were observed in most comparisons when examining the statistical difference between the weighted F1-score of TSC and other classifiers using a Wilcoxon signed-ranks test.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords