A Methodology Combining Cosine Similarity with Classifier for Text Classification

Kwangil Park; June Seok Hong; Wooju Kim

doi:10.1080/08839514.2020.1723868

Applied Artificial Intelligence (Apr 2020)

A Methodology Combining Cosine Similarity with Classifier for Text Classification

Kwangil Park,
June Seok Hong,
Wooju Kim

Affiliations

Kwangil Park: Yonsei University
June Seok Hong: Kyonggi University
Wooju Kim: Yonsei University

DOI: https://doi.org/10.1080/08839514.2020.1723868
Journal volume & issue: Vol. 34, no. 5
pp. 396 – 411

Abstract

Read online

Text Classification has received significant attention in recent years because of the proliferation of digital documents and is widely used in various applications such as filtering and recommendation. Consequently, many approaches, including those based on statistical theory, machine learning, and classifier performance improvement, have been proposed for improving text classification performance. Among these approaches, centroid-based classifier, multinomial naïve bayesian (MNB), support vector machines (SVM), convolutional neural network (CNN) are commonly used. In this paper, we introduce a cosine similarity-based methodology for improving performance. The methodology combines cosine similarity (between a test document and fixed categories) with conventional classifiers such as MNB, SVM, and CNN to improve the accuracy of the classifiers, and then we call the conventional classifiers with cosine similarity as enhanced classifiers. We applied the enhanced classifiers to famous datasets – 20NG, R8, R52, Cade12, and WebKB – and evaluated the performance of the enhanced classifiers in terms of the confusion matrix’s accuracy; we obtained outstanding results in that the enhanced classifiers show significant increases in accuracy. Moreover, through experiments, we identified which of two considered knowledge representation techniques (word count and term frequency-inverse document frequency (TFIDF)) is more suitable in terms of classifier performance.

Published in Applied Artificial Intelligence

ISSN: 0883-9514 (Print); 1087-6545 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Science: Science (General): Cybernetics
Website: https://www.tandfonline.com/journals/uaai

About the journal