Machine Learning-Based Text Classification Comparison: Turkish Language Context

Yehia Ibrahim Alzoubi; Ahmet E. Topcu; Ahmed Enis Erkaya

doi:10.3390/app13169428

Applied Sciences (Aug 2023)

Machine Learning-Based Text Classification Comparison: Turkish Language Context

Yehia Ibrahim Alzoubi,
Ahmet E. Topcu,
Ahmed Enis Erkaya

Affiliations

Yehia Ibrahim Alzoubi: College of Business Administration, American University of the Middle East, Egaila 54200, Kuwait
Ahmet E. Topcu: College of Engineering and Technology, American University of the Middle East, Egaila 54200, Kuwait
Ahmed Enis Erkaya: Tubıtak Bılgem Software Technologies Research Institute, Ankara 06100, Türkiye

DOI: https://doi.org/10.3390/app13169428
Journal volume & issue: Vol. 13, no. 16
p. 9428

Abstract

Read online

The growth in textual data associated with the increased usage of online services and the simplicity of having access to these data has resulted in a rise in the number of text classification research papers. Text classification has a significant influence on several domains such as news categorization, the detection of spam content, and sentiment analysis. The classification of Turkish text is the focus of this work since only a few studies have been conducted in this context. We utilize data obtained from customers’ inquiries that come to an institution to evaluate the proposed techniques. Classes are assigned to such inquiries specified in the institution’s internal procedures. The Support Vector Machine, Naïve Bayes, Long Term-Short Memory, Random Forest, and Logistic Regression algorithms were used to classify the data. The performance of the various techniques was then analyzed after and before data preparation, and the results were compared. The Long Term-Short Memory technique demonstrated superior effectiveness in terms of accuracy, achieving an 84% accuracy rate, surpassing the best accuracy record of traditional techniques, which was 78% accuracy for the Support Vector Machine technique. The techniques performed better once the number of categories in the dataset was reduced. Moreover, the findings show that data preparation and coherence between the classes’ number and the number of training sets are significant variables influencing the techniques’ performance. The findings of this study and the text classification technique utilized may be applied to data in dialects other than Turkish.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords