Topic Modeling for Amharic User Generated Texts

Girma Neshir; Andreas Rauber; Solomon Atnafu

doi:10.3390/info12100401

Information (Sep 2021)

Topic Modeling for Amharic User Generated Texts

Girma Neshir,
Andreas Rauber,
Solomon Atnafu

Affiliations

Girma Neshir: IT Doctoral Program, Addis Ababa University, Addis Ababa 28762, Ethiopia
Andreas Rauber: Institute of Information Systems Engineering, Technical University of Vienna, Favoritenstraße 9-11/194-01, A-1040 Vienna, Austria
Solomon Atnafu: Department of Computer Science, Addis Ababa University, Addis Ababa 1176, Ethiopia

DOI: https://doi.org/10.3390/info12100401
Journal volume & issue: Vol. 12, no. 10
p. 401

Abstract

Read online

Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model for an Amharic corpus. We also investigate the effect of stemming on topic detection on Term Frequency Inverse Document Frequency (TF-IDF) features, Latent Dirichlet Allocation (LDA) features and a combination of these two feature sets using four supervised machine learning tools, that is, Support Vector Machine (SVM), Naive Bayesian (NB), Logistic Regression (LR), and Neural Nets (NN). We evaluate our approach using an Amharic corpus of 14,751 documents of ten topic categories. Both qualitative and quantitative analysis of results show that our proposed supervised topic detection outperforms with an accuracy of 88% by SVM using state-of-the-art-approach TF-IDF word features with the application of the Synthetic Minority Over-sampling Technique (SMOTE) and with no stemming operation. The results show that text features with stemming slightly improve the performance of the topic classifier over features with no stemming.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords