Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language
Victor Kwaku Agbesi,
Wenyu Chen,
Sophyani Banaamwini Yussif,
Md Altab Hossin,
Chiagoziem C. Ukwuoma,
Noble A. Kuadey,
Colin Collinson Agbesi,
Nagwan Abdel Samee,
Mona M. Jamjoom,
Mugahed A. Al-antari
Affiliations
Victor Kwaku Agbesi
School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
Wenyu Chen
School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
Sophyani Banaamwini Yussif
School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
Md Altab Hossin
School of Innovation and Entrepreneurship, Chengdu University, No. 2025 Chengluo Avenue, Chengdu 610106, China
Chiagoziem C. Ukwuoma
College of Nuclear Technology and Automation Engineering, Chengdu University of Technology, Chengdu 610059, China
Noble A. Kuadey
School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
Colin Collinson Agbesi
Faculty of Applied Science and Technology, Koforidua Technical University, Koforidua P.O. Box KF-981, Ghana
Nagwan Abdel Samee
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
Mona M. Jamjoom
Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
Mugahed A. Al-antari
Department of Artificial Intelligence, College of Software & Convergence Technology, Daeyang AI Center, Sejong University, Seoul 05006, Republic of Korea
Despite a few attempts to automatically crawl Ewe text from online news portals and magazines, the African Ewe language remains underdeveloped despite its rich morphology and complex "unique" structure. This is due to the poor quality, unbalanced, and religious-based nature of the crawled Ewe texts, thus making it challenging to preprocess and perform any NLP task with current transformer-based language models. In this study, we present a well-preprocessed Ewe dataset for low-resource text classification to the research community. Additionally, we have developed an Ewe-based word embedding to leverage the low-resource semantic representation. Finally, we have fine-tuned seven transformer-based models, namely BERT-based (cased and uncased), DistilBERT-based (cased and uncased), RoBERTa, DistilRoBERTa, and DeBERTa, using the preprocessed Ewe dataset that we have proposed. Extensive experiments indicate that the fine-tuned BERT-base-cased model outperforms all baseline models with an accuracy of 0.972, precision of 0.969, recall of 0.970, loss score of 0.021, and an F1-score of 0.970. This performance demonstrates the model’s ability to comprehend the low-resourced Ewe semantic representation compared to all other models, thus setting the fine-tuned BERT-based model as the benchmark for the proposed Ewe dataset.