Franklin Open (Dec 2024)
A part of speech tagger for Yoruba language text using deep neural network
Abstract
The pursuit of advancing Yoruba language in the realm of technology has underscored the necessity for an efficient foundational natural language processing (NLP) tool, notably the part-of-speech (POS) tagger. POS tagging serves as the building block to numerous NLP applications, as its capacity to recognize and assign appropriate syntactic tags to words is pivotal to the efficiency of NLP solutions. However, the existing POS taggers for Yoruba language either rely on rule-based approaches, which are limited by the comprehensiveness and accuracy of the defined rules; or stochastic approaches, which are extremely redundant in generating sequence of tags. Hence, this paper advocates the utilization of machine learning models to develop robust and highly effective POS taggers tailored to Yoruba text. Specifically, a Feed Forward Deep Neural Network (FF-DNN) was employed and trained using curated Yoruba tag set sourced from Yoruba religion and dictionary texts, comprising 20,795 words alongside their corresponding POS tags. The evaluation of the model demonstrates an accuracy of 99 % and a precision of 98 % in predicting appropriate tags, outperforming Random Forest (RF), Logistic Regression (LR), and K-Nearest Neighbour (k-NN) machine learning models.