Data in Brief (Apr 2024)
Lexicon dataset for the Hausa language
Abstract
This paper presents a comprehensive augmented lexicon sentiment analysis dataset for the Hausa language. The dataset was created by adopting words and phrases from a Hausa Language dictionary and then using the data augmentation method to expand the quantity of the dataset. The researchers manually annotated each phrase/sentence with positive, negative, or neutral polarity. The dataset consists of 14,663 rows, with 4,154 positives, 4,310 negatives, and 6,199 neutrals. The dataset is valuable because it contributes to the available resources for sentiment analysis, especially for Hausa, which is a low-resource language. The dataset will benefit researchers in sentiment analysis who want to develop a model to analyze Hausa posts on social media or product reviews in the Hausa language.