Lexicon dataset for the Hausa language

Idi Mohammed; Rajesh Prasad

Data in Brief (Apr 2024)

Lexicon dataset for the Hausa language

Idi Mohammed,
Rajesh Prasad

Affiliations

Idi Mohammed: Corresponding author.; Computer Science Department, African University of Science and Technology, Abuja, Nigeria
Rajesh Prasad: Computer Science Department, African University of Science and Technology, Abuja, Nigeria

Journal volume & issue: Vol. 53
p. 110124

Abstract

Read online

This paper presents a comprehensive augmented lexicon sentiment analysis dataset for the Hausa language. The dataset was created by adopting words and phrases from a Hausa Language dictionary and then using the data augmentation method to expand the quantity of the dataset. The researchers manually annotated each phrase/sentence with positive, negative, or neutral polarity. The dataset consists of 14,663 rows, with 4,154 positives, 4,310 negatives, and 6,199 neutrals. The dataset is valuable because it contributes to the available resources for sentiment analysis, especially for Hausa, which is a low-resource language. The dataset will benefit researchers in sentiment analysis who want to develop a model to analyze Hausa posts on social media or product reviews in the Hausa language.

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal

Abstract

Keywords