Dataset construction to detect human behavior with the help of emotions, sentiments and mood for Roman Urdu

Asia Samreen; Syed Asif Ali

Data in Brief (Feb 2024)

Dataset construction to detect human behavior with the help of emotions, sentiments and mood for Roman Urdu

Asia Samreen,
Syed Asif Ali

Affiliations

Asia Samreen: Department of Computer Science, Bahria University, Karachi Campus, 13 National Stadium Road, Karachi 75260, Pakistan; Department of Computer Science, Sindh Madressatul Islam University, Hasrat Mohani Road, Karachi 74000, Pakistan; Corresponding author at: Department of Computer Science, Bahria University, Karachi Campus, 13 National Stadium Road, Karachi 75260, Pakistan.
Syed Asif Ali: Department of Computer Science, Sindh Madressatul Islam University, Hasrat Mohani Road, Karachi 74000, Pakistan

Journal volume & issue: Vol. 52
p. 109906

Abstract

Read online

Roman Urdu and English are often used together as a hybrid language for communication on social media. Because writers don't worry about spelling when utilizing the English alphabet to write Urdu during texting, it becomes challenging to interpret mixed codes for emotions. There are over 14,000 emotion lexicons in this dataset, each of which lists nine different emotions and their polarities. The NRC emotion lexicons [8] provided in Urdu have been transliterated into Roman Urdu. To verify that the provided translation is accurate, we used three online dictionaries of Urdu. A Python script that transliterates words from Urdu to Roman Urdu has been used to develop Roman Urdu transliteration. Sentiment and mood, depending on the emotion lexicon, are also provided. The textual data has been annotated using the unigram feature and distance estimation among strings and lexicons. Approximately 10,000 sentences from the baseline sample have been automatically annotated.

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal

Abstract

Keywords