Data in Brief (Dec 2024)
IndoWaveSentiment: Indonesian audio dataset for emotion classificationMendeley Data
Abstract
Voice is a one of media for human communication and interaction. Emotions conveyed through voice, such as laughter or tears, can communicate messages more quickly than spoken or written language. In sentiment analysis, the emotional component is crucial for reflecting human perceptions and opinions. This paper introduces IndoWaveSentiment, a dataset of emotional voice recordings categorized into five classes: neutral, happy, surprised, disgusted, and disappointed. The data collection took place in a recording studio with ten actors, evenly split between men and women. Each actor repeated the same sentence in Bahasa Indonesia three times for each emotion class, and the recordings were saved in .wav format. The annotation process was manually conducted using Audacity and validated through a questionnaire-based sampling technique that supports audio data. This dataset is valuable for researchers in Signal Processing and Artificial Intelligence, aiding the development of classification models within Machine Learning.