Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Oct 2020)
Deteksi Emosi Wicara pada Media On-Demand menggunakan SVM dan LSTM
Abstract
To date, there are many speech data sets with emotional classes, but with impromptu or intentional actors. The native speakers are given a stimulus in each emotion expression. Because natural conversation from secretly recorded daily communication still raises ethical issues, then using voice data that takes samples from movies and podcasts is the most appropriate step to take the best insights from speech. Professional actors are trained to induce the most real emotions close to natural, through the Stanislavski acting method. The speech dataset that meets this qualification is the Human voice Natural Language from On-demand media (HENLO). Within HENLO, there are basic per-emotion audio clips of films and podcasts originating from Media On-Demand, a motion video entertainment media platform with the freedom to play and download at any time. In this paper, we describe the use of sound clips from HENLO, then conduct learning using Support Vector Machine (SVM) and Long Short-Term Memory (LSTM). In these two methods, we found the best strategy by training LSTMs first, then then feeding the model to SVM, with a data split strategy at 80:20 scale. The results of the five training phases show that the last accuracy results increased by more than 17% compared to the first training. These results mean both complement and methods are important for improving classification accuracy.
Keywords