Implementing machine learning techniques for continuous emotion prediction from uniformly segmented voice recordings

Hannes Diemerling; Hannes Diemerling; Hannes Diemerling; Hannes Diemerling; Leonie Stresemann; Tina Braun; Tina Braun; Timo von Oertzen; Timo von Oertzen

doi:10.3389/fpsyg.2024.1300996

Frontiers in Psychology (Mar 2024)

Implementing machine learning techniques for continuous emotion prediction from uniformly segmented voice recordings

Hannes Diemerling,
Hannes Diemerling,
Hannes Diemerling,
Hannes Diemerling,
Leonie Stresemann,
Tina Braun,
Tina Braun,
Timo von Oertzen,
Timo von Oertzen

Affiliations

Hannes Diemerling: Center for Lifespan Psychology, Max Planck Institute for Human Development, Berlin, Germany
Hannes Diemerling: Thomas Bayes Institute, Berlin, Germany
Hannes Diemerling: Department of Psychology, Humboldt-Universität zu Berlin, Berlin, Germany
Hannes Diemerling: Department of Psychology, University of the Bundeswehr München, Neubiberg, Germany
Leonie Stresemann: Department of Psychology, University of the Bundeswehr München, Neubiberg, Germany
Tina Braun: Department of Psychology, University of the Bundeswehr München, Neubiberg, Germany
Tina Braun: Department of Psychology, Charlotte-Fresenius University, Wiesbaden, Germany
Timo von Oertzen: Center for Lifespan Psychology, Max Planck Institute for Human Development, Berlin, Germany
Timo von Oertzen: Thomas Bayes Institute, Berlin, Germany

DOI: https://doi.org/10.3389/fpsyg.2024.1300996
Journal volume & issue: Vol. 15

Abstract

Read online

IntroductionEmotional recognition from audio recordings is a rapidly advancing field, with significant implications for artificial intelligence and human-computer interaction. This study introduces a novel method for detecting emotions from short, 1.5 s audio samples, aiming to improve accuracy and efficiency in emotion recognition technologies.MethodsWe utilized 1,510 unique audio samples from two databases in German and English to train our models. We extracted various features for emotion prediction, employing Deep Neural Networks (DNN) for general feature analysis, Convolutional Neural Networks (CNN) for spectrogram analysis, and a hybrid model combining both approaches (C-DNN). The study addressed challenges associated with dataset heterogeneity, language differences, and the complexities of audio sample trimming.ResultsOur models demonstrated accuracy significantly surpassing random guessing, aligning closely with human evaluative benchmarks. This indicates the effectiveness of our approach in recognizing emotional states from brief audio clips.DiscussionDespite the challenges of integrating diverse datasets and managing short audio samples, our findings suggest considerable potential for this methodology in real-time emotion detection from continuous speech. This could contribute to improving the emotional intelligence of AI and its applications in various areas.

Published in Frontiers in Psychology

ISSN: 1664-1078 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Philosophy. Psychology. Religion: Psychology
Website: https://www.frontiersin.org/journals/psychology

About the journal

Abstract

Keywords