Advanced emotion analysis: harnessing facial image processing and speech recognition through deep learning

Magdalena Hałas; Michał Maj; Ewa Guz; Marcin Stencel; Tomasz Cieplak

doi:10.13166/jms/191163

Journal of Modern Science (Aug 2024)

Advanced emotion analysis: harnessing facial image processing and speech recognition through deep learning

Magdalena Hałas,
Michał Maj,
Ewa Guz,
Marcin Stencel,
Tomasz Cieplak

Affiliations

Magdalena Hałas: ORCiD; WSEI University
Michał Maj: ORCiD; WSEI University
Ewa Guz: ORCiD; WSEI University
Marcin Stencel: ORCiD; WSEI University
Tomasz Cieplak: ORCiD; Lublin University of Technology

DOI: https://doi.org/10.13166/jms/191163
Journal volume & issue: Vol. 57, no. 3
pp. 388 – 401

Abstract

Read online

The human face hides many secrets and is one of the most expressive human features. Human faces even contain hidden information about a person's personality. Considering the fundamental role of the human face, it is necessary to prepare appropriate deep-learning solutions that analyze human face data. This technology is becoming increasingly common in many industries, such as online retail, advertising testing, virtual makeovers, etc. For example, facial analysis technology now allows online shoppers to virtually apply makeup and try on jewelry or new glasses to get an accurate picture of what these products will look like. The human sense of hearing is a treasure trove of information about the current environment and the location and properties of sound-producing objects. For instance, we effortlessly absorb the sounds of birds singing outside the window, traffic passing in the distance, or the lyrics of a song on the radio. The human auditory system can process the intricate mix of sounds reaching our ears and create high-level abstractions of the environment by analyzing and grouping measured sensory signals. The process of obtaining segregation and identifying sources of a received complex acoustic signal, known as sound scene analysis, is a domain where the power of deep learning shines. The machine implementation of this functionality (separation and classification of sound sources) is pivotal in applications such as speech recognition in noise, automatic music transcription, searching and retrieving multimedia data, or recognizing emotions in statements.

Published in Journal of Modern Science

ISSN: 1734-2031 (Print); 2391-789X (Online)
Publisher: Akademia Nauk Stosowanych WSGE im. A. De Gasperi w Józefowie
Country of publisher: Poland
LCC subjects: Social Sciences
Website: https://www.jomswsge.com/en

About the journal

Abstract

Keywords