Journal of Modern Science (Aug 2024)
Advanced emotion analysis: harnessing facial image processing and speech recognition through deep learning
Abstract
The human face hides many secrets and is one of the most expressive human features. Human faces even contain hidden information about a person's personality. Considering the fundamental role of the human face, it is necessary to prepare appropriate deep-learning solutions that analyze human face data. This technology is becoming increasingly common in many industries, such as online retail, advertising testing, virtual makeovers, etc. For example, facial analysis technology now allows online shoppers to virtually apply makeup and try on jewelry or new glasses to get an accurate picture of what these products will look like. The human sense of hearing is a treasure trove of information about the current environment and the location and properties of sound-producing objects. For instance, we effortlessly absorb the sounds of birds singing outside the window, traffic passing in the distance, or the lyrics of a song on the radio. The human auditory system can process the intricate mix of sounds reaching our ears and create high-level abstractions of the environment by analyzing and grouping measured sensory signals. The process of obtaining segregation and identifying sources of a received complex acoustic signal, known as sound scene analysis, is a domain where the power of deep learning shines. The machine implementation of this functionality (separation and classification of sound sources) is pivotal in applications such as speech recognition in noise, automatic music transcription, searching and retrieving multimedia data, or recognizing emotions in statements.
Keywords