Identifying languages in a novel dataset: ASMR-whispered speech

Meishu Song; Meishu Song; Zijiang Yang; Zijiang Yang; Emilia Parada-Cabaleiro; Xin Jing; Yoshiharu Yamamoto; Björn Schuller; Björn Schuller

doi:10.3389/fnins.2023.1120311

Frontiers in Neuroscience (Jun 2023)

Identifying languages in a novel dataset: ASMR-whispered speech

Meishu Song,
Meishu Song,
Zijiang Yang,
Zijiang Yang,
Emilia Parada-Cabaleiro,
Xin Jing,
Yoshiharu Yamamoto,
Björn Schuller,
Björn Schuller

Affiliations

Meishu Song: Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany
Meishu Song: Educational Physiology Laboratory, The University of Tokyo, Tokyo, Japan
Zijiang Yang: Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany
Zijiang Yang: Educational Physiology Laboratory, The University of Tokyo, Tokyo, Japan
Emilia Parada-Cabaleiro: Institute of Computational Perception, Johannes Kepler University Linz, Linz, Austria
Xin Jing: Educational Physiology Laboratory, The University of Tokyo, Tokyo, Japan
Yoshiharu Yamamoto: Educational Physiology Laboratory, The University of Tokyo, Tokyo, Japan
Björn Schuller: Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany
Björn Schuller: Group on Language, Audio, & Music, Imperial College London, London, United Kingdom

DOI: https://doi.org/10.3389/fnins.2023.1120311
Journal volume & issue: Vol. 17

Abstract

Read online

IntroductionThe Autonomous Sensory Meridian Response (ASMR) is a combination of sensory phenomena involving electrostatic-like tingling sensations, which emerge in response to certain stimuli. Despite the overwhelming popularity of ASMR in the social media, no open source databases on ASMR related stimuli are yet available, which makes this phenomenon mostly inaccessible to the research community; thus, almost completely unexplored. In this regard, we present the ASMR Whispered-Speech (ASMR-WS) database.MethodsASWR-WS is a novel database on whispered speech, specifically tailored to promote the development of ASMR-like unvoiced Language Identification (unvoiced-LID) systems. The ASMR-WS database encompasses 38 videos-for a total duration of 10 h and 36 min-and includes seven target languages (Chinese, English, French, Italian, Japanese, Korean, and Spanish). Along with the database, we present baseline results for unvoiced-LID on the ASMR-WS database.ResultsOur best results on the seven-class problem, based on segments of 2s length, and on a CNN classifier and MFCC acoustic features, achieved 85.74% of unweighted average recall and 90.83% of accuracy.DiscussionFor future work, we would like to focus more deeply on the duration of speech samples, as we see varied results with the combinations applied herein. To enable further research in this area, the ASMR-WS database, as well as the partitioning considered in the presented baseline, is made accessible to the research community.

Published in Frontiers in Neuroscience

ISSN: 1662-4548 (Print); 1662-453X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: http://www.frontiersin.org/neuroscience

About the journal

Abstract

Keywords