Group Conversations in Noisy Environments (GiN) &#x2013; Multimedia Recordings for Location-Aware Speech Enhancement

Emilie d'Olne; Alastair H. Moore; Patrick A. Naylor; Jacob Donley; Vladimir Tourbabin; Thomas Lunner

doi:10.1109/OJSP.2023.3344379

IEEE Open Journal of Signal Processing (Jan 2024)

Group Conversations in Noisy Environments (GiN) – Multimedia Recordings for Location-Aware Speech Enhancement

Emilie d'Olne,
Alastair H. Moore,
Patrick A. Naylor,
Jacob Donley,
Vladimir Tourbabin,
Thomas Lunner

Affiliations

Emilie d'Olne: ORCiD; Electrical and Electronic Engineering, Imperial College London, London, U.K.
Alastair H. Moore: ORCiD; Electrical and Electronic Engineering, Imperial College London, London, U.K.
Patrick A. Naylor: ORCiD; Electrical and Electronic Engineering, Imperial College London, London, U.K.
Jacob Donley: ORCiD; Meta Reality Labs Research, Redmond, WA, USA
Vladimir Tourbabin: ORCiD; Meta Reality Labs Research, Redmond, WA, USA
Thomas Lunner: Meta Reality Labs Research, Redmond, WA, USA

DOI: https://doi.org/10.1109/OJSP.2023.3344379
Journal volume & issue: Vol. 5
pp. 374 – 382

Abstract

Read online

Recent years have seen a growing interest in the use of smart glasses mounted with microphones to solve the cocktail party problem using beamforming techniques or machine learning. Many such approaches could bring substantial advances in hearing aid or Augmented Reality (AR) research. To validate these methods, the EasyCom [Donley et al., 2021] dataset introduced high-quality multi-modal recordings of conversations in noise, including egocentric multi-channel microphone array audio, speech source pose, and headset microphone audio. While providing comprehensive data, EasyCom lacks diversity in the acoustic environments considered and the degree of overlapping speech in conversations. This work therefore presents the Group in Noise (GiN) dataset of over 2 hours of group conversations in noisy environments recorded using binaural microphones and a pair of glasses mounted with 5 microphones. The recordings took place in 3 rooms and contain 6 seated participants as well as a standing facilitator. The data also include close-talking microphone audio and head-pose data for each speaker, an audio channel from a fixed reference microphone, and automatically annotated speaker activity information. A baseline method is used to demonstrate the use of the data for speech enhancement. The dataset is publicly available in d'Olne et al. [2023].

Published in IEEE Open Journal of Signal Processing

ISSN: 2644-1322 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782710

About the journal

Abstract

Keywords