Journal of Applied Engineering and Technological Science (Dec 2024)
Crowd Speaker Identification Methodologies, Datasets And Features: Review
Abstract
Crowded speech or Overlapping speech, occurs when multiple individuals speak simultaneously, which is a common occurrence in real-life scenarios such as telephone conversations, meetings, and debates. The critical task in these situations is to identify all the speakers rather than just one. Overlapping speech identification is a significant research domain with applications in human-machine interaction, criminal detection in airports, trains, and public spaces. Our work examines crowd speech identification from four perspectives, including the most commonly used datasets, the most effective features for crowed speaker identification, and the best methodologies employed, and the highest results gained. This study proposes a comprehensive survey of research on crowd speech identification, covering the period from 2016 to present. The survey includes ninety research papers, fifty of which, are empirical studies. Initially, statistical methods were predominant, but the current trend leans towards artificial intelligence, particularly deep learning, which has demonstrated considerable efficacy in this field.
Keywords