Information (Apr 2021)

A Distributed Approach to Speaker Count Problem in an Open-Set Scenario by Clustering Pitch Features

  • Sakshi Pandey,
  • Amit Banerjee

DOI
https://doi.org/10.3390/info12040157
Journal volume & issue
Vol. 12, no. 4
p. 157

Abstract

Read online

Counting the number of speakers in an audio sample can lead to innovative applications, such as a real-time ranking system. Researchers have studied advanced machine learning approaches for solving the speaker count problem. However, these solutions are not efficient in real-time environments, as it requires pre-processing of a finite set of data samples. Another approach for solving the problem is via unsupervised learning or by using audio processing techniques. The research in this category is limited and does not consider the large-scale open set environment. In this paper, we propose a distributed clustering approach to address the speaker count problem. The separability of the speaker is computed using statistical pitch parameters. The proposed solution uses multiple microphones available in smartphones in a large geographical area to capture and extract statistical pitch features from the audio samples. These features are shared between the nodes to estimate the number of speakers in the neighborhood. One of the major challenges is to reduce the error count that arises due to the proximity of the users and multiple microphones. We evaluate the algorithm’s performance using real smartphones in a multi-group arrangement by capturing parallel conversations between the users in both indoor and outdoor scenarios. The average error count distance is 1.667 in a multi-group scenario. The average error count distances in indoor environments are 16% which is better than in the outdoor environment.

Keywords