IEEE Access (Jan 2023)

Vocal92: Audio Dataset With a Cappella Solo Singing and Speech

  • Zhuo Deng,
  • Ruohua Zhou

DOI
https://doi.org/10.1109/ACCESS.2023.3253207
Journal volume & issue
Vol. 11
pp. 140958 – 140966

Abstract

Read online

Singer recognition plays a vital role in music information retrieval systems. Most songs in the singer recognition system are mixed audios of music and voice. In contrast, there is a lack of labeled a cappella solo singing data suitable for singer recognition. Text-independent singer recognition systems successfully encode audio features such as voice pitch, intensity, and timbre to achieve good performance. Most such systems are trained and evaluated using data from music with accompaniment. However, due to the influence of background music, the performance of the singer recognition model was limited. Contrarily, a powerful singer identification system can be trained and evaluated using a cappella solo singing voice with a clear and broad range of qualities. There needs to be labeled clear singing data suitable for singer recognition research. To address this issue, we present Vocal92, a multivariate a cappella solo singing and speech audio dataset spanning around 146.73 hours sourced from volunteers. Furthermore, we use three models to construct the singer recognition baseline system. In experiments, the singer recognition model developed by a cappella solo singing data performs well in both single-mode and cross-modal verification data, significantly improving related works. The dataset is accessible to everyone at https://pan.baidu.com/s/1Pn62DHfal2OOZ_5JqgGBdQ with jnz5 as the validation code. For non-commercial use, the dataset is available free of charge at the IEEE DataPort (https://ieee-dataport.org/documents/vocal92-multimodal-audio-dataset-cappella-solo-singing-and-speech).

Keywords