Vocal92: Audio Dataset With a Cappella Solo Singing and Speech

Zhuo Deng; Ruohua Zhou

doi:10.1109/ACCESS.2023.3253207

IEEE Access (Jan 2023)

Vocal92: Audio Dataset With a Cappella Solo Singing and Speech

Zhuo Deng,
Ruohua Zhou

Affiliations

Zhuo Deng: ORCiD; Department of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
Ruohua Zhou: ORCiD; Department of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China

DOI: https://doi.org/10.1109/ACCESS.2023.3253207
Journal volume & issue: Vol. 11
pp. 140958 – 140966

Abstract

Read online

Singer recognition plays a vital role in music information retrieval systems. Most songs in the singer recognition system are mixed audios of music and voice. In contrast, there is a lack of labeled a cappella solo singing data suitable for singer recognition. Text-independent singer recognition systems successfully encode audio features such as voice pitch, intensity, and timbre to achieve good performance. Most such systems are trained and evaluated using data from music with accompaniment. However, due to the influence of background music, the performance of the singer recognition model was limited. Contrarily, a powerful singer identification system can be trained and evaluated using a cappella solo singing voice with a clear and broad range of qualities. There needs to be labeled clear singing data suitable for singer recognition research. To address this issue, we present Vocal92, a multivariate a cappella solo singing and speech audio dataset spanning around 146.73 hours sourced from volunteers. Furthermore, we use three models to construct the singer recognition baseline system. In experiments, the singer recognition model developed by a cappella solo singing data performs well in both single-mode and cross-modal verification data, significantly improving related works. The dataset is accessible to everyone at https://pan.baidu.com/s/1Pn62DHfal2OOZ_5JqgGBdQ with jnz5 as the validation code. For non-commercial use, the dataset is available free of charge at the IEEE DataPort (https://ieee-dataport.org/documents/vocal92-multimodal-audio-dataset-cappella-solo-singing-and-speech).

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords