Web Radio Automation for Audio Stream Management in the Era of Big Data

Nikolaos Vryzas; Nikolaos Tsipas; Charalampos Dimoulas

doi:10.3390/info11040205

Information (Apr 2020)

Web Radio Automation for Audio Stream Management in the Era of Big Data

Nikolaos Vryzas,
Nikolaos Tsipas,
Charalampos Dimoulas

Affiliations

Nikolaos Vryzas: Multidisciplinary Media & Mediated Communication Research Group (M3C), Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece
Nikolaos Tsipas: Multidisciplinary Media & Mediated Communication Research Group (M3C), Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece
Charalampos Dimoulas: Multidisciplinary Media & Mediated Communication Research Group (M3C), Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece

DOI: https://doi.org/10.3390/info11040205
Journal volume & issue: Vol. 11, no. 4
p. 205

Abstract

Read online

Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape of big unstructured audio data available online. In this paper, a framework for knowledge extraction is introduced, to improve discoverability and enrichment of the provided content. A web application for live radio production and streaming is developed. The application offers typical live mixing and broadcasting functionality, while performing real-time annotation as a background process by logging user operation events. For the needs of a typical radio station, a supervised speaker classification model is trained for the recognition of 24 known speakers. The model is based on a convolutional neural network (CNN) architecture. Since not all speakers are known in radio shows, a CNN-based speaker diarization method is also proposed. The trained model is used for the extraction of fixed-size identity d-vectors. Several clustering algorithms are evaluated, having the d-vectors as input. The supervised speaker recognition model for 24 speakers scores an accuracy of 88.34%, while unsupervised speaker diarization scores a maximum accuracy of 87.22%, as tested on an audio file with speech segments from three unknown speakers. The results are considered encouraging regarding the applicability of the proposed methodology.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords