Parallel Stacked Hourglass Network for Music Source Separation

Bhuwan Bhattarai; Yagya Raj Pandeya; Joonwhoan Lee

doi:10.1109/ACCESS.2020.3037773

IEEE Access (Jan 2020)

Parallel Stacked Hourglass Network for Music Source Separation

Bhuwan Bhattarai,
Yagya Raj Pandeya,
Joonwhoan Lee

Affiliations

Bhuwan Bhattarai: ORCiD; Division of Computer Science and Engineering, Jeonbuk National University, Jeonju, South Korea
Yagya Raj Pandeya: ORCiD; Division of Computer Science and Engineering, Jeonbuk National University, Jeonju, South Korea
Joonwhoan Lee: Division of Computer Science and Engineering, Jeonbuk National University, Jeonju, South Korea

DOI: https://doi.org/10.1109/ACCESS.2020.3037773
Journal volume & issue: Vol. 8
pp. 206016 – 206027

Abstract

Read online

Music source separation is one of the old and challenging problems in music information retrieval society. Improvements in deep learning lead to big progress in decomposing music into its constitutive components with a variety of music. This research uses three types of datasets for source separation namely; Korean traditional music Pansori dataset, MIR-1K dataset, and DSD100 dataset. DSD100 dataset includes multiple sound sources and other two datasets has relatively smaller number of sound sources. We synthetically constructed a novel dataset for Pansori music and trained a novel parallel stacked hourglass network (PSHN) with multiple band spectrograms. In comparison with past study, proposed architecture performs the best results in real-world test samples of Pansori music of any length. The network performance was also tested for the public DSD100 and MIR-1K dataset for strength comparison in multiple source data and found comparable quantitative and qualitative outcomes. System performance is evaluated using median value of signal-to-distortion ratio (SDR), source-to-interference ratio (SIR), and source-to-artifacts ratio (SAR) measured in decibels (dB) and visual comparison of prediction results with ground truth. We report better performance in the Pansori dataset and MIR-1K dataset and perform detailed ablation studies based on architecture variation. The proposed system is better applicable for separating the music source with voices and single or fewer musical instruments.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords