MusicTalk: A Microservice Approach for Musical Instrument Recognition

Yi-Bing Lin; Chang-Chieh Cheng; Shih-Chuan Chiu

doi:10.1109/OJCS.2024.3476416

IEEE Open Journal of the Computer Society (Jan 2024)

MusicTalk: A Microservice Approach for Musical Instrument Recognition

Yi-Bing Lin,
Chang-Chieh Cheng,
Shih-Chuan Chiu

Affiliations

Yi-Bing Lin: ORCiD; Department of Biomedical Informatics, China Medical University, TaiChung City, Taiwan
Chang-Chieh Cheng: ORCiD; Information Technology Service Center, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Shih-Chuan Chiu: Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan

DOI: https://doi.org/10.1109/OJCS.2024.3476416
Journal volume & issue: Vol. 5
pp. 612 – 623

Abstract

Read online

Musical instrument recognition is the process of using machine learning or audio signal processing to identify and classify different musical instruments from an audio recording. This capability enables more precise analysis of musical pieces, aiding in tasks like transcription, music recommendation, and automated composition. The challenges include (1) recognition models not being accurate enough, (2) the need to retrain the entire model when a new instrument is added, and (3) differences in audio formats that prevent direct usage. To address these challenges, this article introduces MusicTalk, a microservice based musical instrument (MI) detection system, with several key contributions. Firstly, MusicTalk introduces a novel patchout mechanism named Brightness Characteristic Based Patchout for the ViT algorithm, which enhances MI detection accuracy compared to existing solutions. Secondly, MusicTalk integrates individual MI detectors as microservices, facilitating efficient interaction with other microservices. Thirdly, MusicTalk incorporates an audio shaper that unifies diverse music open datasets such as Audioset, Openmic-2018, MedleyDB, URMP, and INSTDB. By employing Grad-CAM analysis on Mel-Spectrograms, we elucidate the characteristics of the MI detection model. This analysis allows us to optimize ensemble combinations of ViT with patchout and CNNs within MusicTalk, resulting in high accuracy rates. For instance, the system achieves precision and recall rates of 96.17% and 95.77% respectively for violin detection, which are the highest among previous approaches. An additional advantage of MusicTalk lies in its microservice-driven visualization capabilities. By integrating MI detectors as microservices, MusicTalk enables seamless visualization of songs using animated avatars. In a case study featuring “Peter and the Wolf,” we demonstrate that improved MI detection accuracy enhances the visual storytelling impact of music. The overall F1-score improvement of MusicTalk over previous approaches for this song is up to 12%.

Published in IEEE Open Journal of the Computer Society

ISSN: 2644-1268 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=8782664

About the journal

Abstract

Keywords