Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning

Serhat Hizlisoy; Recep Sinan Arslan; Emel Çolakoğlu

doi:10.1186/s13636-024-00336-8

EURASIP Journal on Audio, Speech, and Music Processing (Feb 2024)

Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning

Serhat Hizlisoy,
Recep Sinan Arslan,
Emel Çolakoğlu

Affiliations

Serhat Hizlisoy: Department of Computer Engineering, Faculty of Engineering, Architecture and Design, Kayseri University
Recep Sinan Arslan: Department of Computer Engineering, Faculty of Engineering, Architecture and Design, Kayseri University
Emel Çolakoğlu: Calculated Sciences and Engineering, Graduate School of Education, Kayseri University

DOI: https://doi.org/10.1186/s13636-024-00336-8
Journal volume & issue: Vol. 2024, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Analyzing songs is a problem that is being investigated to aid various operations on music access platforms. At the beginning of these problems is the identification of the person who sings the song. In this study, a singer identification application, which consists of Turkish singers and works for the Turkish language, is proposed in order to find a solution to this problem. Mel-spectrogram and octave-based spectral contrast values are extracted from the songs, and these values are combined into a hybrid feature vector. Thus, problem-specific situations such as determining the differences in the voices of the singers and reducing the effects of the year and album differences on the result are discussed. As a result of the tests and systematic evaluations, it has been shown that a certain level of success has been achieved in the determination of the singer who sings the song, and that the song is in a stable structure against the changes in the singing style and song structure. The results were analyzed in a database of 9 singers and 180 songs. An accuracy value of 89.4% was obtained using the reduction of the feature vector by PCA, the normalization of the data, and the Extra Trees classifier. Precision, recall and f-score values were 89.9%, 89.4% and 89.5%, respectively.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords