KritiSamhita: A machine learning dataset of South Indian classical music audio clips with tonic classification

Samhita Konduri; Kriti V. Pendyala; Vishnu S. Pendyala

Data in Brief (Aug 2024)

KritiSamhita: A machine learning dataset of South Indian classical music audio clips with tonic classification

Samhita Konduri,
Kriti V. Pendyala,
Vishnu S. Pendyala

Affiliations

Samhita Konduri: Palo Alto High School, 50 Embarcadero Road, Palo Alto, CA 94301, USA
Kriti V. Pendyala: University Preparatory Academy, 2315 Canoas Garden Ave, San Jose, CA 95125, USA
Vishnu S. Pendyala: Department of Applied Data Science, San Jose State University, One Washington Square, San Jose, CA 95192, USA; Corresponding author.

Journal volume & issue: Vol. 55
p. 110730

Abstract

Read online

There are currently a limited number of Indian classical music datasets, especially those large enough and with useful annotations, particularly the subtler ones, such as the tonic, for training classification or prediction models. The dataset described in this paper is created with useful tonic annotations, to fill this gap. The tonic pitch, or base pitch, plays an important role in music, so much so that it is sometimes called the keynote. The vocalists and the accompanying instrumental ensemble are fine-tuned to this keynote to render the composition. The first and second authors of this paper, who are vocalists themselves, recorded songs in four different tonics: F#, G, G#, and A. Using the Python library pydub, each 3+ minute song was segmented into 20-second snippets, including the remainder as a separate snippet. The raw audio snippet data is available in folders separated by tonic, and a directory contains each snippet's file path and tonic. This dataset can be reused for tonic classification work in the future, as well as for training other automated systems targeting higher-level attributes of ICM, such as melodic framework, as a tonic can be the basis for them all.

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal

Abstract

Keywords