Spoken Language Identification in Unseen Target Domain Using Centroid Similarity Loss With Adaptive Gradient Blending

Muralikrishna H; Sujeet Kumar; Dileep Aroor Dinesh; Veena Thenkanidiyoor

doi:10.1109/ACCESS.2024.3422380

IEEE Access (Jan 2024)

Spoken Language Identification in Unseen Target Domain Using Centroid Similarity Loss With Adaptive Gradient Blending

Muralikrishna H,
Sujeet Kumar,
Dileep Aroor Dinesh,
Veena Thenkanidiyoor

Affiliations

Muralikrishna H: ORCiD; Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
Sujeet Kumar: ORCiD; MANAS Laboratory, Indian Institute of Technology Mandi, Suran, Himachal Pradesh, India
Dileep Aroor Dinesh: Department of Computer Science and Engineering, Indian Institute of Technology Dharwad, Dharwad, Karnataka, India
Veena Thenkanidiyoor: Department of Computer Science and Engineering, National Institute of Technology Goa, Ponda, India

DOI: https://doi.org/10.1109/ACCESS.2024.3422380
Journal volume & issue: Vol. 12
pp. 95959 – 95971

Abstract

Read online

In this paper, we propose a centroid similarity loss (CSL) with adaptive gradient blending (AGB) (denoted as CSL-with-AGB) strategy to improve the generalization of a spoken language identification (LID) system to unseen target domain conditions. Unlike most of the existing approaches, the proposed CSL-with-AGB can improve the generalization even when the training dataset lacks domain-diversity. Specifically, in this approach, the LID network first analyses the input at two different temporal resolutions using a set of two embedding extractors, which allow them to generalize better by encoding complementary contents. We then propose to use the CSL to further improve the generalization of the network by encouraging the embedding extractors to learn discriminative and domain-invariant embeddings. However, application of auxiliary loss like CSL can sometimes force the two embedding extractors of the network to learn in an unbalanced way, diminishing their ability to encode complementary contents in the input. To overcome this issue, we propose to include the AGB strategy with the CSL. With the help of two auxiliary classifiers attached to the two embedding extractors, the AGB monitors and guides them to have a balanced learning, leading to enhanced performance in unseen target domain conditions.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords