Dementia Speech Dataset Creation and Analysis in Indic Languages&#x2014;A Pilot Study

Susmitha Vekkot; Nagulapati Naga Venkata Sai Prakash; Thirupati Sai Eswar Reddy; Satwik Reddy Sripathi; S. Lalitha; Deepa Gupta; Mohammed Zakariah; Yousef Ajami Alotaibi

doi:10.1109/ACCESS.2023.3334790

IEEE Access (Jan 2023)

Dementia Speech Dataset Creation and Analysis in Indic Languages—A Pilot Study

Susmitha Vekkot,
Nagulapati Naga Venkata Sai Prakash,
Thirupati Sai Eswar Reddy,
Satwik Reddy Sripathi,
S. Lalitha,
Deepa Gupta,
Mohammed Zakariah,
Yousef Ajami Alotaibi

Affiliations

Susmitha Vekkot: ORCiD; Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India
Nagulapati Naga Venkata Sai Prakash: Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India
Thirupati Sai Eswar Reddy: Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India
Satwik Reddy Sripathi: Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India
S. Lalitha: Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India
Deepa Gupta: ORCiD; Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India
Mohammed Zakariah: ORCiD; Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Yousef Ajami Alotaibi: ORCiD; Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2023.3334790
Journal volume & issue: Vol. 11
pp. 130697 – 130718

Abstract

Read online

The paper describes the creation, analysis and validation of a multilingual Dementia Speech dataset for Indic languages. Three popular Indian languages viz. Telugu, Tamil and Hindi are considered for the pilot study. Dementia and associated Alzheimers disease affect a large section of Asian population. Though there are promising studies in dementia detection focussed on Western ethnicity, the absence of a clinical dementia dataset for Indian languages forms the primary motivation for this study. This pilot study aims to overcome the challenges associated with data collection and validation in a clinical setting and deal with situations wherein clinical data is not readily available. The Indic dementia dataset is an enacted non-clinical dataset created from the manual translations of the benchmark clinical English DementiaBank dataset. The dataset created is validated using features extracted from the benchmark. The feature evaluation revealed a similarity of 92.6% for silences, 92% for mean pitch (Hz), 84.7% for jitter and 90.3% for shimmer. Subjective evaluation was also conducted based on clarity and similarity of utterances with DementiaBank data. An average MOS of 3.9 for clarity of speech and 3.76 for similarity with respect to DementiaBank was obtained across all three languages. A baseline classification using state-of-art deep network architecture gave a maximum of 78% accuracy in dementia detection using the Indic dementia dataset. The pilot experimentation in this work gives promising insights into the development of a multilingual dataset for analysis of clinical speech patterns in early dementia in the Indian population.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords