Acoustic Analysis and Prediction of Type 2 Diabetes Mellitus Using Smartphone-Recorded Voice Segments

Jaycee M. Kaufman, MSc; Anirudh Thommandram, MASc; Yan Fossat, MSc

Mayo Clinic Proceedings: Digital Health (Dec 2023)

Acoustic Analysis and Prediction of Type 2 Diabetes Mellitus Using Smartphone-Recorded Voice Segments

Jaycee M. Kaufman, MSc,
Anirudh Thommandram, MASc,
Yan Fossat, MSc

Affiliations

Jaycee M. Kaufman, MSc: Klick Applied Sciences, Klick Inc, Toronto, Canada; Correspondence: Jaycee M. Kaufman, MSc, 175 Bloor St E Suite 300, Toronto, ON, Canada M4W 3R8.
Anirudh Thommandram, MASc: Klick Applied Sciences, Klick Inc, Toronto, Canada
Yan Fossat, MSc: Klick Applied Sciences, Klick Inc, Toronto, Canada; Faculty of Science, Ontario Tech University, Oshawa, Canada

Journal volume & issue: Vol. 1, no. 4
pp. 534 – 544

Abstract

Read online

Objective: To investigate the potential of voice analysis as a prescreening or monitoring tool for type 2 diabetes mellitus (T2DM) by examining the differences in voice recordings between nondiabetic and T2DM individuals. Patients and Methods: Total 267 participants diagnosed as nondiabetic (79 women and 113 men) or T2DM (18 women and 57 men) on the basis of American Diabetes Association guidelines were recruited in India between August 30, 2021 and June 30, 2022. Using a smartphone application, participants recorded a fixed phrase up to 6 times daily for 2 weeks, resulting in 18,465 recordings. Fourteen acoustic features were extracted from each recording to analyze differences between nondiabetic and T2DM individuals and create a prediction methodology for T2DM status. Results: Significant differences were found between voice recordings of nondiabetic and T2DM men and women, both in the entire dataset and in an age-matched and body mass index (BMI [calculated as the weight in kilograms divided by the height in meters squared])-matched sample. The highest predictive accuracy was achieved by pitch (P<.0001), pitch SD (P<.0001), and relative average pertubation jitter (P=.02) for women, and intensity (P<.0001) and 11-point amplitude perturbation quotient shimmer (apq11, P<0.0001) for men. Incorporating these features with age and BMI, the optimal prediction models achieved accuracies of 0.75±0.22 for women and 0.70±0.10 for men through 5-fold cross-validation in the age-matched and BMI-matched sample. Conclusion: Overall, vocal changes occur in individuals with T2DM compared with those without T2DM. Voice analysis shows potential as a prescreening or monitoring tool for T2DM, particularly when combined with other risk factors associated with the condition. Trial Registration: clinicaltrials.gov Identifier: CTRI/2021/08/035957

Published in Mayo Clinic Proceedings: Digital Health

ISSN: 2949-7612 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.sciencedirect.com/journal/mayo-clinic-proceedings-digital-health

About the journal