Machine-Learning-Based Closed-Set Text-Independent Speaker Identification Using Speech Recorded During 25 Hours of Prolonged Wakefulness

Youngsun Kong; Hugo F. Posada-Quintero; Matthew S. Daley; Jeffrey Bolkhovsky; Ki H. Chon

doi:10.1109/ACCESS.2021.3094175

IEEE Access (Jan 2021)

Machine-Learning-Based Closed-Set Text-Independent Speaker Identification Using Speech Recorded During 25 Hours of Prolonged Wakefulness

Youngsun Kong,
Hugo F. Posada-Quintero,
Matthew S. Daley,
Jeffrey Bolkhovsky,
Ki H. Chon

Affiliations

Youngsun Kong: Department of Biomedical Engineering, University of Connecticut, Storrs, CT, USA
Hugo F. Posada-Quintero: Department of Biomedical Engineering, University of Connecticut, Storrs, CT, USA
Matthew S. Daley: Naval Submarine Medical Research Laboratory, Groton, CT, USA
Jeffrey Bolkhovsky: Naval Submarine Medical Research Laboratory, Groton, CT, USA
Ki H. Chon: ORCiD; Department of Biomedical Engineering, University of Connecticut, Storrs, CT, USA

DOI: https://doi.org/10.1109/ACCESS.2021.3094175
Journal volume & issue: Vol. 9
pp. 96890 – 96897

Abstract

Read online

We performed machine learning for text-independent speaker identification using speech recorded during the day, evening, and night, from subjects undergoing 25 hours of prolonged wakefulness. Subjects answered casual questions lasting approximately 3 minutes and described pictures presented to them for 0.5 minutes. We extracted 12,515 vocal features using OpenSmile software. For generalization of the training scheme, we segmented the 20 subjects into training and testing sets (10 subjects for each) and repeated testing four times with different subsets. Specifically, we used one set of 10 subjects to find the best feature-sets and the optimal machine-learning method, and the other set of 10 subjects was used to test the trained model. With trained machine-learning models using three speech sessions recorded throughout the day for speaker identification, we obtained 95% and 98.8% for balanced accuracies for daytime and evening speech, respectively, but 84.2% for nighttime-testing speech. With training data from all times of day—daytime, evening, and nighttime—we obtained 97.5%, 98.8%, and 98.1% for balanced accuracies for test data from daytime, evening, and nighttime speech, respectively; the overall accuracy was 98.1%. Prolonged wakefulness deteriorates the performance of machine-learning based speaker identification. This work suggests that machine-learning based speaker identification should be trained using speech data from both daytime and nighttime speech sessions for better overall accuracy. Machine learning can potentially be used for identifying a speaker’s voice even when it is affected by tiredness and fatigue which are frequently encountered in scenarios such as the emergency rooms and long-duration repetitive task operations.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords