User Identity Protection in Automatic Emotion Recognition through Disguised Speech

Fasih Haider; Pierre Albert; Saturnino Luz

doi:10.3390/ai2040038

AI (Nov 2021)

User Identity Protection in Automatic Emotion Recognition through Disguised Speech

Fasih Haider,
Pierre Albert,
Saturnino Luz

Affiliations

Fasih Haider: Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh EH16 4UX, UK
Pierre Albert: Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh EH16 4UX, UK
Saturnino Luz: Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh EH16 4UX, UK

DOI: https://doi.org/10.3390/ai2040038
Journal volume & issue: Vol. 2, no. 4
pp. 636 – 649

Abstract

Read online

Ambient Assisted Living (AAL) technologies are being developed which could assist elderly people to live healthy and active lives. These technologies have been used to monitor people’s daily exercises, consumption of calories and sleep patterns, and to provide coaching interventions to foster positive behaviour. Speech and audio processing can be used to complement such AAL technologies to inform interventions for healthy ageing by analyzing speech data captured in the user’s home. However, collection of data in home settings presents challenges. One of the most pressing challenges concerns how to manage privacy and data protection. To address this issue, we proposed a low cost system for recording disguised speech signals which can protect user identity by using pitch shifting. The disguised speech so recorded can then be used for training machine learning models for affective behaviour monitoring. Affective behaviour could provide an indicator of the onset of mental health issues such as depression and cognitive impairment, and help develop clinical tools for automatically detecting and monitoring disease progression. In this article, acoustic features extracted from the non-disguised and disguised speech are evaluated in an affect recognition task using six different machine learning classification methods. The results of transfer learning from non-disguised to disguised speech are also demonstrated. We have identified sets of acoustic features which are not affected by the pitch shifting algorithm and also evaluated them in affect recognition. We found that, while the non-disguised speech signal gives the best Unweighted Average Recall (UAR) of 80.01%, the disguised speech signal only causes a slight degradation of performance, reaching 76.29%. The transfer learning from non-disguised to disguised speech results in a reduction of UAR (65.13%). However, feature selection improves the UAR (68.32%). This approach forms part of a large project which includes health and wellbeing monitoring and coaching.

Published in AI

ISSN: 2673-2688 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/ai

About the journal

Abstract

Keywords