AI (Nov 2021)

User Identity Protection in Automatic Emotion Recognition through Disguised Speech

  • Fasih Haider,
  • Pierre Albert,
  • Saturnino Luz

DOI
https://doi.org/10.3390/ai2040038
Journal volume & issue
Vol. 2, no. 4
pp. 636 – 649

Abstract

Read online

Ambient Assisted Living (AAL) technologies are being developed which could assist elderly people to live healthy and active lives. These technologies have been used to monitor people’s daily exercises, consumption of calories and sleep patterns, and to provide coaching interventions to foster positive behaviour. Speech and audio processing can be used to complement such AAL technologies to inform interventions for healthy ageing by analyzing speech data captured in the user’s home. However, collection of data in home settings presents challenges. One of the most pressing challenges concerns how to manage privacy and data protection. To address this issue, we proposed a low cost system for recording disguised speech signals which can protect user identity by using pitch shifting. The disguised speech so recorded can then be used for training machine learning models for affective behaviour monitoring. Affective behaviour could provide an indicator of the onset of mental health issues such as depression and cognitive impairment, and help develop clinical tools for automatically detecting and monitoring disease progression. In this article, acoustic features extracted from the non-disguised and disguised speech are evaluated in an affect recognition task using six different machine learning classification methods. The results of transfer learning from non-disguised to disguised speech are also demonstrated. We have identified sets of acoustic features which are not affected by the pitch shifting algorithm and also evaluated them in affect recognition. We found that, while the non-disguised speech signal gives the best Unweighted Average Recall (UAR) of 80.01%, the disguised speech signal only causes a slight degradation of performance, reaching 76.29%. The transfer learning from non-disguised to disguised speech results in a reduction of UAR (65.13%). However, feature selection improves the UAR (68.32%). This approach forms part of a large project which includes health and wellbeing monitoring and coaching.

Keywords