Scientific Reports (Jan 2025)

Artificial intelligence empowered voice generation for amyotrophic lateral sclerosis patients

  • Stefano Regondi,
  • Giordana Donvito,
  • Emanuele Frontoni,
  • Milutin Kostovic,
  • Fabio Minazzi,
  • Sébastien Bratières,
  • Massimiliano Filosto,
  • Raffaele Pugliese

DOI
https://doi.org/10.1038/s41598-024-84728-y
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease that can result in a progressive loss of speech due to bulbar dysfunction, which can have significant negative impact on the patient’s mental well-being. Alternative Augmentative Communication (AAC) strategies based on synthetic voices have been shown to assist patients in maintaining communication and improving their Quality of Life (QoL). However, such synthetic voices are often perceived as impersonal and fail to capture the unique voice and identity of the patient. To tackle this issue, combining voice banking (VB) and artificial intelligence (AI) has emerged as a more natural communication strategy, enabling individuals to preserve their voice for use with AAC devices as needed. This involves recording speech samples to generate a synthetic voice closely resembling the individual’s own. Despite the increasing interest in VB, there’s a lack of clear strategies for its effective implementation in rapidly progressing diseases like ALS. Additionally, the perceptual quality of VB on patients with preserved speech, especially when offered early in the disease, remains poorly understood. In light of these challenges, this study aims to assess the effectiveness and the perceptual impact of AI-generated voices on ALS patients with preserved speech, utilizing a personalized voice synthesis system based on machine learning. The AI-generated patient-specific voice is achieved through voice recording, followed by fine-tuning using a Generative Adversarial Network for Efficient and High Fidelity Speech Synthesis (HiFi-GAN), resulting in a model capable of producing speech highly similar to the patient’s own voice, with exceptional expressive and audio quality. By addressing these aspects, this study intends to offer valuable insights into the potential benefits and challenges of combining VB with AI voices to enhance communication support for ALS patients.

Keywords