Optimising speech recognition using LLMs: an application in the surgical domain

Matasyoh Nevin M.; Zeineldin Ramy A.; Mathis-Ullrich Franziska

doi:10.1515/cdbme-2024-0112

Current Directions in Biomedical Engineering (Sep 2024)

Optimising speech recognition using LLMs: an application in the surgical domain

Matasyoh Nevin M.,
Zeineldin Ramy A.,
Mathis-Ullrich Franziska

Affiliations

Matasyoh Nevin M.: Friedrich-Alexander-University Erlangen-Nurnberg, Department of Artificial Intelligence in Biomedical Engineering, Werner-von-Siemens-Strasse 61, 91052Erlangen, Germany
Zeineldin Ramy A.: Friedrich-Alexander-University Erlangen-Nurnberg, Department of Artificial Intelligence in Biomedical Engineering,Erlangen, Germany
Mathis-Ullrich Franziska: Friedrich-Alexander-University Erlangen-Nurnberg, Department of Artificial Intelligence in Biomedical Engineering,Erlangen, Germany

DOI: https://doi.org/10.1515/cdbme-2024-0112
Journal volume & issue: Vol. 10, no. 1
pp. 45 – 48

Abstract

Read online

Automatic speech recognition (ASR), powered by deep learning techniques, is crucial for enhancing humancomputer interaction. However, its full potential remains unrealized in diverse real-world environments, with challenges such as dialects, accents, and domain-specific jargon, particularly in fields like surgery, persisting. Here, we investigate the potential of large language models (LLMs) as error correction modules for ASR.We leverage Whisper-medium or ASRLibriSpeech for speech recognition, and GPT-3.5 or GPT-4 for error correction.We employ various prompting methods, from zero-shot to few-shot with leading questions and sample medical terms to correct wrong transcriptions. Results, measured by word error rate (WER), reveal Whisper’s superior transcription accuracy over ASR-LibriSpeech, with a WER of 11.93% compared to 32.09%. GPT-3.5, with the few-shot with medical terms prompting method, further enhances performance, achieving a 64.29% and 37.83% WER-reduction for Whisper and ASR-LibriSpeech, respectively. Additionally, Whisper exhibits faster execution speed. Substituting GPT-3.5 with GPT- 4 further enhances transcription accuracy. Despite some few challenges, our approach demonstrates the potential of leveraging domain-specific knowledge through LLM prompting for accurate transcription, particularly in sophisticated domains like surgery.

Published in Current Directions in Biomedical Engineering

ISSN: 2364-5504 (Online)
Publisher: De Gruyter
Country of publisher: Germany
LCC subjects: Medicine
Website: https://www.degruyter.com/view/j/cdbme

About the journal

Abstract

Keywords