Miscommunication handling in spoken dialog systems based on error-aware dialog state detection

Chung-Hsien Wu; Ming-Hsiang Su; Wei-Bin Liang

doi:10.1186/s13636-017-0107-3

EURASIP Journal on Audio, Speech, and Music Processing (May 2017)

Miscommunication handling in spoken dialog systems based on error-aware dialog state detection

Chung-Hsien Wu,
Ming-Hsiang Su,
Wei-Bin Liang

Affiliations

Chung-Hsien Wu: Department of Computer Science and Information Engineering, National Cheng Kung University
Ming-Hsiang Su: Department of Computer Science and Information Engineering, National Cheng Kung University
Wei-Bin Liang: Department of Computer Science and Information Engineering, National Cheng Kung University

DOI: https://doi.org/10.1186/s13636-017-0107-3
Journal volume & issue: Vol. 2017, no. 1
pp. 1 – 17

Abstract

Read online

Abstract With the exponential growth in computing power and progress in speech recognition technology, spoken dialog systems (SDSs) with which a user interacts through natural speech has been widely used in human-computer interaction. However, error-prone automatic speech recognition (ASR) results usually lead to inappropriate semantic interpretation so that miscommunication happens easily. This paper presents an approach to error-aware dialog state (DS) detection for robust miscommunication handling in an SDS. Non-understanding (Non-U) and misunderstanding (Mis-U) are considered for miscommunication handling in this study. First, understanding evidence (UE), derived from the recognition confidence, is adopted for Non-U detection followed by Non-U recovery. For Mis-U with the recognized sentence containing uncertain recognized words, the partial sentences obtained by removing potentially misrecognized words from the input utterance are organized, based on regular expressions, as a tree structure to tolerate the deletion or rejection of keywords resulting from misrecognition for Mis-U DS modeling. Latent semantic analysis is then employed to consider the verified words and their n-grams for DS detection, including Mis-U and predefined Base DSs. Historical information-based n-grams are employed to find the most likely DS for the SDS. Several experiments were performed with a dialog corpus for the restaurant reservation task. The experimental results show that the proposed approach achieved a promising performance for Non-U recovery and Mis-U repair as well as a satisfactory task success rate for the dialogs using the proposed method.

Published in EURASIP Journal on Audio, Speech, and Music Processing

ISSN: 1687-4722 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Science: Physics: Acoustics. Sound; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://asmp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords