Sensors (Mar 2022)
Context-Aware Automatic Sign Language Video Transcription in Psychiatric Interviews
Abstract
Sign language (SL) translation constitutes an extremely challenging task when undertaken in a general unconstrained setup, especially in the absence of vast training datasets that enable the use of end-to-end solutions employing deep architectures. In such cases, the ability to incorporate prior information can yield a significant improvement in the translation results by greatly restricting the search space of the potential solutions. In this work, we treat the translation problem in the limited confines of psychiatric interviews involving doctor-patient diagnostic sessions for deaf and hard of hearing patients with mental health problems.To overcome the lack of extensive training data and be able to improve the obtained translation performance, we follow a domain-specific approach combining data-driven feature extraction with the incorporation of prior information drawn from the available domain knowledge. This knowledge enables us to model the context of the interviews by using an appropriately defined hierarchical ontology for the contained dialogue, allowing for the classification of the current state of the interview, based on the doctor’s question. Utilizing this information, video transcription is treated as a sentence retrieval problem. The goal is predicting the patient’s sentence that has been signed in the SL video based on the available pool of possible responses, given the context of the current exchange. Our experimental evaluation using simulated scenarios of psychiatric interviews demonstrate the significant gains of incorporating context awareness in the system’s decisions.
Keywords