Audio-visual modelling in a clinical setting

Jianbo Jiao; Mohammad Alsharid; Lior Drukker; Aris T. Papageorghiou; Andrew Zisserman; J. Alison Noble

doi:10.1038/s41598-024-66160-4

Scientific Reports (Jul 2024)

Audio-visual modelling in a clinical setting

Jianbo Jiao,
Mohammad Alsharid,
Lior Drukker,
Aris T. Papageorghiou,
Andrew Zisserman,
J. Alison Noble

Affiliations

Jianbo Jiao: Department of Engineering Science, University of Oxford
Mohammad Alsharid: Department of Engineering Science, University of Oxford
Lior Drukker: Nuffield Department of Women’s and Reproductive Health, University of Oxford
Aris T. Papageorghiou: Nuffield Department of Women’s and Reproductive Health, University of Oxford
Andrew Zisserman: Department of Engineering Science, University of Oxford
J. Alison Noble: Department of Engineering Science, University of Oxford

DOI: https://doi.org/10.1038/s41598-024-66160-4
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Auditory and visual signals are two primary perception modalities that are usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals—usually speech audio. In this study, we consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations that benefit various clinical tasks, without relying on dense supervisory annotations from human experts for the model training. A simple yet effective multi-modal self-supervised learning framework is presented for this purpose. The proposed approach is able to help find standard anatomical planes, predict the focusing position of sonographer’s eyes, and localise anatomical regions of interest during ultrasound imaging. Experimental analysis on a large-scale clinical multi-modal ultrasound video dataset show that the proposed novel representation learning method provides good transferable anatomical representations that boost the performance of automated downstream clinical tasks, even outperforming fully-supervised solutions. Being able to learn such medical representations in a self-supervised manner will contribute to several aspects including a better understanding of obstetric imaging, training new sonographers, more effective assistive tools for human experts, and enhancement of the clinical workflow.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal