A single latent channel is sufficient for biomedical glottis segmentation

Andreas M. Kist; Katharina Breininger; Marion Dörrich; Stephan Dürr; Anne Schützenberger; Marion Semmler

doi:10.1038/s41598-022-17764-1

Scientific Reports (Aug 2022)

A single latent channel is sufficient for biomedical glottis segmentation

Andreas M. Kist,
Katharina Breininger,
Marion Dörrich,
Stephan Dürr,
Anne Schützenberger,
Marion Semmler

Affiliations

Andreas M. Kist: Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg
Katharina Breininger: Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg
Marion Dörrich: Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg
Stephan Dürr: Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg
Anne Schützenberger: Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg
Marion Semmler: Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg

DOI: https://doi.org/10.1038/s41598-022-17764-1
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Glottis segmentation is a crucial step to quantify endoscopic footage in laryngeal high-speed videoendoscopy. Recent advances in deep neural networks for glottis segmentation allow for a fully automatic workflow. However, exact knowledge of integral parts of these deep segmentation networks remains unknown, and understanding the inner workings is crucial for acceptance in clinical practice. Here, we show that a single latent channel as a bottleneck layer is sufficient for glottal area segmentation using systematic ablations. We further demonstrate that the latent space is an abstraction of the glottal area segmentation relying on three spatially defined pixel subtypes allowing for a transparent interpretation. We further provide evidence that the latent space is highly correlated with the glottal area waveform, can be encoded with four bits, and decoded using lean decoders while maintaining a high reconstruction accuracy. Our findings suggest that glottis segmentation is a task that can be highly optimized to gain very efficient and explainable deep neural networks, important for application in the clinic. In the future, we believe that online deep learning-assisted monitoring is a game-changer in laryngeal examinations.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal