Heliyon (Mar 2023)
3D VOSNet: Segmentation of endoscopic images of the larynx with subsequent generation of indicators
Abstract
Video laryngoscope is available for visualizing the motion of vocal cords and aid in the assessment of analyzing the larynx-related lesion preliminarily. Laryngeal Electromyography (EMG) needs to be performed to diagnose the factors of vocal cord paralysis, which may cause patient feeling unwell. Thus, the problem is the lack of credible larynx indicators to evaluate larynx-related diseases in the department of otolaryngology. Therefore, this paper aims to propose a 3D VOSNet model, which has the characteristics of sequence segmentation to extract the time-series features in the video laryngoscope. The 3D VOSNet model can keep the time-series features of three images before and after of the specific image to achieve translation and occlusion invariance, which explicitly signifies that our model can segment and classify each item in the video of laryngoscopy not affected by extrinsic causes such as shaking or occlusion during laryngoscope. Numerical results revealed that the testing accuracy rates of the glottal, right vocal cord, and the left vocal cord are 89.91%, 94.63%, and 93.48%, respectively. Our proposed model can segment glottal and vocal cords from the sequence of laryngoscopy. Finally, using the proposed algorithm computes six larynx indicators, which are the area of the glottal, area of vocal cords, length of vocal cords, deviation of length of vocal cords, and symmetry of the vocal cords. In order to assist otolaryngologists in staying credible and objective when making decisions without any doubt during diagnosis and also explaining the clinical symptoms of the larynx such as vocal cord paralysis to patients after diagnosis, our proposed algorithm provides otolaryngologists with explainable indicators (X-indicators).