Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network

Jeong Hoon Lee; Chang Yoon Lee; Jin Seop Eom; Mingun Pak; Hee Seok Jeong; Hee Young Son

doi:10.3390/s22176387

Sensors (Aug 2022)

Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network

Jeong Hoon Lee,
Chang Yoon Lee,
Jin Seop Eom,
Mingun Pak,
Hee Seok Jeong,
Hee Young Son

Affiliations

Jeong Hoon Lee: Division of Biomedical Informatics, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul 110799, Korea
Chang Yoon Lee: Department of Otolaryngology, Thyroid/Head & Neck Cancer Center, The Dongnam Institute of Radiological & Medical Sciences (DIRAMS), Busan 46033, Korea
Jin Seop Eom: Samsung Electronics Co., Ltd., 130 Samsung-ro, Yeongtong-gu, Suwon-si 16677, Korea
Mingun Pak: Microsoft, Redmond, WA 98052, USA
Hee Seok Jeong: Department of Radiology, Pusan National University Yangsan Hospital, Yangsan 50612, Korea
Hee Young Son: Department of Otolaryngology, Thyroid/Head & Neck Cancer Center, The Dongnam Institute of Radiological & Medical Sciences (DIRAMS), Busan 46033, Korea

DOI: https://doi.org/10.3390/s22176387
Journal volume & issue: Vol. 22, no. 17
p. 6387

Abstract

Read online

Despite the lack of findings in laryngeal endoscopy, it is common for patients to undergo vocal problems after thyroid surgery. This study aimed to predict the recovery of the patient’s voice after 3 months from preoperative and postoperative voice spectrograms. We retrospectively collected voice and the GRBAS score from 114 patients undergoing surgery with thyroid cancer. The data for each patient were taken from three points in time: preoperative, and 2 weeks and 3 months postoperative. Using the pretrained model to predict GRBAS as the backbone, the preoperative and 2-weeks-postoperative voice spectrogram were trained for the EfficientNet architecture deep-learning model with long short-term memory (LSTM) to predict the voice at 3 months postoperation. The correlation analysis of the predicted results for the grade, breathiness, and asthenia scores were 0.741, 0.766, and 0.433, respectively. Based on the scaled prediction results, the area under the receiver operating characteristic curve for the binarized grade, breathiness, and asthenia were 0.894, 0.918, and 0.735, respectively. In the follow-up test results for 12 patients after 6 months, the average of the AUC values for the five scores was 0.822. This study showed the feasibility of predicting vocal recovery after 3 months using the spectrogram. We expect this model could be used to relieve patients’ psychological anxiety and encourage them to actively participate in speech rehabilitation.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords