Machine learning-based genetic diagnosis models for hereditary hearing loss by the GJB2, SLC26A4 and MT-RNR1 variants
Xiaomei Luo,
Fengmei Li,
Wenchang Xu,
Kaicheng Hong,
Tao Yang,
Jiansheng Chen,
Xiaohe Chen,
Hao Wu
Affiliations
Xiaomei Luo
University of Science and Technology of China, No.96 Jinzhai Road, Hefei, Anhui 230026, China, No.96 Jinzhai Road, Hefei, Anhui 230026, China; Department of Medical Electronics, Chinese Academy of Sciences, Suzhou Institute of Biomedical Engineering and Technology, No. 88, Keling Road, Suzhou, Jiangsu 215163, China
Fengmei Li
Department of Medical Electronics, Chinese Academy of Sciences, Suzhou Institute of Biomedical Engineering and Technology, No. 88, Keling Road, Suzhou, Jiangsu 215163, China
Wenchang Xu
Department of Medical Electronics, Chinese Academy of Sciences, Suzhou Institute of Biomedical Engineering and Technology, No. 88, Keling Road, Suzhou, Jiangsu 215163, China
Kaicheng Hong
Department of Medical Electronics, Chinese Academy of Sciences, Suzhou Institute of Biomedical Engineering and Technology, No. 88, Keling Road, Suzhou, Jiangsu 215163, China
Tao Yang
Department of Otorhinolaryngology-Head and Neck Surgery, Shanghai Ninth People's Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China; Ear Institute, Shanghai Jiaotong University School of Medicine, Shanghai, China; Shanghai Key Laboratory of Translational Medicine on Ear and Nose Diseases, Shanghai, China
Jiansheng Chen
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; Beijing National Research Center for Information Science and Technology, Beijing 100084, China; University of Science and Technology Beijing, Beijing 100083, China
Xiaohe Chen
University of Science and Technology of China, No.96 Jinzhai Road, Hefei, Anhui 230026, China, No.96 Jinzhai Road, Hefei, Anhui 230026, China; Department of Medical Electronics, Chinese Academy of Sciences, Suzhou Institute of Biomedical Engineering and Technology, No. 88, Keling Road, Suzhou, Jiangsu 215163, China; Corresponding author at: University of Science and Technology of China, No.96 Jinzhai Road, Hefei, Anhui 230026, China.
Hao Wu
Department of Otorhinolaryngology-Head and Neck Surgery, Shanghai Ninth People's Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China; Ear Institute, Shanghai Jiaotong University School of Medicine, Shanghai, China; Shanghai Key Laboratory of Translational Medicine on Ear and Nose Diseases, Shanghai, China; Corresponding author at: Department of Otorhinolaryngology-Head and Neck Surgery, Shanghai Ninth People's Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China.
Background: Hereditary hearing loss (HHL) is the most common sensory deficit, which highly afflicts humans. With gene sequencing technology development, more variants will be identified and support genetic diagnoses, which is difficult for human experts to diagnose. This study aims to develop a machine learning-based genetic diagnosis model of HHL-related variants of GJB2, SLC26A4 and MT-RNR1. Methods: This case-control study included 1898 subjects, among which 1354 were HHL patients and 544 were carriers. Risk assessment models were established based on variants at 144 sites in three genes related to HHL by building six machine learning (ML) models. We compared the ML models with the genetic risk score (GRS) and expert interpretation (EI) to verify the clinical performance. Findings: Among the six ML models, the support vector machine (SVM) showed the best performance. For the prediction of HHL-related gene sites in subjects with variants, the area under the receiver operating characteristic (AUC) of the SVM model was 0.803 (0.680–0.814) in the 10-fold stratified cross-validation and 0.751 (0.635–0.779) in external validation. The predicted results were better than both EI and GRS. Furthermore, 11 sites were identified as the smallest feature set that can be accurately predicted. Interpretation: The developed SVM model has great potential to be an efficient and effective tool for HHL prediction when high throughput sequencing data are available.