DeepFace: Deep-learning-based framework to contextualize orofacial-cleft-related variants during human embryonic craniofacial development
Yulin Dai,
Toshiyuki Itai,
Guangsheng Pei,
Fangfang Yan,
Yan Chu,
Xiaoqian Jiang,
Seth M. Weinberg,
Nandita Mukhopadhyay,
Mary L. Marazita,
Lukas M. Simon,
Peilin Jia,
Zhongming Zhao
Affiliations
Yulin Dai
Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
Toshiyuki Itai
Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
Guangsheng Pei
Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
Fangfang Yan
Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
Yan Chu
Center for Secure Artificial Intelligence for Healthcare, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
Xiaoqian Jiang
Center for Secure Artificial Intelligence for Healthcare, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
Seth M. Weinberg
Department of Oral and Craniofacial Sciences, School of Dental Medicine, Center for Craniofacial and Dental Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA; Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA
Nandita Mukhopadhyay
Department of Oral and Craniofacial Sciences, School of Dental Medicine, Center for Craniofacial and Dental Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA
Mary L. Marazita
Department of Oral and Craniofacial Sciences, School of Dental Medicine, Center for Craniofacial and Dental Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA; Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA; Clinical and Translational Science Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
Lukas M. Simon
Therapeutic Innovation Center, Baylor College of Medicine, Houston, TX 77030, USA
Peilin Jia
Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
Zhongming Zhao
Center for Precision Health, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA; Corresponding author
Summary: Orofacial clefts (OFCs) are among the most common human congenital birth defects. Previous multiethnic studies have identified dozens of associated loci for both cleft lip with or without cleft palate (CL/P) and cleft palate alone (CP). Although several nearby genes have been highlighted, the “casual” variants are largely unknown. Here, we developed DeepFace, a convolutional neural network model, to assess the functional impact of variants by SNP activity difference (SAD) scores. The DeepFace model is trained with 204 epigenomic assays from crucial human embryonic craniofacial developmental stages of post-conception week (pcw) 4 to pcw 10. The Pearson correlation coefficient between the predicted and actual values for 12 epigenetic features achieved a median range of 0.50–0.83. Specifically, our model revealed that SNPs significantly associated with OFCs tended to exhibit higher SAD scores across various variant categories compared to less related groups, indicating a context-specific impact of OFC-related SNPs. Notably, we identified six SNPs with a significant linear relationship to SAD scores throughout developmental progression, suggesting that these SNPs could play a temporal regulatory role. Furthermore, our cell-type specificity analysis pinpointed the trophoblast cell as having the highest enrichment of risk signals associated with OFCs. Overall, DeepFace can harness distal regulatory signals from extensive epigenomic assays, offering new perspectives for prioritizing OFC variants using contextualized functional genomic features. We expect DeepFace to be instrumental in accessing and predicting the regulatory roles of variants associated with OFCs, and the model can be extended to study other complex diseases or traits.