Large language models assisted multi-effect variants mining on cerebral cavernous malformation familial whole genome sequencing

Yiqi Wang; Jinmei Zuo; Chao Duan; Hao Peng; Jia Huang; Liang Zhao; Li Zhang; Zhiqiang Dong

Computational and Structural Biotechnology Journal (Dec 2024)

Large language models assisted multi-effect variants mining on cerebral cavernous malformation familial whole genome sequencing

Yiqi Wang,
Jinmei Zuo,
Chao Duan,
Hao Peng,
Jia Huang,
Liang Zhao,
Li Zhang,
Zhiqiang Dong

Affiliations

Yiqi Wang: College of Biomedicine and Health, College of Life Science and Technology, Huazhong Agricultural University, No.1, Shizishan Street, Wuhan 430070, Hubei, China; Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China; Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, No. 32, Renmin South Road, Shiyan 442000, Hubei, China
Jinmei Zuo: Physical Examination Center, Taihe Hospital, Hubei University of Medicine, No. 32, Renmin South Road, Shiyan 442000, Hubei, China
Chao Duan: College of Biomedicine and Health, College of Life Science and Technology, Huazhong Agricultural University, No.1, Shizishan Street, Wuhan 430070, Hubei, China; Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China
Hao Peng: Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China; Department of Neurosurgery, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China
Jia Huang: The Second Clinical Medical College, Lanzhou University, No. 222, South Tianshui Road, Lanzhou 730030, Gansu, China
Liang Zhao: Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, No. 32, Renmin South Road, Shiyan 442000, Hubei, China; Corresponding author.
Li Zhang: Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China; Department of Neurosurgery, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China; Corresponding author at: Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China.
Zhiqiang Dong: College of Biomedicine and Health, College of Life Science and Technology, Huazhong Agricultural University, No.1, Shizishan Street, Wuhan 430070, Hubei, China; Center for Neurological Disease Research, Taihe Hospital, Hubei University of Medicine, No.32, Renmin South Road, Shiyan 442000, Hubei, China; Corresponding author at: College of Biomedicine and Health, College of Life Science and Technology, Huazhong Agricultural University, No.1, Shizishan Street, Wuhan 430070, Hubei, China.

Journal volume & issue: Vol. 23
pp. 843 – 858

Abstract

Read online

Cerebral cavernous malformation (CCM) is a polygenic disease with intricate genetic interactions contributing to quantitative pathogenesis across multiple factors. The principal pathogenic genes of CCM, specifically KRIT1, CCM2, and PDCD10, have been reported, accompanied by a growing wealth of genetic data related to mutations. Furthermore, numerous other molecules associated with CCM have been unearthed. However, tackling such massive volumes of unstructured data remains challenging until the advent of advanced large language models. In this study, we developed an automated analytical pipeline specialized in single nucleotide variants (SNVs) related biomedical text analysis called BRLM. To facilitate this, BioBERT was employed to vectorize the rich information of SNVs, while a deep residue network was used to discriminate the classes of the SNVs. BRLM was initially constructed on mutations from 12 different types of TCGA cancers, achieving an accuracy exceeding 99%. It was further examined for CCM mutations in familial sequencing data analysis, highlighting an upstream master regulator gene fibroblast growth factor 1 (FGF1). With multi-omics characterization and validation in biological function, FGF1 demonstrated to play a significant role in the development of CCMs, which proved the effectiveness of our model. The BRLM web server is available at http://1.117.230.196.

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal

Abstract

Keywords