DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information

Khunanon Chanasongkhram; Kasikrit Damkliang; Unitsa Sangket

doi:10.7717/peerj.16086

PeerJ (Sep 2023)

DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information

Khunanon Chanasongkhram,
Kasikrit Damkliang,
Unitsa Sangket

Affiliations

Khunanon Chanasongkhram: Division of Biological Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla, Thailand
Kasikrit Damkliang: Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla, Thailand
Unitsa Sangket: Division of Biological Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla, Thailand

DOI: https://doi.org/10.7717/peerj.16086
Journal volume & issue: Vol. 11
p. e16086

Abstract

Read online Read online

Background Genetic variants may potentially play a contributing factor in the development of diseases. Several genetic disease databases are used in medical research and diagnosis but the web applications used to search these databases for disease-associated variants have limitations. The application may not be able to search for large-scale genetic variants, the results of searches may be difficult to interpret and variants mapped from the latest reference genome (GRCH38/hg38) may not be supported. Methods In this study, we developed a novel R library called “DisVar” to identify disease-associated genetic variants in large-scale individual genomic data. This R library is compatible with variants from the latest reference genome version. DisVar uses five databases of disease-associated variants. Over 100 million variants can be simultaneously searched for specific associated diseases. Results The package was evaluated using 24 Variant Call Format (VCF) files (215,054 to 11,346,899 sites) from the 1000 Genomes Project. Disease-associated variants were detected in 298,227 hits across all the VCF files, taking a total of 63.58 m to complete. The package was also tested on ClinVar’s VCF file (2,120,558 variants), where 20,657 hits associated with diseases were identified with an estimated elapsed time of 45.98 s. Conclusions DisVar can overcome the limitations of existing tools and is a fast and effective diagnostic and preventive tool that identifies disease-associated variations from large-scale genetic variants against the latest reference genome.

Published in PeerJ

ISSN: 2167-8359 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Medicine; Science: Biology (General)
Website: https://peerj.com/

About the journal

Abstract

Keywords