Journal of Bioinformatics and Genomics (Aug 2023)
ALIGNMENT OF PSEUDOREADS OBTAINED FROM HOMOLOGOUS SEQUENCES IN IDENTIFYING POTENTIALLY TOLERATED GENOMIC VARIANTS
Abstract
The use of Next-Generation Sequencing (NGS) has proven to be clinically beneficial, but it has also revealed a significant number of variants that we are unable to accurately define and categorize in terms of pathogenicity. These variants are known as variants of uncertain significance (VUS) which are detected en masse in each NGS run. Unlike amino acid substitutions and splice site mutations, common variants in non-coding regions have not been extensively studied and are still mostly classified as VUS. In this paper, a new concept was proposed to identify potentially tolerated variants, including variants in non-coding regions, based on the Genetic Alignment of “Pseudoreads” from Homologs (GAPH) method. We have discovered a total of 5,859,205 variants, the majority of which have never been documented in the largest population database, GnomAD, and only 0.0015% (88 variants) were classified as pathogenic according to the ClinVar database. Overall, the results of this study demonstrate the efficacy of our new method to refine a variant tolerability, many aspects of which could be further adjusted to optimize the results.
Keywords