Scientific Reports (May 2023)

Targeted adaptive long-read sequencing for discovery of complex phased variants in inherited retinal disease patients

  • Kenji Nakamichi,
  • Russell N. Van Gelder,
  • Jennifer R. Chao,
  • Debarshi Mustafi

DOI
https://doi.org/10.1038/s41598-023-35791-4
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Inherited retinal degenerations (IRDs) are a heterogeneous group of predominantly monogenic disorders with over 300 causative genes identified. Short-read exome sequencing is commonly used to genotypically diagnose patients with clinical features of IRDs, however, in up to 30% of patients with autosomal recessive IRDs, one or no disease-causing variants are identified. Furthermore, chromosomal maps cannot be reconstructed for allelic variant discovery with short-reads. Long-read genome sequencing can provide complete coverage of disease loci and a targeted approach can focus sequencing bandwidth to a genomic region of interest to provide increased depth and haplotype reconstruction to uncover cases of missing heritability. We demonstrate that targeted adaptive long-read sequencing on the Oxford Nanopore Technologies (ONT) platform of the USH2A gene from three probands in a family with the most common cause of the syndromic IRD, Usher Syndrome, resulted in greater than 12-fold target gene sequencing enrichment on average. This focused depth of sequencing allowed for haplotype reconstruction and phased variant identification. We further show that variants obtained from the haplotype-aware genotyping pipeline can be heuristically ranked to focus on potential pathogenic candidates without a priori knowledge of the disease-causing variants. Moreover, consideration of the variants unique to targeted long-read sequencing that are not covered by short-read technology demonstrated higher precision and F1 scores for variant discovery by long-read sequencing. This work establishes that targeted adaptive long-read sequencing can generate targeted, chromosome-phased data sets for identification of coding and non-coding disease-causing alleles in IRDs and can be applicable to other Mendelian diseases.