Scientific Reports (Apr 2024)

Targeted phasing of 2–200 kilobase DNA fragments with a short-read sequencer and a single-tube linked-read library method

  • Veronika Mikhaylova,
  • Madison Rzepka,
  • Tetsuya Kawamura,
  • Yu Xia,
  • Peter L. Chang,
  • Shiguo Zhou,
  • Amber Paasch,
  • Long Pham,
  • Naisarg Modi,
  • Likun Yao,
  • Adrian Perez-Agustin,
  • Sara Pagans,
  • T. Christian Boles,
  • Ming Lei,
  • Yong Wang,
  • Ivan Garcia-Bassets,
  • Zhoutao Chen

DOI
https://doi.org/10.1038/s41598-024-58733-0
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 17

Abstract

Read online

Abstract In the human genome, heterozygous sites refer to genomic positions with a different allele or nucleotide variant on the maternal and paternal chromosomes. Resolving these allelic differences by chromosomal copy, also known as phasing, is achievable on a short-read sequencer when using a library preparation method that captures long-range genomic information. TELL-Seq is a library preparation that captures long-range genomic information with the aid of molecular identifiers (barcodes). The same barcode is used to tag the reads derived from the same long DNA fragment within a range of up to 200 kilobases (kb), generating linked-reads. This strategy can be used to phase an entire genome. Here, we introduce a TELL-Seq protocol developed for targeted applications, enabling the phasing of enriched loci of varying sizes, purity levels, and heterozygosity. To validate this protocol, we phased 2–200 kb loci enriched with different methods: CRISPR/Cas9-mediated excision coupled with pulse-field electrophoresis for the longest fragments, CRISPR/Cas9-mediated protection from exonuclease digestion for mid-size fragments, and long PCR for the shortest fragments. All selected loci have known clinical relevance: BRCA1, BRCA2, MLH1, MSH2, MSH6, APC, PMS2, SCN5A-SCN10A, and PKI3CA. Collectively, the analyses show that TELL-Seq can accurately phase 2–200 kb targets using a short-read sequencer.