Genetic polyploid phasing from low-depth progeny samples
Sven Schrinner,
Rebecca Serra Mari,
Richard Finkers,
Paul Arens,
Björn Usadel,
Tobias Marschall,
Gunnar W. Klau
Affiliations
Sven Schrinner
Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany; Corresponding author
Rebecca Serra Mari
Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Richard Finkers
Plant Breeding, Wageningen University & Research, Wageningen, the Netherlands; Gennovation B.V., Agro Business Park 10, 6708 PW, Wageningen, The Netherlands
Paul Arens
Plant Breeding, Wageningen University & Research, Wageningen, the Netherlands
Björn Usadel
Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany; Forschungszentrum Jülich, Institute of Bio and Geosciences, Bioinformatics (IBG-4), Jülich, Germany; Bioeconomy Science Center, c/o Forschungszentrum, Jülich, Germany; Biological Data Science, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Tobias Marschall
Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Gunnar W. Klau
Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany; Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
Summary: An important challenge in genome assembly is haplotype phasing, that is, to reconstruct the different haplotype sequences of an individual genome. Phasing becomes considerably more difficult with increasing ploidy, which makes polyploid phasing a notoriously hard computational problem. We present a novel genetic phasing method for plant breeding with the aim to phase two deep-sequenced parental samples with the help of a large number of progeny samples sequenced at low depth. The key ideas underlying our approach are to (i) integrate the individually weak Mendelian progeny signals with a Bayesian log-likelihood model, (ii) cluster alleles according to their likelihood of co-occurrence, and (iii) assign them to haplotypes via an interval scheduling approach. We show on two deep-sequenced parental and 193 low-depth progeny potato samples that our approach computes high-quality sparse phasings and that it scales to whole genomes.