Nanopore sequencing-based genome assembly and evolutionary genomics of circum-basmati rice

Jae Young Choi; Zoe N. Lye; Simon C. Groen; Xiaoguang Dai; Priyesh Rughani; Sophie Zaaijer; Eoghan D. Harrington; Sissel Juul; Michael D. Purugganan

doi:10.1186/s13059-020-1938-2

Genome Biology (Feb 2020)

Nanopore sequencing-based genome assembly and evolutionary genomics of circum-basmati rice

Jae Young Choi,
Zoe N. Lye,
Simon C. Groen,
Xiaoguang Dai,
Priyesh Rughani,
Sophie Zaaijer,
Eoghan D. Harrington,
Sissel Juul,
Michael D. Purugganan

Affiliations

Jae Young Choi: Center for Genomics and Systems Biology, Department of Biology, New York University
Zoe N. Lye: Center for Genomics and Systems Biology, Department of Biology, New York University
Simon C. Groen: Center for Genomics and Systems Biology, Department of Biology, New York University
Xiaoguang Dai: Oxford Nanopore Technologies
Priyesh Rughani: Oxford Nanopore Technologies
Sophie Zaaijer: New York Genome Center
Eoghan D. Harrington: Oxford Nanopore Technologies
Sissel Juul: Oxford Nanopore Technologies
Michael D. Purugganan: Center for Genomics and Systems Biology, Department of Biology, New York University

DOI: https://doi.org/10.1186/s13059-020-1938-2
Journal volume & issue: Vol. 21, no. 1
pp. 1 – 27

Abstract

Read online

Abstract Background The circum-basmati group of cultivated Asian rice (Oryza sativa) contains many iconic varieties and is widespread in the Indian subcontinent. Despite its economic and cultural importance, a high-quality reference genome is currently lacking, and the group’s evolutionary history is not fully resolved. To address these gaps, we use long-read nanopore sequencing and assemble the genomes of two circum-basmati rice varieties. Results We generate two high-quality, chromosome-level reference genomes that represent the 12 chromosomes of Oryza. The assemblies show a contig N50 of 6.32 Mb and 10.53 Mb for Basmati 334 and Dom Sufid, respectively. Using our highly contiguous assemblies, we characterize structural variations segregating across circum-basmati genomes. We discover repeat expansions not observed in japonica—the rice group most closely related to circum-basmati—as well as the presence and absence variants of over 20 Mb, one of which is a circum-basmati-specific deletion of a gene regulating awn length. We further detect strong evidence of admixture between the circum-basmati and circum-aus groups. This gene flow has its greatest effect on chromosome 10, causing both structural variation and single-nucleotide polymorphism to deviate from genome-wide history. Lastly, population genomic analysis of 78 circum-basmati varieties shows three major geographically structured genetic groups: Bhutan/Nepal, India/Bangladesh/Myanmar, and Iran/Pakistan. Conclusion The availability of high-quality reference genomes allows functional and evolutionary genomic analyses providing genome-wide evidence for gene flow between circum-aus and circum-basmati, describes the nature of circum-basmati structural variation, and reveals the presence/absence variation in this important and iconic rice variety group.

Published in Genome Biology

ISSN: 1474-760X (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Genetics
Website: https://genomebiology.biomedcentral.com/

About the journal

Abstract

Keywords