CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Matosinhos, Portugal, Faculty of Sciences, University of Porto, , Porto, Portugal
CIBIO/InBIO - Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal, IUCN SSC Mollusc Specialist Group, c/o IUCN, David Attenborough Building, Pembroke St., Cambridge, England
CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Matosinhos, Portugal
Thomas Forest
Éco-anthropologie, Muséum National d’Histoire Naturelle, CNRS UMR 7206, Paris, France, SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS UMR 7241, INSERM U 1050, Paris, France, Institut de Systématique Evolution Biodiversité, CNRS MNHN SU EPHE, CP 51, 55 rue Buffon, 75005, Paris, France
Guillaume Achaz
Éco-anthropologie, Muséum National d’Histoire Naturelle, CNRS UMR 7206, Paris, France, SMILE Group, Center for Interdisciplinary Research in Biology (CIRB), Collège de France, CNRS UMR 7241, INSERM U 1050, Paris, France
Amílcar Teixeira
Centro de Investigação de Montanha (CIMO), Instituto Politécnico de Bragança, Bragança, Portugal
Vincent Prié
CIBIO/InBIO - Research Center in Biodiversity and Genetic Resources, University of Porto, Vairão, Portugal, IUCN SSC Mollusc Specialist Group, c/o IUCN, David Attenborough Building, Pembroke St., Cambridge, England
CIIMAR/CIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Matosinhos, Portugal, Faculty of Sciences, University of Porto, , Porto, Portugal
Contiguous assemblies are fundamental to deciphering the composition of extant genomes. In molluscs, this is considerably challenging owing to the large size of their genomes, heterozygosity, and widespread repetitive content. Consequently, long-read sequencing technologies are fundamental for high contiguity and quality. The first genome assembly of Margaritifera margaritifera (Linnaeus, 1758) (Mollusca: Bivalvia: Unionida), a culturally relevant, widespread, and highly threatened species of freshwater mussels, was recently generated. However, the resulting genome is highly fragmented since the assembly relied on short-read approaches. Here, an improved reference genome assembly was generated using a combination of PacBio CLR long reads and Illumina paired-end short reads. This genome assembly is 2.4 Gb long, organized into 1,700 scaffolds with a contig N50 length of 3.4 Mbp. The ab initio gene prediction resulted in 48,314 protein-coding genes. Our new assembly is a substantial improvement and an essential resource for studying this species’ unique biological and evolutionary features, helping promote its conservation.