McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA, Pacific Biosciences, 1305 O’Brien Dr., Menlo Park, CA 94025, USA
Amanda Markee
McGuire Center for Lepidoptera and Biodiversity, Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA, School of Natural Resources and the Environment, University of Florida, Gainesville, FL 32611, USA
LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany, Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany
Ashlyn Powell
Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA
LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt 60325, Germany, Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt 60325, Germany, Institute for Insect Biotechnology, Justus-Liebig-University, Gießen 35390, Germany
Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT 84602, USA, Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC 20002, USA
Insect silk is a versatile biomaterial. Lepidoptera and Trichoptera display some of the most diverse uses of silk, with varying strength, adhesive qualities, and elastic properties. Silk fibroin genes are long (>20 Kbp), with many repetitive motifs that make them challenging to sequence. Most research thus far has focused on conserved N- and C-terminal regions of fibroin genes because a full comparison of repetitive regions across taxa has not been possible. Using the PacBio Sequel II system and SMRT sequencing, we generated high fidelity (HiFi) long-read genomic and transcriptomic sequences for the Indianmeal moth (Plodia interpunctella) and genomic sequences for the caddisfly Eubasilissa regina. Both genomes were highly contiguous (N50 = 9.7 Mbp/32.4 Mbp, L50 = 13/11) and complete (BUSCO complete = 99.3%/95.2%), with complete and contiguous recovery of silk heavy fibroin gene sequences. We show that HiFi long-read sequencing is helpful for understanding genes with long, repetitive regions.