Genome Biology (Jul 2022)
A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
- Runxuan Zhang,
- Richard Kuo,
- Max Coulter,
- Cristiane P. G. Calixto,
- Juan Carlos Entizne,
- Wenbin Guo,
- Yamile Marquez,
- Linda Milne,
- Stefan Riegler,
- Akihiro Matsui,
- Maho Tanaka,
- Sarah Harvey,
- Yubang Gao,
- Theresa Wießner-Kroh,
- Alejandro Paniagua,
- Martin Crespi,
- Katherine Denby,
- Asa ben Hur,
- Enamul Huq,
- Michael Jantsch,
- Artur Jarmolowski,
- Tino Koester,
- Sascha Laubinger,
- Qingshun Quinn Li,
- Lianfeng Gu,
- Motoaki Seki,
- Dorothee Staiger,
- Ramanjulu Sunkar,
- Zofia Szweykowska-Kulinska,
- Shih-Long Tu,
- Andreas Wachter,
- Robbie Waugh,
- Liming Xiong,
- Xiao-Ning Zhang,
- Ana Conesa,
- Anireddy S. N. Reddy,
- Andrea Barta,
- Maria Kalyna,
- John W. S. Brown
Affiliations
- Runxuan Zhang
- Information and Computational Sciences, James Hutton Institute
- Richard Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh
- Max Coulter
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute
- Cristiane P. G. Calixto
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute
- Juan Carlos Entizne
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute
- Wenbin Guo
- Information and Computational Sciences, James Hutton Institute
- Yamile Marquez
- Centre for Genomic Regulation
- Linda Milne
- Information and Computational Sciences, James Hutton Institute
- Stefan Riegler
- Institute of Molecular Plant Biology, Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences (BOKU)
- Akihiro Matsui
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science
- Maho Tanaka
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science
- Sarah Harvey
- Centre for Novel Agricultural Products (CNAP), Department of Biology, University of York Wentworth Way
- Yubang Gao
- College of Forestry, Fujian Agriculture and Forestry University
- Theresa Wießner-Kroh
- Center for Plant Molecular Biology (ZMBP), University of Tübingen
- Alejandro Paniagua
- Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Paterna
- Martin Crespi
- French National Centre for Scientific Research | CNRS INRAE-Universities of Paris Saclay and Paris, Institute of Plant Sciences Paris Saclay IPS2
- Katherine Denby
- Centre for Novel Agricultural Products (CNAP), Department of Biology, University of York Wentworth Way
- Asa ben Hur
- Department of Computer Science, Colorado State University
- Enamul Huq
- Department of Molecular Biosciences, University of Texas at Austin
- Michael Jantsch
- Department of Cell and Developmental Biology, Center for Anatomy and Cell Biology, Medical University of Vienna
- Artur Jarmolowski
- Department of Gene Expression, Adam Mickiewicz University
- Tino Koester
- RNA Biology and Molecular Physiology, Faculty for Biology, Bielefeld University
- Sascha Laubinger
- Institut für Biologie und Umweltwissenschaften (IBU), Carl von Ossietzky Universität Oldenburg
- Qingshun Quinn Li
- Graduate College of Biomedical Sciences, Western University of Health Sciences
- Lianfeng Gu
- College of Forestry, Fujian Agriculture and Forestry University
- Motoaki Seki
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science
- Dorothee Staiger
- RNA Biology and Molecular Physiology, Faculty for Biology, Bielefeld University
- Ramanjulu Sunkar
- Department of Biochemistry and Molecular Biology, Oklahoma State University
- Zofia Szweykowska-Kulinska
- Department of Gene Expression, Adam Mickiewicz University
- Shih-Long Tu
- Institute of Plant and Microbial Biology, Academia Sinica
- Andreas Wachter
- Center for Plant Molecular Biology (ZMBP), University of Tübingen
- Robbie Waugh
- Cell and Molecular Sciences, James Hutton Institute
- Liming Xiong
- Department of Biology, Hong Kong Baptist University
- Xiao-Ning Zhang
- Biology Department, School of Arts and Sciences, St. Bonaventure University
- Ana Conesa
- Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Paterna
- Anireddy S. N. Reddy
- Department of Biology and Program in Cell and Molecular Biology, Colorado State University
- Andrea Barta
- Max F. Perutz Laboratories, Medical University of Vienna, Center of Medical Biochemistry
- Maria Kalyna
- Institute of Molecular Plant Biology, Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences (BOKU)
- John W. S. Brown
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute
- DOI
- https://doi.org/10.1186/s13059-022-02711-0
- Journal volume & issue
-
Vol. 23,
no. 1
pp. 1 – 37
Abstract
Abstract Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts—twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.
Keywords
- Arabidopsis
- Iso-seq
- Reference transcript dataset
- Splice junction
- Transcription start and end sites
- Alternative splicing