Detection of single nucleotide polymorphisms in virus genomes assembled from high-throughput sequencing data: large-scale performance testing of sequence analysis strategies
Johan Rollin,
Rachelle Bester,
Yves Brostaux,
Kadriye Caglayan,
Kris De Jonghe,
Ales Eichmeier,
Yoika Foucart,
Annelies Haegeman,
Igor Koloniuk,
Petr Kominek,
Hans Maree,
Serkan Onder,
Susana Posada Céspedes,
Vahid Roumi,
Dana Šafářová,
Olivier Schumpp,
Cigdem Ulubas Serce,
Merike Sõmera,
Lucie Tamisier,
Eeva Vainio,
Rene AA van der Vlugt,
Sebastien Massart
Affiliations
Johan Rollin
Laboratory of Plant Pathology—TERRA—Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
Rachelle Bester
Citrus Research International, Matieland, South Africa
Yves Brostaux
Laboratory of Statistics, Computer Science and Modelling Applied to Bioengineering, TERRA, Gembloux Agro-Bio Tech, Teaching and Research Centre, University of Liège, Gembloux, Belgium
Kadriye Caglayan
Plant Protection Department, Agricultural Faculty, Hatay Mustafa Kemal University, Hatay, Turkey
Kris De Jonghe
Fisheries and Food (ILVO), Plant Sciences Unit, Flanders Research Institute for Agriculture, Merelbeke, Belgium
Ales Eichmeier
Mendeleum—Institute of Genetics, Faculty of Horticulture, Mendel University in Brno, Lednice, Czech Republic
Yoika Foucart
Fisheries and Food (ILVO), Plant Sciences Unit, Flanders Research Institute for Agriculture, Merelbeke, Belgium
Annelies Haegeman
Fisheries and Food (ILVO), Plant Sciences Unit, Flanders Research Institute for Agriculture, Merelbeke, Belgium
Igor Koloniuk
Biology Centre CAS, Ceske Budejovice, Czech Republic
Petr Kominek
Crop Research Institute, Praha, Czech Republic
Hans Maree
Citrus Research International, Matieland, South Africa
Serkan Onder
Department of Plant Protection, Faculty of Agriculture, Eskişehir Osmangazi University, Eskişehir, Turkey
Susana Posada Céspedes
Department of Biosystems Science and Engineering, ETH Zurich, Basel, 4058, Switzerland
Vahid Roumi
Plant Protection Department, Faculty of Agriculture, University of Maragheh, Maragheh, Iran
Dana Šafářová
Department of Cell Biology and Genetics, Faculty of Science, Palacký University Olomouc, Olomouc, Czech Republic
Plant Production and Technologies Department, Ayhan Şahenk Faculty of Agricultural Science and Technologies, Niğde Ömer Halisdemir University, Niğde, Turkey
Merike Sõmera
Department of Chemistry and Biotechnology, Tallinn University of Technology, Tallinn, Estonia
Lucie Tamisier
Pathologie Végétale, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Montfavet, France
Eeva Vainio
Natural Resources Institute Finland, Helsinki, Finland
Rene AA van der Vlugt
Wageningen University & Research, Wageningen, The Netherlands
Sebastien Massart
Laboratory of Plant Pathology—TERRA—Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium
Recent developments in high-throughput sequencing (HTS) technologies and bioinformatics have drastically changed research in virology, especially for virus discovery. Indeed, proper monitoring of the viral population requires information on the different isolates circulating in the studied area. For this purpose, HTS has greatly facilitated the sequencing of new genomes of detected viruses and their comparison. However, bioinformatics analyses allowing reconstruction of genome sequences and detection of single nucleotide polymorphisms (SNPs) can potentially create bias and has not been widely addressed so far. Therefore, more knowledge is required on the limitations of predicting SNPs based on HTS-generated sequence samples. To address this issue, we compared the ability of 14 plant virology laboratories, each employing a different bioinformatics pipeline, to detect 21 variants of pepino mosaic virus (PepMV) in three samples through large-scale performance testing (PT) using three artificially designed datasets. To evaluate the impact of bioinformatics analyses, they were divided into three key steps: reads pre-processing, virus-isolate identification, and variant calling. Each step was evaluated independently through an original, PT design including discussion and validation between participants at each step. Overall, this work underlines key parameters influencing SNPs detection and proposes recommendations for reliable variant calling for plant viruses. The identification of the closest reference, mapping parameters and manual validation of the detection were recognized as the most impactful analysis steps for the success of the SNPs detections. Strategies to improve the prediction of SNPs are also discussed.