Naïve Bayes Classifiers and accompanying dataset for Pseudomonas syringae isolate characterization

Chad Fautt; Estelle Couradeau; Kevin L. Hockett

doi:10.1038/s41597-024-03003-x

Scientific Data (Feb 2024)

Naïve Bayes Classifiers and accompanying dataset for Pseudomonas syringae isolate characterization

Chad Fautt,
Estelle Couradeau,
Kevin L. Hockett

Affiliations

Chad Fautt: Department of Plant Pathology and Environmental Microbiology, Pennsylvania State University
Estelle Couradeau: Department of Ecosystem Science and Management, Pennsylvania State University
Kevin L. Hockett: Department of Plant Pathology and Environmental Microbiology, Pennsylvania State University

DOI: https://doi.org/10.1038/s41597-024-03003-x
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 8

Abstract

Read online

Abstract The Pseudomonas syringae species complex (PSSC) is a diverse group of plant pathogens with a collective host range encompassing almost every food crop grown today. As a threat to global food security, rapid detection and characterization of epidemic and emerging pathogenic lineages is essential. However, phylogenetic identification is often complicated by an unclarified and ever-changing taxonomy, making practical use of available databases and the proper training of classifiers difficult. As such, while amplicon sequencing is a common method for routine identification of PSSC isolates, there is no efficient method for accurate classification based on this data. Here we present a suite of five Naïve bayes classifiers for PCR primer sets widely used for PSSC identification, trained on in-silico amplicon data from 2,161 published PSSC genomes using the life identification number (LIN) hierarchical clustering algorithm in place of traditional Linnaean taxonomy. Additionally, we include a dataset for translating classification results back into traditional taxonomic nomenclature (i.e. species, phylogroup, pathovar), and for predicting virulence factor repertoires.

Published in Scientific Data

ISSN: 2052-4463 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/sdata/

About the journal