Frontiers in Plant Science (Dec 2022)

Machine learning and analysis of genomic diversity of “Candidatus Liberibacter asiaticus” strains from 20 citrus production states in Mexico

  • Jiaquan Huang,
  • Jiaquan Huang,
  • Iobana Alanís-Martínez,
  • Lucita Kumagai,
  • Zehan Dai,
  • Zheng Zheng,
  • Adalberto A. Perez de Leon,
  • Jianchi Chen,
  • Xiaoling Deng

DOI
https://doi.org/10.3389/fpls.2022.1052680
Journal volume & issue
Vol. 13

Abstract

Read online

BackgroundHuanglongbing (HLB, yellow shoot disease) is a highly destructive citrus disease associated with a nonculturable bacterium, “Candidatus Liberibacter asiaticus” (CLas), which is transmitted by Asian citrus psyllid (ACP, Diaphorina citri). In Mexico, HLB was first reported in Tizimin, Yucatán, in 2009 and is now endemic in 351 municipalities of 25 states. Understanding the population diversity of CLas is critical for HLB management. Current CLas diversity research is exclusively based on analysis of the bacterial genome, which composed two regions, chromosome (> 1,000 genes) and prophage (about 40 genes).Methods and resultsIn this study, 40 CLas-infected ACP samples from 20 states in Mexico were collected. CLas was detected and confirmed by PCR assays. A prophage gene(terL)-based typing system (TTS) divided the Mexican CLas strains into two groups: Term-G including four strains from Yucatán and Chiapas, as well as strain psy62 from Florida, USA, and Term-A included all other 36 Mexican strains, as well as strain AHCA1 from California, USA. CLas diversity was further evaluated to include all chromosomal and prophage genes assisted by using machine learning (ML) tools to resolve multidimensional data handling issues. A Term-G strain (YTMX) and a Term-A strain (BCSMX) were sequenced and analyzed. The two Mexican genome sequences along with the CLas genome sequences available in GenBank were studied. An unsupervised ML was implemented through principal component analysis (PCA) on average nucleotide identities (ANIs) of CLas whole genome sequences; And a supervised ML was implemented through sparse partial least squares discriminant analysis (sPLS-DA) on single nucleotide polymorphisms (SNPs) of coding genes of CLas guided by the TTS. Two CLas Geno-groups, Geno-group 1 that extended Term-A and Geno-group 2 that extended Term-G, were established.ConclusionsThis study concluded that: 1) there were at least two different introductions of CLas into Mexico; 2) CLas strains between Mexico and USA are closely related; and 3) The two Geno-groups provide the basis for future CLas subspecies research.

Keywords