Annals of Clinical Microbiology and Antimicrobials (Nov 2022)

A Facile machine learning multi-classification model for Streptococcus agalactiae clonal complexes

  • Jingxian Liu,
  • Jing Zhao,
  • Chencui Huang,
  • Jingxu Xu,
  • Wei Liu,
  • Jiajia Yu,
  • Hongyan Guan,
  • Ying Liu,
  • Lisong Shen

DOI
https://doi.org/10.1186/s12941-022-00541-3
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Background The clinical significance of group B streptococcus (GBS) was different among different clonal complexes (CCs), accurate strain typing of GBS would facilitate clinical prognostic evaluation, epidemiological investigation and infection control. The aim of this study was to construct a practical and facile CCs prediction model for S. agalactiae. Methods A total of 325 non-duplicated GBS strains were collected from clinical samples in Xinhua Hospital, Shanghai, China. Multilocus sequence typing (MLST) method was used for molecular classification, the results were analyzed to derive CCs by Bionumeric 8.0 software. Antibiotic susceptibility test was performed using Vitek-2 Compact system combined with K-B method. Multiplex PCR method was used for serotype identification. A total of 45 virulence genes associated with adhesion, invasion, immune evasion were detected by PCR method and electrophoresis. Three types of features, including antibiotic susceptibility (A), serotypes (S) and virulence genes (V) tests, and XGBoost algorithm was established to develop multi-class CCs identification models. The performance of proposed models was evaluated by the receiver operating characteristic curve (ROC). Results The 325 GBS were divided into 47 STs, and then calculated into 7 major CCs, including CC1, CC10, CC12, CC17, CC19, CC23, CC24. A total of 18 features in three kinds of tests (A, S, V) were significantly different from each CC. The model based on all the features (S&A&V) performed best with AUC 0.9536. The model based on serotype and antibiotic resistance (S&A) only enrolled 5 weighed features, performed well in predicting CCs with mean AUC 0.9212, and had no statistical difference in predicting CC10, CC12, CC17, CC19, CC23 and CC24 when compared with S&A&V model (all p > 0.05). Conclusions The S&A model requires least parameters while maintaining a high accuracy and predictive power of CCs prediction. The established model could be used as a promising tool to classify the GBS molecular types, and suggests a substantive improvement in clinical application and epidemiology surveillance in GBS phenotyping.

Keywords