T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features

Zewei Chen; Ziyi Zhao; Xinjie Hui; Junya Zhang; Yixue Hu; Runhong Chen; Xuxia Cai; Yueming Hu; Yejun Wang

doi:10.3389/fmicb.2021.813094

Frontiers in Microbiology (Feb 2022)

T1SEstacker: A Tri-Layer Stacking Model Effectively Predicts Bacterial Type 1 Secreted Proteins Based on C-Terminal Non-repeats-in-Toxin-Motif Sequence Features

Zewei Chen,
Ziyi Zhao,
Xinjie Hui,
Junya Zhang,
Yixue Hu,
Runhong Chen,
Xuxia Cai,
Yueming Hu,
Yejun Wang

Affiliations

Zewei Chen: Youth Innovation Team of Medical Bioinformatics, Shenzhen University Health Science Center, Shenzhen, China
Ziyi Zhao: Youth Innovation Team of Medical Bioinformatics, Shenzhen University Health Science Center, Shenzhen, China
Xinjie Hui: Department of Respiratory Medicine, Xuanwu Hospital, Capital Medical University, Beijing, China
Junya Zhang: Youth Innovation Team of Medical Bioinformatics, Shenzhen University Health Science Center, Shenzhen, China
Yixue Hu: Youth Innovation Team of Medical Bioinformatics, Shenzhen University Health Science Center, Shenzhen, China
Runhong Chen: Youth Innovation Team of Medical Bioinformatics, Shenzhen University Health Science Center, Shenzhen, China
Xuxia Cai: Youth Innovation Team of Medical Bioinformatics, Shenzhen University Health Science Center, Shenzhen, China
Yueming Hu: Youth Innovation Team of Medical Bioinformatics, Shenzhen University Health Science Center, Shenzhen, China
Yejun Wang: Youth Innovation Team of Medical Bioinformatics, Shenzhen University Health Science Center, Shenzhen, China

DOI: https://doi.org/10.3389/fmicb.2021.813094
Journal volume & issue: Vol. 12

Abstract

Read online

Type 1 secretion systems play important roles in pathogenicity of Gram-negative bacteria. However, the substrate secretion mechanism remains largely unknown. In this research, we observed the sequence features of repeats-in-toxin (RTX) proteins, a major class of type 1 secreted effectors (T1SEs). We found striking non-RTX-motif amino acid composition patterns at the C termini, most typically exemplified by the enriched “[FLI][VAI]” at the most C-terminal two positions. Machine-learning models, including deep-learning ones, were trained using these sequence-based non-RTX-motif features and further combined into a tri-layer stacking model, T1SEstacker, which predicted the RTX proteins accurately, with a fivefold cross-validated sensitivity of ∼0.89 at the specificity of ∼0.94. Besides substrates with RTX motifs, T1SEstacker can also well distinguish non-RTX-motif T1SEs, further suggesting their potential existence of common secretion signals. T1SEstacker was applied to predict T1SEs from the genomes of representative Salmonella strains, and we found that both the number and composition of T1SEs varied among strains. The number of T1SEs is estimated to reach 100 or more in each strain, much larger than what we expected. In summary, we made comprehensive sequence analysis on the type 1 secreted RTX proteins, identified common sequence-based features at the C termini, and developed a stacking model that can predict type 1 secreted proteins accurately.

Published in Frontiers in Microbiology

ISSN: 1664-302X (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Microbiology
Website: http://www.frontiersin.org/journals/microbiology

About the journal

Abstract

Keywords