Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage

Patrick Willems; Igor Fijalkowski; Petra Van Damme

doi:10.1128/mSystems.00833-20

mSystems (Oct 2020)

Lost and Found: Re-searching and Re-scoring Proteomics Data Aids Genome Annotation and Improves Proteome Coverage

Patrick Willems,
Igor Fijalkowski,
Petra Van Damme

Affiliations

Patrick Willems: Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
Igor Fijalkowski: Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium
Petra Van Damme: Department of Biochemistry and Microbiology, Ghent University, Ghent, Belgium

DOI: https://doi.org/10.1128/mSystems.00833-20
Journal volume & issue: Vol. 5, no. 5

Abstract

Read online

ABSTRACT Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribosome profiling (ribo-seq) and proteomic data for Salmonella enterica serovar Typhimurium to identify unannotated proteins or alternative protein forms. This data analysis encompasses the searching of cofragmenting peptides and postprocessing with extended peptide-to-spectrum quality features, including comparison to predicted fragment ion intensities. When this strategy is applied, an enhanced proteome depth is achieved, as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by reanalyzing public Deinococcus radiodurans data sets. Taken together, our results show that systematic reanalysis using available prokaryotic (proteome) data sets holds great promise to assist in experimentally based genome annotation. IMPORTANCE Delineation of open reading frames (ORFs) causes persistent inconsistencies in prokaryote genome annotation. We demonstrate that by advanced (re)analysis of omics data, a higher proteome coverage and sensitive detection of unannotated ORFs can be achieved, which can be exploited for conditional bacterial genome (re)annotation, which is especially relevant in view of annotating the wealth of sequenced prokaryotic genomes obtained in recent years.

Published in mSystems

ISSN: 2379-5077 (Online)
Publisher: American Society for Microbiology
Country of publisher: United States
LCC subjects: Science: Microbiology
Website: https://journals.asm.org/journal/msystems

About the journal

Abstract

Keywords