Frontiers in Physiology (Sep 2019)

A Computational Pipeline for the Extraction of Actionable Biological Information From NGS-Phage Display Experiments

  • Antonios Vekris,
  • Eleftherios Pilalis,
  • Eleftherios Pilalis,
  • Aristotelis Chatziioannou,
  • Aristotelis Chatziioannou,
  • Klaus G. Petry

DOI
https://doi.org/10.3389/fphys.2019.01160
Journal volume & issue
Vol. 10

Abstract

Read online

Phage Display is a powerful method for the identification of peptide binding to targets of variable complexities and tissues, from unique molecules to the internal surfaces of vessels of living organisms. Particularly for in vivo screenings, the resulting repertoires can be very complex and difficult to study with traditional approaches. Next Generation Sequencing (NGS) opened the possibility to acquire high resolution overviews of such repertoires and thus facilitates the identification of binders of interest. Additionally, the ever-increasing amount of available genome/proteome information became satisfactory regarding the identification of putative mimicked proteins, due to the large scale on which partial sequence homology is assessed. However, the subsequent production of massive data stresses the need for high-performance computational approaches in order to perform standardized and insightful molecular network analysis. Systems-level analysis is essential for efficient resolution of the underlying molecular complexity and the extraction of actionable interpretation, in terms of systemic biological processes and pathways that are systematically perturbed. In this work we introduce PepSimili, an integrated workflow tool, which performs mapping of massive peptide repertoires on whole proteomes and delivers a streamlined, systems-level biological interpretation. The tool employs modules for modeling and filtering of background noise due to random mappings and amplifies the biologically meaningful signal through coupling with BioInfoMiner, a systems interpretation tool that employs graph-theoretic methods for prioritization of systemic processes and corresponding driver genes. The current implementation exploits the Galaxy environment and is available online. A case study using public data is presented, with and without a control selection.

Keywords