Informatics in Medicine Unlocked (Jan 2021)

Predicting human–pathogen protein–protein interactions using Natural Language Processing methods

  • Nikhil Mathews,
  • Tuan Tran,
  • Banafsheh Rekabdar,
  • Chinwe Ekenna

Journal volume & issue
Vol. 26
p. 100738

Abstract

Read online

In this paper, we predict the interaction of proteins between Humans and Yersinia pestis via amino acid sequences. We utilize multiple Natural Language Processing (NLP) methods available in deep learning in a unique format and produce promising results. Our developed model gives a cross-validation AUC score of 0.92 and is comparable with other work that utilizes extensive biochemical properties i.e, network and sequence in conjunction. We achieve this by combining advanced tools in neural machine translation into an integrated end-to-end deep learning framework as well as methods of preprocessing that are novel to the field of bioinformatics. We show that our proposed approach is robust to different protein–protein interactions between host and pathogen data.

Keywords