PLoS ONE (Jan 2017)

Distributed smoothed tree kernel for protein-protein interaction extraction from the biomedical literature.

  • Gurusamy Murugesan,
  • Sabenabanu Abdulkadhar,
  • Jeyakumar Natarajan

DOI
https://doi.org/10.1371/journal.pone.0187379
Journal volume & issue
Vol. 12, no. 11
p. e0187379

Abstract

Read online

Automatic extraction of protein-protein interaction (PPI) pairs from biomedical literature is a widely examined task in biological information extraction. Currently, many kernel based approaches such as linear kernel, tree kernel, graph kernel and combination of multiple kernels has achieved promising results in PPI task. However, most of these kernel methods fail to capture the semantic relation information between two entities. In this paper, we present a special type of tree kernel for PPI extraction which exploits both syntactic (structural) and semantic vectors information known as Distributed Smoothed Tree kernel (DSTK). DSTK comprises of distributed trees with syntactic information along with distributional semantic vectors representing semantic information of the sentences or phrases. To generate robust machine learning model composition of feature based kernel and DSTK were combined using ensemble support vector machine (SVM). Five different corpora (AIMed, BioInfer, HPRD50, IEPA, and LLL) were used for evaluating the performance of our system. Experimental results show that our system achieves better f-score with five different corpora compared to other state-of-the-art systems.