Applied Sciences (Jun 2020)

Paraphrase Identification with Lexical, Syntactic and Sentential Encodings

  • Sheng Xu,
  • Xingfa Shen,
  • Fumiyo Fukumoto,
  • Jiyi Li,
  • Yoshimi Suzuki,
  • Hiromitsu Nishizaki

DOI
https://doi.org/10.3390/app10124144
Journal volume & issue
Vol. 10, no. 12
p. 4144

Abstract

Read online

Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents an approach for leveraging contextual features with a neural-based learning model. Our Lexical, Syntactic, and Sentential Encodings (LSSE) learning model incorporates Relational Graph Convolutional Networks (R-GCNs) to make use of different features from local contexts, i.e., word encoding, position encoding, and full dependency structures. By utilizing the hidden states obtained by the R-GCNs as well as lexical and sentential encodings by Bidirectional Encoder Representations from Transformers (BERT), our model learns the contextual similarity between sentences effectively. The experimental results by using the two benchmark datasets, Microsoft Research Paraphrase Corpus (MRPC) and Quora Question Pairs (QQP) show that the improvement compared with the baseline, BERT sentential encodings model, was 1.7% F1-score on MRPC and 1.0% F1-score on QQP. Moreover, we verified that the combination of position encoding and syntactic features contributes to performance improvement.

Keywords