Paraphrase Identification with Lexical, Syntactic and Sentential Encodings

Sheng Xu; Xingfa Shen; Fumiyo Fukumoto; Jiyi Li; Yoshimi Suzuki; Hiromitsu Nishizaki

doi:10.3390/app10124144

Applied Sciences (Jun 2020)

Paraphrase Identification with Lexical, Syntactic and Sentential Encodings

Sheng Xu,
Xingfa Shen,
Fumiyo Fukumoto,
Jiyi Li,
Yoshimi Suzuki,
Hiromitsu Nishizaki

Affiliations

Sheng Xu: School of Computer Science and Technology, Hangzhou Dianzi University, HangZhou 310018, China
Xingfa Shen: School of Computer Science and Technology, Hangzhou Dianzi University, HangZhou 310018, China
Fumiyo Fukumoto: Graduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, Japan
Jiyi Li: Graduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, Japan
Yoshimi Suzuki: Graduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, Japan
Hiromitsu Nishizaki: Graduate Faculty of Interdisciplinary Research, University of Yamanashi, Kofu 400-8511, Japan

DOI: https://doi.org/10.3390/app10124144
Journal volume & issue: Vol. 10, no. 12
p. 4144

Abstract

Read online

Paraphrase identification has been one of the major topics in Natural Language Processing (NLP). However, how to interpret a diversity of contexts such as lexical and semantic information within a sentence as relevant features is still an open problem. This paper addresses the problem and presents an approach for leveraging contextual features with a neural-based learning model. Our Lexical, Syntactic, and Sentential Encodings (LSSE) learning model incorporates Relational Graph Convolutional Networks (R-GCNs) to make use of different features from local contexts, i.e., word encoding, position encoding, and full dependency structures. By utilizing the hidden states obtained by the R-GCNs as well as lexical and sentential encodings by Bidirectional Encoder Representations from Transformers (BERT), our model learns the contextual similarity between sentences effectively. The experimental results by using the two benchmark datasets, Microsoft Research Paraphrase Corpus (MRPC) and Quora Question Pairs (QQP) show that the improvement compared with the baseline, BERT sentential encodings model, was 1.7% F1-score on MRPC and 1.0% F1-score on QQP. Moreover, we verified that the combination of position encoding and syntactic features contributes to performance improvement.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords