Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese

Pedro Fialho; Luísa Coheur; Paulo Quaresma

doi:10.3390/info11100484

Information (Oct 2020)

Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese

Pedro Fialho,
Luísa Coheur,
Paulo Quaresma

Affiliations

Pedro Fialho: INESC-ID, Rua Alves Redol 9, 1000-029 Lisboa, Portugal
Luísa Coheur: INESC-ID, Rua Alves Redol 9, 1000-029 Lisboa, Portugal
Paulo Quaresma: INESC-ID, Rua Alves Redol 9, 1000-029 Lisboa, Portugal

DOI: https://doi.org/10.3390/info11100484
Journal volume & issue: Vol. 11, no. 10
p. 484

Abstract

Read online

Two sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of pre-trained models (BERT); additionally, we studied the roles of lexical features. We tested our models in several datasets—ASSIN, SICK-BR and ASSIN2—and the best results were usually achieved with ptBERT-Large, trained in a Brazilian corpus and tuned in the latter datasets. Besides obtaining state-of-the-art results, this is, to the best of our knowledge, the most all-inclusive study about natural language inference and semantic textual similarity for the Portuguese language.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords