Exploring content selection strategies for Multilingual  Multi-Document Summarization based on the Universal Network Language (UNL)

Matheus Rigobelo Chaud; Ariani Di Felippo

doi:10.17851/2237-2083.26.1.45-71

Revista de Estudos da Linguagem (Nov 2017)

Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)

Matheus Rigobelo Chaud,
Ariani Di Felippo

Affiliations

Matheus Rigobelo Chaud: Universidade de São Paulo
Ariani Di Felippo: Universidade Federal de São Carlos

DOI: https://doi.org/10.17851/2237-2083.26.1.45-71
Journal volume & issue: Vol. 26, no. 1
pp. 45 – 71

Abstract

Read online

Multilingual Multi-Document Summarization aims at ranking the sentences of a cluster with (at least) 2 news texts (1 in the user’s language and 1 in a foreign language), and select the top-ranked sentences for a summary in the user’s language. We explored three concept-based statistics and one superficial strategy for sentence ranking. We used a bilingual corpus (Brazilian Portuguese-English) encoded in UNL (Universal Network Language) with source and summary sentences aligned based on content overlap. Our experiment shows that “concept frequency normalized by the number of concepts in the sentence” is the measure that best ranks the sentences selected by humans. However, it does not outperform the superficial strategy based on the position of the sentences in the texts. This indicates that the most frequent concepts are not always contained in first sentences, usually selected by humans to build the summaries because they convey the main information of the collection. Keywords: content selection; concept; statistical measure; multilingual corpus; multi-document summarization.

Published in Revista de Estudos da Linguagem

ISSN: 0104-0588 (Print); 2237-2083 (Online)
Publisher: Universidade Federal de Minas Gerais
Country of publisher: Brazil
LCC subjects: Language and Literature: Philology. Linguistics
Website: http://periodicos.letras.ufmg.br/index.php/relin

About the journal