Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-Checking

Ethan Joseph; Julian Lioanag; Mei Si

doi:10.4000/ijcol.909

IJCoL (Dec 2021)

Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-Checking

Ethan Joseph,
Julian Lioanag,
Mei Si

Affiliations

Ethan Joseph
Julian Lioanag
Mei Si

DOI: https://doi.org/10.4000/ijcol.909
Journal volume & issue: Vol. 7
pp. 223 – 244

Abstract

Read online

Transforming numerical data into natural language descriptions (data-to-text) requires presenting the data in the correct context, supplementing plausible details, and creating an overall coherent and non-conflicting narrative. In this work, we propose a generate-extract-correct pipeline for the task. We use transfer learning with an auxiliary task of keeping high-frequency word sequences from the training data for text generation. We then apply information extraction to the generated text to check its accuracy, followed by correction, and thus ensure the coherence of the generated narrative. We demonstrate the effectiveness of this approach with both objective and subjective evaluations. Using an empirical evaluation, we show that people rated our system’s outputs similarly to human-written text regarding its coherence, conciseness, and grammar.

Published in IJCoL

ISSN: 2499-4553 (Online)
Publisher: Accademia University Press
Country of publisher: Italy
LCC subjects: Social Sciences; Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing
Website: https://journals.openedition.org/ijcol

About the journal