IEEE Access (Jan 2024)

Evaluating Text Generation Model Performance by Combining Semantic Meaning and Word Order

  • Erik Novak,
  • Luka Bizjak,
  • Dunja Mladenic,
  • Marko Grobelnik

DOI
https://doi.org/10.1109/ACCESS.2024.3426082
Journal volume & issue
Vol. 12
pp. 95265 – 95277

Abstract

Read online

Modern text generation metrics use semantic representations of words to assess the quality of a text generation model without considering the fluency of the generated text. This paper proposes a novel text generation metric that combines adequacy and fluency to measure the quality of the generated text. When computing the final score using optimal transport, the metric considers semantic meaning and word order. We evaluate the metric on text translation data sets consisting of 20 language pairs from various language families and scripts. Using a novel statistic for measuring word order sensitivity, we analyze its adequacy-based performance using Pearson’s r and Kendall’s $\tau $ correlation coefficients and their sensitivity to fluency-related modifications. Results show that the proposed metric is the most sensitive to fluency-related changes among all top-performing embedding-based metrics, which were found to be relatively invariant to variations in word order. The proposed metric’s overall adequacy-based performance is lower than the best embedding-based metric but higher than the n-gram matching metrics. Our code is publicly available on GitHub (https://github.com/eriknovak/metric-OPWScore) under the BSD-2-Clause license.

Keywords