Readability Metrics for Machine Translation in Dutch: Google vs. Azure & IBM

Chaïm van Toledo; Marijn Schraagen; Friso van Dijk; Matthieu Brinkhuis; Marco Spruit

doi:10.3390/app13074444

Applied Sciences (Mar 2023)

Readability Metrics for Machine Translation in Dutch: Google vs. Azure & IBM

Chaïm van Toledo,
Marijn Schraagen,
Friso van Dijk,
Matthieu Brinkhuis,
Marco Spruit

Affiliations

Chaïm van Toledo: Department of Information and Computing Sciences, Utrecht University, Princetonplein 5, 3584 CC Utrecht, The Netherlands
Marijn Schraagen: Department of Information and Computing Sciences, Utrecht University, Princetonplein 5, 3584 CC Utrecht, The Netherlands
Friso van Dijk: Department of Information and Computing Sciences, Utrecht University, Princetonplein 5, 3584 CC Utrecht, The Netherlands
Matthieu Brinkhuis: Department of Information and Computing Sciences, Utrecht University, Princetonplein 5, 3584 CC Utrecht, The Netherlands
Marco Spruit: Department of Public Health and Primary Care, Leiden University Medical Center (LUMC), Albinusdreef 2, 2333 ZA Leiden, The Netherlands

DOI: https://doi.org/10.3390/app13074444
Journal volume & issue: Vol. 13, no. 7
p. 4444

Abstract

Read online

This paper introduces a novel method to predict when a Google translation is better than other machine translations (MT) in Dutch. Instead of considering fidelity, this approach considers fluency and readability indicators for when Google ranked best. This research explores an alternative approach in the field of quality estimation. The paper contributes by publishing a dataset with sentences from English to Dutch, with human-made classifications on a best-worst scale. Logistic regression shows a correlation between T-Scan output, such as readability measurements like lemma frequencies, and when Google translation was better than Azure and IBM. The last part of the results section shows the prediction possibilities. First by logistic regression and second by a generated automated machine learning model. Respectively, they have an accuracy of 0.59 and 0.61.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords