Tehnički Vjesnik (Jan 2017)
Document similarity in repeatedly translated corpora
Abstract
The paper analyses the changes in relationship between documents in textual corpus that occur due to the translation into another language. Authors analyzed the similarities between documents in original corpus, in Croatian, and compared them with the corresponding documents in translated corpus, in English. The changes were analyzed using two measures, chi-square test’s P-value and new proposed measure, correction coefficient.
Keywords