Information (Aug 2019)

Study on Unknown Term Translation Mining from Google Snippets

  • Bin Li,
  • Jianmin Yao

DOI
https://doi.org/10.3390/info10090267
Journal volume & issue
Vol. 10, no. 9
p. 267

Abstract

Read online

Bilingual web pages are widely used to mine translations of unknown terms. This study focused on an effective solution for obtaining relevant web pages, extracting translations with correct lexical boundaries, and ranking the translation candidates. This research adopted co-occurrence information to obtain the subject terms and then expanded the source query with the translation of the subject terms to collect effective bilingual search engine snippets. Afterwards, valid candidates were extracted from small-sized, noisy bilingual corpora using an improved frequency change measurement that combines adjacent information. This research developed a method that considers surface patterns, frequency−distance, and phonetic features to elect an appropriate translation. The experimental results revealed that the proposed method performed remarkably well for mining translations of unknown terms.

Keywords