Applied Sciences (Mar 2023)

The Role of Transliterated Words in Linking Bilingual News Articles in an Archive

  • Muzammil Khan,
  • Sarwar Shah Khan,
  • Yasser Alharbi,
  • Ali Alferaidi,
  • Talal Saad Alharbi,
  • Kusum Yadav

DOI
https://doi.org/10.3390/app13074435
Journal volume & issue
Vol. 13, no. 7
p. 4435

Abstract

Read online

Retrieving a specific digital information object from a multi-lingual huge and evolving news archives is challenging and complicated against a user query. The processing becomes more difficult to understand and analyze when low-resourced and morphologically complex languages like Urdu and Arabic scripts are included in the archive. Computing similarity against a query and among news articles in huge and evolving collections may be inaccurate and time-consuming at run time. This paper introduces a Similarity Measure based on Transliteration Words (SMTW) from the English language in the Urdu scripts for linking news articles extracted from multiple online sources during the preservation process. The SMTW link Urdu-to-English news articles using an upgraded Urdu-to-English lexicon, including transliteration words. The SMTW was exhaustively evaluated to assess the effectiveness using different size datasets and the results were compared with the Common Ratio Measure for Dual Language (CRMDL). The experimental results show that the SMTW was more effective than the CRMDL for linking Urdu-to-English news articles. The precision improved from 50% to 60%, recall improved from 67% to 82%, and the impact of common terms also improved.

Keywords