IEEE Access (Jan 2022)

Urdu Wikification and Its Application in Urdu News Recommendation System

  • Safia Kanwal,
  • Muhammad Kamran Malik,
  • Zubair Nawaz,
  • Khawar Mehmood

DOI
https://doi.org/10.1109/ACCESS.2022.3208666
Journal volume & issue
Vol. 10
pp. 103655 – 103668

Abstract

Read online

Wikification is the process of linking the entities found in a sample text to their individual Wikipedia or Wikidata pages. Many natural language processing applications, including question-answering systems, information retrieval, fraud detection, and recommendation systems(RS), can benefit from this information extraction technique. There has been a great deal of effort put towards entity-linking(EL) for both Asian and Western languages, with several datasets and numerous proposed methodologies. Despite millions of Urdu language users globally, relatively little entity-linking research has been done for Urdu. This work proposes an Urdu EL pipeline to identify named entities in text and link them to Wikidata. Secondly, a dataset of 550 Urdu news titles relating to their respective Wiki-ids has been prepared for the examination. Third, utilizing the proposed EL pipeline, 16738 news articles from the first-ever Urdu news RS dataset of 100 users are annotated. Fourthly, a sub Knowledge graph (KG) of 8439 entities and 23080 relationship tuples is retrieved from Wikidata. The Trans-E algorithm is then used to create KG embeddings so that the extracted KG may be used in an Urdu news RS. The final accuracy of Urdu news RS is 60.8%.

Keywords