University of Sindh Journal of Information and Communication Technology (Sep 2022)

Named Entity Recognition for Urdu Language: The UNER System, A Hybrid Approach

  • Saba Rani,
  • Hira Fatima Naqvi,
  • Fida Hussain Khoso,
  • Attia Agha,
  • Dil Nawaz Hakro

Journal volume & issue
Vol. 6, no. 3
pp. 108 – 114

Abstract

Read online

NER is a natural language processing technique that primarily classifies parts of parsed text into well-known named entities. In the domain of natural language processing, the recognition of name entities is used to classify nouns that appear in bulk text data and place these nouns into predefined groups, such as names of people, places, times, dates, organizations, etc. There is a lot of fragmented material and data on the Cyberspace, therefore scholars are working on several languages (i.e: Sindhi, English, etc.), by working on various approaches and techniques depending on their locations, to improve accessibility of filtered information for online users. The NER enhance the quality of NLP in applications including automated summarization, semantic web search, information extraction and retrieval machine translation and question answering, chatbots and others. This study designs an efficient framework to extract noun entities in Urdu using a hybrid approach. The UNER system not only extracts entities by searching through a list of names, but also extracts named entities by recognizing phrases in a given text. The UNER system is designed to recognize Urdu noun entities in pre-defined categories such as places, personal names, titled personal names, organizations, object names, trade names, abbreviations, dates and times, measurements, and text names in Urdu.

Keywords