Journal of ICT in Education (Dec 2015)

Malay Named Entity Recognition: A Review

  • Farid Morsidi,
  • Sulaiman Sarkawi,
  • Suliana Sulaiman,
  • Siti Asma Mohammad,
  • Rohaizah Abdul Wahid

Journal volume & issue
Vol. 2
pp. 1 – 14

Abstract

Read online

The Named Entity Recognition (NER) field had been thriving for more than 15 years. NER could be defined as a process that recognizes named entities, such as the names of persons, organizations, locations, times, and quantities. The research field of NER generally emphasizes on the extraction and classification of mentions for rigid designators. This ranged from text, such as proper names, biological species, temporal expressions, and so on. NER has been utilized in many sectors, for example ranging from inquiries to morphological syntax, besides information extraction. However, most of the work had been delegated on limited domains and textual genres such as news articles and web pages. Techniques used during the processing of English text cannot be used to process Malay-related terminology. This is due to the different morphological usage of a particular language. Finding co-references and aliases in a text can be reduced to the same problem of finding all occurrences of an entity in a document. This paper proposes approaches that have been applied in the fields of NER that is in Malay, or partially related to it, in order to detect proper nouns within Malay documents. This paper also discusses the various researches done in an effort to produce high-quality training data for Malay corpus via appropriate NER algorithms and methods aside from highlighting the key points needed in improving the current NER studies.

Keywords