IEEE Access (Jan 2021)

An Ontological Framework for Information Extraction From Diverse Scientific Sources

  • Gohar Zaman,
  • Hairulnizam Mahdin,
  • Khalid Hussain,
  • Atta-Ur-Rahman,
  • Jemal Abawajy,
  • Salama A. Mostafa

DOI
https://doi.org/10.1109/ACCESS.2021.3063181
Journal volume & issue
Vol. 9
pp. 42111 – 42124

Abstract

Read online

Automatic information extraction from online published scientific documents is useful in various applications such as tagging, web indexing and search engine optimization. As a result, automatic information extraction has become among the hottest areas of research in text mining. Although various information extraction techniques have been proposed in the literature, their efficiency demands domain specific documents with static and well-defined format. Furthermore, their accuracy is challenged with a slight modification in the format. To overcome these issues, a novel ontological framework for information extraction (OFIE) using fuzzy rule-base (FRB) and word sense disambiguation (WSD) is proposed. The proposed approach is validated with a significantly wider document domains sourced from well-known publishing services such as IEEE, ACM, Elsevier, and Springer. We have also compared the proposed information extraction approach against state-of-the-art techniques. The results of the experiment show that the proposed approach is less sensitive to changes in the document format and has a significantly better average accuracy of 89.14% and F-score as 89%.

Keywords