Autonomous schema markups based on intelligent computing for search engine optimization

Burhan Ud Din Abbasi; Iram Fatima; Hamid Mukhtar; Sharifullah Khan; Abdulaziz Alhumam; Hafiz Farooq Ahmad

doi:10.7717/peerj-cs.1163

PeerJ Computer Science (Dec 2022)

Autonomous schema markups based on intelligent computing for search engine optimization

Burhan Ud Din Abbasi,
Iram Fatima,
Hamid Mukhtar,
Sharifullah Khan,
Abdulaziz Alhumam,
Hafiz Farooq Ahmad

Affiliations

Burhan Ud Din Abbasi: Department of Computer Science, Bahria University, Islamabad, Pakistan
Iram Fatima: Schema App-Hunch Manifest Inc, Guelph, Canada
Hamid Mukhtar: Department of Computer Science, College of Engineering and Physical Sciences (EPS), University of Birmingham Dubai, Dubai, United Arab Emirates
Sharifullah Khan: PAF-Institute of Applied Sciences and Technology, Haripur, Pakistan
Abdulaziz Alhumam: Computer Science Department, College of Computer Sciences and Information Technology (CCSIT), King Faisal University, Al-Ahsa, Saudi Arabia
Hafiz Farooq Ahmad: Computer Science Department, College of Computer Sciences and Information Technology (CCSIT), King Faisal University, Al-Ahsa, Saudi Arabia

DOI: https://doi.org/10.7717/peerj-cs.1163
Journal volume & issue: Vol. 8
p. e1163

Abstract

Read online Read online

With advances in artificial intelligence and semantic technology, search engines are integrating semantics to address complex search queries to improve the results. This requires identification of well-known concepts or entities and their relationship from web page contents. But the increase in complex unstructured data on web pages has made the task of concept identification overly complex. Existing research focuses on entity recognition from the perspective of linguistic structures such as complete sentences and paragraphs, whereas a huge part of the data on web pages exists as unstructured text fragments enclosed in HTML tags. Ontologies provide schemas to structure the data on the web. However, including them in the web pages requires additional resources and expertise from organizations or webmasters and thus becoming a major hindrance in their large-scale adoption. We propose an approach for autonomous identification of entities from short text present in web pages to populate semantic models based on a specific ontology model. The proposed approach has been applied to a public dataset containing academic web pages. We employ a long short-term memory (LSTM) deep learning network and the random forest machine learning algorithm to predict entities. The proposed methodology gives an overall accuracy of 0.94 on the test dataset, indicating a potential for automated prediction even in the case of a limited number of training samples for various entities, thus, significantly reducing the required manual workload in practical applications.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords