Semi-Supervised Bootstrapped Syntax-Semantics-Based Approach for Agriculture Relation Extraction for Knowledge Graph Creation and Reasoning

G. Veena; Deepa Gupta; Vani Kanjirangat

doi:10.1109/access.2023.3339552

IEEE Access (Jan 2023)

Semi-Supervised Bootstrapped Syntax-Semantics-Based Approach for Agriculture Relation Extraction for Knowledge Graph Creation and Reasoning

G. Veena,
Deepa Gupta,
Vani Kanjirangat

Affiliations

G. Veena: ORCiD; Department of Computer Science and Applications, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, India
Deepa Gupta: ORCiD; Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Bengaluru, India
Vani Kanjirangat: Istituto Dalle Molle di Studi sull’Intelligenza Artificiale (IDSIA USI/SUPSI), Viganello, Switzerland

DOI: https://doi.org/10.1109/access.2023.3339552
Journal volume & issue: Vol. 11
pp. 138375 – 138398

Abstract

Read online

We propose a novel approach that uses semi-supervised learning to extract triplets from domain-specific texts and create a Knowledge Graph (KG), with a focus on the agricultural domain. Building domain specific knowledge graphs can be challenging due to several factors such as domain specific vocabulary, data integration challenges, dynamic data, and the need for domain expertise. Our approach primarily focuses on triplet extraction for the creation of knowledge graph. We employ dependency parsing techniques to extract relationships between entities, and utilize an extended version of BERT, combined with Latent Dirichlet Allocation (LDA) for Named Entity Recognition (NER). The proposed Agriculture knowledge graph covers significant areas of the agricultural domain by focusing on six major entities: soil, place, disease, pathogen, pesticide, and crops, along with their intra and inter-relationships. There is no benchmark dataset in the agriculture domain encompassing all the major entities. Hence we create our own corpus comprises 30k sentences sourced from reputable agriculture websites. To evaluate the effectiveness of our triplet extraction model, we utilized a test corpus containing 3500 agriculture triplets. Based on the experimental results, we were able to achieve an average macro F-score of 87% for relation extraction, indicating the efficacy of our approach. Additionally, we created an Agriculture knowledge graph using a triplet corpus of 6236 triplets. We also analyzed various knowledge graph reasoning models that improve the discovery of implicit knowledge that is not explicitly represented in the knowledge graph. Experimental results indicate that our approach is effective in creating triplets and reasoning knowledge graphs for the agricultural domain.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords