IEEE Access (Jan 2023)
Semi-Supervised Bootstrapped Syntax-Semantics-Based Approach for Agriculture Relation Extraction for Knowledge Graph Creation and Reasoning
Abstract
We propose a novel approach that uses semi-supervised learning to extract triplets from domain-specific texts and create a Knowledge Graph (KG), with a focus on the agricultural domain. Building domain specific knowledge graphs can be challenging due to several factors such as domain specific vocabulary, data integration challenges, dynamic data, and the need for domain expertise. Our approach primarily focuses on triplet extraction for the creation of knowledge graph. We employ dependency parsing techniques to extract relationships between entities, and utilize an extended version of BERT, combined with Latent Dirichlet Allocation (LDA) for Named Entity Recognition (NER). The proposed Agriculture knowledge graph covers significant areas of the agricultural domain by focusing on six major entities: soil, place, disease, pathogen, pesticide, and crops, along with their intra and inter-relationships. There is no benchmark dataset in the agriculture domain encompassing all the major entities. Hence we create our own corpus comprises 30k sentences sourced from reputable agriculture websites. To evaluate the effectiveness of our triplet extraction model, we utilized a test corpus containing 3500 agriculture triplets. Based on the experimental results, we were able to achieve an average macro F-score of 87% for relation extraction, indicating the efficacy of our approach. Additionally, we created an Agriculture knowledge graph using a triplet corpus of 6236 triplets. We also analyzed various knowledge graph reasoning models that improve the discovery of implicit knowledge that is not explicitly represented in the knowledge graph. Experimental results indicate that our approach is effective in creating triplets and reasoning knowledge graphs for the agricultural domain.
Keywords