Smart Agricultural Technology (Aug 2024)

A unified approach to publish semantic annotations of agricultural documents as knowledge graphs

  • Nadia Yacoubi Ayadi,
  • Stephan Bernard,
  • Robert Bossy,
  • Marine Courtin,
  • Bill Gates Happi Happi,
  • Pierre Larmande,
  • Franck Michel,
  • Claire Nédellec,
  • Catherine Roussey,
  • Catherine Faron

Journal volume & issue
Vol. 8
p. 100484

Abstract

Read online

The research results presented in this paper were obtained as part of the D2KAB project (Data to Knowledge in Agriculture and Biodiversity) which aims to develop semantic web-based tools to describe and make agronomical data actionable and accessible following the FAIR principles. We focus on constructing domain-specific Knowledge Graphs (KGs) from textual data sources, using Natural Language Processing (NLP) techniques to extract and structure relevant entities. Our approach is based on the formalization of a semantic data model using common linked open vocabularies such as the Web Annotation Ontology (OA) and the Provenance Ontology (PROV). The model was developed by formulating motivating scenarios and competency questions from domain experts. This model has been used to construct three different KGs from three distinct corpora: PubMed scientific publications on wheat and rice genetics and phenotyping, and French agricultural alert bulletins. The named entities to be recognized include genes, phenotypes, traits, genetic markers, taxa and phenological stages normalized using semantic resources such as the Wheat Trait and Phenotype Ontology (WTO), the French Crop Usage (FCU) thesaurus and the Plant Phenological Description Ontology (PPDO). Named entities were extracted using different NLP approaches and tools. The relevance of the semantic model was validated by implementing experts questions as SPARQL queries to be answered on the constructed RDF knowledge graphs. Our work demonstrates how domain-specific vocabularies and systematic querying of KGs can reveal hidden interactions and support agronomists in navigating vast amounts of data. The resources and transformation pipelines developed are publicly available in Git repositories.

Keywords