LLMs in Action: Robust Metrics for Evaluating Automated Ontology Annotation Systems

Ali Noori; Pratik Devkota; Somya D. Mohanty; Prashanti Manda

doi:10.3390/info16030225

Information (Mar 2025)

LLMs in Action: Robust Metrics for Evaluating Automated Ontology Annotation Systems

Ali Noori,
Pratik Devkota,
Somya D. Mohanty,
Prashanti Manda

Affiliations

Ali Noori: Informatics and Analytics, University of North Carolina, Greensboro, NC 27412, USA
Pratik Devkota: Fractal Analytics, New York, NY 10006, USA
Somya D. Mohanty: United Health Group, Minnetonka, MN 55343, USA
Prashanti Manda: Department of Computer Science, University of Nebraska, Omaha, NE 68182, USA

DOI: https://doi.org/10.3390/info16030225
Journal volume & issue: Vol. 16, no. 3
p. 225

Abstract

Read online

Ontologies are critical for organizing and interpreting complex domain-specific knowledge, with applications in data integration, functional prediction, and knowledge discovery. As the manual curation of ontology annotations becomes increasingly infeasible due to the exponential growth of biomedical and genomic data, natural language processing (NLP)-based systems have emerged as scalable alternatives. Evaluating these systems requires robust semantic similarity metrics that account for hierarchical and partially correct relationships often present in ontology annotations. This study explores the integration of graph-based and language-based embeddings to enhance the performance of semantic similarity metrics. Combining embeddings generated via Node2Vec and large language models (LLMs) with traditional semantic similarity metrics, we demonstrate that hybrid approaches effectively capture both structural and semantic relationships within ontologies. Our results show that combined similarity metrics outperform individual metrics, achieving high accuracy in distinguishing child–parent pairs from random pairs. This work underscores the importance of robust semantic similarity metrics for evaluating and optimizing NLP-based ontology annotation systems. Future research should explore the real-time integration of these metrics and advanced neural architectures to further enhance scalability and accuracy, advancing ontology-driven analyses in biomedical research and beyond.

Published in Information

ISSN: 2078-2489 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/information/

About the journal

Abstract

Keywords