Text Semantic Annotation: A Distributed Methodology Based on Community Coherence

Christos Makris; Georgios Pispirigos; Michael Angelos Simos

doi:10.3390/a13070160

Algorithms (Jul 2020)

Text Semantic Annotation: A Distributed Methodology Based on Community Coherence

Christos Makris,
Georgios Pispirigos,
Michael Angelos Simos

Affiliations

Christos Makris: Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece
Georgios Pispirigos: Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece
Michael Angelos Simos: Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece

DOI: https://doi.org/10.3390/a13070160
Journal volume & issue: Vol. 13, no. 7
p. 160

Abstract

Read online

Text annotation is the process of identifying the sense of a textual segment within a given context to a corresponding entity on a concept ontology. As the bag of words paradigm’s limitations become increasingly discernible in modern applications, several information retrieval and artificial intelligence tasks are shifting to semantic representations for addressing the inherent natural language polysemy and homonymy challenges. With extensive application in a broad range of scientific fields, such as digital marketing, bioinformatics, chemical engineering, neuroscience, and social sciences, community detection has attracted great scientific interest. Focusing on linguistics, by aiming to identify groups of densely interconnected subgroups of semantic ontologies, community detection application has proven beneficial in terms of disambiguation improvement and ontology enhancement. In this paper we introduce a novel distributed supervised knowledge-based methodology employing community detection algorithms for text annotation with Wikipedia Entities, establishing the unprecedented concept of community Coherence as a metric for local contextual coherence compatibility. Our experimental evaluation revealed that deeper inference of relatedness and local entity community coherence in the Wikipedia graph bears substantial improvements overall via a focus on accuracy amelioration of less common annotations. The proposed methodology is propitious for wider adoption, attaining robust disambiguation performance.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords