Maximizing Relation Extraction Potential: A Data-Centric Study to Unveil Challenges and Opportunities

Anushka Swarup; Avanti Bhandarkar; Olivia P. Dizon-Paradis; Ronald Wilson; Damon L. Woodard

doi:10.1109/ACCESS.2024.3494737

IEEE Access (Jan 2024)

Maximizing Relation Extraction Potential: A Data-Centric Study to Unveil Challenges and Opportunities

Anushka Swarup,
Avanti Bhandarkar,
Olivia P. Dizon-Paradis,
Ronald Wilson,
Damon L. Woodard

Affiliations

Anushka Swarup: ORCiD; Florida Institute for National Security (FINS), University of Florida, Gainesville, FL, USA
Avanti Bhandarkar: ORCiD; Florida Institute for National Security (FINS), University of Florida, Gainesville, FL, USA
Olivia P. Dizon-Paradis: ORCiD; Florida Institute for National Security (FINS), University of Florida, Gainesville, FL, USA
Ronald Wilson: ORCiD; Florida Institute for National Security (FINS), University of Florida, Gainesville, FL, USA
Damon L. Woodard: ORCiD; Florida Institute for National Security (FINS), University of Florida, Gainesville, FL, USA

DOI: https://doi.org/10.1109/ACCESS.2024.3494737
Journal volume & issue: Vol. 12
pp. 167655 – 167682

Abstract

Read online

Relation extraction is a Natural Language Processing task that aims to extract relationships from textual data. It is a critical step for information extraction. Due to its wide-scale applicability, research in relation extraction has rapidly scaled to using highly advanced neural networks. Despite their computational superiority, modern relation extractors fail to handle complicated extraction scenarios. However, a comprehensive performance analysis of the state-of-the-art extractors that compile these challenges has been missing from the literature, and this paper aims to bridge this gap. The goal has been to investigate the possible data-centric characteristics that impede neural relation extraction. Based on extensive experiments conducted using 15 state-of-the-art relation extraction algorithms ranging from recurrent architectures to large language models and seven large-scale datasets, this research suggests that modern relation extractors are not robust to complex data and relation characteristics. It emphasizes pivotal issues, such as contextual ambiguity, correlating relations, long-tail data, and fine-grained relation distributions. In addition, it sets a marker for future directions to alleviate these issues, thereby proving to be a critical resource for novice and advanced researchers. Efficient handling of the challenges described can have significant implications for the field of information extraction, which is a critical part of popular systems such as search engines and chatbots. Data and relevant code can be found at https://aaig.ece.ufl.edu/projects/relation-extraction.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords