A Systematic Review on Semantic Role Labeling for Information Extraction in Low-Resource Data

Amelia Devi Putri Ariyanto; Diana Purwitasari; Chastine Fatichah

doi:10.1109/ACCESS.2024.3392370

IEEE Access (Jan 2024)

A Systematic Review on Semantic Role Labeling for Information Extraction in Low-Resource Data

Amelia Devi Putri Ariyanto,
Diana Purwitasari,
Chastine Fatichah

Affiliations

Amelia Devi Putri Ariyanto: Department of Informatics, Faculty of Intelligent Electrical and Informatics Technology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
Diana Purwitasari: ORCiD; Department of Informatics, Faculty of Intelligent Electrical and Informatics Technology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
Chastine Fatichah: ORCiD; Department of Informatics, Faculty of Intelligent Electrical and Informatics Technology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia

DOI: https://doi.org/10.1109/ACCESS.2024.3392370
Journal volume & issue: Vol. 12
pp. 57917 – 57946

Abstract

Read online

Challenges in the big data phenomenon arise due to the existence of unstructured text data, which is very large, comes from various sources, has various formats, and contains much noise. The complexity of unstructured text data makes it difficult to extract useful information. Therefore, a process is needed to transform it into structured data to be processed further. The information Extraction (IE) process helps to extract relationships, entities, semantic roles, and events from unstructured text data by converting them into structured output. One of IE’s tasks is Semantic Role Labeling (SRL), which has a crucial function in identifying semantic roles in a sentence so that it can enrich the understanding of the text. However, much of SRL development focuses on high-resource data, especially in English. The limited development of SRL in specific low-resource languages or domains is a complex challenge. This research aims to conduct a systematic study on the development of SRL for low-resource data, both in low-resource language or domain-specific contexts. The review process was carried out systematically using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) model, and 54 quality papers were obtained from the filtering process (from 2018 to 2023). We review several essential points, including (1) datasets that are often used for SRL tasks and their labeling strategies for low-resource data, (2) methods that have currently been developed for SRL tasks and learning scenarios when dealing with low-resource data, (4) evaluation metrics, (5) application of SRL tasks. This review is complemented by a discussion of issues and potential solutions for developing SRL on low-resource data to help researchers develop SRL more effectively in dealing with the challenges faced with low-resource data.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords