Negation Detection on Mexican Spanish Tweets: The T-MexNeg Corpus

Gemma Bel-Enguix; Helena Gómez-Adorno; Alejandro Pimentel; Sergio-Luis Ojeda-Trueba; Brian Aguilar-Vizuet

doi:10.3390/app11093880

Applied Sciences (Apr 2021)

Negation Detection on Mexican Spanish Tweets: The T-MexNeg Corpus

Gemma Bel-Enguix,
Helena Gómez-Adorno,
Alejandro Pimentel,
Sergio-Luis Ojeda-Trueba,
Brian Aguilar-Vizuet

Affiliations

Gemma Bel-Enguix: Instituto de Ingeniería, Universidad Nacional Autónoma de México, Circuito Escolar, Ingeniería S/N, C.U., Coyoacán, 04510 Ciudad de México, Mexico
Helena Gómez-Adorno: Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Circuto Escolar 3000, C.U., Coyoacán, 04510 Ciudad de México, México
Alejandro Pimentel: Instituto de Ingeniería, Universidad Nacional Autónoma de México, Circuito Escolar, Ingeniería S/N, C.U., Coyoacán, 04510 Ciudad de México, Mexico
Sergio-Luis Ojeda-Trueba: Instituto de Ingeniería, Universidad Nacional Autónoma de México, Circuito Escolar, Ingeniería S/N, C.U., Coyoacán, 04510 Ciudad de México, Mexico
Brian Aguilar-Vizuet: Facultad de Ciencias, Universidad Nacional Autónoma de México, Circuito Exterior, S/N, C.U., Coyoacán, 04510 Ciudad de México, México

DOI: https://doi.org/10.3390/app11093880
Journal volume & issue: Vol. 11, no. 9
p. 3880

Abstract

Read online

In this paper, we introduce the T-MexNeg corpus of Tweets written in Mexican Spanish. It consists of 13,704 Tweets, of which 4895 contain negation structures. We performed an analysis of negation statements embedded in the language employed on social media. This research paper aims to present the annotation guidelines along with a novel resource targeted at the negation detection task. The corpus was manually annotated with labels of negation cue, scope, and, event. We report the analysis of the inter-annotator agreement for all the components of the negation structure. This resource is freely available. Furthermore, we performed various experiments to automatically identify negation using the T-MexNeg corpus and the SFU ReviewSP-NEG for training a machine learning algorithm. By comparing two different methodologies, one based on a dictionary and the other based on the Conditional Random Fields algorithm, we found that the results of negation identification on Twitter are lower when the model is trained on the SFU ReviewSP-NEG Corpus. Therefore, this paper shows the importance of having resources built specifically to deal with social media language.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords