An Eclectic Approach for Enhancing Language Models Through Rich Embedding Features

Edwin Aldana-Bobadilla; Victor Jesus Sosa-Sosa; Alejandro Molina-Villegas; Karina Gazca-Hernandez; Jose Angel Olivas

doi:10.1109/ACCESS.2024.3422971

IEEE Access (Jan 2024)

An Eclectic Approach for Enhancing Language Models Through Rich Embedding Features

Edwin Aldana-Bobadilla,
Victor Jesus Sosa-Sosa,
Alejandro Molina-Villegas,
Karina Gazca-Hernandez,
Jose Angel Olivas

Affiliations

Edwin Aldana-Bobadilla: ORCiD; CONAHCYT, Mexico City, Mexico
Victor Jesus Sosa-Sosa: ORCiD; Cinvestav, Unidad Tamaulipas, Ciudad Victoria, Tamaulipas, Mexico
Alejandro Molina-Villegas: ORCiD; CONAHCYT, Mexico City, Mexico
Karina Gazca-Hernandez: ORCiD; Cinvestav, Unidad Tamaulipas, Ciudad Victoria, Tamaulipas, Mexico
Jose Angel Olivas: ORCiD; Grupo SMILe, Universidad de Castilla-La Mancha, Ciudad Real, Spain

DOI: https://doi.org/10.1109/ACCESS.2024.3422971
Journal volume & issue: Vol. 12
pp. 100921 – 100938

Abstract

Read online

Text processing is a fundamental aspect of Natural Language Processing (NLP) and is crucial for various applications in fields such as artificial intelligence, data science, and information retrieval. It plays a core role in language models. Most text-processing approaches focus on describing and synthesizing, to a greater or lesser degree, lexical, syntactic, and semantic properties of text in the form of numerical vectors that induce a metric space, in which, it is possible to find underlying patterns and structures related to the original text. Since each approach has strengths and weaknesses, finding a single approach that perfectly extracts representative text properties for every task and application domain is hard. This paper proposes a novel approach capable of synthesizing information from heterogeneous state-of-the-art text processing approaches into a unified representation. Encouraging results demonstrate that using this representation in popular machine-learning tasks not only leads to superior performance but also offers notable advantages in memory efficiency and preservation of underlying information of the distinct sources involved in such a representation.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords