Revisiting the Dissimilarity Representation in the Context of Regression

Vicente Garcia; J. Salvador Sanchez; Rafael Martinez-Pelaez; Luis C. Mendez-Gonzalez

doi:10.1109/access.2021.3130127

IEEE Access (Jan 2021)

Revisiting the Dissimilarity Representation in the Context of Regression

Vicente Garcia,
J. Salvador Sanchez,
Rafael Martinez-Pelaez,
Luis C. Mendez-Gonzalez

Affiliations

Vicente Garcia: ORCiD; Department of Electrical and Computer Engineering, Universidad Autónoma de Ciudad Juárez, Ciudad Juárez, México
J. Salvador Sanchez: ORCiD; Department of Computer Languages and Systems, Institute of New Imaging Technologies, Universitat Jaume I, Castelló de la Plana, Spain
Rafael Martinez-Pelaez: Faculty of Information Technologies, Universidad de La Salle Bajío, León, México
Luis C. Mendez-Gonzalez: ORCiD; Department of Industrial Engineering and Manufacturing, Universidad Autónoma de Ciudad Juárez, Ciudad Juárez, México

DOI: https://doi.org/10.1109/access.2021.3130127
Journal volume & issue: Vol. 9
pp. 157043 – 157051

Abstract

Read online

In machine learning, a natural way to represent an instance is by using a feature vector. However, several studies have shown that this representation may not accurately characterize an object. For classification problems, the dissimilarity paradigm has been proposed as an alternative to the standard feature-based approach. Encoding each object by pairwise dissimilarities has been demonstrated to improve the data quality because it mitigates some complexities such as class overlap, small disjuncts, and low-sample size. However, its suitability and performance when applied to regression problems have not been fully explored. This study redefines the dissimilarity representation for regression. To this end, we have carried out an extensive experimental evaluation on 34 datasets using two linear regression models. The results show that the dissimilarity approach decreases the error rates of both the traditional linear regression and the linear model with elastic net regularization, and it also reduces the complexity of most regression datasets.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords