Sparse Logistic Regression: Comparison of Regularization and Bayesian Implementations

Mattia Zanon; Giuliano Zambonin; Gian Antonio Susto; Seán McLoone

doi:10.3390/a13060137

Algorithms (Jun 2020)

Sparse Logistic Regression: Comparison of Regularization and Bayesian Implementations

Mattia Zanon,
Giuliano Zambonin,
Gian Antonio Susto,
Seán McLoone

Affiliations

Mattia Zanon: Department of Electronic Engineering, National University of Ireland, Maynooth (NUIM), W23 F2K8 Maynooth, Co. Kildare, Ireland
Giuliano Zambonin: Electrolux Italy S.P.A., 33080 Porcia, PN, Italy
Gian Antonio Susto: Department of Information Engineering, University of Padova, 35131 Padova, PD, Italy
Seán McLoone: School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, 16A Malone Road, Belfast BT9 5BN, UK

DOI: https://doi.org/10.3390/a13060137
Journal volume & issue: Vol. 13, no. 6
p. 137

Abstract

Read online

In knowledge-based systems, besides obtaining good output prediction accuracy, it is crucial to understand the subset of input variables that have most influence on the output, with the goal of gaining deeper insight into the underlying process. These requirements call for logistic model estimation techniques that provide a sparse solution, i.e., where coefficients associated with non-important variables are set to zero. In this work we compare the performance of two methods: the first one is based on the well known Least Absolute Shrinkage and Selection Operator (LASSO) which involves regularization with an ℓ 1 norm; the second one is the Relevance Vector Machine (RVM) which is based on a Bayesian implementation of the linear logistic model. The two methods are extensively compared in this paper, on real and simulated datasets. Results show that, in general, the two approaches are comparable in terms of prediction performance. RVM outperforms the LASSO both in term of structure recovery (estimation of the correct non-zero model coefficients) and prediction accuracy when the dimensionality of the data tends to increase. However, LASSO shows comparable performance to RVM when the dimensionality of the data is much higher than number of samples that is p > > n .

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords