So you think you can PLS-DA?

Daniel Ruiz-Perez; Haibin Guan; Purnima Madhivanan; Kalai Mathee; Giri Narasimhan

doi:10.1186/s12859-019-3310-7

BMC Bioinformatics (Dec 2020)

So you think you can PLS-DA?

Daniel Ruiz-Perez,
Haibin Guan,
Purnima Madhivanan,
Kalai Mathee,
Giri Narasimhan

Affiliations

Daniel Ruiz-Perez: Bioinformatics Research Group (BioRG), Florida International University
Haibin Guan: Bioinformatics Research Group (BioRG), Florida International University
Purnima Madhivanan: Department of Epidemiology, Florida International University
Kalai Mathee: Herbert Wertheim College of Medicine, Florida International University
Giri Narasimhan: Bioinformatics Research Group (BioRG), Florida International University

DOI: https://doi.org/10.1186/s12859-019-3310-7
Journal volume & issue: Vol. 21, no. S1
pp. 1 – 10

Abstract

Read online

Abstract Background Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). Results We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input. Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated. Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. All the 3D figures shown in this paper as well as the supplementary ones can be viewed interactively at http://biorg.cs.fiu.edu/plsda Conclusions Our results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA for different underlying data models.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords