Applied Sciences (May 2022)

A Comparison of PCA-LDA and PLS-DA Techniques for Classification of Vibrational Spectra

  • Maria Lasalvia,
  • Vito Capozzi,
  • Giuseppe Perna

DOI
https://doi.org/10.3390/app12115345
Journal volume & issue
Vol. 12, no. 11
p. 5345

Abstract

Read online

Vibrational spectroscopies provide information about the biochemical and structural environment of molecular functional groups inside samples. Over the past few decades, Raman and infrared-absorption-based techniques have been extensively used to investigate biological materials under different pathological conditions. Interesting results have been obtained, so these techniques have been proposed for use in a clinical setting for diagnostic purposes, as complementary tools to conventional cytological and histological techniques. In most cases, the differences between vibrational spectra measured for healthy and diseased samples are small, even if these small differences could contain useful information to be used in the diagnostic field. Therefore, the interpretation of the results requires the use of analysis techniques able to highlight the minimal spectral variations that characterize a dataset of measurements acquired on healthy samples from a dataset of measurements relating to samples in which a pathology occurs. Multivariate analysis techniques, which can handle large datasets and explore spectral information simultaneously, are suitable for this purpose. In the present study, two multivariate statistical techniques, principal component analysis-linear discriminate analysis (PCA-LDA) and partial least square-discriminant analysis (PLS-DA) were used to analyse three different datasets of vibrational spectra, each one including spectra of two different classes: (i) a simulated dataset comprising control-like and exposed-like spectra, (ii) a dataset of Raman spectra measured for control and proton beam-exposed MCF10A breast cells and (iii) a dataset of FTIR spectra measured for malignant non-metastatic MCF7 and metastatic MDA-MB-231 breast cancer cells. Both PCA-LDA and PLS-DA techniques were first used to build a discrimination model by using calibration sets of spectra extracted from the three datasets. Then, the classification performance was established by using test sets of unknown spectra. The achieved results point out that the built classification models were able to distinguish the different spectra types with accuracy between 93% and 100%, sensitivity between 86% and 100% and specificity between 90% and 100%. The present study confirms that vibrational spectroscopy combined with multivariate analysis techniques has considerable potential for establishing reliable diagnostic models.

Keywords