Cells (Apr 2021)

The Impact of Preprocessing Methods for a Successful Prostate Cell Lines Discrimination Using Partial Least Squares Regression and Discriminant Analysis Based on Fourier Transform Infrared Imaging

  • Danuta Liberda,
  • Ewa Pięta,
  • Katarzyna Pogoda,
  • Natalia Piergies,
  • Maciej Roman,
  • Paulina Koziol,
  • Tomasz P. Wrobel,
  • Czeslawa Paluszkiewicz,
  • Wojciech M. Kwiatek

DOI
https://doi.org/10.3390/cells10040953
Journal volume & issue
Vol. 10, no. 4
p. 953

Abstract

Read online

Fourier transform infrared spectroscopy (FT-IR) is widely used in the analysis of the chemical composition of biological materials and has the potential to reveal new aspects of the molecular basis of diseases, including different types of cancer. The potential of FT-IR in cancer research lies in its capability of monitoring the biochemical status of cells, which undergo malignant transformation and further examination of spectral features that differentiate normal and cancerous ones using proper mathematical approaches. Such examination can be performed with the use of chemometric tools, such as partial least squares discriminant analysis (PLS-DA) classification and partial least squares regression (PLSR), and proper application of preprocessing methods and their correct sequence is crucial for success. Here, we performed a comparison of several state-of-the-art methods commonly used in infrared biospectroscopy (denoising, baseline correction, and normalization) with the addition of methods not previously used in infrared biospectroscopy classification problems: Mie extinction extended multiplicative signal correction, Eiler’s smoothing, and probabilistic quotient normalization. We compared all of these approaches and their effect on the data structure, classification, and regression capability on experimental FT-IR spectra collected from five different prostate normal and cancerous cell lines. Additionally, we tested the influence of added spectral noise. Overall, we concluded that in the case of the data analyzed here, the biggest impact on data structure and performance of PLS-DA and PLSR was caused by the baseline correction; therefore, much attention should be given, especially to this step of data preprocessing.

Keywords