Photonics (Mar 2024)
Classifying Raman Spectra of Colon Cells Based on Machine Learning Algorithms
Abstract
Colorectal cancer is very widespread in developed countries. Its diagnosis partly depends on pathologists’ experience and their laboratories’ instrumentation, producing uncertainty in diagnosis. The use of spectroscopic techniques sensitive to the cellular biochemical environment could aid in achieving a reliable diagnosis. So, we used Raman micro-spectroscopy, combined with a spectral analysis by means of machine learning methods, to build classification models, which allow colon cancer to be diagnosed in cell samples, in order to support such methods as complementary tools for achieving a reliable identification of colon cancer. The Raman spectra were analyzed in the 980–1800 cm−1 range by focusing the laser beam onto the nuclei and the cytoplasm regions of single FHC and CaCo-2 cells (modelling healthy and cancerous samples, respectively) grown onto glass coverslips. The comparison of the Raman intensity of several spectral peaks and the Principal Component Analysis highlighted small biochemical differences between healthy and cancerous cells mainly due to the larger relative lipid content in the former cells with respect to the latter ones and to the larger relative amount of nucleic acid components in cancerous cells compared with healthy ones. We considered four classification algorithms (logistic regression, support vector machine, k nearest neighbors, and a neural network) to associate unknown Raman spectra with the cell type to which they belong. The built machine learning methods achieved median values of classification accuracy ranging from 95.5% to 97.1%, sensitivity values ranging from 95.5% to 100%, and specificity values ranging from 93.9% to 97.1%. The same median values of the classification parameters, which were estimated for a testing set including unknown spectra, ranged between 93.1% and 100% for accuracy and between 92.9% and 100% for sensitivity and specificity. A comparison of the four methods pointed out that k nearest neighbors and neural networks better perform the classification of nucleus and cytoplasm spectra, respectively. These findings are a further step towards the perspective of clinical translation of the Raman technique assisted by multivariate analysis as a support method to the standard cytological and immunohistochemical methods for diagnostic purposes.
Keywords