PLoS ONE (Jan 2020)

Machine learning utilising spectral derivative data improves cellular health classification through hyperspectral infra-red spectroscopy.

  • Ben O L Mellors,
  • Abigail M Spear,
  • Christopher R Howle,
  • Kelly Curtis,
  • Sara Macildowie,
  • Hamid Dehghani

DOI
https://doi.org/10.1371/journal.pone.0238647
Journal volume & issue
Vol. 15, no. 9
p. e0238647

Abstract

Read online

The objective differentiation of facets of cellular metabolism is important for several clinical applications, including accurate definition of tumour boundaries and targeted wound debridement. To this end, spectral biomarkers to differentiate live and necrotic/apoptotic cells have been defined using in vitro methods. The delineation of different cellular states using spectroscopic methods is difficult due to the complex nature of these biological processes. Sophisticated, objective classification methods will therefore be important for such differentiation. In this study, spectral data from healthy/traumatised cell samples using hyperspectral imaging between 2500-3500 nm were collected using a portable prototype device. Machine learning algorithms, in the form of clustering, have been performed on a variety of pre-processing data types including 'raw' unprocessed, smoothed resampling, background subtracted and spectral derivative. The resulting clusters were utilised as a diagnostic tool for the assessment of cellular health and quantified using both sensitivity and specificity to compare the different analysis methods. The raw data exhibited differences for one of the three different trauma types applied, although unable to accurately cluster all the traumatised samples due to signal contamination from the chemical insult. The background subtracted and smoothed data sets reduced the accuracy further, due to the apparent removal of key spectral features which exhibit cellular health. However, the spectral derivative data-types significantly improved the accuracy of clustering compared to other data types, with both sensitivity and specificity for the background subtracted data set being >94% highlighting its utility to account for unknown signal contamination while maintaining important cellular spectral features.