Applying machine learning for multi-individual Raman spectroscopic data to identify different stages of proliferating human hepatocytes
Bihan Shen,
Chen Ma,
Lili Tang,
Zhitao Wu,
Zhaoliang Peng,
Guoyu Pan,
Hong Li
Affiliations
Bihan Shen
Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China
Chen Ma
Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
Lili Tang
School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
Zhitao Wu
Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China
Zhaoliang Peng
Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China
Guoyu Pan
Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China; University of Chinese Academy of Sciences, Beijing 100049, China; School of Chinese Materia Medica, Nanjing University of Chinese Medicine, Nanjing 210023, China; Corresponding author
Hong Li
Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China; University of Chinese Academy of Sciences, Beijing 100049, China; Corresponding author
Summary: Cell therapy using proliferating human hepatocytes (ProliHHs) is an effective treatment approach for advanced liver diseases. However, rapid and accurate identification of high-quality ProliHHs from different donors is challenging due to individual heterogeneity. Here, we developed a machine learning framework to integrate single-cell Raman spectroscopy from multiple donors and identify different stages of ProliHHs. A repository of more than 14,000 Raman spectra, consisting of primary human hepatocytes (PHHs) and different passages of ProliHHs from six donors, was generated. Using a sliding window algorithm, potential biomarkers distinguishing the different cell stages were identified through differential analysis. Leveraging machine learning models, accurate classification of cell stages was achieved in both within-donor and cross-donor prediction tasks. Furthermore, the study assessed the relationship between donor and cell numbers and its impact on prediction accuracy, facilitating improved quality control design. A similar workflow can also be extended to encompass other cell types.