npj Microgravity (Jun 2024)

Harmonizing heterogeneous transcriptomics datasets for machine learning-based analysis to identify spaceflown murine liver-specific changes

  • Hari Ilangovan,
  • Prachi Kothiyal,
  • Katherine A. Hoadley,
  • Robin Elgart,
  • Greg Eley,
  • Parastou Eslami

DOI
https://doi.org/10.1038/s41526-024-00379-3
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 11

Abstract

Read online

Abstract NASA has employed high-throughput molecular assays to identify sub-cellular changes impacting human physiology during spaceflight. Machine learning (ML) methods hold the promise to improve our ability to identify important signals within highly dimensional molecular data. However, the inherent limitation of study subject numbers within a spaceflight mission minimizes the utility of ML approaches. To overcome the sample power limitations, data from multiple spaceflight missions must be aggregated while appropriately addressing intra- and inter-study variabilities. Here we describe an approach to log transform, scale and normalize data from six heterogeneous, mouse liver-derived transcriptomics datasets (n total = 137) which enabled ML-methods to classify spaceflown vs. ground control animals (AUC ≥ 0.87) while mitigating the variability from mission-of-origin. Concordance was found between liver-specific biological processes identified from harmonized ML-based analysis and study-by-study classical omics analysis. This work demonstrates the feasibility of applying ML methods on integrated, heterogeneous datasets of small sample size.