Frontiers in Molecular Biosciences (Nov 2022)

An anchored experimental design and meta-analysis approach to address batch effects in large-scale metabolomics

  • Amanda O. Shaver,
  • Amanda O. Shaver,
  • Brianna M. Garcia,
  • Brianna M. Garcia,
  • Goncalo J. Gouveia,
  • Goncalo J. Gouveia,
  • Alison M. Morse,
  • Zihao Liu,
  • Carter K. Asef,
  • Ricardo M. Borges,
  • Franklin E. Leach,
  • Franklin E. Leach,
  • Erik C. Andersen,
  • I. Jonathan Amster,
  • Facundo M. Fernández,
  • Arthur S. Edison,
  • Arthur S. Edison,
  • Arthur S. Edison,
  • Lauren M. McIntyre,
  • Lauren M. McIntyre

DOI
https://doi.org/10.3389/fmolb.2022.930204
Journal volume & issue
Vol. 9

Abstract

Read online

Untargeted metabolomics studies are unbiased but identifying the same feature across studies is complicated by environmental variation, batch effects, and instrument variability. Ideally, several studies that assay the same set of metabolic features would be used to select recurring features to pursue for identification. Here, we developed an anchored experimental design. This generalizable approach enabled us to integrate three genetic studies consisting of 14 test strains of Caenorhabditis elegans prior to the compound identification process. An anchor strain, PD1074, was included in every sample collection, resulting in a large set of biological replicates of a genetically identical strain that anchored each study. This enables us to estimate treatment effects within each batch and apply straightforward meta-analytic approaches to combine treatment effects across batches without the need for estimation of batch effects and complex normalization strategies. We collected 104 test samples for three genetic studies across six batches to produce five analytical datasets from two complementary technologies commonly used in untargeted metabolomics. Here, we use the model system C. elegans to demonstrate that an augmented design combined with experimental blocks and other metabolomic QC approaches can be used to anchor studies and enable comparisons of stable spectral features across time without the need for compound identification. This approach is generalizable to systems where the same genotype can be assayed in multiple environments and provides biologically relevant features for downstream compound identification efforts. All methods are included in the newest release of the publicly available SECIMTools based on the open-source Galaxy platform.

Keywords