Genome Biology (Sep 2023)

Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method

  • Ying Yu,
  • Naixin Zhang,
  • Yuanbang Mai,
  • Luyao Ren,
  • Qiaochu Chen,
  • Zehui Cao,
  • Qingwang Chen,
  • Yaqing Liu,
  • Wanwan Hou,
  • Jingcheng Yang,
  • Huixiao Hong,
  • Joshua Xu,
  • Weida Tong,
  • Lianhua Dong,
  • Leming Shi,
  • Xiang Fang,
  • Yuanting Zheng

DOI
https://doi.org/10.1186/s13059-023-03047-z
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 26

Abstract

Read online

Abstract Background Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios. Results As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies. Conclusions Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.

Keywords