Genome Biology (Jul 2019)

CONFINED: distinguishing biological from technical sources of variation by leveraging multiple methylation datasets

  • Mike Thompson,
  • Zeyuan Johnson Chen,
  • Elior Rahmani,
  • Eran Halperin

DOI
https://doi.org/10.1186/s13059-019-1743-y
Journal volume & issue
Vol. 20, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Methylation datasets are affected by innumerable sources of variability, both biological (cell-type composition, genetics) and technical (batch effects). Here, we propose a reference-free method based on sparse canonical correlation analysis to separate the biological from technical sources of variability. We show through simulations and real data that our method, CONFINED, is not only more accurate than the state-of-the-art reference-free methods for capturing known, replicable biological variability, but it is also considerably more robust to dataset-specific technical variability than previous approaches. CONFINED is available as an R package as detailed at https://github.com/cozygene/CONFINED .