Epigenetics (Dec 2022)

An evaluation of the genome-wide false positive rates of common methods for identifying differentially methylated regions using illumina methylation arrays

  • Yuanchao Zheng,
  • Kathryn L. Lunetta,
  • Chunyu Liu,
  • Seyma Katrinli,
  • Alicia K. Smith,
  • Mark W. Miller,
  • Mark W. Logue

DOI
https://doi.org/10.1080/15592294.2022.2115600
Journal volume & issue
Vol. 17, no. 13
pp. 2241 – 2258

Abstract

Read online

Differentially methylated regions (DMRs) are genomic regions with specific methylation patterns across multiple loci that are associated with a phenotype. We examined the genome-wide false positive (GFP) rates of five widely used DMR methods: comb-p, Bumphunter, DMRcate, mCSEA and coMethDMR using both Illumina HumanMethylation450 (450 K) and MethylationEPIC (EPIC) data and simulated continuous and dichotomous null phenotypes (i.e., generated independently of methylation data). coMethDMR provided well-controlled GFP rates (~5%) except when analysing skewed continuous phenotypes. DMRcate generally had well-controlled GFP rates when applied to 450 K data except for the skewed continuous phenotype and EPIC data only for the normally distributed continuous phenotype. GFP rates for mCSEA were at least 0.096 and comb-p yielded GFP rates above 0.34. Bumphunter had high GFP rates of at least 0.35 across conditions, reaching as high as 0.95. Analysis of the performance of these methods in specific regions of the genome found that regions with higher correlation across loci had higher regional false positive rates on average across methods. Based on the false positive rates, coMethDMR is the most recommended analysis method, and DMRcate had acceptable performance when analysing 450 K data. However, as both could display higher levels of FPs for skewed continuous distributions, a normalizing transformation of skewed continuous phenotypes is suggested. This study highlights the importance of genome-wide simulations when evaluating the performance of DMR-analysis methods.

Keywords