PLoS ONE (Jan 2013)

Quality evaluation of methyl binding domain based kits for enrichment DNA-methylation sequencing.

  • Tim De Meyer,
  • Evi Mampaey,
  • Michaël Vlemmix,
  • Simon Denil,
  • Geert Trooskens,
  • Jean-Pierre Renard,
  • Sarah De Keulenaer,
  • Pierre Dehan,
  • Gerben Menschaert,
  • Wim Van Criekinge

DOI
https://doi.org/10.1371/journal.pone.0059068
Journal volume & issue
Vol. 8, no. 3
p. e59068

Abstract

Read online

DNA-methylation is an important epigenetic feature in health and disease. Methylated sequence capturing by Methyl Binding Domain (MBD) based enrichment followed by second-generation sequencing provides the best combination of sensitivity and cost-efficiency for genome-wide DNA-methylation profiling. However, existing implementations are numerous, and quality control and optimization require expensive external validation. Therefore, this study has two aims: 1) to identify a best performing kit for MBD-based enrichment using independent validation data, and 2) to evaluate whether quality evaluation can also be performed solely based on the characteristics of the generated sequences. Five commercially available kits for MBD enrichment were combined with Illumina GAIIx sequencing for three cell lines (HCT15, DU145, PC3). Reduced representation bisulfite sequencing data (all three cell lines) and publicly available Illumina Infinium BeadChip data (DU145 and PC3) were used for benchmarking. Consistent large-scale differences in yield, sensitivity and specificity between the different kits could be identified, with Diagenode's MethylCap kit as overall best performing kit under the tested conditions. This kit could also be identified with the Fragment CpG-plot, which summarizes the CpG content of the captured fragments, implying that the latter can be used as a tool to monitor data quality. In conclusion, there are major quality differences between kits for MBD-based capturing of methylated DNA, with the MethylCap kit performing best under the used settings. The Fragment CpG-plot is able to monitor data quality based on inherent sequence data characteristics, and is therefore a cost-efficient tool for experimental optimization, but also to monitor quality throughout routine applications.