Computational and Structural Biotechnology Journal (Jan 2022)

Benchmarking DNA methylation analysis of 14 alignment algorithms for whole genome bisulfite sequencing in mammals

  • Wentao Gong,
  • Xiangchun Pan,
  • Dantong Xu,
  • Guanyu Ji,
  • Yifei Wang,
  • Yuhan Tian,
  • Jiali Cai,
  • Jiaqi Li,
  • Zhe Zhang,
  • Xiaolong Yuan

Journal volume & issue
Vol. 20
pp. 4704 – 4716

Abstract

Read online

Whole genome bisulfite sequencing (WGBS) is an essential technique for methylome studies. Although a series of tools have been developed to overcome the mapping challenges caused by bisulfite treatment, the latest available tools have not been evaluated on the performance of reads mapping as well as on biological insights in multiple mammals. Herein, based on the real and simulated WGBS data of 14.77 billion reads, we undertook 936 mappings to benchmark and evaluate 14 wildly utilized alignment algorithms from reads mapping to biological interpretation in humans, cattle and pigs: Bwa-meth, BSBolt, BSMAP, Walt, Abismal, Batmeth2, Hisat_3n, Hisat_3n_repeat, Bismark-bwt2-e2e, Bismark-his2, BSSeeker2-bwt, BSSeeker2-soap2, BSSeeker2-bwt2-e2e and BSSeeker2-bwt2-local. Specifically, Bwa-meth, BSBolt, BSMAP, Bismark-bwt2-e2e and Walt exhibited higher uniquely mapped reads, mapped precision, recall and F1 score than other nine alignment algorithms, and the influences of distinct alignment algorithms on the methylomes varied considerably at the numbers and methylation levels of CpG sites, the calling of differentially methylated CpGs (DMCs) and regions (DMRs). Moreover, we reported that BSMAP showed the highest accuracy at the detection of CpG coordinates and methylation levels, the calling of DMCs, DMRs, DMR-related genes and signaling pathways. These results suggested that careful selection of algorithms to profile the genome-wide DNA methylation is required, and our works provided investigators with useful information on the choice of alignment algorithms to effectively improve the DNA methylation detection accuracy in mammals.

Keywords