Frontiers in Genetics (Sep 2021)

Systematic Evaluation of DNA Sequence Variations on in vivo Transcription Factor Binding Affinity

  • Yutong Jin,
  • Jiahui Jiang,
  • Ruixuan Wang,
  • Zhaohui S. Qin

DOI
https://doi.org/10.3389/fgene.2021.667866
Journal volume & issue
Vol. 12

Abstract

Read online

The majority of the single nucleotide variants (SNVs) identified by genome-wide association studies (GWAS) fall outside of the protein-coding regions. Elucidating the functional implications of these variants has been a major challenge. A possible mechanism for functional non-coding variants is that they disrupted the canonical transcription factor (TF) binding sites that affect the in vivo binding of the TF. However, their impact varies since many positions within a TF binding motif are not well conserved. Therefore, simply annotating all variants located in putative TF binding sites may overestimate the functional impact of these SNVs. We conducted a comprehensive survey to study the effect of SNVs on the TF binding affinity. A sequence-based machine learning method was used to estimate the change in binding affinity for each SNV located inside a putative motif site. From the results obtained on 18 TF binding motifs, we found that there is a substantial variation in terms of a SNV’s impact on TF binding affinity. We found that only about 20% of SNVs located inside putative TF binding sites would likely to have significant impact on the TF-DNA binding.

Keywords