In silico discovery of repetitive elements as key sequence determinants of 3D genome folding
Laura M. Gunsalus,
Michael J. Keiser,
Katherine S. Pollard
Affiliations
Laura M. Gunsalus
Gladstone Institutes, San Francisco, CA, USA; Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
Michael J. Keiser
Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA; Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, CA, USA; Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
Katherine S. Pollard
Gladstone Institutes, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA; Chan Zuckerberg Biohub, San Francisco, CA, USA; Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA, USA; Corresponding author
Summary: Natural and experimental genetic variants can modify DNA loops and insulating boundaries to tune transcription, but it is unknown how sequence perturbations affect chromatin organization genome wide. We developed a deep-learning strategy to quantify the effect of any insertion, deletion, or substitution on chromatin contacts and systematically scored millions of synthetic variants. While most genetic manipulations have little impact, regions with CTCF motifs and active transcription are highly sensitive, as expected. Our unbiased screen and subsequent targeted experiments also point to noncoding RNA genes and several families of repetitive elements as CTCF-motif-free DNA sequences with particularly large effects on nearby chromatin interactions, sometimes exceeding the effects of CTCF sites and explaining interactions that lack CTCF. We anticipate that our disruption tracks may be of broad interest and utility as a measure of 3D genome sensitivity, and our computational strategies may serve as a template for biological inquiry with deep learning.