Molecular Systems Biology (May 2018)
High‐throughput discovery of functional disordered regions: investigation of transactivation domains
Abstract
Abstract Over 40% of proteins in any eukaryotic genome encode intrinsically disordered regions (IDRs) that do not adopt defined tertiary structures. Certain IDRs perform critical functions, but discovering them is non‐trivial as the biological context determines their function. We present IDR‐Screen, a framework to discover functional IDRs in a high‐throughput manner by simultaneously assaying large numbers of DNA sequences that code for short disordered sequences. Functionality‐conferring patterns in their protein sequence are inferred through statistical learning. Using yeast HSF1 transcription factor‐based assay, we discovered IDRs that function as transactivation domains (TADs) by screening a random sequence library and a designed library consisting of variants of 13 diverse TADs. Using machine learning, we find that segments devoid of positively charged residues but with redundant short sequence patterns of negatively charged and aromatic residues are a generic feature for TAD functionality. We anticipate that investigating defined sequence libraries using IDR‐Screen for specific functions can facilitate discovering novel and functional regions of the disordered proteome as well as understand the impact of natural and disease variants in disordered segments.
Keywords