mBio (Oct 2014)
Origin Replication Complex Binding, Nucleosome Depletion Patterns, and a Primary Sequence Motif Can Predict Origins of Replication in a Genome with Epigenetic Centromeres
Abstract
ABSTRACT Origins of DNA replication are key genetic elements, yet their identification remains elusive in most organisms. In previous work, we found that centromeres contain origins of replication (ORIs) that are determined epigenetically in the pathogenic yeast Candida albicans. In this study, we used origin recognition complex (ORC) binding and nucleosome occupancy patterns in Saccharomyces cerevisiae and Kluyveromyces lactis to train a machine learning algorithm to predict the position of active arm (noncentromeric) origins in the C. albicans genome. The model identified bona fide active origins as determined by the presence of replication intermediates on nondenaturing two-dimensional (2D) gels. Importantly, these origins function at their native chromosomal loci and also as autonomously replicating sequences (ARSs) on a linear plasmid. A “mini-ARS screen” identified at least one and often two ARS regions of ≥100 bp within each bona fide origin. Furthermore, a 15-bp AC-rich consensus motif was associated with the predicted origins and conferred autonomous replicating activity to the mini-ARSs. Thus, while centromeres and the origins associated with them are epigenetic, arm origins are dependent upon critical DNA features, such as a binding site for ORC and a propensity for nucleosome exclusion. IMPORTANCE DNA replication machinery is highly conserved, yet the definition of exactly what specifies a replication origin differs in different species. Here, we utilized computational genomics to predict origin locations in Candida albicans by combining locations of binding sites for the conserved origin replication complex, necessary for replication initiation, together with chromatin organization patterns. We identified predicted sequences that exhibited bona fide origin function and developed a linear plasmid assay to delimit the DNA fragments necessary for origin function. Additionally, we found that a short AC-rich motif, which is enriched in predicted origins, is required for origin function. Thus, we demonstrated a new machine learning paradigm for identification of potential origins from a genome with no prior information. Furthermore, this work suggests that C. albicans has two different types of origins: “hard-wired” arm origins that rely upon specific sequence motifs and “epigenetic” centromeric origins that are recruited to kinetochores in a sequence-independent manner.