BioTechniques (Apr 2008)
Deep cap analysis gene expression (CAGE): genome-wide identification of promoters, quantification of their expression, and network inference
Abstract
In cap analysis gene expression (CAGE), short (∼20 nucleotide) sequence tags originating from the 5′ end of full-length mRNAs are sequenced to identify transcription events on a genome-wide scale. The rapid increase in the throughput of present-day sequencers provides much deeper CAGE tag sequencing, where CAGE tags can be found multiple times for each mRNA in a given experiment. CAGE tag counts can then be used to reliably estimate the cellular concentration of the corresponding mRNA. In contrast to microarray and SAGE expression profiling, CAGE identifies the location of each transcription start site in addition to its expression level. This makes it possible for us to infer a genome-wide network of transcriptional regulation by searching the promoter region surrounding each CAGE-defined transcription start site for potential transcription factor binding sites. Hence, deep CAGE is a unique tool for the construction of a promoter-based network of transcriptional regulation. CAGE-based expression profiling also allows us to identify dynamic promoter usage in time-course experiments and the specific promoter regulated by a given transcription factor in disruption experiments. The sheer size of the short-tag datasets produced by modern sequencers spurs a need for new software development to handle the amount of data generated by next-generation sequencers. In addition, new visualization methods will be needed to represent a promoter-based transcriptional network.