The transcription process can be highly dynamic and sophisticated in eukaryotic cells, and its initiation typically involves a recognition between a transcription factor (TF) and a cis element at the upstream region on a gene. Knowing the precise sequence of a cis element for its associated TF is therefore critical for understanding the transcription process of a given gene. The coupling process, however, has not been understood in a balanced way, as cis elements have not been clarified for many well characterized trans acting TFs.
Both technical and inherent difficulties appear to contribute to the situation. In the previous efforts devoted to cis identification, a widely taken strategy was segment-dissection. Typically, the 5’ promoter region of a gene was cut by segments and examined how such manipulations affected the expression of the gene, then an inference was made on which segment was likely responsible for the transcription initiation. The protocol can be laborious and frequently result in only rough estimates of cis elements in simple cases or inconclusiveness in cases of multiple TFs (such as those in the combinatorial regulation). When the correct segments are identified, their lengths can be inconsistent between reports due to uncertainties about the exact binding sites.
The situation may become inherently more chaotic in some cases. The DNA binding of a TF per se may not require a full participation of all nucleotides within the binding region (previously referred to as gapped or degenerate cis element) on a promoter, causing more or less varied sequence content and consequently an increased difficulty for identifying the relevant cis element. The bordering nucleotide sites around a cis element may also influence TF-DNA binding without being part of the TF-DNA complex. More subtly, different species’ versions of the same member of a TF family may recognize somewhat varied cis elements due to evolution of the TF-DNA interaction.
It seems lack of precise identifications of cis elements cannot be circumvented by the existing approaches. A recent survey on MYB cis elements shows that rich experimental data has been collected on MYBs across kingdoms, and many cis elements have been reported, but MYB-DNA interactions remain vague. In plants, most MYBs have R2 and R3 domains, and the cis elements for R2R3 MYBs have been reported in various lengths from 4 (AACA) in the case of rice OSMYB5 by DNase I footprinting analyses to 14 nucleotides (TAT AAC GGT TTT TT) in that of soybean GmMYBs by yeast one hybrid. While these reports do not show coherence in the length of R2R3 binding region, there is no reason to believe that the binding sites for plant R2R3 MYBs should vary greatly in length. In contrast, the R2R3 domains of mouse c-MYB had been shown to bind to AACNG via the nuclear magnetic resonance, and the crystallization of protozoan tv MYB1-DNA suggested the binding sequence to be a/gACGAT. In other words, the range of R2R3 MYB binding sites has been poorly defined in planta, causing problems for further analysis. Apparently, a more focused identification method is desired for pinpointing the cis element for a given TF.
In an analysis of the combinatorial regulation on the flavonoid network, we have developed a strategy that is highly effective in identifications of cis elements, particularly for plant systems. The strategy features bioinformatically generated cis candidates and their effective validations via experimental means based on some of the existing protocols. The initial step places a bioinformatic analysis in the forefront, which mines candidate cis sites species- and locus-wide (assuming a tractable cis element). It takes advantage of partially known cis information for one of the TFs involved in combinatorial regulation in order to predict the likely cis element for another TF in interest. For instance, the binding sites for bHLHs have been known for some members of the TF family. Both mouse c-myc and Brassica bHLH recognize CACGTG, which was initially known as the core of the G-box (−TCTTACACGTGGCAYY-) on the promoter of a small subunit of ribulose 1, 5-bisphosphate carboxylase gene. Since the regulation of the anthocyanin pathway requires both MYB and bHLH (there is evidence for the bHLH’s binding to CACGTG), this attribute of bHLH may serve as an anchor for obtaining the cis sites for MYB via our bioinformatic approaches.
Regulation by combinatorial TFs features many biological processes. Here we focused on better known MYB, bHLH and WD-repeat protein (WDR) families in plants. These three kinds of TFs may form a complex to influence trichome formation and proanthocyanidin synthesis, in additional to anthocyanin synthesis. As part of the flavonoid network, the components of anthocyanin pathway are relatively well-defined, mainly consisting of chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3- dioxygenase (F3H), flavonoid 3’-monooxygenase (F3’H), dihydroflavonol 4-reductase (DFR), anthocyanidin synthase (ANS), and UDP-glucose flavonoid 3-O-glucosyltransferase (3GT). These supposed regulons and their TFs have been identified in several species. The MYB-bHLH-WDR complex have been known to include C1- B/R- PAC1 in maize[20, 21], AN2 - AN1 - AN11 in petunia[22, 23], and MYB1 - bHLH2 - WDR1 in Ipomoea[24–26], respectively. Meanwhile, MYB and bHLH (but not WDR) have been known to interact with promoters in order to fulfill their roles in gene regulations. The anthocyanin pathway system is thus an ideal system for examining TF-DNA interactions.
We take CHS as an example here, showing that once candidate cis motifs are generated from the bioinformatic analysis, their validations may be effectively tested through site-directed mutagenesis and experiments targeting specific TF-DNA interactions. Numerous tests have suggested that electrophoretic mobility shift assays (EMSAs)[27, 28] and transient expression assays using living cells[29, 30] are highly effective in TF-DNA interaction analysis. For EMSA, we have obtained the best resolution with commercialized fluorescent dyes including SYBR® Green and SYPRO® Ruby (Molecular probes/Life technologies), which may bind to nucleotide and protein, respectively. Sequential applications of these dyes to the same gel and exposures of the gel under different light conditions lead to detections of unambiguous signals of DNA-protein interactions. For dual-luciferase transient expression assays, we have made promoter constructs with desired site-mutations, and engaged particle bombardment and transient gene expressions in living-cells for analyzing candidate motifs. The complete working pipeline is detailed below for precise cis identifications on genes under combinatorial regulation.