High throughput procedure
A high throughput method was developed to facilitate the ease of cloning and transformation over traditional gene-by-gene methods such as that used previously . It includes a Perl script to design Gateway-compatible primers for cloning the promoter regions of candidate genes, PCR amplification of promoters of candidate genes, Gateway BP cloning, and Gateway LR cloning all in 96-well format, Agrobacterium transformation by tri-parental mating and floral dipping in 50 ml Falcon tubes (see MATERIALS AND METHODS). The 5' ("left") primer is located at least 2000 bp upstream of the start position of the coding sequence of the candidate genes since many studies have shown that region 1-2 kb upstream of the translation start site of genes determines the specificity of gene expression [39–44]. The 3' ("right") primers for the promoters were all located between 50 bp and 150 bp downstream of the translation start site of the target genes allowing an in-frame fusion with the GAL4-VP16 component of the reporter construct that contained at least 16 amino acids of the studied gene's coding sequence as well as the 8 amino acid linker from Gateway system. This script made the primer design process facile, reliable and consistent.
Positive BP clones were picked and sequenced to confirm the cloned promoter sequences. Residual BP DNA from the sequence confirmation step was used for the Gateway LR cloning reaction in 96-well plate format. The triparental mating method was chosen to transfer the Gateway expression clones from E. coli to Agrobacterium [33, 45], eliminating the need for DNA isolation from the selected clones as well as the subsequent high throughput electroporation. For the plant transformation, we cultured only 50 ml Agrobacterium, which allowed all steps (precipitation and resuspension of Agrobacterium cells and floral dipping of Arabidopsis plants) to be carried out in single 50 ml Falcon tubes. Recently, Davis et al  successfully transformed Arabidopsis by dipping plants directly into Agrobacterium cultures supplemented with surfactant, eliminating the need for media exchange to a buffered solution and further simplifying the transformation process.
A very important feature of this pipeline was the creation of a project specific laboratory information management system (LIMS) in a MySQL relational database to track all stages of the pipeline: candidate genes, primers, different stages of clone construction, transgenic lines, GFP checking at different stages, PCR and sequencing results and annotation of all the GFP images.
Therefore, this high throughput cloning and data tracking pipeline made our project management efficient and robust. However, out of 627 targeted promoters, only 266 constructs were ultimately transferred into Arabidopsis plants. This overall success rate is due to a degree of failure at each experimental step. For example, the successful rate for obtaining entry clones by BP cloning is 74.8%, for expression clones from entry clones by LR cloning 94.4%, for Agrobacterium clones by triparental mating 79.7%, and for kanamycin-resistant transgenic plants by floral dip 75.4%. Because of the nature of this high throughput project, we have not yet repeated any experimental step. Certainly the overall successful rate will increase if the unsuccessful clones at each step are reprocessed.
Expression Pattern Analysis
In order to confirm that the expression patterns are from the intended cloned promoters, vector-based primers flanking the cloning site were used to amplify the cloned promoters from transgenic plants showing GFP expression and the PCR products were sequenced for confirmation. Of 27 different constructs showing GFP expression, all the promoters were verified as correct. However, since not all lines were tested, researchers may wish to perform their own confirmation before using our lines. There are a total of 112 promoter-reporter constructs that show the same expression in more than one plant. For 79 promoter-reporter constructs, the same GFP expression patterns were observed from transgenic lines derived from independent floral dips and the other 33 constructs produced the same GFP expression patterns from separate seed borne by a single dipped plant are thus the most reliable data set. It has been shown that female reproductive tissues are the primary target of Agrobacterium-mediated transformation and that the transformants derived from the same seed pod contain independent T-DNA integration events .
The validity of the specific patterns of GFP expression from a representative set of promoters was confirmed by quantitative real-time PCR (qRT-PCR) on RNA samples from multiple tissues (additional file 4: Table S4). In every case, the tissue showing the highest expression (lowest Ct) by qRT-PCR was the one from which GFP expression was observed, and in almost every case this expression value was many times higher than any of the other tissues examined.
In this study, we checked GFP expression at 4 different stages: on the selection plate around 10 days after germination, at the rosette stage in soil, just before flowering, and at the flowering stage. These stages were chosen both to cover several developmental stages and also for the convenience of the large amount of GFP screening and to and minimize the stress for the T0 transgenic plants (e.g. checking root GFP on the selection plates and during transplanting to soil). If the kanamycin-selected transgenic plants did not show any GFP expression at any of the stages examined, PCR with GFP specific primers was used to confirm the presence of the reporter transgene. Out of 256 plants representing 89 promoter reporter constructs without GFP expression that we tested by leaf PCR , all were positive with GFP primers. There are several possible reasons for the lack of detectable GFP expression in these lines. The promoter might be active only under conditions or at specific developmental stages not examined in this study. Alternatively, in contrast to the localized expression seen with many of the promoters, those without visible GFP expression may in fact be expressed in the plant but at levels too low to be detected by this method. It is also possible that some of the promoter-reporter constructs were truncated or rearranged during T-DNA integration , or that gene silencing occurred . In addition, the inconsistent GFP expression patterns that were detected from different transgenic lines of 5 promoter constructs may be due to position effects or to truncation or re-arrangement of the constructs during transformation as well as to human error.
The goal of this project was to use the expression of promoter-reporter constructs in transgenic plants to infer the function of these no/low expression genes. Promoters from 35 genes tested had GFP expression in hydathodes, a secretory structure on leaf margins. An example of hydathode expression from promoter-report construct of gene AT02EUG13430 is shown in Figure 3K. Studies have shown that some genes expressed in hydathodes are related to plant tolerance to toxicity. For example, the Bot1 gene in barley is responsible for boron-toxicity tolerance , the MTP11 gene in Arabidopsis is associated with plant tolerance to manganese , and AtHMA3, a P1B-ATPase protein plays a role in the detoxification of heavy metals . AtCML9, a calmodulin-like protein from Arabidopsis thaliana, can alter plant responses to abiotic stress and abscisic acid and the expression of its promoter-reporter construct also included hydathodes . In addition, hydathodes are one of the expression locations of a promoter-reporter construct from ECA3, a Golgi-localized P2A-type ATPase that plays a crucial role in manganese nutrition in Arabidopsis . Thus, it is possible that some the genes of unknown function analyzed in this study that show hydathodes expression are also involved in tolerance or detoxification pathways, suggesting a direction for further study. Motif search by Multiple Em for Motif Elicitation (MEME)  for all promoter sequences with hydathode expression found a motif of CTTAAGA (P = 8.67e-09). However, its function and specificity will require experimental verification.
Twenty six promoter-reporter constructs expressed GFP in the abscission zones of siliques, flowers and leaves including expression around the flower abscission zone from construct of gene AT4G18395 (Figure 3J). Abscission is a physiological process that involves the programmed separation of entire organs, such as leaves, petals, flowers, and fruit, allows plants to discard nonfunctional or infected organs, and promotes dispersal of progeny . Promoter-reporter constructs from a number of confirmed abscission related genes including BOP1 , BFN1 [55, 56], HAE, HSL2, MKK4,5 , AtZFP2  show similar expression at abscission zones. Using MEME , the sequence TAACCACTCA was the most significant motif found in the promoters analyzed in this study.
Thirty-six promoter-reporter constructs are expressed in trichomes or the socket cells that surround a trichome and provide support, suggesting their possible function in trichome development, expansion and branching. Many promoter constructs in our study were expressed in specific floral organs, including sepal, petal, filament, anther, carpel, and pollen. For example, pollen specific GFP expression was detected from the construct of gene AT2G24370 (Figure 3I, L). In addition to providing the clues to their function, they may also provide novel promoters for plant genetic engineering. For example, it has been shown that completely sterile Arabidopsis plants can be generated by engineering carpel and stamen-specific expressed genes . Use of the Ory s1 promoter (pollen-specific promoter) with antisense Lol p5A cDNA led to the production of hypoallergenic rye grass (Lolium perenne) .
Overall, in our study, positive transgenic plants were obtained from 266 promoter constructs derived from our intergenic and non- or low-expressing genes of unknown function. Among them, about 56% of constructs showed GFP expression in Arabidopsis. Thus the in vivo expression data from promoter-reporter constructs generated in this study has provided insights into possible functions of many genes previously lacking both expression data and functional annotation as well as another great gene expression resource for the research community.