A community resource for high-throughput quantitative RT-PCR analysis of transcription factor gene expression in Medicago truncatula

Background Medicago truncatula is a model legume species that is currently the focus of an international genome sequencing effort. Although several different oligonucleotide and cDNA arrays have been produced for genome-wide transcript analysis of this species, intrinsic limitations in the sensitivity of hybridization-based technologies mean that transcripts of genes expressed at low-levels cannot be measured accurately with these tools. Amongst such genes are many encoding transcription factors (TFs), which are arguably the most important class of regulatory proteins. Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) is the most sensitive method currently available for transcript quantification, and one that can be scaled up to analyze transcripts of thousands of genes in parallel. Thus, qRT-PCR is an ideal method to tackle the problem of TF transcript quantification in Medicago and other plants. Results We established a bioinformatics pipeline to identify putative TF genes in Medicago truncatula and to design gene-specific oligonucleotide primers for qRT-PCR analysis of TF transcripts. We validated the efficacy and gene-specificity of over 1000 TF primer pairs and utilized these to identify sets of organ-enhanced TF genes that may play important roles in organ development or differentiation in this species. This community resource will be developed further as more genome sequence becomes available, with the ultimate goal of producing validated, gene-specific primers for all Medicago TF genes. Conclusion High-throughput qRT-PCR using a 384-well plate format enables rapid, flexible, and sensitive quantification of all predicted Medicago transcription factor mRNAs. This resource has been utilized recently by several groups in Europe, Australia, and the USA, and we expect that it will become the 'gold-standard' for TF transcript profiling in Medicago truncatula.


Background
Legumes are second only to grasses in agricultural importance [1]. They are a mainstay of sustainable agricultural systems because of their ability to reduce atmospheric nitrogen (N 2 ) to ammonia via a symbiosis with bacteria called rhizobia. This provides legumes and subsequent crops with a free and renewable source of nitrogen in lieu of expensive, environmentally-unfriendly fertilizers. Development and differentiation of root nodules, the organ that accommodates nitrogen-fixing rhizobia in legumes, is orchestrated by transcription factors [2][3][4][5][6][7][8][9]. Transcription factors are DNA-binding proteins that regulate the transcription of most, if not all genes [10]. As a result, TFs play central roles in all aspects of plant biology, including development and differentiation of organs and adaptive responses to changes in the environment [11]. Transcription factors as a whole are an important target of plant research because they are a key to understanding the regulation of important plant processes as well as potential tools to optimize these processes for agriculture.
The importance of TFs in plant biology is reflected by the fact that approximately 5% of all plant genes encode such proteins [10]. Thus, even species with relatively small genomes, such as Arabidopsis thaliana contain thousands of TF genes [10]. This presents a real challenge for systematic approaches to decipher the function of TF genes in plants. Classical, 'forward' genetics has uncovered the roles of perhaps a hundred TF genes in Arabidopsis [12] and far fewer in other species [11]. Reverse-genetic approaches, using T-DNA insertion mutants for instance [13], provide a means to decipher in a systematic and relatively rapid manner the function of TF genes/proteins, although gene-redundancy often stymies this enterprise [12]. Another stumbling-block is that phenotypes associated with non-redundant TFs may be subtle in nature.
Transcript profiling can help to uncover the functions of TF genes/proteins by revealing where and when in a plant TF genes are expressed. This information can help direct our attention to particular organs, developmental stages, or conditions under which aberrant phenotypes might become apparent in a TF mutant of interest.
Medicago truncatula is a model legume species that is currently the focus of an international genome sequencing effort [14]. Several generations of cDNA [15] and oligonucleotide arrays [16] have been developed for transcriptome analysis of Medicago truncatula, including mostrecently an Affymetrix GeneChip that contains 51,000 probe-sets representing a large proportion of all the genes in this species [17]. While these tools now provide a means to measure the transcriptional output of a large proportion of genes in Medicago, inherent limitations in the sensitivity of hybridization-based technologies [18] mean that transcripts of a substantial number of genes cannot be detected even when probes for these transcripts are present on the array/chip. Furthermore, expansion of arrays to encompass novel genes uncovered by genome sequencing is not a trivial task. An alternative to arrays that is 2-3 orders of magnitude more sensitive and more flexible in terms of expansion to encompass novel genes is quantitative reverse transcription-polymerase chain reaction (qRT-PCR). Platforms for qRT-PCR analysis of thousands of Arabidopsis and rice TF genes have been developed by us and others [19,20], and utilized to identify TF genes involved in Arabidopsis responses to nutrient stress and pathogen attack [21][22][23][24]. Here we describe a bioinformatics pipeline to identify putative TF genes in Medicago truncatula and to design gene-specific oligonucleotide primers for qRT-PCR analysis of all predicted TF transcripts. Over 1000 TF primer pairs were tested and used to identify sets of organ-enhanced TF genes that may play important roles in organ development or differentiation in this species.

Results and Discussion
Identification of putative transcription factors TF protein families are generally defined by the type(s) of DNA-binding domain they contain and putative TF genes are often identified on the basis of DNA sequences that encode known DNA-binding domains [10,11,25,26]. We utilized this approach to identify putative TFs of Medicago amongst the set of proteins predicted from genomic sequence by the International Medicago Genome Annotation Group (IMGAG). Proteins of IMGAG release 1, which contained over 40,000 predicted proteins, were screened for the presence of known or presumed DNA-binding domains (Table 1), using INTERPRO [27]. Medicago proteins containing putative DNA binding domains and other domains associated with TFs were then used as query sequences in WU-BLASTX [28] which included searches of both the non-redundant DNA database of NCBI [29] and the well-curated protein database, UniProt [30] to check annotations of related proteins in support of tentative Medicago TF assignments. This process resulted in a list of 1045 putative TF genes (see Additional file 1). We utilized genomic sequences rather than the large collection of partial cDNA sequences present in Expressed Sequence Tag (EST) databases for Medicago as the starting point for TF gene discovery because protein sequences derived from genomic sequence are more complete and the set of IMGAG proteins essentially contains no redundancy. Although identification of the 'complete' set of Medicago TFs from IMGAG-annotated proteins will only be possible upon completion of genome sequencing, we expect little or no redundancy in the protein set targeted by our primer set. This approach avoids wasting money on redundant primer sets and re-organization of primers when redundancy is detected, both of which would have

PCR primer design
To ensure maximum specificity and efficiency during PCR amplification of TF cDNA under a standard set of reaction conditions, a stringent set of criteria was used for primer design. This included predicted melting temperatures (T m ) of 58°C to 61°C, limited self-complementarity and poly-X, and PCR amplicon lengths of 100-150 base pairs (bp). Secondary hits were minimized by aligning primer candidates to all known Medicago sequences via WU-BLAST [28] and eliminating primer pairs with multiple potential hits.

PCR primer testing: gene-specificity and amplification efficiency
PCR primers were tested on Medicago cDNA free of genomic DNA contamination as follows. First, total RNA was extracted from various organs using Trizol reagent (Invitrogen GmbH, Karlsruhe, Germany), which yielded high quality RNA as judged by gel electrophoresis and by Agilent 2100 BioAnalyser using RNA 6000 Nano Chips (Agilient Technologies, Waldbronn, Germany). Typical RNA yields ranged from 0.5-1.0 μg RNA/mg fresh mass for nodules and leaves, respectively. Isolated RNA was treated with DNAse I (Ambion, product number 1907) to remove all contaminating genomic DNA, and this was always confirmed by PCR using primers to non-coding regions of the Ubiquitin gene (TC102473; AC137828-19.4). After inactivation of DNAse I, RNA was reverse transcribed using SuperScript III reverse transcriptase (Invitrogen GmbH, Karlsruhe, Germany) and oligo-dT [12][13][14][15][16][17][18] to prime the reaction.
Specificity of PCR primers was assessed in three ways: by melting curve analysis of PCR reaction products; by separating the products of all reactions via electrophoresis in 3% agarose gels; and by sequencing a sub-set of PCR reaction products ( Figure 1). 94.5% (998/1045) of primer pairs gave unique PCR products of the expected size. Only 3.3% (34/1045) of primer pairs yielded no product and 2.2% (23/1045) gave non-specific products. (see Additional file 1). Sequencing was performed on 178 randomly-chosen PCR products amplified from a 1:1 mixture of leaf and root cDNA. In the vast majority of cases (92.7% or 165/178), the sequence of the PCR product was identical to that of the intended target gene. In 5.1% of cases, the amplicon sequence matched multiple related genes, including the target gene, while in only 2.2% of cases the amplicon sequence did not match the target gene sequence.   [19,20].

Selection of reference genes
Reference genes with stable expression/transcript levels throughout development and in the face of environmental challenge are crucial for the normalization of expression data of other genes. Potentially useful reference genes Specificity of transcription factor PCR primers Figure 1 Specificity of transcription factor PCR primers. Specificity was confirmed by dissociation curves with a single peak (A) while double peaks (B) indicated off-target ampification. The derivative of fluorescence intensity is shown on the y-axis. Separation of PCR products on 3% (v/w) agarose gels following electrophoresis (C) confirmed the presence of unique amplicons of the expected size for most reactions. Few reactions yielded no products (indicated by arrow).

Identification of organ-enhanced TFs of Medicago
To get an overview of TF gene expression in Medicago truncatula and to identify TFs induced in specific organs, we used the real-time RT-PCR platform described above. Transcript profiling was performed on six different organs of Medicago (leaves, stems, flowers, pods, roots, and nodules) with three independent biological replicates for each (see Additional file 2). The fraction of genes for which transcripts were detected within 40 cycles ranged from 77.2% in leaves to 90.8% in pods. Transcripts from nearly all putative TF genes (96.8% or 1011/1045) were detected in at least one organ. Genes were called detected if they were expressed in at least two biological replicates with a C T < 40. Approximately half of all TF genes exhibited differential expression during plant development, based on significant differences (p ≤ 0.05) in transcript levels between organs. Few TF genes (1.19% or 12/1011) were Amplification efficiency of transcription factor-specific primer pairs Figure 2 Amplification efficiency of transcription factor-specific primer pairs. Typical real-time RT-PCR amplification plots of 384 TF genes (left) and distribution of PCR efficiencies for all 1045 TF primer pairs (right). expressed exclusively in vegetative organs (leaves, stems, roots or nodules), and even fewer (0.5% or 5/1011) were expressed only in reproductive organs (flowers or pods). A relatively small number of TF genes exhibited greater than ten-fold ratios in expression level in one organ compared to any other organ (Table 3). For comparison, we have included gene expression ratios derived from Affymetrix array data from the same RNA samples. While there is reasonable qualitative agreement between gene expression ratios obtained using the two methods, the lack of quantitative agreement is likely due to the limited sensitivity and low signal to noise ratio near the detection limit of Affymetrix arrays [19]. The genes listed in Table 3 may control development and/or differentiation in Medicago and are interesting targets for future research.

Conclusion
We have established a flexible platform for high-throughput qRT-PCR analysis of Medicago TF gene expression that is based on gene-specific primers arrayed in 384-well plates and SYBR ® Green detection of gene-specific PCR amplicons. Currently, the platform has primer pairs for 1045 TF genes and we have plans to extend this to all pre-dicted Medicago TF genes as genome sequencing progresses. At this stage, the resource has been utilized by several groups in Europe, Australia, and the USA, and we expect it will become the 'gold-standard' for TF transcript profiling in Medicago truncatula.

Plant material and growth conditions
Medicago truncatula cv. Jemalong, line A17 wild type plants were vernalized for 3 days in the dark at 4°C on sterile, wet filter paper. Germinated seedlings were transferred to pots containing Turface (BWI Texarcana, Texarcana, TX). Plants were grown in growth chambers under a 16 h day and 8 h night regime, at 200 μE light intensity, 24°C and 40% relative humidity.
Vegetative organs (leaves, stems, roots, and nodules) were harvested 28 days after planting. Leaf material did not include petioles and stems did not include buds. Roots consisted of the entire root system with laterals. Several plants grown at the same time were pooled for each of the three biological replicates. Biological replicates were planted on separate days. Nodules were harvested from Mean PCR efficiency (E) was determined from three biological replicates of each of six organs, using LinRegPCR [31], which also yielded mean R 2 . N, no corresponding GenBank accession number.
Specificity, efficiency, and reproducibility of PCR primers designed to amplify reference gene transcripts Figure 3 Specificity, efficiency, and reproducibility of PCR primers designed to amplify reference gene transcripts. Specificity of primers was confirmed by the presence of unique amplicons of the expected size following electrophoresis on 3% (v/ w) agarose gels (A) and by dissociation curves with a single peak (B to D). Typical real-time RT-PCR amplification plots of three reference gene transcripts (E to G).
plants inoculated with Sinorhizobium meliloti strain 1021 one and seven days after sowing. Reproductive organs were harvested from plants that were vernalized for two weeks to decrease the time to flowering. Flowers were harvested on the day of opening. Pods were harvested from 1 to 21 days after the appearance of the floral bud to cover a wide range of developmental stages. Harvested plant material was frozen in liquid nitrogen before storage at -80°C.

PCR primer design
The primer design pipeline was implemented in objectoriented PERL modules supported by a MySQL relational database. Primers iterated through three phases before approval: design, specificity, and selection.
The design phase interrogated TF genes with a sliding window 250 bp across that stepped 50 bp along the entire target sequence, generating primer candidates at each window. Experimental conditions, as outlined in the Results section above, were enforced by the following MIT  Primer3  parameters:  PRIMER_MIN_TM  58,  PRIMER_OPT_TM  60,  PRIMER_MAX_TM  61,  PRIMER_SELF_ANY  6,  PRIMER_SELF_END  2,  PRIMER_MAX_POLY_X 3, and PRIMER_PRODUCT_SIZE_RANGE '100-150' [37]. The specificity phase aligned primer candidates via WU-Blast to a database of all known Medicago sequences. The selection phase sorted primer candidates by the number of possible secondary hits, self-complementarity, and poly-X characteristics. Secondary hits were defined as specificity alignments that contained at least one of the terminal ends of the primer and achieved 80% or greater identity over the length of the primer. The sequences of each primer pair are given in Supplementary Material (see Additional file 1).

Real-time PCR conditions and analysis
PCR reactions were carried out in an ABI PRISM ® 7900 HT Sequence Detection System (Applied Biosystems, Foster City, CA, USA). SYBR ® Green was used to quantify dsDNA synthesis. Reactions (5 μl total volume) were performed in an optical 384-well plate containing 2.5 μl 2 × SYBR ® Green Power Master Mix reagent (Applied Biosystems, Warringen, UK), 5 ng cDNA and 200 nM of each gene-specific primer. Primer pairs were aliquoted using a pipetting robot (Evolution P3 liquid handling system, Perkin Elmer, MA, USA) to minimize pipetting errors. cDNA was aliquoted as a master mix of cDNA and 2 × SYBR ® Green Ranking of 8 reference genes in M. truncatula Figure 4 Ranking of 8 reference genes in M. truncatula. Transcript levels of all 8 genes were measured by qRT-PCR, using 18 independent cDNA preparations from six different organs with three replicate measurements of each cDNA preparation. A low value for the average expression stability M, as calculated by geNORM software, indicates more stable expression throughout the various organs. A TF gene was considered organ-enhanced if transcript levels for that gene were more than 10-fold higher in one organ than in any other organ. Transcript ratios were calculated using the mean of three biological replicates for each organ. Data from qRT-PCR are compared with data for the same RNA samples obtained from Affymetrix Gene chips [38]. Affymetrix data were normalized using the Robust Multiarray Average (RMA) method, as described by [38], prior to calculation of ratios. Data in bold represent the lowest transcript ratio of the corresponding gene across all organs. n = not present on Affymetrix chip; a = Ct > 40 in two or three biological replicates of denominator organ; b = transcript called 'absent' by Affymetrix software in two or three biological replicates of denominator organ. L = leaf; S = stem; P = pod; F = flower; R = root; N = nodule. reagent, using an electronic Eppendorf multipipette. Reaction plates were sealed with a transparent adhesive cover before proceeding (Applied Biosystems, Foster City, CA, USA). All templates were amplified using the following standard PCR protocol: 50°C for 2 min; 95°C for 10 min; 40 cycles of 95°C for 15 sec and 60°C for 1 min, and SYBR ® Green fluorescence was measured continuously. Melting curves were generated after 40 cycles by heating the sample up to 95°C for 15 sec followed by cooling down to 60°C for 15 s and heating the samples to 95°C for 15 sec.
Data analysis was performed with the SDS 2.2.1 software (Applied Biosystems). To determine the threshold cycle value (C T ) for each PCR reaction, the threshold (ΔR n ) was set within the logarithmic amplification phase. All amplification plots were analyzed with an ΔR n of 0.2. PCR efficiency (E) was estimated using LinReg software with data obtained from the exponential phase of each individual amplification plot and the equation (1+E) = 10 slope [31].
To compare data from different PCR runs and different cDNA samples, C T values were normalized against the geometric mean of four reference genes (Ubquitin, PPRep, PDF2, and PTB), whose transcript levels were most stable across the biological samples analyzed. The average of the geometric mean of these four genes for all 18 samples was C T 21.23 ± SD1.15. For normalization, the mean reference gene C T value was substracted from the C T value of the TF gene of interest, yielding a ΔC T value. The expression ratios for the identification of organ-enhanced genes were obtained using the following formula on all 30 organ combinations: , where ΔΔC T was calculated by ΔC TA minus ΔC Tb , A and B are averages of three biological replicates of the two organs being compared, and E is the PCR efficiency. Dissociation curves were analysed using SDS 2.2.1 software (Applied Biosystems). RT-PCR products were resolved on 3% (w/v) agarose gels (LE Agarose, Biozym, Oldendorf, Germany) run at 4 V cm -1 in TAE Tris-Acetate-EDTA buffer, along with a 200-bp DNA-standard ladder (Promega GmbH). A subset of 178 RT-PCR products was sequenced at the JC Venter Institute (Rockville, MD, USA).