ModuleFinder and CoReg: alternative tools for linking gene expression modules with promoter sequences motifs to uncover gene regulation mechanisms in plants
© Holt et al; licensee BioMed Central Ltd. 2006
Received: 05 November 2005
Accepted: 11 April 2006
Published: 11 April 2006
Uncovering the key sequence elements in gene promoters that regulate the expression of plant genomes is a huge task that will require a series of complementary methods for prediction, substantial innovations in experimental validation and a much greater understanding of the role of combinatorial control in the regulation of plant gene expression.
To add to this larger process and to provide alternatives to existing prediction methods, we have developed several tools in the statistical package R. ModuleFinder identifies sets of genes and treatments that we have found to form valuable sets for analysis of the mechanisms underlying gene co-expression. CoReg then links the hierarchical clustering of these co-expressed sets with frequency tables of promoter elements. These promoter elements can be drawn from known elements or all possible combinations of nucleotides in an element of various lengths. These sets of promoter elements represent putative cis-acting regulatory elements common to sets of co-expressed genes and can be prioritised for experimental testing. We have used these new tools to analyze the response of transcripts for nuclear genes encoding mitochondrial proteins in Arabidopsis to a range of chemical stresses. ModuleFinder provided a subset of co-expressed gene modules that are more logically related to biological functions than did subsets derived from traditional hierarchical clustering techniques. Importantly ModuleFinder linked responses in transcripts for electron transport chain components, carbon metabolism enzymes and solute transporter proteins. CoReg identified several promoter motifs that helped to explain the patterns of expression observed.
ModuleFinder identifies sets of genes and treatments that form useful sets for analysis of the mechanisms behind co-expression. CoReg links the clustering tree of expression-based relationships in these sets with frequency tables of promoter elements. These sets of promoter elements represent putative cis-acting regulatory elements for sets of genes, and can then be tested experimentally. We consider these tools, both built on an open source software product to provide valuable, alternative tools for the prioritisation of promoter elements for experimental analysis.
The regulation of gene expression is one of the most intensively studied areas of biology. The regulation of transcription, the first committed step in gene expression, is achieved via the interaction of transcription factors with cis acting regulatory elements (CAREs) . A complete understanding of the interaction between transcription factors and regulatory sequences will ultimately lead to a picture of the regulatory networks operating in a biological system. Genome wide studies on the expression of transcription factors are currently underway in attempts to gain data that can be used to understand the complex nature of gene regulation that exists to coordinate cellular functions [2–4]. The structure of such regulatory networks (multi-component regulatory factors that have overlapping but also discrete activities) for a plant can begin to be hypothesized using the ~1,500 transcription factors in Arabidopsis in a combinatorial manner to achieve regulation of the 28,000 or more genes [5–7].
The completion of the Arabidopsis nuclear genome sequence means that the analysis of plant gene expression has changed from probing the expression of a single or few genes at a time to simultaneous analysis of the expression of virtually every gene . This change in the amount of data available represents a considerable challenge for biologists to extract knowledge from these data and use it in a productive manner to investigate the mechanisms underlying gene regulation, i.e. the further dissection of a complex network of combinatorial control.
The analysis of Arabidopsis microarray expression data sets can be carried out from single gene analysis to whole genome approaches. At a single gene level many researchers can simply look up how their gene or genes of interest are changing under a large number of conditions. This approach has been facilitated by the use of tools such as Genevestigator, which enables complex array data to be easily interrogated for a gene of interest . At a wider genome level hierarchical clustering has been applied to complete genome transcriptomic data during growth and development [10–13], following various biotic and abiotic treatments [14–16] and after alterations in transcript abundances due to changes in nutrient availability . Development of analysis packages such as MAPMAN has allowed plant biologists to visualize transcriptomic data on metabolic pathways that should lead to a greater understanding and use of transcriptomic data .
Even though large-scale analysis like those above can and has identified novel associations of biological significance, the clustering methods used can also tend to split or miss relationships in such data. The transcripts from a group of genes may respond to a number of parameters in a similar manner, but in additional treatments their response may differ. In a hierarchical cluster analysis of all these treatments the relationship between these genes will often be masked and they will be separated to different parts of the clustering tree. This loss of association is further compounded by the fact that clustering of gene expression data is often carried out with the intent to identify co-expressed genes and then these data used to elucidate the regulation of these genes, i.e. to identify CAREs and the transcription factors that bind them. As transcription factor binding sites are small in size (6 to 10 bp ) compared to the large number of DNA bases in promoter regions, there is a significant challenge in identifying these regions of important sequence. Direct experimental confirmation requires considerable effort, so computational efforts to identify the most likely putative CAREs are essential. The identification of similar CAREs in co-expressed genes thus becomes crucial as it will determine the quality of input for such analysis.
An alternative approach to hierarchical clustering to analyse array expression data is to define associations based on similarities in transcript abundance in a subset of treatments. Such two way clustering or biclustering uses iterative approaches to define relationships between subsets of genes and subset of treatments. This approach has been most widely used in the analysis of transcript datasets from cancer samples [19–25]. Various approaches such as the progressive iterative signature algorithm (PISA) , gene expression mining server (GEMS) , coupled two-ways clustering (CTWC)  and X-Motifs  use this principle to search for relationships that go largely undetected using hierarchical clustering.
We have taken a biclustering approach to identify co-expressed genes and the prediction of the CAREs. Firstly we have simplified the number of genes analyzed by using only a subset, in this example those that encode proteins located in mitochondria [29, 30]. Secondly we have identified genes that are co-expressed in response to subsets of treatments using a novel approach via a tool we have developed and named ModuleFinder. The pattern of co-expressed genes produced in ModuleFinder can be exported to visualize functional groups in tools such as MAPMAN. To predict CAREs we have used the hierarchical clustering produced in ModuleFinder and the assumption that the resulting hierarchical tree structure of the expression data is a reflection of patterns of CAREs in promoter regions. Thus the hierarchical relationships identified based on the expression data can be used to identify these promoter elements. We have developed a tool named CoReg to undertake this CARE prediction.
Results and discussion
Existing approaches are not well suited to identifying shared responses among numerous non-linear-related treatments
We have thus developed ModuleFinder in R with the aim of identifying gene expression modules in a way that facilitates the subsequent interpretation of results. The method was designed to allow easy visualization not only of the expression patterns of discrete modules, but also of the relationships between the modules. The aim of ModuleFinder is to identify gene expression responses that are shared among subsets of treatments and genes; the approach is to first identify gene clusters that are co-expressed in a small subset (often a pair) of treatments, then look for other treatments in which these gene clusters are expressed in a similar co-ordinated manner. This approach ignores the differences in treatment effects and focuses on the shared effects on gene expression, which are expected to be related to the activation of common gene regulatory pathways.
The ModuleFinder algorithm
ModuleFinder takes as its input a matrix of expression data from a set of experiments, for example the set of average log expression ratios for genes from a range of experimental treatments compared to a control. It also requires a matrix of p-values associated with each data point, providing an assessment of how likely it would be to observe the gene expression values if there was really no change in experimental compared to control conditions. P-values may be calculated from the original expression measures via an appropriate statistical method, e.g. t-tests.
The ModuleFinder algorithm can be run in either a supervised or unsupervised fashion. In an unsupervised run, the algorithm first searches for pairs of experiments in which gene expression was similar (i.e. highly correlated), then builds gene expression modules based on these correlated pairs. On the other hand, the user can identify a particular subset of experiments they are interested in, and run the algorithm in a supervised manner by specifying the names of the experiments to provide an initial subset. This initial set will then be added to by iterative additions of related experiments.
The main output of ModuleFinder is a PDF file containing clustering trees and expression heat maps of the modules produced after the addition of each new experiment. It also includes pie charts displaying the breakdown of each module according to the functional categories of its member genes (Figure 2B). In addition, cluster files are written at each stage for easy viewing of clusters, heat maps and gene annotations in tree viewing programs compatible with TreeView and its java versions, which can run on any platform . Excel-compatible, comma-separated files containing the expression data for the subsets of genes and experiments are also saved at each stage (Figure 2A).
Using ModuleFinder to identify modules within the expression of a set of nuclear genes encoding mitochondrial proteins (NGMP)
Visualization of ModuleFinder sets in MAPMAN
Building a framework for understanding the biological implications of the gene regulation observed
Combining the MAPMAN overview with a more detailed analysis using the wider literature provided an even deeper view of the biological response to rotenone and salicylic acid, showing this process was helpful for a biologists' interpretation of the dataset. Rotenone is an inhibitor of complex I function, thus preventing matrix-located NADH from the TCA cycle entering the classical respiratory chain. Salicylic acid can have a similar effect, as it appears that along with its defence signalling functions this compound can inhibit the respiratory chain in plants . This effect appears to be through inhibition of the dehydrogenases of the mitochondrial electron transport chain . Induction of the Aox and NADH dehydrogenase are the clearest direct response to this targeted inhibition of mitochondrial function evident from both types of cluster analysis (Figure 3). Using the classical cluster analysis it appeared that the up-regulation of gene expression in response to respiratory poisons was split, in clusters 2, 4, 5, 15 and 16, and down-regulation split into 1, 3, 10 and 14 (Fig 3A, Supplementary Figure 1A). Many of the genes in cluster 9 and 15 are involved in protein synthesis or mitochondrial biogenesis (Supplementary Figure 1B). We have previously reported that changes in protein import into mitochondria and a general up-regulation of genes encoding components involved in mitochondrial biogenesis occur as a result of chemical and environmental stresses [36, 37].
Using ModuleFinder a larger picture of the effects of these chemical stresses on the expression of mitochondrial components becomes evident. In the defined subset of co-expressed genes the induction of the alternative transport chain components is coupled to the induction of transcripts encoding for eight different substrate dehydrogenases, providing new avenues for NADH generation, or in the case of the electron transfer flavoprotein (At1g50940), provision of electrons to ubiquinone. Significantly, the new carbon substrates for these NADH generating pathways, while including the organic acids of the TCA cycle, are likely to be generated by catabolism of amino acids. Enzymes involved in valine, isoleucine, cysteine, tyrosine, alanine and glutamate catabolism are induced. Concomitant with this change in substrate for energy generation is the upregulation of transcripts for 4 mitochondrial carrier proteins, most of unknown function. Down-regulation is observed for components of the classical electron transport chain complexes I and III, a separate set of five mitochondrial substrate carriers (most of unknown function) and lipid biosynthesis pathways for phosphotidylglyerol and phosphotidylethanolamine. Interestingly, both genes for NAD-malic enzyme (At4g00570, At2g13560) are down-regulated. This protein normally bridges the TCA cycle to allow the anaplerotic removal of organic acids for functions elsewhere in the cell. Together the insights from this analysis suggests that these simple chemical inhibitors appear to initiate the signals for a complicated re-organisation of mitochondrial function within the plant cell that can now been investigated independently.
Searching for common regulatory elements in the promoters of co-expressed genes
Genes whose transcription is co-ordinately regulated may exhibit co-ordinated expression patterns. Thus co-expression of a group of genes may be indicative of co-regulation at the transcriptional level . To determine whether this is the case for a given cluster of co-expressed genes, such as those shown above, the promoter regions of the genes need to be analyzed. Transcription factors (TFs) bind to specific DNA sequences, which are usually only 6 to 10 base pairs long . These short sequences are often referred to as promoter motifs or sequence elements. Transcriptional regulation in eukaryotes most often occurs through the combinatorial action of multiple TFs [1, 39, 40]. For example, the induction and repression of Arabidopsis genes in response to red and blue light or abscisic acid (ABA) is dependent on combinations of multiple light-responsive or ABA-responsive promoter elements [41, 42]. It is therefore expected that the promoter regions of co-expressed genes may share numerous TF binding sites, including some that are also present in the promoter regions of genes whose expression patterns are quite different. A limitation of this type of approach is that genes may be regulated by the same transcription factor(s) but display different pattern(s) of transcript abundance due to the fact that post-transcriptional processes that affect their transcript stability may differ.
Aims of promoter analysis
to identify promoter sequence elements (possible TF binding sites) that are common to genes within a module,
to identify promoter sequence elements that are common to up-regulated genes or downregulated genes but not both,
to identify combinations of promoter sequence elements that are common within a module but not shared by other modules, and
to use the identified promoter motifs to construct testable hypothetical models of gene regulation that explain observed expression patterns in terms of patterns of regulatory elements.
Various motif recognition tools are available which can identify promoter sequence elements that are common among a group of genes, many of them available as web-based programs [43, 44]. However this becomes difficult when there are large numbers of large groups to be analyzed, as the processing times for these programs generally increase exponentially with the number of sequences taken as input data. Assuming such programs could be employed, it would be possible to build up a model of the regulatory network responsible for observed patterns of gene expression by applying these tools repeatedly to gene clusters defined by cluster analysis or gene modules defined by ModuleFinder analysis. Unfortunately such a process would be time consuming and error-prone. The identification of motifs conserved in multiple sequences is a complicated computing task and can consume significant processing time. To achieve the aims outlined above, this task must be repeated for each module and subset of modules and each potential motif would then have to be searched against all the other promoter sequences. Keeping track of module memberships and relationships, promoter sequences and motifs is a complicated task in itself. If this involves using the current web-based tools it requires considerable uploading, copying and pasting of gene lists and sequences, which can also introduce errors. A more attractive alternative is to try to identify sequence elements whose presence in gene promoter regions can be correlated with observed gene expression levels . This approach was implemented using clustering-based methods in a novel tool called CoReg (Co-R egulation of Co-E xpressed G enes) to undertake promoter analysis by deducing models of gene co-regulation to explain observed patterns of gene co-expression.
The CoReg algorithm
The frequency of each of the identified sequence elements in each of the gene groups is then calculated, and displayed as a greyscale heatmap (dubbed frequency map) in which black corresponds to a frequency of 80–100%, shades of grey intermediate values and white 0–20% (Figure 5B–C). The gene groups are then clustered according to the frequencies of the identified elements in the promoter regions of their member genes (Figure 5C). At this point, the algorithm has done the bulk of its work, and it is up to the user to drive the selection of a final subset of the identified sequence elements. The user can choose to try random subsets of sequence elements, chosen by CoReg using random sampling methods, or can select their own subsets to try. For each subset of sequence elements, the image window is updated to display a frequency map for the subset of elements, and a hierarchical tree showing the gene groups clustered according to these frequencies. The aim here is to try to find a subset of sequence elements such that, when the gene groups are clustered according to the frequencies of the elements in the promoter regions of their member genes, the resulting tree has the same structure as the expression-based hierarchical clustering tree. It can then be proposed that the selected sequence elements capture the structure of the observed gene expression patterns, and it can be hypothesised that the sequences correspond to regulatory elements that are responsible for these patterns of gene expression. Experiments may then be designed to test these hypotheses in the laboratory.
While the criteria of tree matching provides a good visual cue to spot relationships between gene expression and the occurrence of sequence elements, it is up to the user to decide when they have found a set of sequence elements that might explain the observed expression patterns. The frequency maps themselves provide visual cues, helping the user to spot other patterns that may be useful. Thus, rather than providing the user with a definitive list of promoter elements that might be regulatory, CoReg is a tool for the user-driven exploration of patterns relating gene co-expression and co-regulation. CoReg scans for the specific elements present and thus will not identify degenerate elements.
Using CoReg to identify putative sequence elements in subsets of co-expressed nuclear genes encoding mitochondrial proteins
We have used CoReg to analyze the gene expression modules identified by ModuleFinder analysis as described in the example above (Figure 3B). To do this, the file containing expression data for the 51 genes in the module, created during the ModuleFinder run, was loaded into CoReg along with the promoter sequences for these genes. The built-in list of hexamers was taken as the list of sequence elements for the search. The hierarchical clustering tree was broken down into eight groups – four up-regulated (Group 1 to 4) and four down-regulated (Group 5 to 8) in response to the various treatments. The resulting tree is shown in Figure 6A. The maximum frequency tolerance was set to 0.35 and the characteristic frequency tolerance to 0.1, meaning that at each split in the tree, any sequence element present in promoters of >65% of the genes in one group but <35% of the other group would be identified as interesting, as would any sequence elements with a frequency of >90% in one group but <10% in all other groups. A subset of 6 of these elements was identified which resulted in a clustering of the gene groups that was quite similar to the expression-based clustering (Figure 6A). This suggests that although the element-based tree did not precisely match the expression-based tree, the uniqueness of expression pattern is reflected in the uniqueness of its promoter composition, relative to the other groups. Therefore the high frequency of the elements TTCTGC and ATGTAC correlate with the down regulation of modules SR 5 to 8, while the high frequency or AAAAGC, TTCCAG and AACTAT correlate with the up regulation of modules SR 1 to 4. GATGAC is present in all except the most highly downregulated module SR5.
These correlation patterns can then be used to model gene regulatory networks that can be prioritised for experimental testing (Figure 6B). Of the six elements chosen to define the expression patterns obtained from the microarray analysis two have been previously identified to be involved in regulation of gene expression. The motif GATGAC, identified in CoReg analysis as a regulatory element present in all except the most highly downregulated module SR5, is part of two regulatory elements documented in the PlantCARE database: the As-1-box of tobacco (PlantCARE ID: NT~as-1-box) and OCS-element of Arabidopsis (PlantCARE ID: AT~ocs-element). These were both identified as being involved in the induction of gene expression in response to salicylic acid, auxin and oxidative stress [50–54]. The alternative oxidase gene (Aox1a) is a member of SR4, contains this GATGAC element and transcript abundance of Aox is known to be induced by salicylic acid in several species [35, 55, 56].
Using a large number of plant microarray analyses to help pinpoint the mechanisms of gene regulation is limited by the range of tools currently available. We have developed ModuleFinder to identify sets of genes and treatments that in our hands contain more biologically related functions for analysis of the mechanisms behind co-expression in non-linear-related sets. We then developed CoReg to link the clustering tree of expression-based relationships in these gene sets with frequency tables of promoter elements. These sets of promoter elements represent putative CAREs for sets of genes, and can then be tested experimentally. We consider these tools, both built on an open source software product, provide a valuable alternative tool to those widely available for the prioritisation of promoter elements for experimental analysis.
Data sources and processing
The changes in gene expression in response to the addition of various compounds to Arabidopsis suspension cells were measured as outlined previously . Data for the addition of chitin to 50 mg/mL (Sigma, Sydney) and flagellen22 peptide to 1 μM (Auspep, Parkville, Victoria) are included here and arrays were carried out as described in Clifton et al. 2005 . Average gene expression levels were calculated across replicate chips; in each case, a minimum of two replicates was available. For each experimental variable or time point, the log ratio of expression under experimental conditions to appropriate control conditions was determined for each gene. These log ratios formed the input for ModuleFinder and CoReg analysis. Only a subset of the >22,000 genes on the Affymetrix gene chips were analyzed in the examples presented here. This gene subset comprised 374 genes, derived from a set of proteins identified in isolated Arabidopsis mitochondria by liquid chromatography-tandem mass spectrometry . For CoReg analysis, promoter sequences were taken as the 3000 base-pair sequences upstream of each gene, retrieved from TAIR.
Programming in R
ModuleFinder and CoReg were developed in R, a computer language and environment for statistical computing . An advantage of R is that it is available as free software and runs on a wide variety of UNIX platforms and similar systems, Windows and MacOS. Most importantly R provides a variety of built-in statistical and graphical techniques, including a variety of cluster analysis methods and facilities for displaying cluster trees and heat maps, while also allowing users to extend R's capabilities by defining their own functions.
Statistical methods used in ModuleFinder
ModuleFinder filters out the genes whose expression did not change under all experiments in the initial subset. This is done by considering a matrix of p-values provided by the user, which reflects the results of a test for differential expression (including correction for multiple testing if appropriate), and filtering out all genes whose p-values are above a user-defined cut-off in any of the experiments in the subset. The default p-value cut-off is 0.05, but can be set by the user to any value between zero and one. In the examples presented here, the p-values used were derived from two-sided t-tests comparing the robust multiarray analysis-processed expression measures from replicates of control and experimental conditions . In each case, a minimum of two replicates was available.
ModuleFinder uses R's hclust function for hierarchical clustering of genes based on the expression values provided in the input expression data file. The default clustering method uses a Euclidean distance measure and the Ward linkage method , but can be set by the user to any of the hierarchical clustering methods available in R. (These include Minkowski, Canberra, maximum, minimum and Manhattan distances, and the complete, single, average, centroid and McQuitty  methods of linkage.)
Having defined modules containing genes that are co-ordinately expressed in response to a subset of experiments, ModuleFinder searches for further experiments in which these modules also display co-ordinated expression responses. For each experiment not already in the module, the variance of the gene expression measures within each module is calculated using the var function in R (var(x1,.., xn) = sum(xi-mean(x))2/(n-1)). A small within-module variance can be interpreted as a high level of co-expression among the genes in the module. The sum of these within-module variance measures is calculated as an overall measure of how well gene expression in the experiment fits the set of modules. A measure of between-module variance is also calculated for each experiment (between-module var = sum(meanmodule i -meanall modules)2). Large values here indicate that the modules had distinct expression patterns in the experiment. The experiment that most closely 'fits' the module structure will display co-ordinated gene expression within modules and, ideally, distinct patterns of gene expression between modules. That is, it will have small within-module variances and a large between-module variance. The algorithm thus looks for the experiment with the highest ratio of between-module variance to sum of within-module variances.
The primary data set used by CoReg is a table representing the incidence of each of a list of potential sequence elements (e.g. hexamers, known motifs) in a list of gene promoters. This is a table of sequence elements on the horizontal axis, gene names on the vertical axis and values of TRUE or FALSE indicating whether or not the element was found in a search of the gene's promoter sequence. String matching is used to search for sequence elements in promoter sequences. This table can be prepared independently, or CoReg can build one from a list of sequence elements and a file containing gene promoter sequences in FASTA format input by the user.
The user is then asked to input a table of expression data. The genes in this table must appear in the incidence table, and must be labelled in the same way (e.g. AGI locus identifier). CoReg, like ModuleFinder, uses R's hclust function for hierarchical clustering of genes based on the expression values in this table. Distance and linkage methods can be set by the user to any of those available in R (see above). The user is also asked to indicate branches defined by the tree that they consider to be gene expression clusters.
The resulting hierarchical clustering tree is split into two branches, separating the genes into two discrete groups (say A and B). The incidence table is then used to determine, for each sequence element in the table, the proportion of genes in each group that contain that element. This is dubbed the 'frequency' of the element in those two groups (say fi, A and fi, B, where i denotes sequence element i). These frequencies are then compared to a user-defined tolerance level, f. Any sequence element that occurs with frequencies fi, A <f and fi, B > (1-f) is recorded in a list of sequence elements that may be able to explain the difference in expression patterns of the two groups. The same process (splitting into two branches and searching for elements whose frequencies are different in the two groups defined by the split) is repeated for each of the two branches in an iterative procedure, stopping when the final user-defined clusters are reached.
In addition, the frequencies of each sequence element in each of the user-defined clusters is compared to a second user-defined tolerance level g. Any sequence elements whose frequency is below g or above 1-g in exactly one of these clusters, is added to the list of interesting sequence elements.
The gene expression clusters defined by the user are then themselves clustered, according to the frequencies of all the recorded sequence elements. The same method chosen for expression-based hierarchical clustering is used at this step. The user is then given the opportunity to select subsets of the recorded sequence elements and cluster according to those, the aim being to isolate a subset of sequence elements leading to a hierarchical structure similar to that defined by the expression-based hierarchical clustering tree.
ModuleFinder and CoReg are available for downloading from . Alternatively a package will be emailed on request containing program files, instruction files and examples files. We request that users cite this manuscript if using these programs.
cis-acting regulatory element(s)
alternative NAD(P)H dehydrogenases
nuclear genes encoding mitochondrial proteins
translocase of the outer mitochondrial membrane
translocase of the inner mitochondrial membrane
This work was supported by funding to JW and AHM through the Australian Research Council (ARC) Centres of Excellence Program. AHM is also funded as an ARC Queen Elizabeth II Research Fellow.
- Riechmann JL: Transcriptional Regulation: a Genomic Overview. The Arabidopsis Book. Edited by: Somerville CR, Meyerowitz EM. 2002, doi: 10.1199/tab.0085, http://www.aspb.org/publications/arabidopsis/: 1-46. 10.1199/tab.0085. Rockville, MD , American Society of Plant BiologistsGoogle Scholar
- Czechowski T, Bari RP, Stitt M, Scheible WR, Udvardi MK: Real-time RT-PCR profiling of over 1400 Arabidopsis transcription factors: unprecedented sensitivity reveals novel root- and shoot-specific genes. Plant J. 2004, 38 (2): 366-379.View ArticlePubMedGoogle Scholar
- Grotewold E: Plant metabolic diversity: a regulatory perspective. Trends Plant Sci. 2005, 10 (2): 57-62.View ArticlePubMedGoogle Scholar
- Toledo-Ortiz G, Huq E, Quail PH: The Arabidopsis basic/helix-loop-helix transcription factor family. Plant Cell. 2003, 15 (8): 1749-1770.PubMed CentralView ArticlePubMedGoogle Scholar
- Gautier L, Cope L, Bolstad BM, Irizarry RA: affy–Analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004, 20: 307-315.View ArticlePubMedGoogle Scholar
- Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR, Creelman R, Pilgrim M, Broun P, Zhang JZ, Ghandehari D, Sherman BK, Yu G: Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science. 2000, 290 (5499): 2105-2110.View ArticlePubMedGoogle Scholar
- Singh KB: Transcriptional regulation in plants: the importance of combinatorial control. Plant Physiol. 1998, 118 (4): 1111-1120.PubMed CentralView ArticlePubMedGoogle Scholar
- Initiative AG: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408 (6814): 796-815.View ArticleGoogle Scholar
- Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W: GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol. 2004, 136 (1): 2621-2632.PubMed CentralView ArticlePubMedGoogle Scholar
- Becker JD, Boavida LC, Carneiro J, Haury M, Feijo JA: Transcriptional profiling of Arabidopsis tissues reveals the unique characteristics of the pollen transcriptome. Plant Physiol. 2003, 133 (2): 713-725.PubMed CentralView ArticlePubMedGoogle Scholar
- Bertrand C, Benhamed M, Li YF, Ayadi M, Lemonnier G, Renou JP, Delarue M, Zhou DX: Arabidopsis HAF2 gene encoding TATA-binding protein (TBP)-associated factor TAF1, is required to integrate light signals to regulate gene expression and growth. J Biol Chem. 2005, 280 (2): 1465-1473.View ArticlePubMedGoogle Scholar
- Honys D, Twell D: Transcriptome analysis of haploid male gametophyte development in Arabidopsis. Genome Biol. 2004, 5 (11): R85-PubMed CentralView ArticlePubMedGoogle Scholar
- Ko JH, Han KH: Arabidopsis whole-transcriptome profiling defines the features of coordinated regulations that occur during secondary growth. Plant Mol Biol. 2004, 55 (3): 433-453.View ArticlePubMedGoogle Scholar
- Fowler S, Thomashow MF: Arabidopsis transcriptome profiling indicates that multiple regulatory pathways are activated during cold acclimation in addition to the CBF cold response pathway. Plant Cell. 2002, 14 (8): 1675-1690.PubMed CentralView ArticlePubMedGoogle Scholar
- Kreps JA, Wu Y, Chang HS, Zhu T, Wang X, Harper JF: Transcriptome changes for Arabidopsis in response to salt, osmotic, and cold stress. Plant Physiol. 2002, 130 (4): 2129-2141.PubMed CentralView ArticlePubMedGoogle Scholar
- Mahalingam R, Gomez-Buitrago A, Eckardt N, Shah N, Guevara-Garcia A, Day P, Raina R, Fedoroff NV: Characterizing the stress/defense transcriptome of Arabidopsis. Genome Biol. 2003, 4 (3): R20-PubMed CentralView ArticlePubMedGoogle Scholar
- Scheible WR, Morcuende R, Czechowski T, Fritz C, Osuna D, Palacios-Rojas N, Schindelasch D, Thimm O, Udvardi MK, Stitt M: Genome-wide reprogramming of primary and secondary metabolism, protein synthesis, cellular growth processes, and the regulatory infrastructure of Arabidopsis in response to nitrogen. Plant Physiol. 2004, 136 (1): 2483-2499.PubMed CentralView ArticlePubMedGoogle Scholar
- Thimm O, Blasing O, Gibon Y, Nagel A, Meyer S, Kruger P, Selbig J, Muller LA, Rhee SY, Stitt M: MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. Plant J. 2004, 37 (6): 914-939.View ArticlePubMedGoogle Scholar
- Aquilar-Ruiz JS, Divina F: Evolutionary biclustering of microarray data. Applications of evolutionary computing, proceedings lecture notes in computer science. 2005, 3449: 1-10.View ArticleGoogle Scholar
- Getz G, Gal H, Kela I, Notterman DA, Domany E: Coupled two-way clustering analysis of breast cancer and colon cancer gene expression data. Bioinformatics. 2003, 19 (9): 1079-1089.View ArticlePubMedGoogle Scholar
- Getz G, Levine E, Domany E: Coupled two-way clustering analysis of gene microarray data. Proc Natl Acad Sci U S A. 2000, 97 (22): 12079-12084.PubMed CentralView ArticlePubMedGoogle Scholar
- Kluger Y, Basri R, Chang JT, Gerstein M: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 2003, 13 (4): 703-716.PubMed CentralView ArticlePubMedGoogle Scholar
- Tang C, Zhang AD: Interrelated two-way clustering and ita application on gene expression data. International Journal on Artificial Intelligence Tools. 2005, 14: 577-597. 10.1142/S0218213005002272.View ArticleGoogle Scholar
- Turner HL, Bailey TC, Krzanowski WJ, Hemingway CA: Bioclustering models for structured microarray data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2005, 2: 316-329. 10.1109/TCBB.2005.49.View ArticlePubMedGoogle Scholar
- Wu CJ, Kasif S: GEMS: a web server for biclustering analysis of expression data. Nucleic Acids Res. 2005, 33 (Web Server issue): W596-9.PubMed CentralView ArticlePubMedGoogle Scholar
- Kloster M, Tang C, Wingreen NS: Finding regulatory modules through large-scale gene-expression data analysis. Bioinformatics. 2005, 21 (7): 1172-1179.View ArticlePubMedGoogle Scholar
- Getz G, Domany E: Coupled two-way clustering server. Bioinformatics. 2003, 19 (9): 1153-1154.View ArticlePubMedGoogle Scholar
- Murali TM, Kasif S: Extracting conserved gene expression motifs from gene expression data. Pac Symp Biocomput. 2003, 77-88.Google Scholar
- Heazlewood JL, Millar AH: AMPDB: the Arabidopsis Mitochondrial Protein Database. Nucleic Acids Res. 2005, 33 Database Issue: D605-10.Google Scholar
- Heazlewood JL, Tonti-Filippini JS, Gout AM, Day DA, Whelan J, Millar AH: Experimental analysis of the Arabidopsis mitochondrial proteome highlights signaling and regulatory components, provides assessment of targeting prediction programs, and indicates plant-specific mitochondrial proteins. Plant Cell. 2004, 16 (1): 241-256.PubMed CentralView ArticlePubMedGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998, 95 (25): 14863-14868.PubMed CentralView ArticlePubMedGoogle Scholar
- Clifton R, Lister R, Parker KL, Sappl P, Elhafez D, Millar AH, Day DA, Whelan J: Stress-induced co-expression of alternative respiratory chain components in Arabidopsis thaliana. Plant Mol Biol. 2005, 58: 193-212.View ArticlePubMedGoogle Scholar
- McQuitty LL: Capabilities and improvement of linkage analysis as a clustering method. Educ Psychol Meas. 1964, 24: 441-456.View ArticleGoogle Scholar
- Xie Z, Chen Z: Salicylic acid induces rapid inhibition of mitochondrial electron transport and oxidative phosphorylation in tobacco cells. Plant Physiol. 1999, 120 (1): 217-226.PubMed CentralView ArticlePubMedGoogle Scholar
- Norman C, Howell KA, Millar AH, Whelan JM, Day DA: Salicylic acid is an uncoupler and inhibitor of mitochondrial electron transport. Plant Physiol. 2004, 134 (1): 492-501.PubMed CentralView ArticlePubMedGoogle Scholar
- Lister R, Chew O, Lee MN, Heazlewood JL, Clifton R, Parker KL, Millar AH, Whelan J: A transcriptomic and proteomic characterization of the Arabidopsis mitochondrial protein import apparatus and its response to mitochondrial dysfunction. Plant Physiol. 2004, 134 (2): 777-789.PubMed CentralView ArticlePubMedGoogle Scholar
- Taylor NL, Rudhe C, Hulett JM, Lithgow T, Glaser E, Day DA, Millar AH, Whelan J: Environmental stresses inhibit and stimulate different protein import pathways in plant mitochondria. FEBS Lett. 2003, 547 (1-3): 125-130.View ArticlePubMedGoogle Scholar
- Harmer SL, Hogenesch JB, Straume M, Chang HS, Han B, Zhu T, Wang X, Kreps JA, Kay SA: Orchestrated transcription of key pathways in Arabidopsis by the circadian clock. Science. 2000, 290 (5499): 2110-2113.View ArticlePubMedGoogle Scholar
- Pilpel Y, Sudarsanam P, Church GM: Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet. 2001, 29 (2): 153-159.View ArticlePubMedGoogle Scholar
- Remenyi A, Scholer HR, Wilmanns M: Combinatorial control of gene expression. Nat Struct Mol Biol. 2004, 11 (9): 812-815.View ArticlePubMedGoogle Scholar
- Brocard IM, Lynch TJ, Finkelstein RR: Regulation and role of the Arabidopsis abscisic acid-insensitive 5 gene in abscisic acid, sugar, and stress response. Plant Physiol. 2002, 129 (4): 1533-1543.PubMed CentralView ArticlePubMedGoogle Scholar
- Chattopadhyay S, Puente P, Deng XW, Wei N: Combinatorial interaction of light-responsive elements plays a critical role in determining the response characteristics of light-regulated promoters in Arabidopsis. Plant J. 1998, 15 (1): 69-77.View ArticlePubMedGoogle Scholar
- Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Regnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z: Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol. 2005, 23 (1): 137-144.View ArticlePubMedGoogle Scholar
- Wasserman WW, Sandelin A: Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet. 2004, 5 (4): 276-287.View ArticlePubMedGoogle Scholar
- van Helden J, Andre B, Collado-Vides J: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol. 1998, 281 (5): 827-842.View ArticlePubMedGoogle Scholar
- Lescot M, Dehais P, Thijs G, Marchal K, Moreau Y, Van de Peer Y, Rouze P, Rombauts S: PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res. 2002, 30 (1): 325-327.PubMed CentralView ArticlePubMedGoogle Scholar
- Higo K, Ugawa Y, Iwamoto M, Korenaga T: Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res. 1999, 27 (1): 297-300.PubMed CentralView ArticlePubMedGoogle Scholar
- Davuluri RV, Sun H, Palaniswamy SK, Matthews N, Molina C, Kurtz M, Grotewold E: AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics. 2003, 4: 25-PubMed CentralView ArticlePubMedGoogle Scholar
- Steffens NO, Galuschka C, Schindler M, Bulow L, Hehl R: AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana. Nucleic Acids Res. 2005, 33 (Web Server issue): W397-402.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen W, Chao G, Singh KB: The promoter of a H2O2-inducible, Arabidopsis glutathione S-transferase gene contains closely linked OBF- and OBP1-binding sites. Plant J. 1996, 10 (6): 955-966.View ArticlePubMedGoogle Scholar
- Kang HG, Singh KB: Characterization of salicylic acid-responsive, arabidopsis Dof domain proteins: overexpression of OBP3 leads to growth defects. Plant J. 2000, 21 (4): 329-339.View ArticlePubMedGoogle Scholar
- Sakai T, Takahashi Y, Nagata T: Analysis of the promoter of the auxin-inducible gene, parC, of tobacco. Plant Cell Physiol. 1996, 37 (7): 906-913.View ArticlePubMedGoogle Scholar
- Zhang B, Chen W, Foley RC, Buttner M, Singh KB: Interactions between distinct types of DNA binding proteins enhance binding to ocs element promoter sequences. Plant Cell. 1995, 7 (12): 2241-2252.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang B, Foley RC, Singh KB: Isolation and characterization of two related Arabidopsis ocs-element bZIP binding proteins. Plant J. 1993, 4 (4): 711-716.View ArticlePubMedGoogle Scholar
- Maxwell DP, Nickels R, McIntosh L: Evidence of mitochondrial involvement in the transduction of signals required for the induction of genes associated with pathogen attack and senescence. Plant J. 2002, 29 (3): 269-279.View ArticlePubMedGoogle Scholar
- Raskin I, Ehmann A, Melander WR, Meeuse BJD: Salicylic acid: a natural inducer of heat production in arum lilies. Science. 1987, 237: 1601-1602.View ArticlePubMedGoogle Scholar
- Ward JHJ: Hierarchical Grouping to Optimize an Objective Function. J Am Stat Assoc. 1963, 58: 236-244. 10.2307/2282967.View ArticleGoogle Scholar
- Lister R, Hulett JM, Lithgow T, Whelan J: Protein import into mitochondria: Origins and functions today. Molecular Membrane Biology. 2005, 22: 87-100.View ArticlePubMedGoogle Scholar
- Millar AH, Heazlewood JL: Genomic and proteomic analysis of mitochondrial carrier proteins in Arabidopsis. Plant Physiol. 2003, 131 (2): 443-453.PubMed CentralView ArticlePubMedGoogle Scholar
- Millar AH, Day DA, Whelan J: Mitochondrial Biogenesis and Function in Arabidopsis. The Arabidopsis Book. Edited by: Somerville CR, Meyerowitz EM. 2004, doi: 10.1199/tab.0105, http://www.aspb.org/publications/arabidopsis/: 1-36. 10.1199/tab.0105. Rockville, MD , American Society of Plant BiologistsGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.