A tiling microarray for global analysis of chloroplast genome expression in cucumber and other plants

Plastids are small organelles equipped with their own genomes (plastomes). Although these organelles are involved in numerous plant metabolic pathways, current knowledge about the transcriptional activity of plastomes is limited. To solve this problem, we constructed a plastid tiling microarray (PlasTi-microarray) consisting of 1629 oligonucleotide probes. The oligonucleotides were designed based on the cucumber chloroplast genomic sequence and targeted both strands of the plastome in a non-contiguous arrangement. Up to 4 specific probes were designed for each gene/exon, and the intergenic regions were covered regularly, with 70-nt intervals. We also developed a protocol for direct chemical labeling and hybridization of as little as 2 micrograms of chloroplast RNA. We used this protocol for profiling the expression of the cucumber chloroplast plastome on the PlasTi-microarray. Owing to the high sequence similarity of plant plastomes, the newly constructed microarray can be used to study plants other than cucumber. Comparative hybridization of chloroplast transcriptomes from cucumber, Arabidopsis, tomato and spinach showed that the PlasTi-microarray is highly versatile.


Background
Plastids form a large family of cellular organelles that occur in plants and algae. The most prominent members of the plastid family are chloroplasts. Chloroplasts use light energy to convert carbon dioxide into organic compounds in a process called photosynthesis. Depending on tissue localization and environmental conditions, other types of plastids may develop. Plastids are also involved in various aspects of plant cell metabolism, e.g., they can store starch, lipids or proteins. Certain factors can induce mature plastids to transform from one type to another, as well as to revert back [1]. The process of plastid biogenesis and interconversion is coupled with large structural and biochemical changes. This huge transformation potential of plastids is partly a result of the presence of their own genetic material (plastome) and inherent transcriptional and translation machinery. The first complete sequences of plastid genomes (from Nicotiana tabacum and Marchantia polymorpha) were determined in 1986. Currently, more than 200 plastome sequences are available in GenBank. Most of them (more than 170) are derived from flowering plants. The majority of plastomes were sequenced after 2006, when high throughput sequencing methods became more widely available and less expensive [2,3]. The sequences of plastid genomes and their organization are highly conserved. Plastomes range in length from 120 to 200 Mbp. They usually contain two large inverted repeats (IR), namely IRA and IRB, separated by single copy regions. However, in some plants, such as Medicago truncatula, the plastomes lack one IR region. Genes encoded in the plastome can be divided into two categories: protein coding (about 70-100 genes, mostly coding for proteins related to the lightphase of photosynthesis or coding for ribosomal proteins), and RNA coding (about 30-50 rRNA and tRNA genes). There are also some conserved open reading frames (conserved ORFs), which have undefined or poorly defined functions. Some plastid genes overlap one another, and many genes are organized into operons, indicative of their prokaryotic origin. The latter are transcribed into polycistronic preRNAs, which are further processed into individual RNA species. The transcripts undergo extensive post-transcriptional modifications, including trans-splicing and RNA editing [4][5][6][7].
Plastids do not operate independently of nuclear genetic information. A large number of photosynthesis-related chloroplast proteins are encoded in the nucleus. Similarly, many proteins that are essential for post-transcriptional processing and stabilization of plastid transcripts are encoded in the nucleus and transported to plastids after their synthesis in the cytoplasm [8]. For example, sigma factors are proteins of nuclear origin that confer promoter specificity of plastid-encoded RNA polymerase (PEP) core subunits. This specificity is one of the regulation mechanisms that modulates gene expression under changing environmental conditions [7,9,10]. Apart from PEP, nucleus-encoded phage-type RNA polymerases (NEPs) are also engaged in transcription in plastids [11,12]. It has recently been shown that genes transcribed by PEP are down-regulated and genes transcribed by NEP are upregulated in tobacco ΔpsaA and ΔpsbA deletion mutants, which lack genes that code for core components of photosystem I and photosystem II, respectively. These mutations, located in the chloroplast genome, also affect the expression of nuclear genes. Genes related to photosynthesis were down-regulated, and stress-responsive genes were up-regulated [13]. This and many other works demonstrate that plastid genes act in concert with nuclear genome products, allowing plants to adapt quickly and flexibly to changing environmental and developmental conditions. However, although the overall structure and function of plastids are quite well known already, and individual plastid genes have often been subjected to intensive studies, few plastome-scale expression studies have been published so far [6,9,10,[13][14][15][16][17][18][19][20][21][22]. Moreover, most reported experiments focus on the gene-coding regions, but there is growing evidence that the so-called non-coding parts of genomes may play important regulatory roles in prokaryotes and in eukaryotic organelles [15,[23][24][25][26][27]. Therefore, based on cucumber plastid genome sequence, we constructed an oligonucleotide tiling microarray (PlasTimicroarray). Although the probes on the PlasTi-microarray do not overlap nor they are contiguous, this array has the highest resolution of the plastid arrays reported so far and covers both coding and non-coding regions on both strands of the plastome. This array is an excellent versatile tool for global functional studies of plastid genomes. In this paper, we present the microarray design, as well as detailed protocols for chloroplast RNA (cpRNA) sample preparation and hybridization. We also propose general procedures that can be used for PlasTi-microarray data normalization and analysis. We demonstrate that the PlasTi-microarray can be used for analyzing the plastome transcriptome in cucumber and other flowering plants.

Construction of the PlasTi-microarray
The cucumber plastid genome is 155,293 bp long and contains a pair of large inverted repeats, IRA and IRB (25,191 bp each). These repeats are separated by two single-copy regions -small (SSC, 18,222 bp) and large (LSC, 86,688 bp) [28]. The genome comprises 89 protein-coding genes, 8 rRNA genes and 37 tRNA genes. Some of these genes contain introns [29]. Our aim was to obtain a microarray probe set allowing for both expression studies of known plastid genes or conserved ORFs, as well as discovery of new RNAs transcribed from non-protein coding regions of the cucumber chloroplast genome. To construct such a universal microarray, we employed a tiling strategy. This strategy assumes that probes uniformly cover the whole target sequence, in a regular manner. If the tiling array resolution (the average distance between the central positions of adjacent probes) is equal or lower than the probe length, it is possible to obtain full genome coverage. Here, 50% coverage of the non-coding part of the genome was obtained by designing 70-nucleotide probes with 140nucleotide resolution. Uniform tiling of both strands of the chloroplast genome with such a density required the synthesis of over 2200 probes. Considering the specific organization of plastid genomes, we were able to reduce this number. A set of common probes was designed to cover IRA and IRB regions, which are almost (99%) identical. Also, the number of probes in the coding regions was limited to a maximum of four per gene/exon. Gene introns and conserved ORFs of undefined functions were tiled with a density similar to other non-coding regions. To avoid ambiguity in the results, probes located at the borders of coding and intergenic regions were not used. To meet this requirement, 15 probes targeting very short regions (predominantly short exons of tRNA-coding genes) were extended to 70 nt by adding adaptors that were not complementary to the cucumber plastid genome. Other criteria applied to the PlasTi-microarray probe design considered oligonucleotide properties (rule C), their secondary structure (rule H) and probe specificity within the plastid genome (rules D and S) (see Table 1 for details). The limits imposed on specific parameters were based on the criteria used to design the Array-Ready Oligo Set™ for the Arabidopsis thaliana Genome (Version 3.0) but were slightly modified to better reflect the nature of the plastome [30]. For example, GC content limits were set to 20% -60%, as the mean GC content in the plastid DNA sequence is small (about 37% in cucumber) [28]. Also, the specificity rule D was made more stringent. To this end, the minimum Hamming distance to non-target parts of the cucumber plastid genome was > 25. Application of this rule resulted in the exclusion of probes that could potentially cross-hybridize to non-target sequences displaying high (≥ 64.3%) identity to the target region (up to 70% identity was allowed in case of the Operon probes) [30]. Cross-hybridization analysis was made only for the plastid genome sequence because i) the cucumber nuclear genome sequence was unavailable, and ii) the procedure of RNA sample preparation involved chloroplast separation; thus, most transcripts of nuclear origin were removed [see Methods]. In rare cases, some design rules were relaxed when a probe that met all established criteria could not be found ( Table 1). As a result, 1629 oligonucleotide probes (70-mers) were designed and synthesized. A total of 315 probes targeted coding regions of the plastome, and 1314 probes targeted the non-coding parts of the plastid genome (introns and intergenic regions) and conserved ORFs. The design rules were relaxed for 239 oligonucleotides; for more than 73% of those "imperfect probes", the rule did not meet the only "Nucleotide composition" criterion ( Figure 1). Such a situation is an obvious result of a compromise between the need for a regular probe distribution and the capacity of the probes to hybridize. It should be emphasized that only nineteen probes did not comply with the rules S and/or D, (Substring and Hamming distance, respectively, see (Table 1) for the description of those parameters) which could slightly affect their specificity. The lower density of probe coverage in the coding regions permitted better optimization. Accordingly, the parameters for 91.4% of coding sequence-specific probes met all of the established criteria ( Figure 1).
A total of 306 probes were common to both the IRA and IRB regions. Sixteen of these probes were not perfectly complementary to one of the IRs because of minimal differences in their nucleotide sequences. However, these discrepancies are minor and would have little impact on the hybridization results (Table 2). Detailed information on probe sequences and target regions is presented as additional data [Additional Files 1 and 2]. Probe names have a uniform format P/MxxxxxxC/N, where the first symbol is a letter indicating the genome strand (P for plus, M for minus), followed by the six-digit number for the genome target coordinates, with the last letter indicating the region type (C for coding, N for non-coding). Probes common to the IR regions are named according to the IRB target coordinates.

Evaluation of experimental procedure
All microarray experiments described in this paper were performed using a two-color hybridization approach. Cucumber PlasTi-microarrays were produced with a SpotArray 24 instrument (PerkinElmer). All probes were printed in duplicate on the epoxide-coated glass slides (Corning) in sixteen 17 × 36 print-tip groups, together with Stratagene's SpotReport™ Alien™ cDNA Array Validation System oligonucleotide set and control buffer spots. The probes are complementary to the target sequences (they are also complementary to transcripts, not to cDNA). As a result, a method of direct chemical RNA labeling was chosen. For all experiments described in this paper, the Micromax ASAP RNA labeling kit (Per-kinElmer) was used. In the original manufacturer's procedure, the total RNA is chemically modified with either Cy3 or Cy5 during a short incubation, and the labeled mRNA is further purified with Oligotex™ RNA kit (QIA-GEN). Here, the total cpRNA was subjected to analysis, so the purification procedure was limited to the removal of non-incorporated dye, without fractionating the labeled RNA. Therefore, the miRNeasy Mini Kit (QIA-GEN) was used for purification. This kit preserves shorter RNA molecules (tRNA transcripts, highly abundant in chloroplasts and presumptive regulatory RNAs) from washing out. Also, the amount of input RNA was lowered [see Methods for detailed labeling protocol]. Alternatively, the Micromax ASAP RNA labeling kit can be replaced by the Arcturus ® Turbo Labeling™ Kit with Cy3/Cy5 (Applied Biosystems) (data not shown).

Data normalization and gene expression analysis
The PlasTi-microarray is a combination of two array types, a typical gene expression array and a tiling array, for which different normalization methods are usually applied [for review see [31]]. Accordingly, we needed to determine which approach would be more useful in the case of the PlasTi-microarray. Considering that our microarray contains relatively long probes and that none of them overlaps with the coding and intergenic regions (which have different GC content), we assumed that the effect of sequence-dependent hybridization can be omitted. As a result, the standard (median-or loess-based) normalization methods, commonly used for gene expression arrays, seemed appropriate. However, the application of those methods could be called into question, as they can be used if at least half of the spots on the array produce measured signals, which is usually not the case with the high-density genome tiling arrays. To verify whether such a standard normalization can be applied to the PlasTi-microarray, we calculated average numbers of probes that produced signals higher than the background intensity in microarrays from five separate experiments (A-E, Table 3). Each experiment addressed a specific biological problem and was designed and performed independently of the other ones (see Additional file 3 for more information). All microarrays hybridized within one experiment were normalized together. After data quantification, an average log-2 expression value across all channels and all microarrays from one experiment was calculated for each probe (A probe ). In addition, a mean signal from all probes (A mean ) was established. The signals produced by negative controls were used to calculate the average level of background intensity (Aneg mean ) ( Table 3). The controls included printing buffer, an alien cDNA oligonucleotide set and "empty" spots. They generated comparable signals, with those from the empty spots being slightly less intense than those from the buffer or the alien cDNA spots. The Aneg mean value was then used to calculate the threshold intensity value (Athr = Aneg mean + 2 SD), above which the spot was acknowledged as "detected" (Table 3). At least two-thirds of the cucumber probes generate signals higher than the threshold value in all five experiments. From this fact, we concluded that the methods developed for standard expression arrays can be used for the normalization of results generated with the PlasTi-microarray. In subsequent analysis, the "print-tip loess" normalization method adopted in the Limma package of the R/Bioconductor project was applied. Microarray normalization performed independently for each experiment resulted in comparable patterns of probe signal intensities on both genome strands, regardless of the experiment type. The observed differences in the absolute values of the signal intensities thus resulted from the varying methods of background correction applied ( Figure 2). As an example illustrating the application of the PlasTimicroarray to gene expression analysis, we present selected data from experiment E (other results will be presented elsewhere). Comparison of the plastid transcriptome isolated from female flowers to that isolated from the leaves revealed global up-regulation of genes Discrepancy occurs when the probe targets IRA and IRB in regions that are not perfectly identical. In this case, the probe is designed to match one IR and is not 100% complementary to the other one. Level of identity of the probe to the second IR region is presented as number of probe bases perfectly aligning to target region/total length of aligning probe sequence. Probe 5' or 3' ends that do not match the second IR, do not add up to the reported total alignment length. A mean is the mean intensity of all probes on all microarrays within the experiment. Aneg mean is the mean intensity of all negative control spots on all microarrays within the experiment. Norm.min and Norm.max are normalized minimal and maximal intensity values. Athr is a threshold intensity value qualifying the probe as "detected" and was calculated from the formula Athr = Aneg mean + 2 SD. Experiments A-E are briefly described in the Additional file 3.

Number of internal mismatches and gaps in alignment is also shown
engaged in transcription and translation in the flower plastids ( Figure 3A). Thirteen ribosomal protein genes were significantly up-regulated (six with p < 0.01 and the remaining with p < 0.05). A significant increase in transcript accumulation was also observed for rpoA (which codes for the PEP polymerase subunit), clpP (which codes for ATP-dependent protease) and three conserved ORFs, ycf1, ycf2 and ycf15. The first two of them had previously been shown to be functional genes that are essential for cell survival, suggesting that they are involved in basic plastid metabolism [32]. By contrast, in flowers, photosynthesis-related genes were down-regulated overall. The most notable examples are the significantly lower transcript levels of eight psb genes (coding for elements of the photosystem II complex). Many genes coding for components of the ATP synthase, photosystem I, cytochrome b 6 /f and NADH dehydrogenase were also down-regulated. The ndhH gene was the only significantly up-regulated photosynthesis-related gene.
To permit better visualization of the expression patterns of functionally related genes, a graphic representation of the expression data was created with the MapMan software ( Figure 3B). MapMan is a user-driven tool that superimposes large data sets, for example microarray data, on diagrams of metabolic pathways [33]. For our dataset, a ChloroPlast_CustomArray pathway diagram [20] was modified to display all genes represented on a PlasTi-microarray, and specific mapping files were prepared. Those files were added to MapMan Store. They can be downloaded from the project website and used to visualize any gene expression data produced with PlasTi-microarrays [34].

Transcriptional activity of non-coding regions
As was mentioned earlier, intergenic sequences were covered by PlasTi-microarray probes in a regular manner. This coverage made it possible to survey the transcriptional activity of the regions classified as noncoding. In the analysis of high density tiling arrays, sliding window methods are usually adopted to search for probe intensity peaks [35]. However, those methods may not perform well in this case, owing to the irregular distribution of probes in the coding and intergenic regions and the moderate array resolution (one 70 nt probe per 140 bases). At such a resolution, even a single high intensity signal may be significant, but such a signal could likely be ignored in the automatic search. Examination of the signals generated by the "non-coding" probes in microarray experiment E showed that their substantial numbers had an intensity higher than A mean and were comparable to the intensities of the "coding" . For each probe in each experiment, A norm value was obtained by dividing probe intensity (average from two duplicates) by the corresponding A mean (defined in Table 2). This enabled signal intensity comparison across all five experiments. All microarrays were analyzed with no background correction, apart from experiment D, for which local background subtraction was applied, due to high background intensity on most microarrays in this set.
probes. Therefore, for "non-coding" probe analysis, we propose a simple approach, based on establishing a cutoff value that is higher than A mean , followed by subjecting all of the signals above this threshold to a more detailed investigation. As the total number of "non-coding" probes on the array is 1314 and the majority of them are expected to target regions transcribed at low levels, the number of candidate probes should be reasonably low. In the experiment E, setting the cut-off at 10.5 (A mean = 10.27) resulted in 336 candidate probes. Information on the array probe sequence and localization [see Additional Files 1 and 2] will then be helpful to distinguish high intensity signals occurring in the proximity of gene coding regions or in introns from signals generated by antisense transcripts or those that are apparently unrelated to known genes ( Figure 4).

PlasTi-microarray in cross-species hybridization experiments
DNA microarrays have been widely adopted for crossspecies hybridization (CSH) in situations where speciesspecific microarrays were unavailable. Several earlier studies showed that the value of biological data obtained in this way depends on the sequence similarity between  cross-hybridized species [36,37]. As the plastomes of higher plants display high levels of sequence similarity, we attempted to determine whether the microarray designed for the cucumber plastid genome can be used to study the chloroplast transcriptomes of other plants.
First, each of the 1629 cucumber probe sequences was used as a query in a blastn search against the plastid genomes of nine flowering plants: Arabidopsis thaliana, tobacco, tomato, spinach, lettuce, Medicago truncatula, Lotus japonicus, poplar and barley. The highest similarity was observed for tobacco and poplar plastid genomes, with over 1100 probes (> 67%) matching with E ≤ 10 -10 and more than 1250 (> 76%) matching with E < 10 -3 , where E is the expected number of high-scoring segment pairs (Table 4). Barley and spinach plastomes showed the lowest similarity to cucumber probes of all of the species under investigation. Even in those two cases, however, more than 800 queries (~50% of all microarray probes) matched the genome with E ≤ 10 -10 .
We also observed that probes designed for coding regions of the genome, which are more conserved than non-coding regions, turned out to be the most suitable for CSH studies. More than 98% of the probes designed for the IR coding regions and more than 90% of the probes designed for the SC coding regions have enough similarity to serve as microarray probes in seven of nine  analyzed species. In the cases of spinach and barley, the number of matching probes was > 92% and > 81% for IR and SC coding regions, respectively. As whole IR regions are highly conserved in plants, IR "non coding" probes also match all analyzed genomes very well (73-78% of probes in the case of barley, spinach and Medicago, and more than 94% for the remaining plants). Non-coding parts of SC regions are less conserved, and so the number of SC "non-coding" probes matching the examined plastid genomes with E < 10 -3 was much smaller. Still, it was about 44% -45% in spinach and barley and 59% -70% in the remaining plants.
More detailed examination of aligning sequences (selected by blastn search) made for Arabidopsis and tobacco further confirms the high applicability of the designed probes in CSH experiments ( Figure 5). More than 90% of blastn-selected probes targeting SC coding and whole IR regions of the Arabidopsis and tobacco plastid genomes, align across at least 60 bases with ≥ 80% identity, which indicates that there are not many gaps or mismatches within the alignment. This level of similarity is enough to allow successful hybridization of the cucumber microarray probes to plastomes of other species. To verify in practice whether the PlasTi-microarray is truly well suited for CSH studies, a trial set of hybridization experiments was performed. Chloroplast RNA from Arabidopsis, tomato and spinach leaves was extracted, labeled and hybridized to the PlasTi-microarray with the cucumber samples labeled with the second dye as a control. For each species, one biological sample and one technical replicate (labeled in a dye-swap orientation) were analyzed, resulting in two microarray hybridizations per species and six microarrays (a-f) in total (Table 5). After image acquisition, signal quantification and quality control, data from each channel on each microarray were analyzed separately. All control and negative probes were excluded from further calculations, as were probes with at least one spot replicate flagged as "bad". The signal-tonoise ratios (SNR) of the remaining probes were then estimated by averaging SNR values from two spot replicates. Subsequently, the number of "coding" probes with average SNR ≥ 3 was estimated as a percentage of all "coding" probes. Additionally, the fraction of probes with average SNR ≥ 3 was calculated separately for "coding" probes matching and "coding" probes not matching the analyzed genome ( Figure 6A, C, E). Similar calculations were performed for the "non-coding" probes ( Figure 6B, D, F). Matching probes were selected on the basis of blastn similarity search results, independently for each species (see Table 4 and the text above). To allow results comparisons, the "matching" probe set for cucumber was always identical to the "matching" set of co-hybridized species; therefore, it varied among arrays. As one might expect, in cases of species-specific hybridization (cucumber), the number of probes (both "coding" and "non-coding") with SNR ≥ 3 was always higher than in cross-species hybridization. In all examined species, the SNR values of "non-coding" probes were lower compared with the SNR values of the "coding" probes. Those "non-coding" probes by definition target regions that are unlikely to be intensively transcribed. Additionally, all those regions are less conserved compared to coding regions. Consequently, for Arabidopsis, spinach and tomato complementarity between probes and target sequences is reduced. In the case of "coding" probes, more than 96% of them had a SNR above the set threshold in cucumber. In the case of cross-species hybridization, the calculated SNR ratios were above the threshold for 70-83% of the "coding" probes. In each case, probe filtering (based on the results of blastn searches) increased the percentage of probes with SNR ≥ 3. This relationship was observed for both "coding" and "non-coding" probes. It was also true for both green and red channels, regardless of the dye effect. In our hands, the Cy3 signals gave overall higher absolute SNR values compared to Cy5, even when the total intensity of the Cy5 channel was twice as high as the Cy3 channel (for details compare Figure 6 and Table 5). Therefore, the initial filtering out of probes with low sequence similarity to the target plastid genome is highly recommended.

Discussion
The past twenty years were remarkable for the development of high throughput methods in biology. This development resulted in researchers' focusing their interests on complex systems (e.g., genomes, transcriptomes and proteomes) instead of on individual genes or pathways. For more than a decade, DNA microarrays have been the leading tools for analyzing global gene expression in plants and animals. These tools are still attractive and offer a relatively cheap alternative to next generation sequencing. However, so far microarrays have been rarely used for the global analysis of plastome expression. Plastid gene expression has been assessed with whole-genome expression arrays, such as GeneChip Arabidopsis ATH1 Genome Arrays (Affymetrix). These arrays contain probe sets for almost 24,000 genes, both nuclear and organellar. The use of such arrays is beneficial for providing information about possible co-regulation of nuclear and organelle genes [13,22,38]. However, in cases in which such data are unnecessary, the genome arrays are too expensive and too complex for studying the plastome. This issue has raised a need for constructing plastome-specific microarrays. So far, only about ten plastid genome-specific macro-or microarrays have been developed (Table 6). Almost all of them cover only gene coding regions, and all but one are based on PCR-generated cDNA probes. The sole oligonucleotide-based plastome microarray reported so far has been designed as a tool for gene expression analysis in the Solanaceae species (tobacco, potato and tomato). This microarray contains 128 probes, printed in duplicate, each representing one gene, or conserved ORF, plus five additional probes to compensate for sequence differences between tobacco and two other Solanaceae species [20]. The Solanaceae plastid microarray was used in gene expression studies, including the analysis of tomato fruit development and chloroplast-to-chromoplast conversion or potato tuber amyloplasts vs leaf chloroplasts comparisons [6,20]. For now, this microarray can be considered the tool of choice for the analysis of plastome protein coding gene expression in the Solanaceae family. However, the Solanaceae microarray does not enable analysis of the non-coding regions. Only two arrays presented in Table 6 include probes for non-coding regions. The first one, constructed by Schmitz-Linneweber et al., was employed for searching maize chloroplast RNAs that bind to pentatricopeptide proteins (PPRs), CRP1, PPR4 and PPR5 using the RIP-chip (RNA immunoprecipitation and chip hybridization) method. For this purpose, cDNA microarrays were generated, consisting of 248 probes that covered the whole plastome of maize [39]. The probes varied in length, from 73 to 1653 bp and overlapped each other. Similarly, Nakamura et al. used 220 PCR amplicons (71 -2373 bp), each corresponding to a single known gene or an intergenic region, to build a genome microarray for the tobacco plastome [15]. Resolution of both maize and tobacco cDNA microarrays is far too low to enable precise analysis of transcriptional activity of non-coding regions. Moreover, the PCR-generated DNA probes do not distinguish between the plus and minus strands, which is why the transcriptional activity of the antisense regions is masked by signals from the sense transcripts. All of those pitfalls are omitted in the PlasTi-microarray. The two major factors that ensure the high specificity of the PlasTi-microarray are the algorithm applied for the oligonucleotide probe design and the method used for cpRNA separation (this method prevents cross-hybridization with nuclear transcripts). In addition, the application of antisense probes, which are untypical for microarrays, allows for direct labeling of cpRNA, using chemical dye coupling methods. It is easier and faster than performing reverse transcription with random primers. Moreover, direct labeling eliminates numerous problems that can arise during reverse transcription, e.g., false-positive results caused by RNA selfpriming [40]. The PlasTi-microarray also has proved to be highly sensitive. This microarray allowed us to detect as low as 2 μg cucumber cpRNA, without the need for material amplification. The PlasTi-microarray tiles the whole plastid genome with a resolution that is much higher than any plastid array reported so far. Still, this resolution is not comparable to that of regular high-density tiling arrays, where probes often overlap each other. We show that this moderate resolution can be beneficial for the simplicity of the data analysis pipeline. The small size of the plastid genomes and the perfect separation of the probes that target coding and intergenic regions allowed us to successfully adopt normalization methods dedicated to expression arrays. The use of these methods is further justified by recent findings that standard (sequenceindependent) normalization methods perform equally to sequence-dependent methods even for high-density tiling arrays with short oligonucleotide probes [41]. Also, the analysis of the non-coding regions can simply be based on the probe intensity threshold and can be performed with Excel or similar software. The interactive map that shows the localization of the probes on the cucumber genome sequence is presented as supplementary material [Additional File 2]. This map provides the possibility of distinguishing between the signals located near or opposite to gene coding regions and the signals that apparently are not linked to those parts of the genome. This kind of information can provide clues to the significance of transcripts detected in the intergenic regions.

A. thaliana -IR probes
Using the PlasTi-microarray, we compared the patterns of gene expression in the cucumber plastids isolated from female flowers and from mature leaves. We observed that profiles of functionally related gene expression are congruent, e.g., in flowers; the expression of plastid genetic machinery-related genes is enhanced and the expression of photosynthesis-related genes is weakened. Also, the cotranscribed genes, for example genes of the rpl23-rpoA operon, often shared a similar expression pattern. These results are consistent with observations made by Cho and coworkers during their DNA macroarray-based studies of plastid gene expression in Arabidopsis flowers and leaves [22]. They reported significant down-regulation of many photosynthesis-related genes and up-regulation of the plastid genetic system-related genes. The relative fold changes were higher in Cho's Arabidopsis studies than in our study. However, the PlasTi-microarrays are sensitive enough to detect even minor changes in gene expression.  Figure 6 Proportion of PlasTi-microarray probes with SNR ≥ 3 in CSH-based microarray experiments. Graphs present percentages of probes with SNR ≥ 3 in CSH-based microarray experiments, separately for "coding" (panels A, C, E) and "non-coding" probes (panels B, D, F). Each bar represents the value obtained from one channel of one microarray. Only probes with both replicate spots flagged as good were considered, and the SNR values were averaged between the replicates. a-f -arrays, as indicated in (Table 5). M -probes matching analyzed genome sequence with E ≤ 10 -3 , NM -probes not matching the genome sequence with E ≤ 10 -3 . Colors represent array channels: green -Cy3 channel, red -Cy5 channel. Cyanidioschyzon merolae (red algae) cDNA microarray; 193 PCR probes for protein coding genes and orfs role of nuclear-encoded sigma factors in plastid transcriptome changes during the shift from dark to light [10] wheat (used also for barley and rice in CSH studies) nylon macroarray; 67 PCR products (200 bp -1259 bp) representing 60 wheat plastid genes (excluding tRNAs) and 7 nuclear genes related to plastid metabolism germinating seeds and seedlings at three different stages of development [18] Maize cDNA microarray; PCR probes for 887 nuclear, 62 chloroplast, and 27 mitochondrial transcripts comparison of chloroplasts and etioplasts in stage 2 semi-emerged leaf blades of one month-old plant [19] tobacco, potato, tomato oligonucleotide microarray; 128 probes (68-71 bases) representing tobacco genes, ycfs and orfs (+ 5 probes designed for potato and tomato due to insufficient homology of tobacco probes) tomato fruit development and chloroplast-to-chromoplast conversion; potato tuber amyloplasts vs leaf chloroplasts comparison [6,20]  The expression profiles of functionally related genes generated with our microarray were more congruent than those obtained by Cho et al. (except for tRNA genes, see Figure 3). In Cho's experiments, the expression profiles of genes coding for particular components of protein complexes were often contradictory. For example, of the genes encoding the ATP-synthase protein complex, two (atpA and atpF) were up-regulated, and the other two (atpH and atpI) were down-regulated. Similarly, among the Photosystem I complex coding genes, psaA and psaB were up-regulated, whereas psaC, psaI and psaJ were down-regulated [22].
The high sequence similarity of plant plastomes allows the application of the PlasTi-microarray in cross-species hybridization studies. The probe length (70 nt) ensures high specificity and low sensitivity to single nucleotide mismatches, which frequently occur during inter-species hybridization. Accordingly, for Arabidopsis, spinach and tomato, we were able to obtain a high number of signals with SNR ≥ 3, the value widely used as a threshold for flagging spots of bad quality. Moreover, we showed that filtering the microarray probe set based on the homology of the probes to the analyzed genome can further increase the ratio of probes with high SNR values. Homology-based probe filtering is a common practice in the inter-species microarray studies [cited in [36]]. As the number of published plastid genome sequences is large and still growing, initial in silico evaluation of the PlasTi-microarray probe specificity can easily be performed before cross-species hybridization experiments.

Conclusions
We have presented in detail the PlasTi-microarray design and hybridization procedure. We show that this procedure can provide high-resolution data without the need for sophisticated analysis. The PlasTi-microarray allows the expression of both coding and non-coding plastome regions to be surveyed and generates data with at a level of resolution that has not been reported for previous plastid arrays. The microarrays are available from the authors on a cost-reimbursement basis, to academic and nonprofit institutions.

Plant growing and material collection
All cucumber plants were in var. Borszczagowski background. Seeds were imbibed in water for several hours and sterilized in a regular bleaching reagent (Domestos, 25% v/v sol., 10 min.), followed by extensive washing with the sterile water. Seeds were grown for 4 days on plates with damp blotting-paper. After that seedlings were transferred to watered peat Jiffy pellets (Agrowit, Poland) and grown as follows: (i) experiments A, B and C -wt plants were grown in versatile chamber MLR-350H, Sanyo at with 16 h light at 25°C/8 h dark at 23°C cycles (120-160 μmol*m 2 *s -1 light), with Jiffy pellets 2/3 immersed in water. 1st and 2nd leaves of 3-4 week old plants were collected; (ii) experiment D -plants (wt, msc16 and tch03 lines) were grown in phytotron at 25°C/8 h dark at 23°C cycles (160-200 μmol*m 2 *s -1 light). Plants were transferred to 8-cm pots after 2 weeks. Five fully developed leaves were collected from each of 6-week old plants; (iii) experiment E -wt plants were grown in greenhouse (natural light, supported by sodium lamps (16 h light/8 h dark cycles, 100-300 μmol*m 2 *s -1 ). Plants were transferred to 30-cm pots after ten days. Material (young and mature leaves, female flowers, growing points, young fruits) was collected from 9-week old plants. For etiolated seedlings (experiment E), seeds were grown for 4 days at dark, at 28°C, before seedling collection.
Whenever necessary (experiments B and C), plants were stressed before collection (see Additional file 3 for description, details will be described elsewhere). All material was roughly grounded in liquid nitrogen and stored at -80°C.
For CSH studies, Arabidopsis, tomato and spinach seedlings were grown as described above. Arabidopsis and spinach plants were further grown in peat Jiffy pellets, in a phytotron at 20°C with 8 h light/16 h dark cycles (160-200 μmol×m 2 ×s -1 light). Mature leaves were collected from 6-week-old plants. Tomato plants were cultured in a greenhouse in mineral wool. Fully developed leaves were collected from 12-week-old fruiting plants. The material was roughly ground in liquid nitrogen and stored at -78°C.

cpRNA isolation and labeling
Chloroplasts were isolated according to [42] with the following modifications: the sorbitol concentration was increased to 0.4 M in buffers SGB (Sorbitol Grinding Buffer) and SDB (Sorbitol Dilution Buffer). Chloroplast isolation was conducted on ice, and samples were centrifuged at 4°C. Briefly, plant tissue was soaked and ground using a mortar and pestle in homogenization buffer (SGB; 0.4 M sorbitol, 50 mM HEPES, 2 mM EDTA, 1 mM MgCl 2 , 0.1% bovine serum albumin, 1% PVP; pH 7.4). The homogenate was filtered through Miracloth and centrifuged at 3,000 × g for 5 min. The pellet was resuspended and washed twice in sorbitol dilution buffer (SDB; 0.4 M sorbitol, 20 mM HEPES, 2 mM EDTA, 1 mM MgCl 2 , 0.1% bovine serum albumin, pH 7.4). The chloroplasts were loaded onto step gradients (30%:70% Percoll in SDB) and centrifuged for 30 min at 4°C at 1,500 × g. The chloroplasts were recovered from the Percoll interface and washed twice in SDB buffer. Only intact chloroplasts were used for cpRNA isolation.
The cpRNA was extracted from the isolated chloroplasts with the use of an miRNeasy Mini Kit (Qiagen) following the manufacturer's instructions. Potential DNA contamination was removed with TURBO DNA-free™ (Ambion) according to the manufacturer's instructions with the following modifications: i) 0.5 U of enzyme was used per 100 μl reaction, and ii) the time of incubation was shortened to 10 minutes. The cpRNA was precipitated with ethanol and redissolved in water to achieve a final concentration of~500 ng/μl. Then, 2 μg of cpRNA was used per 20 μl labeling reaction with MICROMAX ASAP RNA Labeling Kit (PerkinElmer). The reaction mixture included 2 μl of Cy3 or Cy5 dye and 10 μl of labeling buffer. After a 15 min incubation at 85°C in a TC-3000 thermocycler (Techne), the mixture was rapidly cooled to 4°C. A 5 μl amount of Stop Solution was then added, and the samples were cleaned up, as described below.
For removing the unbound dye, Cy3 and Cy5 corresponding reactions were combined and cleaned up with an miRNeasy Mini Kit (Qiagen), following the isolation protocol (starting with the ethanol addition step). RNA was eluted twice with 35 μl water that had been preheated to 55°C. Labeling efficiency was controlled on Nanodrop 1000. Samples were then concentrated in a SpeedVac, to < 10 μl volume, denatured at 68°C for 5 min and mixed with 115 μl of SlideHyb #3 Buffer (Ambion) that had been preheated to 68°C. All samples were incubated at 43°C (for 5 -30 min) prior to hybridization.

Oligonucleotide probe design and selection
The oligonucleotide probe design and selection algorithm was written in C++ Builder. The cucumber plastid genome sequence, deposited in GenBank under accession number AJ970307, was used as a template. At the beginning of the design procedure, all possible 70-mer nucleotides (with a 1-nucleotide shift) were generated for each strand. We analyzed their nucleotide composition, melting temperature (Tm, calculated from the equation 64.9+(41 × (GC-16.4))/N, where N is probe length and GC is the sum of G and C occurrences in the probe) and secondary structure (determined by the hairpin detection algorithm, followed by mFold analysis of best candidates). To evaluate oligonucleotide specificity, the Hamming distance and substring length in comparison to all other possible 70-mers within the genome were calculated for each candidate sequence. Oligonucleotides not satisfying the threshold criteria, described in Table 1 of the Results section, were penalized: S -20 points, H -8 points, T -5 points and C -2 points. The best set of probes was chosen within the desired locations (contiguous coding/non coding regions), with an assumed distance between neighboring probes of 140 +/-10 bases (and up to 4 probes per coding region) by selection of probes with the highest D parameters and 0 penalty points. If there were no oligonucleotides satisfying those criteria, probes with i) the lowest number of penalty points and ii) the highest D parameter were chosen.

Microarray data analysis
For quantitative analysis filtering, GenePix Pro v. 6.1 (Axon Instruments) was used. Spots with more than 10% pixels with signal saturation in both channels, as well as visually bad or contaminated spots, were flagged out. Further data analysis was performed with R/Bioconductor packages. Hybridization quality was assessed with Array-Quality [43] and ArrayQualityMetrix [44]. Limma was used for normalization and assessing gene differential expression [45]. Data filtering and other calculations were made in Microsoft Office Excel 2003 and 2007. Microarray data have been deposited to ArrayExpress Database and assigned accessions E-MEXP-3227 (analysis of organ-specific gene expression in cucumber) and E-MEXP-3220 (CSH validation experiment).

Homology searches
The PlasTi-microarray probes in fasta format were used as queries for homology searches via the NCBI web page [46], using the blastn algorithm, with the following parameters: word size = 11; Expect threshold = 0.001; Match/Mismatch scores = 1,-1; Gap costs: Existence 2; Extension 1. Search results were further analyzed in Excel 2003.