Universal endogenous gene controls for bisulphite conversion in analysis of plant DNA methylation

Accurate analysis of DNA methylation by bisulphite sequencing depends on the complete conversion of all cytosines into uracil. Until now there has been no standard or universal gene identified as an endogenous control to monitor the conversion frequency in plants. Here, we report the development of PCR based assays for one nuclear gene IND (INDEHISCENT) and two mitochondrial genes, NAD (NICOTINAMIDE ADENINE DINUCLEOTIDE) and ATP1 (ATPase SUBUNIT 1). We demonstrated their efficacy as bisulphite conversion controls in Brassica and other plant taxa. The target regions amplified by four primer pairs were found to be consistently free from DNA methylation. Primer pairs for IND.a and NAD were effective within Brassica species, whereas two primer pairs for ATP1 provided reliable controls across a representative range of dicot and monocot angiosperm species. These primer sets may therefore be adopted as controls in plant methylation analysis for a wide range of studies.


Background
Methylation of cytosine plays an important role in epigenetic gene regulation in vertebrates and higher plants [1]. In contrast to animals, where methylated cytosine residues are primarily observed within the symmetrical CpG dinucleotide, plants display cytosine methylation in any DNA context, including symmetric CG and CHG (where H = A, T or C) and asymmetric CHH [2]. Over the past few decades, four major approaches have been used for distinguishing the epigenetic mark 5-methylcytosine (5 m C) from unmethylated cytosine. These include methods based on isochizimer restriction endonucleases, bisulphite conversion of DNA, immunoprecipitation and mass spectrometry [3,4]. Bisulphite conversion of DNA, originally developed by Frommer et al. [5], involves treatment of DNA with sodium bisulphite, where under optimized conditions unmethylated cytosine is converted to uracil, whilst methylated cytosine (both 5 m C and 5-hydroxymethylcytosine) remains unchanged. DNA sequence changes resulting from bisulphite conversion can then be detected by a variety of methods, including PCR amplification, followed by DNA sequencing where in the original uracil residues are reported as thymine. The primary advantage of this technique is that it provides base-pair resolution of methylation patterns, which is particularly useful in plants for distinguishing between the different cytosine sequence contexts [6]. Following a number of substantial improvements based on the original protocol, bisulphite sequencing is now accepted as the gold standard for detecting changes in DNA methylation [3]. The combination of bisulphite conversion and next-generation high-throughput sequencing has recently provided powerful tools for revealing DNA methylation patterns on a genome-wide scale [7][8][9][10].
Although bisulphite-based methods are reasonably accurate and reproducible in comparison with other methods, successful detection is dependent on the complete bisulphite conversion of all unmethylated cytosine into uracil [11]. Incomplete conversion complicates downstream data analysis, especially in plants where larger and more complex genomes are likely to contain a high level of 5 m C. False-positive 5-methylcytosines (cytosine read as 5-methylcytosine) are common, since it is often difficult to determine whether an unconverted cytosine represents true methylation or incomplete treatment. Both incomplete DNA denaturation prior to bisulphite treatment and reannealing during treatment can lead to incomplete bisulphite conversion, since bisulphite converts single-stranded but not doublestranded DNA [5]. As a result, repeated denaturation cycles during the bisulphite treatment are required to ensure complete conversion [4], which is now a standard feature of protocols recommended for commercial bisulphite conversion kits. However, it is still necessary to include some form of control to monitor bisulphite conversion for each sample assayed. The completion of bisulphite conversion can be tested by monitoring exogenously spiked DNA controls, or retention of endogenous non-target sequence cytosine dinucleotides [4]. Theoretically, any DNA sample with a known consistent methylation pattern could be used as a control. In mammalian genomes, unamplified, nearly methylation-free genomic DNA from specific cell lines has been used as the template to optimize and test conditions for genome-wide bisulphite conversion, PCR amplification and subsequent library construction [8]. In Arabidopsis, specific unmethylated genes and chloroplast DNA have been used for establishing the degree of conversion [9,12,13]. Plant mitochondrial DNA is another potential control for monitoring conversion, since mitochondrial genomes are free of methylated cytosines and can be isolated with nuclear DNA from all organs and tissues [14]. However, to date the full sequence of mitochondrial genomes has only been established for a small number of plant species.
In this study, we first identified a nuclear endogenous gene IND.a, present in Brassica 'A' genomes, which remains unmethylated in different organs and tissues. We then designed primer pairs for two mitochondrial genes, ATP1 and NAD. Two primer pairs for ATP1 were effective across all dicotyledonous and monocotyledonous species tested, and are therefore valuable as universal controls for DNA methylation analysis of target genes or whole genome analysis in plants.

Results and Discussion
The genus Brassica includes a diverse range of important vegetable, oilseed, fodder, and mustard crops grown and consumed throughout the world. It has three widely cultivated diploid species Brassica rapa (AA, × = 10), B. nigra (BB, × = 8) and B. oleracea (CC, × = 9) and three allotetraploid species B. napus (AACC, × = 19), B. juncea (AABB, × = 18) and B. carinata (BBCC, × = 17). Brassica genomes are complex, with most genes present as multiple paralogous copies [15][16][17][18]. However, only two orthologues of the Arabidopsis IND (INDEHIS-CENT) gene were found in the amphidiploid B. napus, one within the A genome and one within the C genome [19,20]. We designed a specific primer pair for amplification of BnaA.IND.a and BraA.IND.a, located on chromosome A3, as a control to monitor bisulphite conversion in Brassica species with A genome ( Figure  1A). The primer pair IND.a_A3 gave reproducible PCR products when genomic DNA or bisulphite treated DNA was used as a substrate for different Brassica species that possess the A genome ( Figure 1B and 1C). PCR products from bisulphite treated DNA were cloned and for eight clones selected at random, the sequences demonstrated complete conversion of all cytosine residues to uracil. We therefore conclude that BraA.IND.a and BnaA.IND.a are consistently free of DNA methylation modification. As such they represent a suitable target for use as a control in DNA methylation analysis for those Brassica species possessing the A genome (i.e. B. rapa (A), B. napus (AC), B. juncea (AB)).
In order to develop an assay that would be applicable to all Brassica species, we next considered candidate genes within the mitochondrial genome. The orthologues of Arabidopsis NAD and ATP1 were chosen as suitable targets for developing the control assay due to their potential conservation across plant taxa, and the key role they play in energy production and storage. Two primer pairs for ATP1 and one for NAD were designed based on a region within the genes conserved amongst different species (Additional file 1). Bisulphite sequencing of eight clones for each treatment indicated complete conversion of cytosine to uracil within the target region of NAD for all Brassica species. This indicated that, as expected, the gene is free of methylated cytosine. However, it was not possible to amplify this target region of NAD from other species. In contrast, although some sequence polymorphism was present in different families of plants (data not shown), the PCR products generated using two primer pairs from the ATP1 gene were of identical size (227 bp for ATP1-1 and 252 bp for ATP1-2) in all species tested. Following bisulphite sequencing of eight clones from each treatment, ATP1 genes were also found to be universally unmethylated. This result is consistent with the known lack of 5-methylcytosine within mitochondrial genomes [14]. However, the transfer and incorporation of regions of mitochondrial and plastidic genomes within the nuclear chromosomes is relatively common amongst flowering plants [21]. In Arabidopsis, the mitochondrial ATP1 gene has been found in the nuclear genome where it is methylated at a low level [13,22]. Detailed analysis of the regions flanked by the two primers ATP1-1 and ATP1-2 indicates that these are free of methylated DNA [13]. Although the methylation status of nuclear ATP1 sequences in other plant species is unknown, it appears that the PCR products we generated here are clearly free of 5-methylcytosine, irrespective of their organellar or nuclear origin.
Based on these results, we have identified suitable controls for bisulphite DNA methylation analysis, using PCR amplification of four target regions within three genes. This will facilitate the study of epigenetic variation and regulation of single genes across tissues or environments, as well as genome-wide surveys in different species. In particular, the ATP1 primer pairs appear to be valuable as universal controls for the comparison of DNA methylation state across a wide range of plant taxa. Moreover, we identified the presence of specific restriction endonuclease recognition sites within the sequence flanking the ATP1-2 primer pair in different species ( Figure 2). This provides an additional application as a useful control for reduced representation bisulphite sequencing (RRBS) in different plant species. RRBS analysis involves fragmentation of genomic DNA by different restriction endonucleases followed by isolation of fragements within a discrete size range, which are then subjected to bisulphite treatment and sequencing [8]. The target region of ATP1-2 will be retained for bisulphite sequencing following digestion with specific endonuclease combination such as SacI and MseI ( Figure 2).
Although our assays appeared not to detect any methylated cytosine for each of the genes we screened, it was important to demonstrate that there had not been excess bisulphite exposure that may have led to conversion of methylated cytosine to uracil. ATS1 is a  nuclear encoded gene that is specifically transcribed during embryo development in Arabidopsis [23]. The promoter region of BraA.ATS1 (which contained a potential cytosine methylation, data not shown) is located on chromosome A1 of B. rapa, and was chosen as an additional control to test for the presence or variation of cytosine methylation. Two primer pairs designed from the promoter region were used to generate amplicons from a single source of bisulphite treated DNA isolated from the buds of B. napus Westar10 and B. rapa Chiifu-401. Although two CG sites were completely methylated in Westar10, the same sites were partially methylated in Chiifu-401 ( Figure 3). This suggests that ATS1 has a differential methylation pattern at the pre-embryonic stage in the diploid B. rapa compared with the amphidiploid B. napus. Moreover, the observed variation of cytosine methylation in B. rapa may indicate heterogeneity of methylation pattern either in different tissues within the floral bud or reflect different developmental stages. In tomato, the SBP-box gene LeSPL-CNR is required for normal ripening, and hyper-methylation within the promoter leads to the "Colorless non-ripening" mutation. Moreover, the methylation pattern varies in different tissues (leaf and fruits), during the stages of fruit development and ripening, and between genotypes [24]. Based on our results, we can be confident that the target regions of ATP1, NAD and IND.a were indeed unmethylated and that the full conversion of cytosine to uracil did not result from excess bisulphite treatment.
We also wished to determine whether two successive bisulphite treatments were required to convert all cytosine to uracil. We therefore repeated the analysis with treated DNA from buds of Westar10 and primer pair ATP1-2, analyzing cloned sequences following each round of bisulphite treatment. We found that there was a significant effect on the conversion frequency following the second round of bisulphite treatment, based on analysis of sequences from 15 clones selected at random from each treatment (Table 1). However, the additional purification following the first round of treatment had no significant effect (Table 1).  We also addressed the potential bias in PCR amplification based on use of the primer sets [25]. Bisulphite treatment converts cytosine to uracil whereas 5-methylcytosine is not converted. Thus the complexity and base composition (Tm) of predominantly unmethylated sequences with unmethylated DNA differ widely before and after treatment. In humans and mice it has been found that most of the primer sets are biased to amplify unmethylated DNA with a high content of thymine following bisulphite conversion [25,26]. Since the target regions of the four primer sets tested all appear to be free of methylation, PCR amplification may have resulted in a bias, leading to an inaccurate estimate of the bisulphite conversion ratio. We therefore designed an experiment to explore the bias of the four primer sets as suggested by Warnecke [25]. The results based on SSCP analysis indicated very different bias values for the different primer sets within the same species, and for the same primer in different species ( Table 2). The ATP1-1 primer set in B. napus (TapidorDH) had the highest bias value (7.81) whilst the ATP1-2 set in rice (Nipponbare) had the lowest bias value (1.98) ( Table 2). All primer sets had an average bias value greater than 1, which indicated that all primers were more likely to amplify un-converted DNA. This gives some reassurance that the bisulphite conversion ratio we estimated here is likely to be a reliable estimate, since the primers were not biased to amplify converted DNA.

Conclusion
In summary, we have developed a series of control assays that are valuable for ensuring accurate analysis of plant DNA methylation following bisulphite treatment. We have developed one assay that may be applied to Brassica species containing the A genome, three that may be applied to all Brassica species, and two that appear to be universally applicable to a wide range of monocotyledon and dicotyledon plant taxa. We provide the primer sequences, expected length and number of cytosine residues in the control assays (Table 3).

Plant materials
Twenty five plant species and synthetic species representing ten genera within six dicot and monocot families were used. DNA was isolated mostly from the leaves, as well as from floral buds and siliques, from plants grown in field or controlled environment conditions (Table 4).

Bisulphite sequencing
Collection of plant material, storage and DNA extraction followed the DNeasy Plant Mini Kit (Qiagen) handbook. 450-750 ng genomic DNA was subjected to two successive treatments of sodium bisulphite conversion using the EpiTect Bisulphite kit (Qiagen) according to the manufacturer's instructions. The reaction was then purified once more using the PCR purification kit (Qiagen). Forward (F) and reverse (R) primers for bisulphite sequencing PCR were designed using Kismeth http:// katahdin.mssm.edu/kismeth based on the reference sequences in GenBank (Additional file 1).The bisulphitetreated DNA was amplified using Maxima™ Hot start Taq DNA polymerase (Fermentas). The thermal cycling program was 95°C for 4 min followed by 35 cycles of  Percentage methylation (% C) was calculated as 100 ×C/ (C + T). DNA cytosine methylation in the CG, CHG, and CHH contexts was analyzed and displayed using CyMATE [27].

PCR bias analysis
Un-converted genomic DNA, either from TapidorDH leaf 9 (dicotyledon) or Nipponbare seedling leaf (monocotyledon), was diluted to the same concentration as bisulphite treated DNA. Four sets of samples were prepared with 80:20, 60:40, 40:60, 20:80 ratios of genomic to bisulphite treated DNA. PCR products generated from these templates with different primer combinations were cloned into the pMD18-T vector (TaKaRa). Twenty individual clones from each primer set were selected randomly for SSCP (single strand conformation polymorphism) analysis to discriminate different products from genomic DNA and bisulphate treated DNA. A value for the bias (b) was calculated as b = [y(100x)]/[x(100 -y)]; y is the percent of PCR products from genomic DNA and x is the percent of genomic DNA in mixed sample [25].

Additional material
Additional file 1: Gene information for primer design.