Improved DNase-seq protocol facilitates high resolution mapping of DNase I hypersensitive sites in roots in Arabidopsis thaliana
- Jason S. Cumbie†1,
- Sergei A. Filichkin†1, 3 and
- Molly Megraw1, 2, 3Email authorView ORCID ID profile
© Cumbie et al. 2015
Received: 9 June 2015
Accepted: 21 August 2015
Published: 4 September 2015
Identifying cis-regulatory elements is critical in understanding the direct and indirect regulatory mechanisms of gene expression. Current approaches include DNase-seq, a technique that combines sensitivity to the nonspecific endonuclease DNase I with high throughput sequencing to identify regions of regulatory DNA on a genome-wide scale. While this method was originally developed for human cell lines, later adaptations made the processing of plant tissues possible. Challenges still remain in processing recalcitrant tissues that have low DNA content.
By removing steps requiring the use of gel agarose plugs in DNase-seq, we were able to significantly reduce the time required to perform the protocol by at least 2 days, while also making possible the processing of difficult plant tissues. We refer to this simplified protocol as DNase I SIM (for simplified in-nucleus method). We were able to successfully create DNase-seq libraries for both leaf and root tissues in Arabidopsis using DNase I SIM.
This protocol simplifies and facilitates generation of DNase-seq libraries from plant tissues for high resolution mapping of DNase I hypersensitive sites.
Cis-regulatory elements (CREs) are short DNA sequences which are used by regulatory proteins such as transcription factors (TFs) to control the expression of genes . Because these elements need to be physically accessible to their respective regulatory proteins, they are often found in regions of the genome known as ‘open chromatin’ that are either unbound by or depleted of nucleosomes . Binding of regulatory proteins to their target DNA sequences can cause dynamic chromatin rearrangements resulting in displacement of nucleosomes in the regions of accessible chromatin (reviewed in [1, 2]).
Chromatin accessibility and the effects of chromatin structure modifications on gene transcription can be assessed directly and indirectly. Direct chromatin accessibility assays include Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE). FAIRE-seq  is a relatively simple method for probing nucleosome-depleted regions of a genome. However, a high level of background noise in the output data limits its resolution and value . Due to the lack of tightly bound histone proteins, regions of open chromatin are more readily digested by endonucleases such as micrococcal nuclease (MNase) and deoxyribonuclease I (DNase I). MNase is a low specificity endo-exonuclease that digests single-stranded, double-stranded, circular, and linear DNA. In MNase-seq experiments (commonly referred to as a nucleosome occupancy assay), mononucleosomes are extracted by MNase digestions of formaldehyde-crosslinked chromatin . The nucleosomal population is subsequently subjected to next generation sequencing (NGS), and nucleosome positioning is then deduced from the NGS read counts across the genome. Thus, MNase-seq is a method of choice for assessing genome-wide nucleosome positioning. It can also provide limited information on TF occupancy in different cell types . Drawbacks of the MNase-seq method are that it requires a large number of cells and meticulous enzymatic titrations for reproducible evaluation across samples. In addition, MNase has been shown to have a bias toward AT-cleavage specificity and comparisons between different experiments may vary significantly. In contrast, DNase I is a double stranded DNA-specific endonuclease that releases accessible chromatin by preferentially digesting nucleosome-free genomic regions categorized as DNase I hypersensitive sites (DHSs). Using DNase I digestions of intact nuclei in conjunction with NGS, known as DNase-seq, allowed for genome-wide identification of DHSs with unmatched specificity, sensitivity, and throughput . The improved quality of NGS data has made DNase-seq a preferred method of choice for probing chromatin accessibility in general, and TF occupancy in particular [6, 7].
DHSs have been shown to be strongly associated with CREs [8, 9]. While initial studies using DNase-seq were performed in human cell lines , DNase-seq was later adapted to plant tissues, with the first DNase-seq experiments occurring in rice seedling and callus tissue  and in Arabidopsis thaliana seedling and flower tissues . A critical step in preparing DNase-seq libraries requires isolating intact nuclei. The isolation of nuclei in plants is especially challenging due to the existence of the cell wall. Removal of the rigid cell wall and additional cellular debris requires an extensive amount of additional time and added steps to ensure that nuclei are not lysed in the process. The susceptibility of DNA to mechanical shearing must be carefully avoided to ensure that DNase I digestion can occur under optimal conditions, and that background noise due to spurious DNA fragments is not introduced in down-stream analyses. The latter of these challenges was addressed by introducing low-melt gel agarose plugs during DNase I digestion and T4 DNA polymerase blunt end repair to stabilize high molecular weight DNA in mammalian cell lines . Further adaptations to this protocol added a cell wall removal step . Because of the extensive molecular processing steps required in DNase-seq, tissues that are more resistant to homogenization and that have fewer cells per gram of tissue isolated will be prohibitively challenging to examine. To address this difficulty, here we present a simplified DNase-seq protocol in plants that bypasses the use of low-melt gel agarose plugs. In this protocol, DNA end repair by T4 DNA polymerase is performed directly in nuclei, thus we refer to this simplified protocol as DNase I SIM (for simplified in-nucleus method).
Recently, other protocols such as DNase-Flash  have been successfully adapted using the INTACT system  for use with biotinylated nuclei obtained from transgenic Arabidopsis lines . Where INTACT lines are available, the labor-saving ATAC-seq approach  that uses hyperactive Tn5 transposase to characterize DNA accessibility could also potentially be adapted to plant tissues, though output signal resolution in comparison to DNase-seq is still unclear. In this study, we have developed a purification and sequencing preparation that makes plant tissue studies using the original DNase-seq approach  feasible, even in recalcitrant tissues such as plant roots. We have successfully used the DNase I SIM protocol in A. thaliana leaf and root tissue, providing the very first DHS map in non-transgenic whole root tissue. This protocol greatly facilitates DHS sequencing in cases where an affinity purification system is not available. DNase I SIM thus provides an option that may be particularly desirable for DNase-seq studies in crop species where tissue is abundant but development of transgenic lines is impractical.
DNase I SIM protocol allows isolation and digestion of nuclei from leaf and root tissue in substantially reduced time
The past use of low-melting agarose plugs in combination with a more vigorous nuclei isolation protocol [6, 10] made it possible to analyze DHSs in leaf and flower tissue in Arabidopsis and seedling and callus tissue in rice [8, 9]. However, we found that we were unable to produce sufficient quantities of DNAse I digested DNA for NextGen sequencing using a similar version of this protocol when processing Arabidopsis root tissue samples. A possible reason for the low DNA yield was a particularly high content of the cell debris (including broken root hairs) that co-purified with root nuclei. The enormous required volume of preparations was prohibitive for the embedding of sufficient amounts of nuclei into the constricted volume of a PFGE agarose plug. As a result, visualization of the digested DNA was difficult to monitor using PFGE. In addition, scaling up the number of plugs to achieve higher yield required a sharp increase in the amounts of the T4 polymerase in order to polish DNA ends. Thus, the usage of agarose plugs made the protocol time consuming, labor-intensive, and less predictable.
DNase I SIM protocol data validation
In order to ensure that the modifications made to the original DNase-seq protocol did not change the nature of the data produced, we compared our leaf data to published leaf data in A. thaliana . We made these comparisons using three separate approaches that examined averaged DHS distribution genome-wide, across all genes, and a direct DHS-to-DHS comparison for identifying commonalities and differences for individual genes in all data sets analyzed. These same analyses were carried out using our data from root tissue. It is important to note that sequencing our leaf sample on the HiSeq-2000 generated nearly two-to-three times as many reads as had been previously published (100 × 106 compared to 46 × 106). To account for this difference in read depth, we provide separate analyses in which we randomly subsampled DNase I SIM data to a comparable read depth to provide the most direct comparisons. For these analyses our data is marked as “normalized”.
DHS genome distribution is depleted in centromeric and peri-centromic regions of the chromosome
DHS distribution on average localizes to promoter and transcriptional termination regions
DHS peaks are highly reproducible
To assess the reproducibility of individual DHSs, we compared the coordinates of DHSs defined by F-Seq  and identified those sites that overlapped between our DNase I SIM data sets and those produced using previously published data [8, 9]. In order for a pair of peaks to be considered overlapping between two data sets, at least 80 % of one of the two peaks had to be covered by the corresponding peak. For these analyses, we only used the normalized leaf data in order to provide the most direct comparisons. We found that 70–74 % of all peaks identified in the re-analysis were recapitulated in our normalized data sets (Additional file 4). Additionally we found that when analyzed on a gene-for-gene basis, 90–92 % of genes identified in our re-analysis of this published data were also found to have peaks in our own data sets, providing confidence that the alterations to the original DNase-seq protocol did not affect open chromatin peak identification.
An important point to consider when identifying DHSs is the introduction of background cleavage events. While it is possible to reproduce many of the peaks shown previously, if there is sufficient background noise it could be the case that this is a result of too many false positives contributing to peak agreement. To address this issue, F-Seq generates a background model using a kernel density estimate (kde) of the sequence data for all cleavage events, and then identifies regions that are four standard deviations (by default) above the mean of this kde. To verify that this calculation was estimating a similar level of background cleavage events between our normalized leaf data sets and previously published data, we calculated the percent of all tags that fell within DHSs. We found that ~46 % of our reads fell within identified DHSs in our normalized data sets compared with ~39 % of reads in the re-analyzed leaf data set, indicating that we were generating comparable levels of background cleavage events in our sequenced results.
Root- and leaf-specific genes show distinct differences in open chromatin
DNase-seq is a technically challenging protocol that provides a great deal of promise in its ability to map DHSs genome-wide in a wide variety of organisms and tissues. Protocol adaptations to tissues with cell walls were critical in expanding the utility of DNase-seq to plants, and have already begun to provide new insights into open chromatin and the epigenetic control of plant genomes [8, 9]. However, because most of the most time-consuming processing steps are performed using agarose gel plugs, tissues with low amounts of DNA or that are particularly recalcitrant due to high levels of cellular debris are prohibitively difficult to study. Using our simplified DNase I SIM protocol, we were able to bypass the gel agarose plugs and provide a method for processing tough plant tissue. More importantly, this new protocol generates sufficient quantities of genomic DNA for sequencing on NextGen sequencing platforms, providing an even greater depth of sequencing than was achieved in the past.
Previous studies were already able to generate DNase-seq data in Arabidopsis leaf and flowering tissue  and in rice callus and seedling tissue , however without the use of transgenics no current studies provide DHS maps for root tissue, a notoriously difficult tissue to process. One aspect of purification approaches that do not use transgenic lines is a realistic requirement for 5–15 g of input tissue depending on tissue type. While this is does not pose a serious limitation in most crop species, it is a feasible but non-negligible quantity in systems such as developing Arabidopsis roots. Therefore, if an INTACT transgenic line is available in the whole-organism, tissue, or cell type of interest [12, 13], this should be considered as it provides for an alternative purification strategy that requires less tissue. However, in cases where the production of new transgenic lines of interest requires kanamycin or Basta resistance, or if there is a need for DNase-I studies in mutant lines without an INTACT version, DNase I SIM may provide a potential alternative for generating DHS maps in a given sample of interest.
With our DNase I SIM protocol, we were able to successfully map DHSs genome-wide in Arabidopsis root tissue. We found that most DHSs were located near TSSs, in agreement with previous findings [8, 9]. We found that for leaf and root, about 16 % of all genes were associated with strongly overlapping sets of DHSs. Interestingly, most differences found in DHS localization occurred in their distribution within genes, rather than in distinct sets of genes with/without DHSs; however, a number of DHSs did localize to unique sets of genes in both leaf and root tissue.
In this study, we provide a simplified, more efficient, and time-saving DNase-seq protocol for preparation of genomic DNA libraries for NextGen sequencing. Bypassing the gel-agarose plug processing step allowed a decrease in the length of the protocol by at least 2 days. We successfully applied this protocol to Arabidopsis leaf and root tissues, providing for the first time a DHS map of non-transgenic whole root tissue. The data obtained using the modified protocol was consistent with publicly available datasets. We found that 16 % of all genes show strongly overlapping DHS sets between root and leaf tissues, with the largest differences occurring between the location of DHSs within or near a given gene.
Plant material and growth conditions
Arabidopsis seeds were surface sterilized in 12 % (w/v) bleach/0.1 % Tween 20 solution and washed extensively with sterile distilled water. Seeds were vernalized for 3 days at 4 °C in water, sown in two parallel rows on MS/agar plates (30 mM sucrose, 4.2 g Murashige and Skoog medium, and 0.8 % Phytagar, pH 5.8) covered with a 100 micron nylon membrane (Genesee Scientific). Seedlings were grown on vertical plates in the Conviron PGR15 growth chamber (12:12 h. light:dark, 21 °C, 50 % humidity, and 250 μmol/m2/s light intensity). Roots and leaves of 1 week old seedlings were dissected using surgical blade, flash-frozen in liquid nitrogen, and stored at −80 °C.
DNase I SIM protocol
Nuclei were isolated from roots and leaves of 1 week old Arabidopsis seedlings as described in Additional file 6. Chromatin from isolated nuclei was digested with DNase I, and DNase-seq libraries were prepared as described in Additional file 6. For a full detailed protocol, see Additional file 6. Additional file 7 provides a flowchart that outlines all protocol stages, and shows major differences with the original DNase-seq protocol . Additional file 8 provides a spreadsheet table containing a more detailed view of differences between the DNase I SIM protocol and the protocols in  and .
Genome alignment and DHS mapping
All leaf and root tissue reads have been deposited in the SRA  under the accession PRJNA285928. DNase-seq data was aligned against the TAIR10 version of the A. thaliana genome allowing for up to two mismatches using bowtie . Only those reads that aligned to one genomic locus were used. The peak calling software F-seq  was used to identify DHSs, using the aligned reads as input. F-seq version 1.84 was used and ran with a feature length of ‘300’ and only those DHSs that were at least 50 basepairs (bp) long were used for further analysis. In order to identify those DHSs that were shared between data sets, two criteria had to be met: (1) the genome coordinates of the DHS had to overlap, and (2) at least 80 % of one of the two DHSs had to be covered by the other DHS.
Read depth normalization
For all normalizations, the total number of reads that passed our alignment criteria was calculated and then reads were sampled from our leaf or root data to ensure that the total number of aligned reads in our normalized data set was equal to the number of aligned reads in the previously published data . These normalized alignments were than used to generate DHS maps. This procedure was performed 10 times, and the ranges from these comparisons were noted. For plots, a representative sample of each comparison was provided.
Analysis of genome-wide DHS distribution
To visualize the distribution of DHSs along the length of the genome, each chromosome was partitioned into non-overlapping bins of equal size. The size of each bin was calculated as the length of the longest chromosome (chromosome 1) divided by 1000. Each subsequent chromosome was then divided into bins of this length to plot the distributions proportionally for each chromosome. The total number of DHSs in each bin was then calculated, and this final value was plotted as a histogram using the R programming language .
Gene DHS matrix distribution
For each gene in a given sample, all of the DHS regions that overlapped the gene and the regions within 500 bp upstream and downstream of the gene were identified. All DHS start and end points were normalized such that: (1) position 1 started 500 bp in the 5′ direction from the gene start, (2) position 2000 was 500 bp in the 3′ direction from the gene start, and (3) positions 501–1500 were the normalized positions that fell within the gene body (e.g. if a DHS ended at 5 bp down from the TSS of a gene that was 500 bp long, the end coordinate would be 510—10 ‘normalized’ bps from the TSS, or position 501). These final normalized coordinate positions were then summed over a matrix, with each position enumerating the number of DHSs that fell within this normalized region, and this total was then plotted using the R programming language .
MM designed the study. SAF carried out tissue sample collection, preparation, and protocol modifications. SAF, JSC, and MM participated in analysis and troubleshooting for sequencing. JSC implemented computational analyses. JSC, SAF, and MM prepared the manuscript. All authors read and approved the final manuscript.
We would like to thank Greg Crawford and Yoichiro Shibata of Duke University for advice in troubleshooting. We would like to thank Spencer Kisler for his diligence in plant material production for roots and leaves. We would also like to thank Mark Dasenko of the Center for Genome Research and Biocomputing at Oregon State University for troubleshooting assistance in sample preparation for sequencing. This work was supported by the National Institutes of Health [GM097188 to MM]; and by startup funds from Oregon State University.
Compliance with ethical guidelines
Competing interests The authors declare no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Jiang J. The ‘dark matter’ in the plant genomes: non-coding and unannotated DNA sequences associated with open chromatin. Curr Opin Plant Biol. 2015;24:17–23.View ArticlePubMedGoogle Scholar
- Tsompana M, Buck MJ. Chromatin accessibility: a window into the genome. Epigenetics Chromatin. 2014;7(1):33.PubMed CentralView ArticlePubMedGoogle Scholar
- Hogan GJ, Lee CK, Lieb JD. Cell cycle-specified fluctuation of nucleosome occupancy at gene promoters. PLoS Genet. 2006;2(9):e158.PubMed CentralView ArticlePubMedGoogle Scholar
- Cui K, Zhao K. Genome-wide approaches to determining nucleosome occupancy in metazoans using MNase-Seq. Methods Mol Biol. 2012;833:413–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Rizzo JM, Sinha S. Analyzing the global chromatin structure of keratinocytes by MNase-seq. Methods Mol Biol. 2014;1195:49–59.View ArticlePubMedGoogle Scholar
- Song L. Crawford GE (2010) DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb Protoc. 2010;2:pdb prot5384.Google Scholar
- John S, Sabo PJ, Canfield TK, Lee K, Vong S, Weaver M et al. Genome-scale mapping of DNase I hypersensitivity. Curr Protoc Mol Biol. 2013;Chapter 27:Unit 21 7.Google Scholar
- Zhang W, Wu Y, Schnable JC, Zeng Z, Freeling M, Crawford GE, et al. High-resolution mapping of open chromatin in the rice genome. Genome Res. 2012;22(1):151–62.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang W, Zhang T, Wu Y, Jiang J. Genome-wide identification of regulatory DNA elements and protein-binding footprints using signatures of open chromatin in Arabidopsis. Plant Cell. 2012;24(7):2719–31.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang W, Jiang J. Genome-wide mapping of DNase I hypersensitive sites in plants. Methods Mol Biol. 2015;1284:71–89.View ArticlePubMedGoogle Scholar
- Vierstra J, Wang H, John S, Sandstrom R, Stamatoyannopoulos JA. Coupling transcription factor occupancy to nucleosome architecture with DNase-FLASH. Nat Methods. 2014;11(1):66–72.View ArticlePubMedGoogle Scholar
- Deal RB, Henikoff S. The INTACT method for cell type-specific gene expression and chromatin profiling in Arabidopsis thaliana. Nat Protoc. 2011;6(1):56–68.View ArticlePubMedGoogle Scholar
- Sullivan AM, Arsovski AA, Lempe J, Bubb KL, Weirauch MT, Sabo PJ, et al. Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Rep. 2014;8(6):2015–30.View ArticlePubMedGoogle Scholar
- Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10(12):1213–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Boyle AP, Guinney J, Crawford GE, Furey TS. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics. 2008;24(21):2537–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408(6814):796–815.View ArticleGoogle Scholar
- The R Project for Statistical Computing. 2002. http://www.r-project.org. Accessed on 3 June 2015.
- The Arabidopsis Information Resource. 2000. http://www.arabidopsis.org. Accessed on 3 June 2015.
- Cheng Y, Zhou W, El Sheery NI, Peters C, Li M, Wang X, et al. Characterization of the Arabidopsis glycerophosphodiester phosphodiesterase (GDPD) family reveals a role of the plastid-localized AtGDPD1 in maintaining cellular phosphate homeostasis under phosphate starvation. Plant J. 2011;66(5):781–95.View ArticlePubMedGoogle Scholar
- Short Read Archive. 2007. http://www.ncbi.nlm.nih.gov/sra. Accessed on 3 June 2015.
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.PubMed CentralView ArticlePubMedGoogle Scholar