An efficient procedure for plant organellar genome assembly, based on whole genome data from the 454 GS FLX sequencing platform
© Zhang et al; licensee BioMed Central Ltd. 2011
Received: 11 September 2011
Accepted: 29 November 2011
Published: 29 November 2011
Complete organellar genome sequences (chloroplasts and mitochondria) provide valuable resources and information for studying plant molecular ecology and evolution. As high-throughput sequencing technology advances, it becomes the norm that a shotgun approach is used to obtain complete genome sequences. Therefore, to assemble organellar sequences from the whole genome, shotgun reads are inevitable. However, associated techniques are often cumbersome, time-consuming, and difficult, because true organellar DNA is difficult to separate efficiently from nuclear copies, which have been transferred to the nucleus through the course of evolution.
We report a new, rapid procedure for plant chloroplast and mitochondrial genome sequencing and assembly using the Roche/454 GS FLX platform. Plant cells can contain multiple copies of the organellar genomes, and there is a significant correlation between the depth of sequence reads in contigs and the number of copies of the genome. Without isolating organellar DNA from the mixture of nuclear and organellar DNA for sequencing, we retrospectively extracted assembled contigs of either chloroplast or mitochondrial sequences from the whole genome shotgun data. Moreover, the contig connection graph property of Newbler (a platform-specific sequence assembler) ensures an efficient final assembly. Using this procedure, we assembled both chloroplast and mitochondrial genomes of a resurrection plant, Boea hygrometrica, with high fidelity. We also present information and a minimal sequence dataset as a reference for the assembly of other plant organellar genomes.
Organellar genomes are widely used in evolutionary and population genetics studies. The plastid genome contains many essential genes, especially those required for photosynthesis. Information from multiple plastid genomes harbors suites of characters that transcend the green plant branch in the tree of life . There are multiple copies of the organellar genomes in plant cells, e.g. plant leaf cells often contain 400 to 1,600 copies of the plastid genome . In angiosperms, most chloroplast (cp) genomes are circular DNA molecules ranging from 120 to 160 kb. They have a quadripartite organization, consisting of two copies of inverted repeats (IRs) of 20-28 kb in size, which divides the rest of the genome into a large-single-copy region (LSC; 80-90 kb) and a small-single-copy (SSC; 16-27 kb) region . Plants have larger and more complex mitochondrial (mt) genomes than other unicellular and multicellular eukaryotes. Mitochondrial genomes, especially those in seed plants, are exceptionally varied in size and structure, and their sequence contents accumulate many repetitive sequences [4, 5].
Recently, there has been a dramatic increase in the number of completely sequenced organellar genomes. To date, sequences from 206 cp genomes and 47 mt genomes have been deposited in the GenBank Organelle Genome Resources. Most of them are sequences generated by the Sanger capillary sequencers . With the emergence of next-generation sequencing technologies, new approaches for cp genome sequencing and assembly have been proposed because of their timesaving, high-throughput, and low-cost advantages [7–9]. As for mt genomes, three main strategies have been used: physical map-based [10–12], shotgun-based [13–15], and gene-based . However, all these strategies for sequencing organellar genomes either require the isolation of cp or mt DNA from nuclear DNA  or are difficult to assemble because of the dynamic structure of multipartite molecules [18–20]. Isolating mitochondria and their DNA is often challenging, so that it is imperative to develop better methods for sequencing and assembling these genomes that do not include experimental sample enrichment.
In this study, we present a rapid procedure for complete cp and mt genome sequence assembly from whole genome shotgun data, without organellar DNA isolation. Using this procedure, we successfully assembled the complete cp and mt genomes of a resurrection plant, Boea hygrometrica (Bunge) R Br of the Gesneriaceae family. This is the first mitochondrial genome to be sequenced from a resurrection plant. Boea hygrometrica is an unusual, desiccation-tolerant angiosperm native to China [21, 22]. Comprehensive analyses of the organellar genomes of this particular plant, and comparison with those of other plants, will help us to understand the evolution of Boea hygrometrica.
Data summary of Roche/454 GS FLX sequencing
Mean read Length (bp)
aRead peak quality
Data summary of SOLiD4.0 sequencing
Read length (bp)
Total data (Mbp)
aInsert size1 (bp)
bInsert size 2 (bp)
Assembly of a Cp Genome
Assembly of an Mt Genome
In comparison to non-plant unicellular and multicellular eukaryotes, plants have larger and more complex mitochondrial genomes . All the features of plant mt genomes, including RNA editing, genomic recombination, trans-splicing, and insertions of "foreign" DNA from other genomes  make assembling mt genomes difficult. As recent studies have shown, genome sequences vary exceptionally in size, structure, and sequence content, especially among seed plants [4, 5]. However, there are essential genes that are highly conserved in almost all plant mt genomes, such as NADH dehydrogenase, succinate dehydrogenase, ubichinol cytochrome c reductase, cytochrome c oxidase, and ATP synthase. Using these genes, we could identify assembled contigs originated from the mt genome. Such gene-based procedures have been used to enrich plant mtDNA for mt genomic sequencing .
SOLiD mate-pair read links of spanning repetitive contigs
Minimal Sequencing Data for Organellar Genome Assembly
After finishing the organellar genome assembly for B. hygrometrica, we carried out a simulation study to determine a minimal sequencing dataset for our procedure. We randomly sampled 50-1,400 Mbp sequences from the raw Roche/454 data, and assembled the organellar genomes with our procedure. Flow-cytometry study showed that the genome size of B. hygrometrica is about 300 Mbp, which is twice large as that of Arabidopsis thaliana (our unpublished data). The sequencing coverage of B. hygrometrica is about 4.68×.
Analysis of minimal sequencing coverage for the complete organellar genome assembly of B. hygrometrica
Sample data (Mbp)
Sequencing coverage (X)*
We have successfully applied a new, efficient procedure to determine the complete chloroplast and mitochondrial genome sequences of the resurrection plant, Boea hygrometrica. Subsequently, we have also applied this approach to completely assemble the mt genome of Phoenix dactylifera L with only one run of Roche/454 data, and two Hassawi rice (Oryza sativa L. in Saudi Arabia) organellar genomes (both cp and mt genomes) (data not shown). Therefore, we are confident that our efficient and straightforward procedure will prove useful for further organellar genome sequencing and assembly.
Materials and methods
Materials and datasets
Boea hygrometrica plants were collected from their natural habitat in Beijing, and maintained in a greenhouse (approximately 25°C, 16 h/8 h light period) with regular irrigation. After 2 weeks of growth, fresh green leaves were collected. We extracted genomic DNA from 50 g of leaves according to a CTAB-based protocol . According to the manufacturer's manual for the 454 GS FLX Titanium, we used 5 μg of purified DNA to construct the libraries. In addition, two mate-pair libraries were constructed for the SOLiD 4.0 (Applied Biosystems, Foster City, CA) sequencing platform. We downloaded 206 sequenced plant chloroplast genome sequences from the NCBI (National Center for Biotechnology Information) ftp site http://ftp.ncbi.nih.gov/genomes/Chloroplasts/plastids and 47 sequenced plant mitochondrial genome sequences from NCBI Organelle Genome Resources http://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=33090&opt=organelle.
The genome data have been submitted to the National Center for Biotechnology Information (NCBI) database. The accession numbers are [GenBank: JN107811] and [GenBank: JN107812] for Boea hygrometrica chloroplast and mitochondrial genomes, respectively.
We wish to thank Xing Deng and Xuming Wang for their preparation of the B. hygrometrica materials for this project. We also thank Douglas Senalik and Simon Gladman for sharing two important perl scripts for this procedure.
This work was supported by grants from the Knowledge Innovation Program of the Chinese Academy of Sciences (KSCX2-EW-R-01-04), the Natural Science Foundation of China (90919024), the Natural Science Foundation of China (30900831) and the National Basic Research Program (973 Program) from the Ministry of Science and Technology of the People's Republic of China (2011CB944100).
- Wolf PG, Der JP, Duffy AM, Davidson JB, Grusz AL, Pryer KM: The evolution of chloroplast genes and genomes in ferns. Plant Mol Biol. 2011, 76: 251-261.View ArticlePubMed
- Pyke KA: Plastid division and development. Plant Cell. 1999, 11: 549-556.PubMed CentralView ArticlePubMed
- Yang M, Zhang XW, Liu GM, Yin YX, Chen KF, Yun QZ, Zhao DJ, Al-Mssallem IS, Yu J: The Complete Chloroplast Genome Sequence of Date Palm (Phoenix dactylifera L.). Plos One. 2010, 5:
- Alverson AJ, Wei XX, Rice DW, Stern DB, Barry K, Palmer JD: Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae). Mol Biol Evol. 2010, 27: 1436-1448.PubMed CentralView ArticlePubMed
- Alverson AJ, Zhuo S, Rice DW, Sloan DB, Palmer JD: The Mitochondrial Genome of the Legume Vigna radiata and the Analysis of Recombination across Short Mitochondrial Repeats. Plos One. 2011, 6:
- Jansen RK, Raubeson LA, Boore JL, dePamphilis CW, Chumley TW, Haberle RC, Wyman SK, Alverson AJ, Peery R, Herman SJ: Methods for obtaining and analyzing whole chloroplast genome sequences. Methods in enzymology. 2005, 395: 348-384.View ArticlePubMed
- Cronn R, Liston A, Parks M, Gernandt DS, Shen R, Mockler T: Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology. Nucleic acids research. 2008, 36: e122-PubMed CentralView ArticlePubMed
- Moore M, Dhingra A, Soltis P, Shaw R, Farmerie W, Folta K, Soltis D: Rapid and accurate pyrosequencing of angiosperm plastid genomes. Bmc Plant Biol. 2006, 6: 17-PubMed CentralView ArticlePubMed
- Tangphatsornruang S, Sangsrakru D, Chanprasert J, Uthaipaisanwong P, Yoocha T, Jomchai N, Tragoonrung S: The chloroplast genome sequence of mungbean (Vigna radiata) determined by high-throughput pyrosequencing: structural organization and phylogenetic relationships. DNA research: an international journal for rapid publication of reports on genes and genomes. 2010, 17: 11-22.View Article
- Handa H: The complete nucleotide sequence and RNA editing content of the mitochondrial genome of rapeseed (Brassica napus L.): comparative analysis of the mitochondrial genomes of rapeseed and Arabidopsis thaliana. Nucleic acids research. 2003, 31: 5907-5916.PubMed CentralView ArticlePubMed
- Kubo T, Nishizawa S, Sugawara A, Itchoda N, Estiati A, Mikami T: The complete nucleotide sequence of the mitochondrial genome of sugar beet (Beta vulgaris L.) reveals a novel gene for tRNA(Cys)(GCA). Nucleic acids research. 2000, 28: 2571-2576.PubMed CentralView ArticlePubMed
- Oda K, Yamato K, Ohta E, Nakamura Y, Takemura M, Nozato N, Akashi K, Kanegae T, Ogura Y, Kohchi T, Ohyama K: Gene Organization Deduced from the Complete Sequence of Liverwort Marchantia-Polymorpha Mitochondrial-DNA - a Primitive Form of Plant Mitochondrial Genome. J Mol Biol. 1992, 223: 1-7.View ArticlePubMed
- Clifton SW, Minx P, Fauron CMR, Gibson M, Allen JO, Sun H, Thompson M, Barbazuk WB, Kanuganti S, Tayloe C: Sequence and comparative analysis of the maize NB mitochondrial genome. Plant Physiol. 2004, 136: 3486-3503.PubMed CentralView ArticlePubMed
- Sugiyama Y, Watase Y, Nagase M, Makita N, Yagura S, Hirai A, Sugiura M: The complete nucleotide sequence and multipartite organization of the tobacco mitochondrial genome: comparative analysis of mitochondrial genomes in higher plants. Mol Genet Genomics. 2005, 272: 603-615.View ArticlePubMed
- Unseld M, Marienfeld JR, Brandt P, Brennicke A: The mitochondrial genome of Arabidopsis thaliana contains 57 genes in 366,924 nucleotides. Nat Genet. 1997, 15: 57-61.View ArticlePubMed
- Ogihara Y, Yamazaki Y, Murai K, Kanno A, Terachi T, Shiina T, Miyashita N, Nasuda S, Nakamura C, Mori N: Structural dynamics of cereal mitochondrial genomes as revealed by complete nucleotide sequencing of the wheat mitochondrial genome. Nucleic acids research. 2005, 33: 6235-6250.PubMed CentralView ArticlePubMed
- Atherton RA, McComish BJ, Shepherd LD, Berry LA, Albert NW, Lockhart PJ: Whole genome sequencing of enriched chloroplast DNA using the Illumina GAII platform. Plant methods. 2010, 6: 22-PubMed CentralView ArticlePubMed
- Lonsdale DM, Brears T, Hodge TP, Melville SE, Rottmann WH: The Plant Mitochondrial Genome: Homologous Recombination as a Mechanism for Generating Heterogeneity. Philosophical Transactions of the Royal Society of London B, Biological Sciences. 1988, 319: 149-163.View Article
- Palmer J: Plastid chromosomes: structure and evolution. Cell Culture and Somatic Cell Genetics of Plants, vol 7A, The Molecular Biology of Plastids. 1991, 5-53.
- Palmer JD, Herbon LA: Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J Mol Evol. 1988, 28: 87-97.View ArticlePubMed
- Deng X, Hu ZA, Wang HX, Wen XG, Kuang TY: Effects of dehydration and rehydration on photosynthesis of detached leaves of the resurrective plant Boea hygrometrica. Acta Bot Sin. 2000, 42: 321-323.
- Deng X, Hu ZA, Wang HX, Wen XG, Kuang TY: A comparison of photosynthetic apparatus of the detached leaves of the resurrection plant Boea hygrometrica with its non-tolerant relative Chirita heterotrichia in response to dehydration and rehydration. Plant Sci. 2003, 165: 851-861.View Article
- Xue JY, Liu Y, Li LB, Wang B, Qiu YL: The complete mitochondrial genome sequence of the hornwort Phaeoceros laevis: retention of many ancient pseudogenes and conservative evolution of mitochondrial genomes in hornworts. Curr Genet. 2010, 56: 53-61.View ArticlePubMed
- Hecht J, Grewe F, Knoop V: Extreme RNA editing in coding islands and abundant microsatellites in repeat sequences of Selaginella moellendorffii mitochondria: the root of frequent plant mtDNA recombination in early tracheophytes. Genome Biology and Evolution. 2011
- Wang D, Wu Y-W, Shih AC-C, Wu C-S, Wang Y-N, Chaw S-M: Transfer of Chloroplast Genomic DNA to Mitochondrial Genome Occurred At Least 300 MYA. Mol Biol Evol. 2007, 24: 2040-2048.View ArticlePubMed
- Wang D-Y, Zhang Q, Liu Y, Lin Z-F, Zhang S-X, Sun M-X, Sodmergen : The Levels of Male Gametic Mitochondrial DNA Are Highly Regulated in Angiosperms with Regard to Mitochondrial Inheritance. The Plant Cell Online. 2010, 22: 2402-2416.View Article
- Koumandou VL, Howe CJ: The copy number of chloroplast gene minicircles changes dramatically with growth phase in the dinoflagellate Amphidinium operculatum. Protist. 2007, 158: 89-103.View ArticlePubMed
- Wang W, Messing J: High-throughput sequencing of three lemnoideae (duckweeds) chloroplast genomes from total DNA. Plos One. 2011, 6: e24670-PubMed CentralView ArticlePubMed
- Nock CJ, Waters DL, Edwards MA, Bowen SG, Rice N, Cordeiro GM, Henry RJ: Chloroplast genome sequences from total DNA for plant identification. Plant biotechnology journal. 2011, 9: 328-333.View ArticlePubMed
- Dohm JC, Lottaz C, Borodina T, Himmelbauer H: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008, 36: e105-PubMed CentralView ArticlePubMed
- Gawel N, Jarret R: A modified CTAB DNA extraction procedure for Musa and Ipomoea. Plant Mol Biol Rep. 1991, 9: 262-266.View Article
- Alexander J: Identification and quantification of genomic repeats and sample contamination in assemblies of 454 pyrosequencing reads. 2010
- Krzywinski MI, Schein JE, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA: Circos: An information aesthetic for comparative genomics. Genome Research. 2009
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.