Transcriptome analysis of Polygonatum cyrtonema Hua: identification of genes involved in polysaccharide biosynthesis

Background Polygonatum cyrtonema Hua (P. cyrtonema) is one of the most important herbs in traditional Chinese medicine. Polysaccharides in P. cyrtonema plants comprise a class of important secondary metabolites and exhibit a broad range of pharmacological functions. Results In order to identify genes involved in polysaccharide biosynthesis, we performed RNA sequencing analysis of leaf, root, and rhizome tissues of P. cyrtonema. A total of 164,573 unigenes were obtained by assembling transcripts from all three tissues and 86,063 of these were annotated in public databases. Differentially expressed genes (DEGs) were determined based on expression profile analysis, and DEG levels in rhizome tissues were then compared with their counterparts in leaf and root tissues. This analysis revealed numerous genes that were either up-regulated or uniquely expressed in the rhizome. Multiple genes encoding important enzymes, such as UDP glycosyltransferases (UGTs), or transcription factors involved in polysaccharide biosynthesis were identified and further analyzed, while a few genes encoding key enzymes were experimentally validated using quantitative real-time PCR. Conclusion Our results substantially expand the public transcriptome dataset of P. cyrtonema and provide valuable clues for the identification of candidate genes involved in metabolic pathways. Electronic supplementary material The online version of this article (10.1186/s13007-019-0441-9) contains supplementary material, which is available to authorized users.


Background
Polygonatum cyrtonema Hua (Liliaceae) is a medicinal and edible perennial plant that is used in traditional Chinese medicine for the treatment of coughs, dizziness, and lung problems [1]. According to the Chinese Pharmacopoeia, P. cyrtonema is often prescribed as the dried rhizome of P. sibiricum Red. Indeed, polysaccharides isolated from P. cyrtonema are the major bioactive component of herbal medicines and exhibit a range of important biological characteristics such as immunomodulatory, anti-aging, and antiviral properties [2,3]. The content and composition of these polysaccharides in P. cyrtonema plants varies with geographic region, cultivation method, and individual age. This means that understanding the biosynthesis, metabolism, and regulation of polysaccharides in P. cyrtonema is of great significance.
Previously, high performance anion-exchange chromatography with pulsed amperometric detection (HPAEC-PAD) has been used to reveal that polysaccharides in P. cyrtonema include rhamnose, arabinose, galactose, glucose, mannose, and fructose [4]. The composition and structure of polysaccharides in P. cyrtonema show a remarkable level of diversity [5]; while polysaccharide biosynthesis in other medicinal plants remains poorly understood, studies on Codonopsis pilosula and P. sibiricum show that this process occurs via three main processes [6][7][8]. In the first place, sucrose is converted to glucose 1-phosphate (Glc-1P) and fructose 6-phosphate
A series of complete transcriptome and metabolomic functional analyses have been performed in the postgenomic era [18,19]. In particular, RNA sequencing (RNA-Seq) has been shown to be the most efficient, cost effective method for the analysis of functional genes as well as for the accurate quantification of their expression in the absence of a reference genome [20][21][22]. Dozens of medicinal plants have so far been subjected to RNA-Seq analysis, including Artemisia argyi [23], A. annua [24], Eugenia uniflora L. [25], P. sibiricum [26], and Dendrobium officinale [27]. The use of this approach has enabled the identification of a number of new functional genes involved in specific metabolic pathways.
We conducted a comprehensive transcriptome analysis of P. cyrtonema plants and identified numerous genes involved in polysaccharide biosynthesis. The transcriptome data reported here are expected to provide as a foundation for future studies that address the molecular mechanisms of polysaccharide biosynthesis and will provide new insights on this species.

Plant material and RNA isolation
Whole P. cyrtonema plants were collected from the herb garden at the Anhui University of Chinese Medicine and identified by Professor Qingshan Yang (Anhui University of Chinese Medicine). Plant samples were cleaned with ultrapure water and dried on filter paper. The rhizomes, roots, and leaves of each were then separated and immediately frozen in liquid nitrogen (N). Separated tissues were selected from three replicates and pooled together and the total RNA of rhizomes, roots, and leaves was isolated using an RNA Plant Kit (Aidlab Biotech, Beijing, China) following the manufacturer's instructions. RNA quality was evaluated using an Agilent 2100 BioAnalyzer (Agilent Technologies, Palo Alto, CA, USA) (Additional file 1: Table S1).

Determination of total polysaccharide content
Polysaccharides were isolated from dried samples of P. cyrtonema rhizomes, roots, and leaves as described previously [28]. Briefly, dried powder (0.5 g) from each sample was mixed with 80% ethanol and was then extracted twice at 85 °C (1 h each time) in boiling distilled water. The precipitate was then collected and dissolved with distilled water. Solution absorbance was then determined using UV-spectrophotometry (JASCO Company, Japan). Anhydrous glucose was used as a standard; the standard curve of the relationship between concentration and absorbance is shown in Additional file 1: Fig. S1. The yield of total polysaccharides was calculated as follows:

Construction of cDNA and RNA-Seq libraries
Total RNA was treated with DNase I to remove all traces of genomic DNA, and mRNA was purified using poly-T oligo-attached magnetic beads. Purified mRNA was then fragmented using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). Samples of mRNA were then used for first-strand cDNA synthesis using reverse transcriptase and random hexamer primers, while second-strand cDNA was synthesized with RNase H and DNA Polymerase I using NEBNext doublestranded cDNA (ds cDNA) Fragmentase (New England BioLabs). Short cDNA fragments were recovered and repaired with NEBNext End Repair Module (New England BioLabs), and a single nucleotide (adenine) was added to 3ends using the NEBNext dA-Tailing Module (New England BioLabs). Fragments of cDNA were then ligated to the NEBNext Adaptor using the NEBNext Quick Ligation Module (New England Bio-Labs). In order to select appropriate cDNA fragments, a library was purified using the AMPure XP system (Beckman Coulter, Beverly, USA) while PCR amplification was performed using NEBNext Q5 Hot Start HiFi PCR Master Mix (2X), universal PCR primer, and index primer.
RNA-Seq libraries were generated using a NEBNext ® Ultra ™ RNA Library Prep Kit for Illumina ® (NEB, USA), following the manufacturer's instructions. The quality of each sample library was evaluated using an Agilent 2100 Bioanalyzer system (ABI, New York, NY, USA); qualified libraries were then sequenced and paired-end reads were generated using an Illumina HiSeq 4000 platform (Beijing Genomics Institute, Wuhan, China).

Functional annotation of unigenes
In order to perform functional annotation, unigenes were mapped to five databases using the BLAST (version 2.2.23, e-value ≤ 1e-5) software package released by the National Center of Biotechnology Information (NCBI) [32]. These five databases included those containing NCBI nucleotide (NT) sequences, NCBI non-redundant (NR) protein sequences, Clusters of Orthologous Groups of Proteins (COG), the Kyoto Encyclopedia of Genes and Genomes (KEGG), and a manually annotated and reviewed protein sequence database (SwissProt). Gene Ontology (GO) e-value = 1e−6) functional annotation was also performed using Blast2GO (version 2.5.0; default parameters) [33] with NR annotations, while InterPro annotations were constructed using InterProS-can5 [34].

Identification of differentially expressed genes (DEGs)
Clean transcriptome reads were mapped to unigenes using the software Bowtie2 (version 2.2.5) with default settings [35]. The read density of each sample was calculated using the fragments per kilobase of transcript per million (FPKM) method. Unigenes that exhibit differences in expression between two tissue types (i.e., rhizome vs. root and rhizome vs. leaf ) at a fold change (FC) ≥ 2.00 and a false discovery rate (FDR) ≤ 0.001 were identified as DEGs using the PoissonDis method [36,37]. These DEGs were then used for GO and KEGG enrichment analyses as previously described [38].

Analysis of genes encoding transcription factors (TFs)
In order to determine the TF families represented in the P. cyrtonema transcriptome, the open reading frame (ORF) of each unigene was detected using the software getorf (EMBOSS:6.5.7.0) [39]. These ORFs were then aligned to all TF protein domains using the plant transcription factor database (PlnTFDB) via BLASTX (e-value ≤ 1e−5) and applying the hmmsearch method [40].

Quantitative real-time PCR (qRT-PCR) analysis
In order to validate RNA-Seq data, qRT-PCR analysis was conducted using a QuantiNova SyBr Green PCR kit (Qiagen) on a PIKOREAL 96 Real-Time PCR System. Candidate reference qRT-PCR gene primers were designed using Primer Premier (version 5.0) (Additional file 1: Table S2). In this experiment, each qRT-PCR reaction contained 1 µl of diluted cDNA, 1 µl of each primer, 5 µl of 2X SYBR Green mix, and 2 µl of RNase-free water. All qRT-PCRs were performed using the following conditions: denaturation at 95 °C for 1 min, followed by 40 cycles of 95 °C for 20 s and then at 60 °C for 1 min. Successive qRT-PCR assays were performed using three biological and three technical replicates and to verify product specificity, melting curve analysis was performed after each amplification. The actin gene (CL16529.Con-tig1) of P. cyrtonema was used as a reference and the relative expression level of each unigene was calculated using the 2 −ΔΔCt approach [41].

Total polysaccharide content of P. cyrtonema samples
We extracted polysaccharides from the dried rhizomes, roots, and leaves of P. cyrtonema. Results show that total polysaccharide content was highest in rhizomes (4.76%) and lowest in leaves (1.62%) (Additional file 1: Fig. S2).

Illumina sequencing and de novo transcriptome assembly
A total of 10.42 Gb, 10.18 Gb, and 11.06 Gb of clean reads were obtained from 83.34 Mb, 81.63 Mb, and 89.13 Mb sets of raw data from P. cyrtonema leaf, root, and rhizome samples, respectively. All these data sets were characterized by Q30 ≥ 93.94%. These clean reads were then assembled sequentially so that full-length transcriptomes of each tissue could be reconstructed. A total of 164,573 unigenes were generated after selecting the longest transcript of each using the TGICL tool. These unigenes had a mean length of 710 bp and an N50 value of 1234 bp; 21.94% (36,108) and 39.84% (65,567) of these exceeded 1000 bp and 500 bp in length, respectively (Additional file 1: Fig. S3).

Functional annotation and expression overview of unigenes
Out of the 164,573 unigenes identified in this analysis, 52.29% (86,063) were mapped to at least one public database; thus, 45.83%, 28.81%, 29.00%, 33.35%, 35.62%, 35.56%, and 13.76% unigenes were recorded as significant hits in the NR, NT, SwissProt, KEGG, KOG, InterPro, and GO databases, respectively (Table 1), and 20.22% unigenes (33,283) were co-annotated in all five databases (Additional file 1: Fig. S4A). Out of the 75,427 unigenes annotated in the NR database, 50.46%, 7.14%, 2.04%, and 35.68% were mapped to the genes of Asparagus officinalis (Liliaceae) [42], Elaeis guineensis (Arecaceae) [43], Ananas comosus (Bromeliaceae) [44], and others, respectively (Additional file 1: Fig. S4B). The GO enrichment analysis presented here divided these annotated unigenes into three classes: biological processes, cellular components, and molecular function. A total of 22,649 of these unigenes were then matched with one or more GO terms and comprise 54 functional groups (Additional file 1: Fig. S5). We found that 'metabolic process' and 'biological regulation' were the most abundant categories within biological processes, while within the molecular function term, 'catalytic activity' and 'binding' were the most abundant.
Unigenes with FPKM > 1 were counted in each tissue. The results of this comparison showed that 59,271, 52,184, and 45,893 unigenes were expressed in leaves, roots, and rhizome samples, respectively (Fig. 1a). Gene expression level was highest in leaves compared with rhizomes and roots (Fig. 1b).

Identification of genes involved in polysaccharide biosynthesis
In order to understand the most significant biological processes in P. cyrtonema plants, a total of 54,881 unigenes were annotated using the KEGG database and assigned to 136 pathways (19 subcategories) (Additional file 1: Fig. S6, Table S3). A total of 14 pathways were assigned to the biosynthesis of other secondary metabolites and the most abundant unigenes within this set were annotated within the phenylpropanoid biosynthesis pathway (Fig. 2a). The 'carbohydrate metabolism' subcategory included eight pathways with the largest number of unigenes (1078) involved in amino and nucleotide sugar metabolism. In addition, 3753 unigenes were matched with related polysaccharide biosynthesis pathways, including amino and nucleotide sugar metabolism, starch and sucrose metabolism, glycolysis/gluconeogenesis, and pentose and glucuronate interconversions (Fig. 2b).
We also identified 27 unigenes encoding UDP glycosyltransferases (UGTs) (Additional file 1: Table S4). The amino acid sequence alignment of nine UGTs showed that a 29-amino acid region was well conserved within the C-terminal domain (Fig. 4d). We then constructed three-dimensional (3D) structural models of three UGTs (CL12526. Contig2, CL16787.Contig1, and Unigene23260) based on the 3D model of Medicago   PyMOL. All UGT models comprised N-and C-terminal domains with similar Rossmann-type folds and contained either six or seven β-sheets flanked by eight or nine α-helices (Fig. 4a-c). This approach also showed that 'HCGWNS' residues were highly conserved (shown as magenta dots).

Validation and expression analysis of genes encoding key enzymes
We tested the expression levels of genes encoding hexokinase (HK), fructokinase (scrk), and β-fructofuranosidase (sacA) using qRT-PCR assays (Fig. 5). Results revealed that the expression levels of genes encoding HK and sacA were highest in rhizomes, while those for genes encoding scrk were highest in leaves.

Identification of DEGs
Amongst the unigenes identified in this study, 12,005 were expressed only in rhizomes, while 7402 were expressed in all three tissues (Fig. 6a). DEGs were identified in all three tissues using FPKM values for unigenes (Fig. 6b). A comparison of expression levels between rhizomes and leaves revealed 28,725 DEGs, of which 7739 were up-regulated and 20,986 were down-regulated in rhizomes compared with leaves. A comparison of expression levels between rhizomes and roots revealed 16,553 DEGs, of which 5016 were up-regulated and 11,537 were down-regulated in rhizomes compared with roots. A comparison of gene expression among all three tissues revealed 8774 DEGs, of which 2781 were up-regulated and 5993 were down-regulated in rhizomes compared with leaves and roots. The 'carbohydrate metabolism' subgroup was particularly enriched among DEGs; compared with leaves and roots, 772 and 475 unigenes were up-regulated in rhizomes, respectively ( Table 3).

Analysis of genes showing rhizome-specific expression
A total of 2781 up-regulated DEGs showed rhizomespecific expression, with log 2 FC > 1. We further explored these genes via GOSlim functional analysis; sequence homology analysis revealed that all 2781 DEGs mapped to at least one GO term while 932 mapped within biological processes, 1365 within cellular components, and 815 within molecular function (Additional file 1: Fig. S7). Specifically, within biological processes and molecular All 2781 rhizome-specific DEGs were also annotated in the KEGG database. In order to further evaluate the biological functions of these DEGs, we compared them with the transcriptome of P. cyrtonema. Most DEGs were mapped to 'translation' , 'carbohydrate metabolism' , 'transcription' , 'biosynthesis of other secondary metabolites' , and 'signal transduction' (Additional file 1: Fig. S8). KEGG enrichment analysis revealed that these DEGs were significantly enriched in 'biosynthesis of secondary metabolites' , 'MAPK signaling pathway-plant' , 'spliceosome' , 'mRNA surveillance pathway' , 'RNA transport' , 'plant hormone signal transduction' , and 'phenylpropanoid biosynthesis' pathways (Fig. 7).

Discussion
The herb P. cyrtonema is very commonly used in traditional Chinese medicine for the treatment of various human conditions because of its immunomodulatory, anti-aging, and antiviral activities. Although polysaccharides are the major active constituents of P. cyrtonema plants, genomic data remain unavailable in spite of the significance of this species in treating various health conditions. Little is also known about the mechanisms underlying polysaccharide biosynthesis and metabolism in P. cyrtonema. The aim of this study was therefore to identify genes involved in polysaccharide biosynthesis and metabolism. This information will facilitate the further investigation of P. cyrtonema and the identification of key markers of polysaccharide biosynthesis and metabolism.
Three different types of tissues were utilized in this study for RNA-Seq analysis using Illumina HiSeq 4000. Transcriptome analysis of P. cyrtonema using RNA-Seq generated 31.66 Gb of clean reads that were assembled into 164,573 unigenes with an average length of 710 bp. This approach therefore enabled an understanding of polysaccharide biosynthesis at the molecular level. A  [26]. The unigenes of P. cyrtonema exhibited a homogeneous size distribution and 24.43% of these unigenes (40,206) were longer than 1000 bp. Our transcriptome results are of higher quality and reliability than those previously assembled. The best hit for each unigene was then queried against the NCBI NR database and a functional GO annotation was assigned in terms of cellular components, biological processes, and molecular function. A large number of diverse GO terms were assigned to these unigenes, indicating remarkable diversity within the transcriptome of P. cyrtonema leaf, root, and rhizome. Numerous unigenes involved in polysaccharide biosynthesis were identified via KEGG annotations and the total expression levels of genes encoding various enzymes were determined based on FPKM values (Fig. 3).
The expression levels of genes encoding these enzymes were also examined using qRT-PCR which confirmed the reliability of our transcriptome data (Fig. 5). Higher expression levels of genes encoding HK (CL16751.Con-tig2) and sacA (CL3476.Contig4 and CL3610.Contig2) in rhizomes (as analyzed by qRT-PCR) are consistent with higher polysaccharide accumulation in P. cyrtonema rhizomes, as revealed by UV spectrophotometry (Additional file 1: Fig. S2). This result indicates that these genes play critical roles in regulating the polysaccharide content of P. cyrtonema rhizomes. The characterization of these unigenes will be useful for understanding the molecular mechanisms underlying polysaccharide biosynthesis. The expression levels recorded here for the gene encoding scrk (Unigene 20,408) in rhizomes versus leaves were inconsistent with those for the other genes involved in polysaccharide biosynthesis. These data are consistent with RNA-Seq analysis of P. cyrtonema and confirm the reliability of our RNA-Seq approach. The gene encoding scrk possesses numerous transcripts and has a synergistic effect on the regulation of polysaccharide biosynthesis. The qRT-PCR data on scrk transcripts obtained in this study may not be positively correlated with polysaccharide biosynthesis in the rhizome and several methodological challenges associated with RNA-seq analysis remain, including fragmentation, length, and transcriptome composition biases [51,52]. These factors all lead to biased gene expression results. It is clear that UGTs belong to a super family of enzymes and catalyze the addition of a glycosyl group from a nucleotide sugar (UDP glucuronic acid, UDP galactose, UDP glucose, or UDP xylose) to a small hydrophobic molecule [53]. A total of 27 unigenes encoding UGTs were identified here within the P. cyrtonema transcriptome. Amino acid sequence alignments for nine of the 27 UGTs with complete ORFs revealed a signature motif, specifically the plant secondary product glycosyltransferase one [54] which contains 29 conserved amino acids (Fig. 4d). This region is consistent with the consensus signature sequence seen in other plants and mammalian UGTs In this sequence, conserved amino acids are listed within parentheses and X denotes any amino acid. This motif has long been suspected to be the binding site for nucleotide-activated sugars, based on comparisons with crystallographic data from bacterial enzymes. A UGT model (UGT73A5) from Dorotheanthus bellidiformis (Aizoaceae) indicates direct interaction with the highly conserved 'HCGWNS' residues and the uracil moiety of UDP-glucose [56] with the last amino acid of the plant secondary product glycosyltransferase motif 'Q'; this result can be confirmed via discrimination between UDP-glucose and UDP-galactose [57]. Studies investigating the relationship between the structure and activity of UGTs suggest that the N-terminal domain acts as the acceptor binding site, while the C-terminal domain acts as the donor response site [54]. The 3D structural models of all three UGTs showed Nand C-terminal domains with similar Rossmann-type folds linked together via a long loop (Fig. 4a-c). Although the three genes encoding UGTs exhibit sequence diversity, their protein spatial structures are nevertheless conservative and suggest similar functions.
Results show that TFs are associated with a variety of plant biological processes. A large number of these have been isolated and shown to play major roles in the biosynthesis of polysaccharides and other secondary metabolism processes. A total of 1131 candidate TFs were assigned to the MYB, AP2-EREBP, WRKY, bHLH, C3H, C2H2, and NAC families; these TFs likely play roles in regulating polysaccharide biosynthesis. Over-expression of the gene encoding MYB46 in Arabidopsis thaliana resulted in a significant increase in the mannan content of hemicellulosic polysaccharides [58]. A total of 314 candidate unigenes encoding MYB TFs were identified here, of which 72 and 26 were up-regulated in the rhizome compared with leaves and roots, respectively (Table 4). These up-regulated unigenes are important candidates for future analyses aimed at investigating the regulation of polysaccharide biosynthesis in P. cyrtonema. At the same time, RNA-seq cannot be used to sufficiently detect genes or transcripts with low expression levels and are especially important for regulatory genes [59,60]. The genes doublesex (dsx) and fruitless (fru), transcription factors (TFs) involved in sexual dimorphism in flies, cannot be determined by RNA-Seq in the deeply sequenced modENCODE embryo samples with low expression [61]. These genes will influence expression estimates for transcripts as well as their functional research.

Conclusion
We conducted a comprehensive RNA-Seq analysis of leaves, roots, and rhizome tissues of P. cyrtonema plants and were able to identify numerous genes involved in polysaccharide biosynthesis. The results of RNA-Seq analysis were validated using qRT-PCR for a few genes. Our results help explain the accumulation of secondary metabolites. The transcriptome data reported here will facilitate future studies on the molecular mechanisms of polysaccharide biosynthesis and provide new insights into P. cyrtonema.

Additional file
Additional file 1: Table S1. RNA information of different tissues. Fig. S1. Standard curve of glucose at 582 nm. Table S2. Gene descriptions and primers used for qRT-PCR. Fig. S2. Total polysaccharides content from rhizomes, roots and leaves of P. cyrtonema. Fig. S3. The length distribution of unigenes for P. cyrtonema transcriptome assembly. Fig. S4A and Fig. S4B. (A) Venn diagram of annotated unigenes from the different databases. (B) Species distribution annotated in the NR database for P. cyrtonema. Fig. S5. GO function annotation of P. cyrtonema transcriptome. Fig. S6, Table S3. KEGG functional classifications of the annotated unigenes in P. cyrtonema. KEGG annotation of all unigenes. Table S4. UDP glycosyltransferases annotated in P. cyrtonema. Fig. S7. GOSlim analysis of rhizomespecifc up-regulation genes. Fig. S8. KEGG functional classifications of the rhizome-specifc up-regulation genes in P. cyrtonema.