A rapid, simple method for the genetic discrimination of intact Arabidopsis thaliana mutant seeds using metabolic profiling by direct analysis in real-time mass spectrometry

Background Efficient high throughput screening systems of useful mutants are prerequisite for study of plant functional genomics and lots of application fields. Advance in such screening tools, thanks to the development of analytic instruments. Direct analysis in real-time (DART)-mass spectrometry (MS) by ionization of complex materials at atmospheric pressure is a rapid, simple, high-resolution analytical technique. Here we describe a rapid, simple method for the genetic discrimination of intact Arabidopsis thaliana mutant seeds using metabolic profiling by DART-MS. Results To determine whether this DART-MS combined by multivariate analysis can perform genetic discrimination based on global metabolic profiling, intact Arabidopsis thaliana mutant seeds were subjected to DART-MS without any sample preparation. Partial least squares-discriminant analysis (PLS-DA) of DART-MS spectral data from intact seeds classified 14 different lines of seeds into two distinct groups: Columbia (Col-0) and Landsberg erecta (Ler) ecotype backgrounds. A hierarchical dendrogram based on partial least squares-discriminant analysis (PLS-DA) subdivided the Col-0 ecotype into two groups: mutant lines harboring defects in the phenylpropanoid biosynthetic pathway and mutants without these defects. These results indicated that metabolic profiling with DART-MS could discriminate intact Arabidopsis seeds at least ecotype level and metabolic pathway level within same ecotype. Conclusion The described DART-MS combined by multivariate analysis allows for rapid screening and metabolic characterization of lots of Arabidopsis mutant seeds without complex metabolic preparation steps. Moreover, potential novel metabolic markers can be detected and used to clarify the genetic relationship between Arabidopsis cultivars. Furthermore this technique can be applied to predict the novel gene function of metabolic mutants regardless of morphological phenotypes.


Background
Functional genomics of higher plants is conducted primarily using a phenotype-based approach. A knockout or over-expressed gene is assumed to produce an overt phenotype in a model plant. However, in practice a large proportion of mutants show no visible morphological phenotype or the phenotype results from a secondary or pleiotropic change, which hinders identification of the gene function. To achieve the practical goal of functional genomics, a more robust characterization system is required to identify mutants.
Metabolic profiling, which can reveal the metabolic phenotypes of mutants, is most often carried out using gas chromatography (GC)/MS, liquid chromatography (LC)/MS, and proton nuclear magnetic resonance (1H NMR) [27]. For example, Messerli et al. [11] reported that metabolite fingerprinting with GC-MS differentiated Arabidopsis mutants defective in starch metabolism from other mutants, indicating that non-targeted metabolic profiling of mutants provides clues about the mutated gene(s). However, it is difficult to achieve high throughput with these instruments primarily due to complicated sample preparation, large sample requirements, and time-consuming operation. Recently, DART-MS has been used for a non-invasive, high-throughput metabolic profiling of samples from various organisms. This new MS technique does not require sample preparation or vacuum ionization, making it an extremely versatile high-throughput system [28][29][30].
DART ion source is a recently developed ambient ion source which can ionize various organic molecules in diverse samples directly from the surface. In open air conditions, helium, as a carrier gas, produces protonated water clusters from atmospheric water molecules then transfer the proton into molecules in the samples [28]. DART ion source is especially powerful when it is combined with high resolution mass analyzer as it gives exact molecular weight of ionized compounds from the samples and provides matching molecular formula thereof. DART-MS has been adopted in various qualitative analysis of organic molecules including pharmaceuticals, metabolites, synthetic organic molecules and phytochemicals [31][32][33].
For the quantification purpose, well-known anti-oxidative natural product, curcumin was successfully quantitated from raw material directly [34]. Furthermore, analytical reproducibility of DART-MS was also confirmed using caffeine-d3 with TLC analysis and [35] and olive oil ion analysis [36]. Since labor-taking sample preparation steps can be omitted in DART-MS analysis, high throughput fingerprinting study of natural resources is possible and this feature is one of the most advantageous characteristic of DART ion source in metabolomics approach. And DART ion source ionize moderately polar to highly nonpolar compounds [35]. Therefore phenylpropanoids are easily ionized and well-suited to detection.
In this study we attempted to establish high-throughput discrimination with DART-MS to detect intact mutant seeds. To determine whether metabolic profiling with DART-MS can discriminate seeds based on ecotype and altered metabolism, we used 14 different lines of Arabidopsis mutant seeds in two different ecotype backgrounds, including previously identified knockout mutants of the phenylpropanoid biosynthetic pathway.

DART-MS spectra from Arabidopsis seeds
Representative DART-MS spectra from intact seeds of two Arabidopsis ecotypes, Col-0 and Ler are shown in Figure 1. More than 319 peaks were detected from intact seed by DART-MS analysis (Additional file 1). Spectral differences between the two ecotypes were significant. Intact seeds of Col-0 ecotypes produced more prominent peaks compared to those of Ler ecotypes. These results implied that there were quantitative and qualitative differences in metabolite patterns between two Arabidopsis ecotypes. Interestingly, intact seeds of Col-0 and Ler ecotypes could be successfully discriminated using DART-MS spectral data combined by PLS-DA, even though there is no apparent morphological differences (size, color, or shape) in seed morphology with naked eye (Figure 2). However, principal component analysis (PCA) could not fully differentiate these groups (Additional file 2). In cross-validation, each case was classified by the functions derived from all other cases; 100% of original grouped cases and 100% of cross-validated grouped cases were correctly classified ( Table 1). These results indicated that metabolic profiling with DART-MS spectral data could discriminate two Arabidopsis ecotypes even though there were no visible, distinctive characteristics in the seed morphology.
Genetic discrimination of two ecotype backgrounds from DART-MS spectral data of intact seeds Identification of gene function is limited when based on mutant morphological phenotype, which is often silent or the result of a secondary or pleiotropic changes. To overcome this limitation, non-targeted metabolic profiling of the mutant may provide clues about the mutated gene(s). To investigate the discrimination possibility between genetic defective mutants from same background, we applied three Aabidopsis mutants (co, ft and tt2) from Landsberg background and nine mutants (chs, f3h, f3'h, dfr, ldox, ban, pap1-D, phyB, and ugt78d2) from Col-0 in this study.
co and ft are representative Aabidopsis mutants showing late flowering phenotype that is null mutants of CONSTANS (CO) and FLOWERING LOCUS T (FLC), respectively [37]. tt2 is the null mutant of TRANSPARENT TESTA 2 (TT2), that is one of R2R3 MYB domain transcription factor and acts as a key determinant in the proanthocyanidin accumulation of developing seed. It is composed of ternary complex with TT8 and TTG1 for correct expression of BANYULS (BAN) in seed endothelium [38][39][40]. Therefore the seed color of tt2 mutant is yellow due to the defectiveness of condensed tannin accumulation in the seed endothelium.
Nnine Arabidopsis mutants (phyB, pap1-D, chs, f3h, f3'h, dfr, ldox, ban, and ugt78d2) are belonging to Col-0 background. phyB and pap1-D are a null mutant of phytochrome B (PHYB), primary red/far-red photoreceptor in light signaling transduction pathway and an activated mutant of PRODUCTION OF ANTHOCYANIN PIG-MENT 1 (PAP1) act as an activator of phenylpropanoid biosynthesis, respectively [41,42]. The rest of mutants belonging to Col-0 background are seven T-DNA inserted knock-out mutants (chs, f3h, f3'h, dfr, ldox, ban, and ugt78d2) involved in anthocyanin biosynthesis pathway. The gene expression levels of these mutants (chs, f3h, f3'h, dfr and ldox) were analyzed by RT-PCR using  The discriminant functions were the first five discriminant components from PLS-DA analyses (5 data). In cross-validation, each case was classified by the functions derived from all other cases; 100% of original grouped cases and 100% of cross validated grouped cases were correctly classified. primer pairs for pull length ORF ( Figure 3). This result indicated that all of the mutants were completely defective mutants. But DFR was slightly expressed in dfr mutant line, probably resulting from T-DNA insertion on the 3' UTR region of DFR gene. No expression of BAN was observed in both WT and ban mutant seedlings as previous reported by Lee et al. [43]. Three mutants could easily be discriminated by the seed color itself. The color of seed coat from tt2 and chs mutants was yellow, whereas that of pap1-D mutant was darker than wild type. Although rest of mutants used in this study has slightly difference in seed color, it could not easily be discriminated without direct comparison with wild type. Furthermore, determination of genetic background from tt2, chs and pap1-D mutants was impossible by naked eye, even though they had an obvious seed color. PLS-DA of DART-MS spectral data from intact seeds divided the 14 lines of seeds into two distinct groups: Col-0 and Ler ecotype backgrounds ( Figure 4). In crossvalidation, each case was classified by the functions derived from all other cases; 100% of original grouped cases were correctly identified, whereas 80.6% of crossvalidated grouped cases were correctly classified ( Table 2). In particular, the mutant seed lines chs, co, and ft were perfectly predicted, whereas dfr was incorrectly classified. What dfr shown comparatively low level of accuracy in cross-validation thought to be partial production of metabolic components caused by slightly expressed DFR in the mutant. In contrast, few prediction errors were made at the ecotype level by analysis of intact seeds. All mutant lines (chs, f3h, f3'h, dfr, ldox, ban, pap1-D, phyB, and ugt78d2) and wild type seeds belonging to the Col-0 ecotype were correctly predicted within the Col-0 ecotype. Likewise, mutant lines (co and ft) and wild type Ler seeds were correctly predicted within the Ler ecotype. However, prediction of the tt2 line was 90.5% accurate. These results indicated that DART-MS spectrometry combined with multivariate analysis of intact seeds was able to discriminate the lines of seeds at least the ecotype level.
In the present study, we described a robust method for high throughput profiling with DART-MS. The functional identification of genes of mutant Arabidopsis is currently based on morphological phenotype. However, up to 85% of the mutants exhibit no overt phenotype [21]. Metabolic alterations may be present in a large portion of these mutants; however, current methods for metabolic profiling are complicated and time-consuming, which precludes highthroughput screening of mutants. In the present study, a robust screening of mutants with altered metabolism was devised with DART-MS using intact Arabidopsis mutant seeds. This approach is not limited to screening mutants lacking genes expressed in the seed coat, but may be extended to identification of genes expressed in other seed parts. DART-MS has been used to identify plant compounds [29,30], but this rapid, simple instrumentation has not yet been utilized for high-throughput screening of mutants. Considering the overall PLS-DA results, intact Arabidopsis seeds could allow genetic discrimination of ecotypes and sorting of specific mutants harboring defects in the phenylpropanoid biosynthetic pathway (Figures 2 and  4) and flowering time genes.
Hierarchical clustering of Arabidopsis seeds based on multivariate analysis of metabolic profiling A hierarchical dendrogram based on PLS-DA of DART-MS spectral data from intact seeds showed that 13 lines (excluding the tt2 mutant line) were divided into two major branches by ecotype ( Figure 5). Interestingly, the hierarchical dendrogram from intact seed subdivided the Col-0 ecotypes into two subgroups: seven mutant lines lacking a gene involved in the phenylpropanoid biosynthetic pathway except the ugt78d2 clustered together,  Anthocyanins are water-soluble vacuolar pigments and belong to a parent class of flavonoids synthesized via the phenylpropanoid pathway. Anthocyanins are found in all tissues of higher plants, and especially proanthocyanidins in the seed coat. UGT78D2 that catalyzes the glucosylation of both flavonols and anthocyanidins which converted to anthocyanins, is highly expressed in anthocyanin-accumulating seedlings, but repressed in condensed tannin-accumulating seed coats [43]. It seems that metabolic flux for the metabolic end-products in mature seed coats is not affected even if UGT78D2 is abolished in Arabidopsis seedlings or seeds. It indicated that there might be no differences in metabolic components between WT and ugt78d2 mutant seed coats. BAN encodes a core enzyme, anthocyanin reductase of flavonoid biosynthesis. It is convert anthocyanindins to flavan-3-ol, which condensed to colorless proanthocyanins [44]. They are placed only in the seed coat, and confer a brown color to mature seed after oxidation. BAN, also act as a negative regulator of flavonoid biosynthesis during early embryogenesis, and highly expressed in the tannin-accumulating mature seed in Arabidopsis [43,[45][46][47]. Therefore, accumulation of proanthocyanins probably is inhibited in the developing seed coats of ban mutants. From this point of view, it is reasonable that ugt78d2 grouped with WT, and ban clustered with chs, dfr and f3'h, respectively. Therefore, we inferred that the combination of multivariate analysis and DART-MS, might reflect gene functional relationship on flavonoid biosynthetic pathway such as ugt78d2 and ban clustered with other associated mutants reasonably.
A hierarchical dendrogram based on PLS-DA of DART-MS spectral data from intact seeds from Ler and three mutant lines (co, ft and tt2), they were separated into other major branches from Col-0 ecotypes except for the tt2 mutant line ( Figure 5). TT2 functions as a regulator in proanthocyanidin accumulation in Figure 5 Hierarchical dendrogram of partial least squarediscriminant analysis (PLS-DA) score data from direct analysis in real-time mass spectrometry (DART-MS) spectra of intact seeds. Rectangles represent Col-0 (black dot square) and Ler (blue dot square) ecotypes.  The discriminant functions were determined using the first five discriminant components determined by PLS-DA analyses (5 data). In cross-validation, each case is classified by the functions derived from all other cases; 100% of original grouped cases, while 80.61% of cross validated grouped cases were correctly classified.
developing seed only when TTG1 is expressed. The ternary complex of TT2, TT8 and TTG1 positively regulate BAN expression in whole seed coats by directly regulating BAN promoter activity in plants [39,40,48].
No detection of BAN transcript in tt2, tt8 and ttg1 confer homogeneously yellow hue on their seed coats [39,47]. Therefore, we expected that tt2 mutant was clustered with other tt mutants in the fravonoid biosynthetic pathway, especially ban mutant, regardless of background properties. However, tt2 was placed near to Ler branch from the hierarchical dendrogram even though tt2 was not included into Ler branch ( Figure 5). These result implied that overall metabolic differences between Arabidopsis ecotypes was greater than that of a specific gene, for example metabolic change in tannin accumulation of tt2 mutant. Therefore we suggested that there were common metabolic compounds that mainly affected for the ecotype discrimination of Arabidopsis. Considering the overall hierarchical clustering analysis (HCA) results, we concluded that DART-MS spectrometry combined with multivariate analysis of intact seed could not only discriminate Arabidopsis seeds at the ecotype level, but could also cluster metabolic genes related to same metabolic pathway. Therefore, we suggest that DART-MS spectrometry may be useful as a tool for rapid discrimination of ecotypes and metabolic mutants of Arabidopsis.

Assignments of chemical compounds for ecotype discrimination
Mass spectrometry analysis is one of the most powerful analytical methods available for exact structural identification of organic compounds. In this study, more than 319 peaks were detected from intact seed by DART-MS analysis ( Figure 1). These peaks (m/z) have not fully assigned as chemical compound yet, because of lack of Arabidopsis chemical DB. We selected the top 10 most significant metabolites for discrimination between Col-0 and Ler ecotypes from the MS spectral data of intact seeds by logistic regression (Figure 6). In general, DART ion source commonly produces a mass spectrum consisting of the [M+H] + molecular cation by proton transfer mechanism. But molecular ion peak of M+ also found commonly by penning reaction. Thus, the ion peak of selected compounds in this study was assigned as molecular ion by penning reaction. However, DART-MS cannot discriminate the same chemical formula compounds similar to all other mass spectrometers. Therefore, with the works of intensive in silico informatics on previous phytochemical studies on Arabidopsis thaliana only gives the information on exact molecular weights for the compounds in samples. Assignment of the 10 metabolites was performed by direct comparison with the online chemical database Plant Metabolic Network (http://www.plantcyc.org) ( Table 3).
The intensity of the molecular ion peak of glycerol (MW = 92.115) was 2.5 times higher on the seed surface of the Col-0 ecotype than that of Ler. The intensity of dimethyl fumarate (MW = 145.106) was 3.5 times higher in seed surface of the Col-0 ecotype than that of Ler. Whereas molecular ions of L-glutamic acid, pyridoxal and 5-hydroxyconiferyl aldehyde was more abundant in Ler than in the Col-0 ecotype. Ward et al. [4] reported that nine Arabidopsis ecotypes could be discriminated based on glucose and fumaric acid content by 1 H NMR spectroscopy. Direct comparison of Figure 6 Enlarged view of direct analysis in real-time mass spectrometry (DART-MS) spectra from intact seeds of Col-0 (A) and Ler (B) ecotypes. Arrows and numbers represent 10 major compounds for metabolic discrimination between the ecotypes. key metabolites between the report of Ward et al. [4] and this study were not suitable because of difference in organic solvent and plant materials for metabolite extraction steps. In the present study, we conducted direct MS analysis of seed surface without any organic solvent extraction steps. However dimethyl fumarate was the one of key metabolite in seed coat of Col-0 ecotype. We have not fully understood the biochemical and metabolic pathway of fumaric acid and dimethyl fumarate, especially in seed coat yet. If fumaric acid could be actively modified by methyl group in seed coat, the report of Ward et al. [4] and our study showed that fumaric acid derivatives were key metabolites for ecotype discrimination in Arabidopsis. In addition, we also found that glycerol had a key role in ecotype discrimination of Arabidopsis. Therefore, the report of this study could be applied for the study of glycerol biosynthesis pathway in Arabidopsis ecotypes. Although other MS with higher analytical resolution such as FT-ICR MS can be used for this study, they are suitable for the analysis of unknown species owing to its ultra high mass resolution and accurate mass capacity [49]. AccuTOF, the analyzer used in this study, is an orthogonal acceleration time-of-flight mass spectrometer (oa-TOF-MS) incorporating a single stage reflectron. The resolving power of this analyzer is excess of mass 6,000 (FWHM definition) [50]. Taking into account the relatively high resolution, and fast scan speed with a wide dynamic range, AccuTOF-MS is a powerful tool for high-throughput profiling or chemical fingerprinting of intact seed samples.

Conclusions
In this study, we demonstrate that DART-MS combined by multivariate analysis allows for rapid screening and metabolic characterization of lots of Arabidopsis mutant seeds without complex metabolic preparation steps. Our results represent that mutant lines including wild types were classified to two distinct groups, Col-0 and Ler ecotype backgrounds on PLS-DA from DART-MS spectral data of 14 lines of intact Arabidopsis seed. Furthermore, mutants which are Col-0 background were subdivided into two groups in the hierarchical dendrogram based on PLS-DA, in which one group of defective mutants is related to the phenylpropanoid biosynthetic pathway. These results demonstrate that DART-MS combined by multivariate analysis can discriminate mutants based on quantitative and qualitative differences affecting global metabolic profiles. Considering these results we infer that metabolic profiling with DART-MS could discriminate intact Arabidopsis seeds at least ecotype level or metabolic pathway level within same ecotype. Screening mutants in the form of seeds saves the labor time required to grow plants. Thus, we suggest that DART-MS spectrometry combined by multivariate analysis is a useful tool for only rapid screening of metabolic mutants, but also discrimination of ecotypes of Arabidopsis. Furthermore plant functional genomics can be carried out based on metabolic profiling of intact Arabidopsis mutant seeds by DART-MS in a high throughput manner.

DART-MS
A Jeol DART-MS instrument (Tokyo, Japan) was used, which comprised a DART ion source and a JMS-T100TD (AccuTOF) atmospheric pressure ionization time-of-flight mass spectrometer. For positive ion detection, the atmospheric pressure interface potentials were set to the following values: orifice 1 = 10 V, ring lens and orifice 2 = 5 V. The ion guide potential and detector voltage were set to 500 V and 2400 V, respectively. DART electrode potentials were set to needle electrode = 3000 V, electrode 1 = 100 V, electrode 2 = 100 V. Gas temperature was set to 250°C, and the helium gas flow rate was 3 L/min. Each seed was positioned midway between the DART source and mass spectrometer for measurement. Three measurements for each seed were averaged, and three different seeds of each wild type and mutant line were used as replicates.

Data processing and multivariate statistical analysis
To minimize the influence of sample size, DART-MS spectral data were normalized to total ion count percent. Small noise peaks with low ion intensity value (< 100) were removed from the original spectral data. Multivariate statistical analysis was performed using mean-centered and auto scaled data. These preprocessed metabolomic datasets were imported into R programs for PCA, PLS-DA and HCA. PCA, an unsupervised clustering method, was performed to statistically analyze comprehensive information contained in a data set [52]. Also PLS-DA, a representative supervised data mining algorithm, could give more precise group separation. To create the PLS-DA model, the entire data set is divided into two parts: a training set that was used to build a model, and a test set that was not used in the classification model, but was used to verify the model's predictive ability. To estimate the predictive power and significance of a latent variable of a model, cross-validation was used. Permutation testing also evaluates the statistical significance of the estimated predictive power of a model. After predictive validation by means of cross-validation and response permutation testing, external validation, a more demanding and rigorous mechanism for testing predictive performance consisted of computing predictions for an independent set of test observations (test set). Also, HCA was performed to statistically analyze comprehensive relationship contained in PLS-DA score data from each sample.
To identify the significant metabolites for ecotype discrimination from MS spectral data, P-values for all metabolites were calculated by logistic regression, and 10 metabolites with high P-values were selected. Assignment of 10 metabolites was conducted by comparison with the database of the Plant Metabolic Network (http://www.plantcyc.org).

Additional material
Additional file 1: Raw data of DART-MS spectra from intact 14 Arabidopsis mutant seeds. MS spectral data was consisted of three replicates from each seed line.