Simultaneous molecular formula determinations of natural compounds in a plant extract using 15 T Fourier transform ion cyclotron resonance mass spectrometry

Background Plant extracts are a reservoir of pharmacologically active substances; however, conventional analytical methods can analyze only a small portion of an extract. Here, we report a high-throughput analytical method capable of determining most phytochemicals in a plant extract and of providing their molecular formulae from a single experiment using ultra-high-resolution electrospray ionization mass spectrometry (UHR ESI MS). UHR mass profiling was used to analyze natural compounds in a 70% ethanol ginseng extract, which was directly infused into a 15 T Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometer for less than 10 min without a separation process. Results The UHR FT-ICR MS yielded a mass accuracy of 0.5 ppm and a mass resolving power (m/Δm) of 1,000,000–270,000 for the range m/z 290–1,100. The mass resolution was sufficient to resolve the isotopic fine structure (IFS) of many compounds in the extract. After noise removal from 1,552 peaks, 405 compounds were detected. The molecular formulae of 123 compounds, including 33 ginsenosides, were determined using the observed IFS, exact monoisotopic mass, and exact mass difference. Liquid chromatography (LC)/FT-ICR MS of the extract was performed to compare the high-throughput performance of UHR ESI FT-ICR MS. The LC/FT-ICR MS detected only 129 compounds, including 19 ginsenosides. The result showed that UHR ESI FT-ICR MS identified three times more compounds than LC/FT-ICR MS and in a relatively shorter time. The molecular formula determination by UHR FT-ICR MS was validated by LC and tandem MS analyses of three known ginsenosides. Conclusions UHR mass profiling of a plant extract by 15 T FT-ICR MS showed that multiple compounds were simultaneously detected and their molecular formulae were decisively determined by a single experiment with ultra-high mass resolution and mass accuracy. Simultaneous molecular determination of multiple natural products by UHR ESI FT-ICR MS would be a powerful method to profile a wide range of natural compounds.


Background
Plant extracts contain a large number of components, including many pharmacologically active compounds. Numerous compounds in plant extracts can be beneficial in treating many diseases [1,2]; however, the complexity of the phytochemicals makes their analysis difficult and inhibits our understanding of the mechanisms that control their medicinal effects. There is no analytical method capable of evaluating all of the compounds present in a plant extract. Most analytical methods for plant extracts employ a combination of bioactivity assays and separation steps to isolate a few target compounds from a pool of numerous components. Although these traditional methods have been useful, there are disadvantages such as the high cost in time and labor, the blindness of molecular information, the possible loss of target compounds during the separation stage, and the disregarding of many active compounds not screened by the bio-assays used [3]. In general, separation methods use one or more molecular characteristics to discriminate compounds. Compounds that do not exhibit these characteristics are not separated properly or may even be lost during the separation process. For example, reversed-phase high-performance liquid chromatography (HPLC) employs hydrophobicity and seldom detects extremely hydrophilic or hydrophobic compounds such as petroleum and natural products, which can be analyzed by direct infusion into a mass spectrometer [4,5]. As separation-based methods can detect only some of the compounds extracted from a sample, there is a need for high-throughput (HT) analytical methods applicable for the rapid analysis of most compounds in a plant extract. Many studies to develop HT methods have focused on enhancing the peak capacity of HT screening to analyze a larger number of compounds. For example, multi-dimensional liquid chromatography [6,7] and high-resolution mass spectrometry (HR MS) [8,9] have been optimized in this fashion.
Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) is a common analytical method providing unparalleled resolution and sub-ppm accuracy in mass measurement [10][11][12]. Although FT-ICR MS in narrowband mode can achieve a mass resolving power at the level of several millions, narrowband mode is not suitable for investigating mixtures because of its narrow detection mass range in comparison with broadband mode. With the development of high-field FT-ICR MS, the resolution of FT-ICR MS in broadband mode has become high enough to resolve more than 5,000 species within a m/z range of 200-900 [13], and FT-ICR MS can simultaneously detect multiple ions to determine most compounds in a mixture without separation steps. These HT advantages of HR FT-ICR MS have been applied in studies of various mixtures such as metabolome [3,14,15], petroleome [13], lipidome [16,17], and herbalome [8,18] analyses. HR MS of a compound has been used to report probable molecular formula candidates. From previous studies, it is known that the molecular formula of a small organic molecule (less than 500 amu) can be determined, if the molecular mass is measured at 1 ppm accuracy together with its isotopic pattern [19]. In real sample analysis, the determination of a molecular formula by MS with a mass resolving power (m/Δm) of less than 100,000 is quite difficult due to isobaric compounds and adduct ions, which increase the number of candidate formulae within a mass window. With high field FT-ICR MS, the isotopic fine structure (IFS) has been revealed at ultra-high resolution (UHR) and used to decisively determine the molecular formula for small organic compounds [9,18,20].
The IFS of a single molecular ion produces a unique pattern of mass peaks owing to the different mass defects of isotopic contributions such as 2 H, 15 N, 17 O, 18 O, 33 S, and 34 S, as well as 13 C, the main contributor to the isotopic pattern. Since the mass values and their intensities in the IFS of a molecule exactly reflect the atomic composition of the molecule, the IFS is a fingerprint of the molecular formula. Using IFS and high mass accuracy, fast and confirmative molecular formula determinations of multiple compounds in a mixture are possible in real sample analysis. Considering that the identification of compounds in HT analysis is generally achieved by matching with chemical databases, and only a small portion of the possible phytochemicals are registered in chemical databases, HT molecular formula determination by IFS would be very useful in plant extract studies.
In this study, an extract of ginseng was analyzed directly by UHR FT-ICR MS with a 15 T superconducting magnet to detect and determine the molecular formula of multiple compounds simultaneously. A scheme of the instrument is shown in Figure 1. Molecular formulae of more than 100 ginseng compounds were determined by their IFS. Liquid chromatography (LC)/FT-ICR MS of the ginseng extract was also performed, and the compounds detected by both MS approaches were compared to investigate the capability of UHR FT-ICR MS profiling in the study of phytochemicals.

UHR ESI FT-ICR MS of ginseng extract
The UHR mass profile of the compounds in a 70% ethanol ginseng extract was obtained using a 15 T FT-ICR mass spectrometer ( Figure 2). The mass accuracy of the spectrum was 0.5 ppm after external calibration with 0.1 mg/mL arginine aqueous solution. The external calibration was performed on seven arginine cluster peaks in the mass range m/z 250-1500 with quadratic regression, and the maximum error was 0.47 ppm. The mass resolving power of the spectrum was 1,000,000-270,000 at the range of m/z 290-1,100. Within a single mass spectrum, 1552 peaks were detected with the peak-picking threshold at a signal to noise ratio (S/N) of 5. Because the detection of the M + 1 and M + 2 isotope peaks was difficult for low-abundance compounds, signals without corresponding M + 1 isotope peaks were regarded as noise. After removing the noise peaks, 405 compounds were detected in the extract by UHR electrospray ionization (ESI)/FT-ICR MS in positive ion mode. Although the mass resolution was not sufficient to clearly show the IFS of ions larger than m/z 850, the IFS of the chemical with m/z 985.5 was observed with a mass resolving power of 300,000 at m/z 1000, and the achieved mass resolution was sufficient to show the IFSs of molecules with m/z <850. The observed IFSs were used to determine the elemental compositions of corresponding chemicals with m/z <850. The zoomed spectrum in Figure 2B shows that the UHR of the spectrum resolves all of the observed peaks within 1 m/z unit. The assignment of a molecular formula to a peak was enabled by the high mass accuracy and the IFS revealed in the UHR mass spectrum as described later. The assigned molecular formulae determined by the experiment are listed in the side table ( Figure 2B). The elemental compositions of several peaks in Figure 2B were not determined, because there was no candidate formula satisfying the 0.5 ppm mass tolerance for the limited elements. The unassigned peaks may contain elements such as F, S, P, halogens, or inorganic elements. Even though the peak at m/z 425.09655 was matched with [C 18 H 9 N 12 O 2 ] + (425.09659 amu) within a 0.1-ppm mass tolerance, the molecular formula was rejected, because the observed IFS of the peak was not compatible with the theoretical IFS of [C 18 H 9 N 12 O 2 ] + . This result demonstrates that determining the IFS is crucial to avoid false positive assignment.

Molecular formula determination
The molecular formula of a natural compound was determined by comparing the isotope pattern observed experimentally with the theoretical IFS and the monoisotopic peak calculated by the Generate Molecular Formula (GMF) software tool. Since the maximum deviation of measured mass values in the spectrum (Figure 2A) was 0.5 ppm, the mass tolerance of GMF for candidate generation was also set to 0.5 ppm. GMF was applied to a monoisotopic peak to generate candidate formulae and their theoretical IFSs. The high mass accuracy of the spectrum considerably reduced the number of possible molecular formulae for each peak, especially for compounds whose molecular weights were typically less than 1,500 amu. For example, a compound detected at m/ z 749.48341 had 9, 5, and 3 candidate formulae at mass tolerances of 2, 1, and 0.5 ppm, respectively. IFS comparison was used to determine the molecular formula among the candidate formulae, as shown in Figure 3. Figure 3 shows one experimental IFS and five candidate formula theoretical IFSs. The experimental monoisotopic molecular ion peak (M) was detected at m/z 749.48341, and the enlarged experimental mass spectra near M + 1, M + 2, and M + 3 are shown in Figure 3A. The relative intensities of the isotope peaks were calculated from the elemental composition and abundance. The peak intensities in Figure 3 are relative to the intensity of the monoisotopic peak (100%). GMF generated the following five theoretical IFSs of the candidate formulae near M + 1,  Figure 3F). The monoisotopic peak of the molecular ion is a single peak by definition; other isotopic peaks such as M + 1, M + 2, and M + 3 can be attributed to heavy isotopic atom substitution and have fine structures. The M + 1 fine structures in Figure 3 indicate that the peak caused by a 15 N substitution (Δm = 0.99703 amu) was separated from the peak caused by a substitution 13 C (Δm = 1.00335 amu). The mass resolving power of 360,000 at m/z 750 was sufficient to resolve the 15  substitution peaks. Owing to the absence of the 15 Nsubstituted isotopic peaks (~750.48 amu) in the experimental data, only the candidate formulae F and C remained. These two candidates can be distinguished by the relative intensities of the 13 Figure 3C) is the best fit to the compound at m/z 749.48341 due to the absence of 15 N-and 41 K-substituted isotopic peaks and the relatively high-intensity peaks caused by the presence of 18 O substitutions. Thus, from the IFS comparison shown in Figure 3, the molecular formula of the compound detected at m/z 749.48341 is assigned to C 42 H 68 O 11 . IFS could differentiate Na adduct ions from K adduct ions, because single isotope elemental Na has no effect on IFS, whereas the 40 K isotope causes a conspicuous M + 2 peak split, as shown in Figure 3E.
Differences in the elemental composition of two compounds can be deduced from the mass difference of two peaks measured with high mass accuracy [21]. As shown in the inset of Figure 2A, the mass differences between the peaks at m/z 441.37264 and 457.36756, and between those at m/z 457.36756 and 459.38323 are 15.99492 and 2.01567 amu, respectively, which are equivalent to mass differences with the addition of 16 11 ] + , the molecular formula of (C). Calculated peak intensity of the each isotopic peak in the isotopic fine structures of candidate molecular ions is listed in Additional file 1.
structural analysis methods such as tandem mass spectrometry (MS/MS) and nuclear magnetic resonance spectroscopy. The application of MS/MS analysis to the three selected extracted compounds is described below as an example. All molecular formulae of the 123 compounds are listed in Table 2. The molecular formulae of small molecules (<400 amu) were determined mainly by high mass accuracy with 0.5 ppm tolerance and typically yielded a single candidate, while for large molecules (>400 amu), IFS was required to select the correct formula from multiple candidates. Based on these results, improved mass accuracy and resolution could facilitate the characterization of large phytochemicals. The resolving power of 360,000 at 750 amu clearly showed the IFS in Figure 3, and IFSs near m/z 1000 were observed with the mass resolving power of 300,000 (data not shown). This result indicates that an accuracy of 0.5 ppm and a mass resolving power of 300,000 are required to use IFS for determining the molecular formulae of phytochemicals with <1,000 amu. Enhancing the sensitivity would improve the HT nature of this method because the weak intensities of the M + 2 and M + 3 isotope peaks remain as the main obstacle to determining molecular formulae. The peaks in the 733 scans were counted as follows: all peaks with S/N >5 were sorted by m/z value, and peaks within a 0.5-ppm mass bin were combined into one peak with an averaged mass. The isobaric peaks with different retention times (T) (ΔT >1 min) were regarded as different peaks, although they were not discriminated and counted as a single peak in UHR ESI/FT-ICR MS. After combining the 733 scans of the LC/FT-ICR MS experiment, 1,073 peaks were identified. Considering that isobaric peaks were disregarded, this suggests that many more compounds were detected by UHR ESI/FT-ICR MS. After deisotoping to remove electrical noise and minor chemicals whose isotope peaks were not detected, 129 compound peaks remained, which was far less than the 405 compounds detected by direct injection. These results suggest that UHR mass profiling is more efficient for multi-compound detection than LC/MS. This observation is not surprising because LC columns allow only a specific range of compounds to pass through based on the characteristics of the packed resin, whereas direct sample injection can deliver almost all of the compounds into the mass spectrometer. Molecular formula determination by IFS was not performed with LC/FT-ICR MS because of  the low mass resolving power due to the shorter timedomain signal, which can also suppress the signal intensity. The molecular weight measured by the LC/FT-ICR mass spectrometer was used to assign a molecular formula, which had already been determined by UHR ESI/FT-ICR MS. Of the 33 putative ginsenoside compounds, 13 were detected by LC/FT-ICR MS. The molecular formulae and retention times of the observed putative ginsenoside compounds are listed in Table 1. The HT performance of LC/FT-ICR MS and UHR ESI FT-ICR MS is summarized in Table 3. LC/FT-ICR MS has the advantage of being able to separate isobaric compounds for distinguishing structural isomers. Furthermore, because LC/FT-ICR MS can separate constituent molecules and thereby reduce the ionization competition between numerous molecules with different charge affinities, as in direct injection mode, it allows the detection of extremely low-concentration molecules. On the other hand, UHR mass profiling by ESI/FT-ICR MS has the advantages of allowing the simultaneous analysis of the molecular formulae of multiple compounds in a single experiment and enabling the detection of very weak signals for as long as the sample exists, owing to its continuous accumulation of the ICR signal, which improves the S/N. The results of LC/FT-ICR MS provided more detailed analytical information, including retention time data; however, it seems certain that UHR mass profiling will be a competitive method in HT analysis due to its non-discriminative detection, higher sensitivity, and mass resolving power.  )) and the ginseng extract were analyzed by LC/FT-ICR MS using the same experimental parameters. As shown in Figrue 4, three extract compounds at 20.5 s, 24.7 s, and 26.3 s showed the same retention time and molecular weight as Re, Rf, and Rc, respectively, indicating that these compounds are the commercial ginsenosides.

MS/MS of ginseng extract
For further validation, collision-induced dissociation (CID) tandem mass spectrometry (MS/MS) was performed on the three extract compounds to allow a comparison with MS/MS spectra of the three standard ginsenosides using LC/FT-ICR MS/MS. Using the m/z values and retention times obtained from LC/FT-ICR MS, the three extract compounds were selected as precursor ions for MS/MS at their retention times. For example, the m/z of the precursor ion at 20.5 min was 947.6. The MS/MS spectra of the three extract compounds and three standard ginsenosides are shown in Figure 5. The MS/MS spectra of the three extract compounds ( Figure 5A-5C) display relatively weak intensities and consequently are missing several minor fragments when compared with the three standard ginsenoside spectra ( Figure 5D-5F). Nevertheless, the overall fragmentation patterns of the compounds at 20.5, 24.7, and 26.3 min are quite similar to those of Re, Rf, and Rc, respectively [23]. The comparison of the MS/MS fragmentation pattern indicates that the three ginsenosides are correctly identified and, as a result, the other molecular formulae could also be accurately determined by UHR mass profiling.

Conclusions
Ginseng ethanol extracts were analyzed using an UHR 15 T FT-ICR mass spectrometer. The resolution of the mass spectra in broadband mode was 1,000,000-270,000 at the range of m/z 290-1,100, which is sufficient to obtain the IFS of most compounds within that mass range. The HT performance of UHR ESI/FT-ICR MS was investigated by comparison with LC/FT-ICR MS for the same extract. The number of ginseng compounds detected by UHR ESI/FT-ICR MS was 405, which was more than three times the number detected by LC/FT-ICR MS. HT molecular formula determination of compounds in the ginseng extract was achieved using the formulaspecific IFS and high mass accuracy. The molecular formulae of 123 compounds, including 33 ginsenosides, were accurately determined by a single mass spectrum. The molecular formula determined by UHR mass profiling was validated by a comparison of the CID fragmentation patterns and LC/FT-ICR MS retention times of three selected ginseng compounds containing standard responsive ginsenosides. In this study, UHR ESI/FT-ICR MS was able to detect a wide range of components in comparison with conventional LC/MS and an absolute determination of the molecular formula by IFS. UHR mass profiling may be very useful in studies of multi-component mixtures such as plant extracts and metabolomes owing to its unique ability to simultaneously determine molecular formulae.

Methods
All MS was performed using a 15 T FT-ICR mass spectrometer (ApexQe, Bruker Daltonics, Billerica, MA, USA). UHR ESI/FT-ICR MS was used to profile extract compounds and determine the molecular formulae of the compounds, and LC/FT-ICR MS was employed to confirm the formulae determined by UHR mass profiling. Three known ginsenosides were analyzed to validate the molecular formulae determined by UHR ESI/FT-ICR MS.

Samples
Korean ginseng (Panax ginseng) was purchased from the Korea Ginseng Corp (Daejeon, Korea). Dried and powdered roots (10 g) were dissolved in 500 mL of 70% ethanol Intens. 14

LC/FT-ICR MS
The 70% ethanol extract components were analyzed by LC/FT-ICR MS using a high-performance liquid chromatography (HPLC) system (HP1200: Agilent, Santa Clara, CA, USA) and the 15 T FT-ICR mass spectrometer. Ginseng extract (1 mg) was dissolved in 1 mL of 50% methanol solution with 30 min of sonification. Extract solution (100 μL) was injected onto a C18 reverse-phase HPLC column (150 × 4.6 mm, 4 μm, 8 nm ODS-H80, YMC, Kyoto, Japan). A binary mobile phase was composed of solvents A (95:5 water/acetonitrile 0.1% FA) and B (95:5 acetonitrile/water 0.1% FA) and was applied to the column at a flow rate of 1 mL/min. The column temperature was set to 35°C. The solvent gradient was 5% B for 0-2 min, 10% B for 3 min, 50% B for 20 min, 100% B for 10 min, 100% B for 2 min, 5% B for 2 min, 5% B for 1 min; the run time was 40 min. The eluent was split 10:1 to produce a flow rate of 91 μL/min and to obtain positive ESI LC/MS data. MS parameters were an ESI capillary voltage of 4900 V, nebulizer gas rate of 2.5 L/min, drying gas flow rate of 3.5 L/min, drying gas temperature of 200°C, mass range of m/z of 250-4000, skimmer voltage of 15 V, collision gas energy of −2.0 V, accumulation time of 0.3 s, acquisition size of 512 KB, transient domain of 0.29 s, and an averaged scan number of 2, with a sine-bell function applied as an apodization window prior to the Fourier transform. CID was performed at the hexapole collision cell with Ar gas. The Ar collision energy and flow were set to −8.0 V and 0.33 L/h, respectively. The inflow of Ar was monitored by the pressure change from 4.1 × 10 -6 to 5.3 × 10 -6 mbar, which was measured by ion gauge 1 in Figure 1. Using the retention time and m/z obtained from the LC/FT-ICR MS experiment, the duty cycle of LC/MS/MS was set to 50%.

Molecular formula determination
The molecular formula of a compound detected by UHR FT-ICR MS was determined by comparison of the theoretical and observed IFSs. The theoretical IFSs of candidate molecular ions were generated using the GMF utility of DataAnalysis (Bruker Daltonics). C, H, N, O, Na, and K were considered during the theoretical IFS calculation. Since Na and K were considered as adduct ions, the maximum numbers of Na and K were set equal to the charge number. Any candidate with more Na and K atoms than the charge number was removed from the candidate list. For singly charged ions, any candidate with both Na and K was removed from the candidate list. The parameters for the GMF calculation were a mass tolerance of 0.5 ppm, maximum H/C of 3, and an even electron configuration. The resolution of the theoretical IFSs of candidates generated by GMF was equal to the observed resolution, allowing for direct comparison of experimental and theoretical IFSs. The mass and abundance of the isotopes used in the theoretical mass calculations were obtained from the National Institute of Standards and Technology [24].