Development of support vector machine-based model and comparative analysis with artificial neural network for modeling the plant tissue culture procedures: effect of plant growth regulators on somatic embryogenesis of chrysanthemum, as a case study

Background Optimizing the somatic embryogenesis protocol can be considered as the first and foremost step in successful gene transformation studies. However, it is usually difficult to achieve an optimized embryogenesis protocol due to the cost and time-consuming as well as the complexity of this process. Therefore, it is necessary to use a novel computational approach, such as machine learning algorithms for this aim. In the present study, two machine learning algorithms, including Multilayer Perceptron (MLP) as an artificial neural network (ANN) and support vector regression (SVR), were employed to model somatic embryogenesis of chrysanthemum, as a case study, and compare their prediction accuracy. Results The results showed that SVR (R2 > 0.92) had better performance accuracy than MLP (R2 > 0.82). Moreover, the Non-dominated Sorting Genetic Algorithm-II (NSGA-II) was also applied for the optimization of the somatic embryogenesis and the results showed that the highest embryogenesis rate (99.09%) and the maximum number of somatic embryos per explant (56.24) can be obtained from a medium containing 9.10 μM 2,4-dichlorophenoxyacetic acid (2,4-D), 4.70 μM kinetin (KIN), and 18.73 μM sodium nitroprusside (SNP). According to our results, SVR-NSGA-II was able to optimize the chrysanthemum’s somatic embryogenesis accurately. Conclusions SVR-NSGA-II can be employed as a reliable and applicable computational methodology in future plant tissue culture studies.

diversity. Moreover, chrysanthemum has been used as a model plant for color modification studies [1][2][3]. Conventional propagation and breeding approaches are not able to meet the increasing demands of the market for this valuable ornamental plant. Therefore, novel biotechnological methods can be employed in order to satisfy the demands of consumers [1,3]. Nowadays, in vitro culture methods as biotechnological tools are applied to the rapid multiplication of rare plant genotypes, micropropagation of disease-free plants, production of plantderived metabolites, and gene transformation [1,[3][4][5].
To study in vitro functional genomics, somatic embryos have been employed as a potential explant material [2,3,6,7]. Moreover, many studies proved the usefulness of embryogenesis as a comprehensive model in studying plant growth and development [6,8,9]. The unique developmental pathway represented by somatic embryogenesis can be categorized in different characteristic events such as cell differentiation, activation of cell division, dedifferentiation of cells and reprogramming of their metabolism, gene expression patterns, and physiology [9]. Thus, efficient somatic embryogenesis protocol can play a conspicuous role in successful chrysanthemum genetic manipulation and regeneration. Using the appropriate type and concentration of plant growth regulators (PGRs) in various combinations could improve the somatic embryogenesis of different plant species and explants [10][11][12]. Indeed, in vitro embryogenesis is controlled by the balances of exogenous PGRs and concentrations of endogenous phytohormones. The levels of endogenous phytohormones regulate the in vitro explant differentiation and are assumed to be the major variation sources between different genotypes and explants [1,[13][14][15][16][17]. Therefore, optimizing the somatic embryogenesis protocol can be considered as the first and foremost step in successful gene transformation studies. However, it is usually difficult to achieve an optimized embryogenesis protocol because it is a laborious, time-consuming, and complex process. Therefore, it is necessary to use a novel computational approach for addressing this bottleneck.
In vitro culture consists of highly complex and nonlinear processes such as dedifferentiation, re-differentiation, or differentiation due to the genetic and environmental factors [18][19][20][21]. Therefore, it would be difficult to predict different in vitro culture parameters such as callogenesis rate, embryogenesis rate, and the number of somatic embryos as well as optimize factors involved in these parameters by simple conventional mathematical methods [22][23][24]. Furthermore, biological processes such as somatic embryogenesis cannot be described as a simple stepwise algorithm, especially when the datasets are highly noisy and complex [25][26][27][28][29]. Therefore, machine learning algorithms can be employed as an efficient and reliable computational methodology to interpret and predict different unpredictable datasets [30][31][32][33][34]. Recently, Multilayer Perceptron (MLP) as one of the common artificial neural networks (ANNs) has been widely employed for modeling and predicting in vitro culture systems such as in vitro sterilization [35,36], callogenesis [37][38][39], cell growth and protoplast culture [40,41], somatic embryogenesis [38,42,43], shoot regeneration [25,[44][45][46], androgenesis [47], hairy root culture [48,49], and in vitro rooting and acclimatization [31]. MLP is a type of nonlinear computational methods, which can be applied for different aims such as clustering, predicting, and classifying the complex systems [47,50]. MLP is able to identify the relationship between output and input variables and recognize the inherent knowledge existent in the datasets without previous physical considerations [29,51]. This algorithm consists of numerous highly interconnected processing neurons that work in parallel to find a solution for a particular problem. MLP is learned by example, which should be carefully chosen otherwise time is wasted or even in worse scenarios, the model might be working inaccurately [52]. Support vector machines (SVMs), developed by Vapnik [53], are a kind of interesting, powerful, and easy to interpret machine learning algorithms that analyze data and recognize patterns, used for clustering, classification and regression analysis of nonlinear relationships [54]. Some of the advantages of SVMs in comparison with MLP are related to the complexity of the networks; MLP usually implementing very small number of hidden neurons, whereas SVM uses a large number of hidden units. The best advantage of SVMs is the formulation of the learning problem, resulting in the quadratic optimization task [55,56]. Support Vector Regression (SVR) is a regression version of SVM. Recently, several studies were published regarding SVM-based approaches in solving industrially or chemically important problems [57][58][59]. However, SVR, unlike MLP, is relatively unknown to scientists in the field of plant tissue culture. Also, there is no comprehensive study to compare MLP with other machine learning algorithms (e.g. SVR) in order to develop an appropriate model for predicting in vitro culture parameters such as callogenesis rate, embryogenesis rate, and number of somatic embryos.
Different studies [25,28,30,31,34] have widely employed evolutionary optimization algorithms, in particular, genetic algorithm (GA) as a single optimization algorithm to optimize different factors involved in in vitro culture parameters. This common single-objective optimization algorithm offers merit points over more conventional optimization methods [60,61]. Also, GA has the benefit that it does not need initial estimates for the decision variables. However, GA can be just employed for a single objective function. On the other hand, the Non-dominated Sorting Genetic Algorithm (NSGA) developed by Srinivas and Deb [62] has been successfully employed to optimize many multi-objective variables. However, the main disadvantage of NSGA has been the lack of elitism, the requirements for specifying sharing parameters, and its high computational complexity of non-dominated sorting [60]. NSGA-II is the improved version of NSGA, which has a better incorporates elitism, sorting algorithm, and no sharing parameter requires to be chosen a priori [35]. Therefore, the elitist NSGA-II can be utilized for multi-objective optimization with two, three, or more objective functions [7]. Recently, NSGA-II has been successfully applied to optimize shoot regeneration rate, number of shoots, and callus weight, simultaneously [33].
In the current study, SVR has been employed to predict the somatic embryogenesis parameters, including callogenesis rate, embryogenesis rate, and the number of somatic embryos of chrysanthemum. The developed SVR-based model was compared with MLP in terms of statistical performance parameters to find the most suitable model for modeling and predicting in vitro culture systems. Furthermore, NSGA-II was linked to the best model to find the optimal level of PGRs for somatic embryogenesis. According to the best of our knowledge, this study is the first report of the application of SVR in the field of plant tissue culture.

Effects of PGRs on somatic embryogenesis
Although several investigations have focused on the impact of auxins and cytokinins concentrations in chrysanthemum embryogenesis, there is a lack of study on the influence of auxins, cytokinins, nitric oxide, and their interactions. The PGRs are essential factors in plant tissue culture processes that are remarkably impacted the somatic embryogenesis. The current study was determined the effects of 2,4-dichlorophenoxyacetic acid (2,4-D), kinetin (KIN), sodium nitroprusside (SNP), and their interactions on callogenesis rate (%), number of somatic embryos per explant, and embryogenesis rate (%) of chrysanthemum.
The results of this study showed that leaf explants in the medium containing both 2,4-D and KIN led to both callogenesis and embryogenesis. On the other hand, the medium without PGR was not able to produce calli and embryos. After two and three weeks from culturing, the cut ends of the leaf segments produced calli and embryos, respectively. According to Table 1, high embryogenesis rate and the number of somatic embryos per explant were achieved by using SNP along with 2,4-D and KIN, which is higher than that produced by the media without SNP. Also, the highest callogenesis rate (100%), embryogenesis rate (100%), and the number of somatic embryos per explant (57.8) were observed in the combination of 9.09 μM 2,4-D and 4.65 μM BAP along with 20 μM SNP (Table 1).

SVR modeling and evaluation
SVR was used for modeling the three target variables (callogenesis rate, embryogenesis rate, and the number of somatic embryos) based on three input variables, including 2,4-D, KIN, and SNP.
Two machine learning algorithms, including MLP and SVR were used for modeling and predicting target variables. R 2 , RMSE, and MAE of each developed model were presented in Table 2. Comparative analysis of MLP and SVR ( Table 2) showed that SVR was more accurate than MLP in all studied parameters in somatic embryogenesis in both training and testing sets. As can be seen in Figs. 1, 2 and 3, the regression lines demonstrated that a good fit correlation between the predicted and observed data of callogenesis rate, embryogenesis rate, and the number of somatic embryos for both the training and testing set.  (Table 2).

Sensitivity analysis of the models
Five hundred seventy-six data points were used to determine the overall variable sensitivity ratio (VSR) for identifying the comparative rank of inputs. The results of the sensitivity analysis were summarized in Table 3. Based on sensitivity analysis, callogenesis rate was more sensitive to 2,4-D, followed by KIN, and SNP (Table 3). Also, as can be seen in Table 3, 2,4-D was the most important factor for both embryogenesis rate and the number of somatic embryos per explant, followed by SNP and KIN.

Model optimization
NSGA-II was linked to the SVR in order to determine the optimal level of 2,4-D, KIN, and SNP for obtaining the highest embryogenesis rate and the maximum number of somatic embryos per explant. The results of the optimization process were presented in Table 4 and Fig. 4. As can be seen in Table 4, the highest embryogenesis rate (99.09%) and the maximum number of somatic embryos per explant (56.24) can be obtained from a medium containing 9.10 μM 2,4-D, 4.70 μM KIN, and 18.73 μM SNP.

Validation experiment
According to the validation experiment, the differences between biological validation data and predicted data via SVR-NSGA-II were not significant (Table 5). Indeed, the optimized level of PGRs (9.10 μM 2,4-D, 4.70 μM KIN, and 18.73 μM SNP) led to the highest embryogenesis rate (100%) and the maximum number of somatic embryos per explant (57.86) which is negligibly higher than the predicted result. Therefore, it can be concluded that SVR-NSGA-II can be employed for accurately predicting and optimizing plant tissue culture processes.

Discussion
Being successful in in vitro somatic embryogenesis depends on different factors such as the composition of the medium, gelling agents, light and temperature conditions, and the application of specific combinations of PGRs [1,[13][14][15][16]63]. However, optimizing these factors is time and cost consuming. Also, somatic embryogenesis is a highly complex and nonlinear process. Therefore, there is a dire need to employ robust nonlinear computational methods for optimizing embryogenesis parameters. The efficiency of a good statistical approach depends on the neat understanding of the variable structure, experimental design, and using the appropriate model [64]. One of the most important primary requirements to identify suitable statistical approaches is comprehending the type of data [65]. Variables can be clustered into two groups, including quantitative (continuous and discrete) and qualitative (ordinal and nominal). Names  with two or more classes without a hierarchical order are categorized as nominal variables, while ordinal data have distinct order (level X is more intense than level Y) [65,66]. Counts that include integers are classified as discrete data, while measurements along a continuum, which could be included smaller fractions, are categorized as continuous variables [67]. Plant tissue culture data can be categorized as ordinal (callus quality rated as weak, moderate, and good), nominal (callus types such as embryogenic and non-embryogenic callus), continuous (embryogenesis rate), and discrete (number of somatic embryos). Traditional linear methods such as regression and ANOVA must be just applied with continuous variables that demonstrate a linear relationship between the explanatory and dependent variables [52,68]. On the other hand, in vitro culture systems are considered as complex biological systems that multiple factors can affect the system in nonlinear ways. Hence, the conventional computational approaches are not appropriate for analyzing plant tissue culture data [65]. Recently, different machine learning algorithms such as neural networks [34,46,47], fuzzy logic [7,69], and decision trees [70,71] have been successfully employed for predicting and optimizing various in vitro culture processes. Many studies [35,44,46,72] used MLP to predict the optimal in vitro conditions for different plant tissue culture systems. However, they only applied the MLP model and did not compare this common algorithm with other models. Another promising computational method not previously employed in in vitro data analyses is the SVR. In the current study, MLP and SVR, for the first time, were used to develop a suitable model for chrysanthemum somatic embryogenesis and compare their prediction accuracy. According to our results, SVR had more accuracy than MLP for modeling and predicting the system. Although there is no report regarding the application of SVR in plant tissue culture, in line with our results, comparative studies in other fields revealed the better performance of SVR in comparison to ANNs such as MLP [57][58][59]. On the other hand, one of the weaknesses of using machine learning algorithms is that it is hard to obtain an optimized solution [52]. To tackle this problem, several studies [25,28,30,31,34] used GA to optimize in vitro culture conditions. However, plant tissue culture consists of different functions that sometimes they show conflict interaction. Hence, GA, as a single objective function, cannot optimize multi-objective function [7]. Therefore, it is necessary to employ multi-objective optimization algorithms such as NSGA-II. In the current study, NSGA-II was linked to SVR as the most suitable model for the optimization process. After predicting and optimizing somatic embryogenesis via SVR-NSGA-II, the predicted-optimized results were experimentally tested. Based on our results, SVR-NSGA-II can be considered as an efficient computational methodology for predicting and optimizing different plant tissue culture systems.   The results of the sensitivity analysis showed that 2,4-D is the most important component in the somatic embryogenesis followed by SNP as a donor nitric oxide (NO), and KIN. In line with our results, after several years of molecular and biological somatic embryogenesis studies, it has been shown that 2,4-D is the most important signaling in somatic embryogenesis followed by NO and cytokinin signaling [73]. The type and concentration of PGRs play a pivotal role in somatic embryogenesis. Several studies [1,14,74] have elucidated that among tested auxins, 2,4-D as one of the synthetic auxins resulted in the maximum somatic embryogenesis in chrysanthemum. In addition, kinetin, as a cytokinin, promotes somatic embryogenesis in the chrysanthemum [75,76]. For instance, Shinoyama et al. [75] reported that the maximum number of the somatic embryos (21.3 ± 1.2) was obtained from 2 mg/l 2,4-D along with 1 mg/l kinetin. Nitric oxide is known as a messenger molecule regulating plant development and a ubiquitous bioactive molecule mainly contributed to various plant developmental processes such as fruit ripening, flowering, organ senescence, and germination [73]. This molecule has recently been characterized as one of the phytohormones [77]. The exterior usage of nitric oxide might improve the tolerance of plants under various stresses such as temperature, heavy metals, ultraviolet radiation, drought, and salinity [78][79][80]. The activation rate of nitric oxide has been evaluated by the exterior usage of sodium nitroprusside (SNP) instead of using NO gas directly because of some technical difficulties [81]. In recent years, nitric oxide gets involved in developing in vitro plant propagation [82]. Ötvös et al. [9] demonstrated that despite NO does not affect cell cycle progression in plant tissue culture, it may have a close relation with auxins linking the adjust of cell division to differentiation. Plants have significant developmental plasticity in comparison with animals. During the de-differentiation process, somatic plant cells can repossess the ability to divide and 'de-differentiated' plant cells can 'redifferentiate' into whole plants under appropriate conditions. Ötvös et al. [9] reported that NO accompany with auxin can play a significant role in the embryogenesis of leaf protoplast-derived cells. In the absence of auxin, SNP  could not induce the protoplast-derived cells division. Also, the alternative response of protoplast-derived cells to various concentrations of external auxin in the presence of SNP or L-NMMA may show that NO can alter the sensitivity of the cells to auxin and involved in intermediary of the auxins role during these processes [8]. Furthermore, NO and auxins were suggested to share similar steps in signal transduction pathways caused to root formation and root elongation [73]. In addition to affecting the dividing cells frequency, SNP and L-NMMA have a massive impact on the pathway of auxin concentration-dependent development of leaf protoplast obtained from cells [73]. It previously indicated that these cells could develop into elongated cells or small, vacuolized, and isodiametric cells with dense cytoplasm showing embryogenic competence [83][84][85]. Although the formation of embryo-genic-type cells can be obtained at the high concentration of auxins (5-10 μM 2,4-D), by using SNP, this type of cell can be achieved at a low concentration of 2,4-D [9]. Somatic embryo formation can be obtained by the high-level expression of the MsSERK1 gene as well as the development of the cells [86] so this fact proved the usage of SNP in altering the pathway of the auxin-treated cells. SERK gene expression is usually applied as a marker of embryogenic potential [73] despite its up-regulated expression. This was also accompanying with auxin-promoted root formation [86] and was recommended to be morphogenic instead of only being an embryogenic marker.

Conclusion
Recently, MLP has been widely applied for modeling and predicting in vitro culture systems. In the current study, SVR for the first time was applied to model and predict somatic embryogenesis and to compare its accuracy with MLP. Our results showed that the SVR model has better accuracy than MLP for modeling and predicting complex systems such as somatic embryogenesis. Also, SVR-NSGA-II was able to optimize the chrysanthemum's somatic embryogenesis accurately. The results of the sensitivity analysis showed that 2,4-D is the most important component in the somatic embryogenesis followed by SNP as a donor nitric oxide (NO), and KIN. Interestingly, after several years of molecular and biological somatic embryogenesis studies, it has been shown that 2,4-D is the most important signaling in somatic embryogenesis followed by NO and cytokinin signaling. These results demonstrate that SVR-NSGA-II can open a reliable and accurate window to a comprehensive study of the plant's biological processes. It would be recommended to compare SVR with the current machine-learning methods (e.g., Random Forest, Gradient Boosting), to allow a more thorough appreciation of the relative merit of SVM applied to the presented problem.

Plant material, media, and culture condition
In this study, leaf explants of chrysanthemum 'Hornbill Dark' were selected for in vitro somatic embryogenesis study. To primary disinfect, the explants were washed for 20 min with tap water. Then, further steps were performed under a laminar airflow cabinet. Subsequently, the explants were soaked with 70% ethanol for 40 s and then washed with sterilized distilled water for 3 min. Afterward, the explants dipped in 1.5% (v/v) NaOCl solution for 15 min. Subsequently, the explants were washed with sterilized distilled water for 5 min three times. The basal medium in this study was Murashige and Skoog [87] (MS) medium consisted of 3% sucrose, 0.7% agar, and 100 mg/l Myo-inositol. Also, the pH of the medium by using 1 and/or 0.1 N NaOH as well as 1 and/or 0.1 N HCl was adjusted to 5.8 before autoclaving for 20 min at 120 •C. The explants were cultured in 200-ml culture boxes supplemented with 45 ml basal media. All culture boxes were kept in the growth chamber under 16-h Photoperiod with 50 μmol m −2 s −1 light intensity at 25 + 2 °C.
The somatic embryogenesis experiments were conducted based on a randomized complete block design (RCBD) with a factorial arrangement with a total of 64 treatments with nine replications per treatment, and each replication consisted of five leaf explants.

Modeling procedures
The input variables were 2,4-dichlorophenoxyacetic acid (2,4-D), kinetin (KIN), sodium nitroprusside (SNP). The target variables were callogenesis rate, embryogenesis rate, and the number of the somatic embryos per explant. Before modeling, the datasets were scaled between 0 and 1 to ensure that all variables receive equal attention during the training process. In the current study, two types of machine learning algorithms, including MLP and SVR, were employed to model somatic embryogenesis of chrysanthemum. To train and test each model, 70 and 30% of the data lines were randomly selected, respectively.

Multilayer perceptron (MLP) model
The MLP, as one of the common ANNs, consists of three layers, including input, hidden, and output. In the present study, this model was employed, according to Hesami et al. [35] procedure. Briefly, in the present investigation, a 3-layer backpropagation network (feed-forward backpropagation) was applied for constructing the MLP model. To determine the optimal weights and bias as well as train the network, a Levenberg-Marquardt algorithm was applied. Also, the hyperbolic tangent sigmoid (tansig) and linear (purelin) activation functions were utilized for hidden and output layers, respectively.

Support vector regression (SVR) model
Support vector machines (SVMs), developed by Vapnik [53], can be used for clustering, classification, and regression analysis of nonlinear relationships [54]. SVR, as a regression version of SVM, was employed in the current study. Considering {(x i , t i )} n i as a dataset, x i shows i th input vector, t i represents i th output vector, and n equals a total number of observations. The following function used for the SVR estimation: where w shows weights, b is bias, and ϕ(x) represents the high dimensional feature space, which is non-linearly mapped from the input space x and y is output value. SVR tried to minimize a loss function, and the main goal is that all the estimated variables are placed between the upper and lower prediction error bounds. Upper and lower prediction error bounds in SVR are y = wϕ(x) + b + ε and y = wϕ(x) + b − ε , respectively. Figure 5 shows a schematic view of SVR. An optimization process was used to find out w and b coefficients as follows: where, ε , L ε , and C represents an acceptable error (tube size), insensitive loss function, and penalty parameter, respectively. Both and C are user-prescribed parameters. The dual function of the problem with the application of Lagrange multipliers is as follows: After solving the optimization problem, w and b are determined. The lagrange multipliers with non-zero values were assumed as the supporting vector. Then the SVR can be carried out as follows: Among the various kernel functions in SVR, radial basis function (RBF) is one of the common kernel functions for nonlinear problems. Therefore, SVR along with RBF kernel function could be presented with three parameters as SVR (y, C, Ɛ).

Performance measures
To assess and compare the accuracy of mentioned models, three following performance measures including R 2 (coefficient of determination), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) were used: where y t , ȳ , ŷ t , and T are the t th observed data, the mean of observed values, the mean of predicted values, and total number of predicted values, respectively. Greater R 2 and smaller RMSE and MAE indicated better performance of the constructed models.

Optimization of somatic embryogenesis via NSGA-II
To identify the optimal levels of inputs (2,4-D, KIN, and SNP) for maximizing embryogenesis rate and the number of the somatic embryo per explant, the developed SVR models were exposed to NSGA-II (Fig. 6). Also, a roulette wheel selection method was applied to choose the elite population for crossover [88]. To obtain the best fitness, the initial population, generation number, mutation (10) RMSE = n i=1 y i −ŷ i 2 / n rate, and crossover rate were respectively adjusted to 200, 1000, 0.5, and 0.7. In the current study, the ideal point of Pareto was selected such that embryogenesis rate and the number of somatic embryos per explant became the maximum. Indeed, a point in the Pareto front was detected as the best optimal answer such that: Was minimal; where x and y were the highest embryogenesis rate and the maximum number of somatic embryos per explant in observed data, respectively

Sensitivity analysis
Sensitivity analysis was conducted to identify the importance degree of KIN, SNP, and 2,4-D on the embryogenesis rate, callogenesis rate, and the number of the somatic embryo per explant. The sensitivity of these parameters was measured by the criteria including variable sensitivity error (VSE) value displaying the performance (root mean (11) embryogenesis rate − x 2 + number of somatic embryos per explant − y 2 Fig. 5 The schematic view of the support vector regression (SVR) model square error (RMSE)) of SVR-NSGA-II model when that input variable is removed from the model. Variable sensitivity ratio (VSR) value was determined as the ratio of VSE and SVR-NSGA-II model error (RMSE value) when all input variables are available. A higher important variable in the model was detected by higher VSR.
MATLAB (Matlab, 2010) software was employed to write codes and run the models.

Validation experiments
In order to approve the efficiency of the developed model, the optimized PGRs (medium containing 9.10 μM 2,4-D, 4.70 μM KIN, and 18.73 μM SNP) obtained from SVR-NSGA-II were experimentally tested in the lab with three replications and each replication consisted of ten leaf explants. The obtained experimental results were compared with predicted results.