Skip to main content

Predictive modeling of Persian walnut (Juglans regia L.) in vitro proliferation media using machine learning approaches: a comparative study of ANN, KNN and GEP models



Optimizing plant tissue culture media is a complicated process, which is easily influenced by genotype, mineral nutrients, plant growth regulators (PGRs), vitamins and other factors, leading to undesirable and inefficient medium composition. Facing incidence of different physiological disorders such as callusing, shoot tip necrosis (STN) and vitrification (Vit) in walnut proliferation, it is necessary to develop prediction models for identifying the impact of different factors involving in this process. In the present study, three machine learning (ML) approaches including multi-layer perceptron neural network (MLPNN), k-nearest neighbors (KNN) and gene expression programming (GEP) were implemented and compared to multiple linear regression (MLR) to develop models for prediction of in vitro proliferation of Persian walnut (Juglans regia L.). The accuracy of developed models was evaluated using coefficient of determination (R2), root mean square error (RMSE) and mean absolute error (MAE). With the aim of optimizing the selected prediction models, multi-objective evolutionary optimization algorithm using particle swarm optimization (PSO) technique was applied.


Our results indicated that all three ML techniques had higher accuracy of prediction than MLR, for example, calculated R2 of MLPNN, KNN and GEP vs. MLR was 0.695, 0.672 and 0.802 vs. 0.412 in Chandler and 0.358, 0.377 and 0.428 vs. 0.178 in Rayen, respectively. The GEP models were further selected to be optimized using PSO. The comparison of modeling procedures provides a new insight into in vitro culture medium composition prediction models. Based on the results, hybrid GEP-PSO technique displays good performance for modeling walnut tissue culture media, while MLPNN and KNN have also shown strong estimation capability.


Here, besides MLPNN and GEP, KNN also is introduced, for the first time, as a simple technique with high accuracy to be used for developing prediction models in optimizing plant tissue culture media composition studies. Therefore, selection of the modeling technique to study depends on the researcher’s desire regarding the simplicity of the procedure, obtaining clear results as entire formula and/or less time to analyze.


The walnut is one of the most important nuts in the world. Persian walnuts (Juglans regia L.) are the only edible species of walnut which are widely grown for their nuts and timbers [1]. In general, walnut tree propagation is still mainly by using seeds rather than vegetative procedures which results in non-uniform nut quality and irregular yielding [2]. Therefore, in vitro propagation is used to overcome the mentioned problems. But walnuts are considered recalcitrant to in vitro culture which makes difficult the mass propagation of different genotypes while several micropropagation protocols have been published for different genotypes [3,4,5,6,7,8,9,10,11]. It has been proven that walnut micropropagation results are highly dependent upon genotype [7, 9,10,11]. In addition to genotype, the formulation of culture medium has a great impact on all micropropagation stages. Up to now, the [3] walnut (DKW) culture medium has been the most employed formulation for walnut tissue culture. Nevertheless, there are some researches reporting improved results using modified DKW or other formulations [6,7,8, 12,13,14,15].

However, to the best of our knowledge, no comprehensive study has been done on the balance of culture media components (mineral nutrients, plant growth regulators (PGRs) and vitamins) and their interaction together and with genotype on walnut in vitro performance to increase the efficiency of the micropropagation process by enhancing proliferation rate and reducing physiological disorders.

Predicting the interaction of mineral nutrients, PGRs, vitamins and genotype on the explant in vitro performance would involve modeling a very complex database, which is very problematic and time-consuming process using classic statistical analyses and needs accurate and advanced modeling procedures [16, 17]. Machine learning (ML) tools allow researchers to perceive the studied process and make proper decisions to develop optimal culture media [17]. In recent years, different ML models like neural networks [18,19,20,21,22,23] have been successfully applied for prediction and optimization of different plant tissue culture processes. In our previous studies, we described the ML hybrid techniques, combining artificial neural network (ANN) with genetic algorithm (ANN-GA) in Pyrus [24] and Prunus rootstocks [25,26,27], rootstocks gene expression programing (GEP) with GA (GEP-GA) [20] and particle swarm optimization (PSO) (GEP-PSO) [28] in Pyrus rootstocks as powerful data mining approaches, which allow modeling of complicated databases and finding the factors influencing a given response in micropropagation process.

ANNs are inspired by the functions of human brain [29]. The ANN [multi-layer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN)] has revealed significant development in complex plant tissue culture systems [20, 24,25,26,27]. ANN does not require any previous knowledge regarding the creation or interrelationships between signals of input and output that is one of its profits [16]. Other benefits of ANN are prediction of the plant biomass [30], clustering the micropropagated plantlets and influencing growth and quality of the regenerated plants by controlling light, ventilation, CO2 and air temperature inside the culture containers which could be of ANN benefits [16].

GEP model is another ML-based optimization technique presented by [31] which comprises useful traits of both genetic programming (GP) and GA. This new model according to an evolving computer programs algorithm was used in our previous studies on Pyrus rootstocks micropropagation which precisely detected nonlinear and complicated relationships between input and output [20, 24].

Here, ANN and GEP are compared to k-nearest neighbors (KNN) method as one of the simplest machine learning techniques. The KNN technique recognizes the elements amongst the training samples that correspond “current” conditions maximum closely based on some predefined attributes: the neighbors. The prediction value is then specified from the groups of the next values of the neighbors [32]. Comparing to mathematical modeling, the KNN method involves no model development or confirmation and thus can be used without recombining data, contrasting in the case of common data-based models [33]. In spite of the potential advantages, no research has yet been done on the use of this technique in the area of plant micropropagation.

In our previous study [20], we compared the RBFNN and GEP in optimizing the in vitro culture media composition for pear rootstocks. Based on our results GEP was a significantly powerful and more precise technique than RBFNN in prediction of in vitro proliferation quantity and quality. So, GA technique was applied to optimize GEP models [20]. Nevertheless, GA optimized the level of inputs required for each specific output, distinctly. Consequently, in our recent study [28], in order to achieve a complete optimum formulation for culture medium, we compared two algorithms GEP and M5’ model tree, to predict the impacts of media minerals and PGRs on in vitro proliferation of pear rootstocks. We found that GEP showed a higher prediction precision than M5’ model tree. So, we optimized the GEP prediction models using multi-objective evolutionary optimization algorithms (MOEAs) including GA and PSO methods and compared to the mono-objective GA optimization procedure. The PSO optimized GEP prediction models made the best outputs in both rootstocks [28].

With MOEAs, inputs are evaluated as multi-objective optimization problems (MOPs) and the solutions specify the best probable balance between two reverse functions [34]. Recently, several mathematical methods have been used to solve MOPs, nonetheless the real MOPs applications are specifically nonlinear and also occasionally non-differentiable [35]. This has enhanced interest in metaheuristic methods, and among these procedures, MOEAs are of special interest. Here, PSO as an evolutionary computation technique was used for determining optimized culture media.

The aim of this study is to employ three soft computing methods namely MLPNN, GEP and KNN and to compare the accuracy of their prediction to multiple linear regression (MLR) technique as well as applying PSO algorithm with aim of predicting and optimizing walnut tissue culture media. Briefly, the new contributions of the present research are:

  • Comparing the appropriateness of MLPNN, KNN and GEP nonlinear methods for modeling the impacts of mineral nutrients, PGRs and vitamins on in vitro culture of walnut.

  • Constructing hybrid models in order to assess how Chandler and Rayen explants respond to the culture medium composition according to the new produced shoots attained from the Taguchi design.

  • Finding the optimal composition of culture media to maximize the proliferation rate (PR) and minimize callus weight (CW), shoot tip necrosis (STN) and vitrification (Vit) by optimizing the developed model using PSO.

To our knowledge, this study is the first application of MLR, KNN, ANN, GEP and PSO methods for optimizing walnut tissue culture media. In addition, this work is the first use of KNN modeling procedure in plant tissue culture.


Our models of the interaction of modifying inputs including nutrients, PGRs and vitamins on outputs including PR, CW, STN and Vit were developed using MLR, MLPNN, KNN and GEP techniques. Here, we assess the developed models’ performances through evaluating each modelling method precision to predict the composition of plant micropropagation media for walnut. After that, PSO optimization results of the selected modeling method is investigated to find the most efficient compositions of media for each considered trait. An outline of the techniques used here to achieve the most appropriate model is shown in Fig. 1.

Fig. 1
figure 1

Schema of the techniques used to construct prediction models for Persian walnut in vitro culture media

Comparison of modeling techniques performances

The mathematical equations attained from GEP method, which is showing the best estimate of the explant growth parameters, are shown in Table 1. Moreover, calculated statistics results for output variables (PR, CW, STN and Vit) related to the MLR, MLPNN, KNN and GEP models are given in Table 2. Unlike MLR, the trained MLPNN, KNN and GEP models of PR, CW, STN and Vit resulted in balanced statistic values for both the training and testing subsets (Table 2). For output variables (PR, CW, STN, and Vit) the calculated statistical values corresponding to the KNN, MLPNN and GEP models showed a considerably higher accuracy of prediction than for MLR models as calculated R2 for MLPNN, KNN and GEP vs. MLR models were: 0.672, 0.695 and 0.802 vs. 0.412 for PR of Chandler; 0.377, 0.354 and 0.428 vs. 0.178 for PR of Rayen; 0.923, 0.931 and 0.844 vs. 0.696 for CW of Chandler; 0.929, 0.930 and 0.839 vs. 0.276 for CW of Rayen; 0.855, 0.915 and 0.807 vs. 0.241 for STN of Chandler; 0.812, 0.831 and 0.808 vs. 0.341 for STN of Rayen; 0.974, 0.975 and 0.853 vs. 0.434 for Vit of Chandler; and 0.977, 0.978 and 0.891 vs. 0.299 for Vit of Rayen, respectively (Table 2).

Table 1 Constructed models using gene expression programming to predict explant growth traits in Persian walnut
Table 2 Evaluation of different developed models using various statistics for PR, CW, STN, and Vit of Persian walnut through in vitro proliferation

Comparison of the observed and predicted values of outputs may explain the performance of the developed models according to the studied inputs. A high squared correlation coefficient fitting technique was used to produce plots according to the constructed models derived, to show how each of the four outputs varied as the concentration of media components changed. The plots may be helpful to understand the complete relationship between media components and responses, and to assess the multiple effects of modifying the media components in the DKW medium. The predicted MLR, MLPNN, KNN and GEP models diagrams vs. observed values for the PR, CW, STN and Vit are shown in Figs. 2, 3, 4, 5, 6, 7, 8 and 9. Comparing the fitted simple regression lines of the MLR with ML models showed that MLR resulted in the lowest accordance between the observed and predicted values regarding all considered outputs so that calculated R2 for MLPNN, KNN and GEP vs. MLR were: 0.696, 0.672 and 0.802 vs. 0.412 for PR of Chandler (Fig. 2); 0.178, 0.359, 0.377 and 0.428 for PR of Rayen (Fig. 3); 0.696, 0.931, 0.924 and 0.844 for CW of Chandler (Fig. 4); 0.276, 0.874, 0.930 and 0.840 for CW of Rayen (Fig. 5); 0.241, 0.916, 0.856 and 0.807 for STN of Chandler (Fig. 6); 0.342, 0.810, 0.813 and 0.809 for STN of Rayen (Fig. 7), 0.435, 0.976, 0.975 and 0.853 for Vit of Chandler (Fig. 8); and 0.300, 0.979, 0.978 and 0.891 for Vit of Rayen (Fig. 9), respectively. Therefore, the ML models were able to accurately predict the outputs while the MLR developed models were not able to describe extensive diversity of growth parameters owing to the studied variables interaction, that may hide the effects of media components. Figures 2, 3, 4, 5, 6, 7, 8 and 9 may be helpful for realizing the complete relationship between media components and responses, and assessing the combined impacts of modifying the DKW medium components.

Fig. 2
figure 2

Observed vs. predicted values of proliferation rate (PR) related to A multiple linear regression (MLR); B multi-layer perceptron neural network (MLPNN); C k-nearest neighbors (KNN); D gene expression programming (GEP) developed models (n = 224) for walnut cv. Chandler

Fig. 3
figure 3

Observed vs. predicted values of proliferation rate (PR) related to A multiple linear regression (MLR); B multi-layer perceptron neural network (MLPNN); C k-nearest neighbors (KNN); D gene expression programming (GEP) developed models (n = 224) for walnut cv. Rayen

Fig. 4
figure 4

Observed vs. predicted values of callus weight (CW) related to A multiple linear regression (MLR); B multi-layer perceptron neural network (MLPNN); C k-nearest neighbors (KNN); D gene expression programming (GEP) developed models (n = 224) for walnut cv. Chandler

Fig. 5
figure 5

Observed vs. predicted values of callus weight (CW) related to A multiple linear regression (MLR); B multi-layer perceptron neural network (MLPNN); C k-nearest neighbors (KNN); D gene expression programming (GEP) developed models (n = 224) for walnut cv. Chandler

Fig. 6
figure 6

Observed vs. predicted values of shoot tip necrosis (STN) related to A multiple linear regression (MLR); B multi-layer perceptron neural network (MLPNN); C k-nearest neighbors (KNN); D gene expression programming (GEP) developed models (n = 224) for walnut cv. Chandler

Fig. 7
figure 7

Observed vs. predicted values of shoot tip necrosis (STN) related to A multiple linear regression (MLR); B multi-layer perceptron neural network (MLPNN); C k-nearest neighbors (KNN); D gene expression programming (GEP) developed models (n = 224) for walnut cv. Rayen

Fig. 8
figure 8

Observed vs. predicted values of vitrification (Vit) related to A multiple linear regression (MLR); B multi-layer perceptron neural network (MLPNN); C k-nearest neighbors (KNN); D gene expression programming (GEP) developed models (n = 224) for walnut cv. Chandler

Fig. 9
figure 9

Observed vs. predicted values of vitrification (Vit) related to A multiple linear regression (MLR); B multi-layer perceptron neural network (MLPNN); C k-nearest neighbors (KNN); D gene expression programming (GEP) developed models (n = 224) for walnut cv. Rayen

According to the results presented in Table 2 and Figs. 2, 3, 4, 5, 6, 7, 8 and 9 as well as the above-mentioned results, MLPNN, KNN and GEP models performed accurately in predicting the effect of media components on in vitro performance of Persian walnut. So, in order to select one of these ML modeling techniques to be optimized and achieve final models for in vitro proliferation of Persian walnut, we considered the ease of using model by the end user. In other words, although MLPNN and KNN performed relatively well, none of these models offer explicit mathematical expression. Unlike MLPNN and KNN methods which produce black-box models, GEP can provide the researchers with an opportunity to optimize the extractive equations (optimal values of the variables) by generating explicit mathematical equations between the independent variable and the dependent variable and can be used as an equation for the pre-test stages (initial phase of the study) in designing and developing of their studies. Hence, we selected GEP models to be optimized and achieve proliferation media formulations of Chandler and Rayen.

Optimization of GEP models

Consequently, to achieve the optimized medium resulting in the highest PR and the lowest CW, STN and Vit in walnut, we optimized developed GEP models by using multi-objective PSO technique.

The optimized amounts of the studied factors and the predicted values of growth parameters by the GEP models are shown in Table 3. The PSO optimization of the GEP models revealed that media containing 1.76 × NH4NO3, CaNO3 and ZnNO3, 1.67 × KNO3, 0.96 × K2SO4, 0.66 × MgSo4, MnSo4 and CuSo4, 2.35 × KH2PO4, H3BO3 and Na2MoO4, 1.64 × FeEDDHA and 1.89 × Thiamine, Nicotinic acid and Glycine concentrations in DKW medium, 0.67 mg/l BAP and 1.30 mg/l TDZ and 1.30 mg/l IBA could lead to optimal PR (23.54), CW (0.12), STN (2.23) and Vit (9.95) in Chandler and media containing 0.73 × NH4NO3, CaNO3 and ZnNO3, 0.69 × KNO3, 0.94 × K2SO4, 0.64 × MgSo4, MnSo4 and CuSo4, 0.83 × KH2PO4, H3BO3 and Na2MoO4, 1.35 × FeEDDHA and 1.52 × Thiamine, Nicotinic acid and Glycine concentrations in DKW medium, 0.67 mg/l BAP and 1.23 mg/l TDZ and 1.23 mg/l IBA could result in optimal PR (24.57), CW (0.64), STN (12.48) and Vit (3.04) in Rayen.

Table 3 Multi-objective PSO optimization of GEP models to achieve the highest quantity and quality through in vitro proliferation of walnut


Walnuts as one of the important woody plants are considered recalcitrant to in vitro culture in which genetic determinism besides other factors such as media components makes more complicated different stages of micropropagation, as well. In the present study, three different ML modeling approaches along with PSO optimization algorithm were applied to determine and predict the effect of genotype and the media formulation throughout the proliferation of walnut. Walnut micropropagation can be improved by involving different physiological disorders in modeling and optimization processes. The incidence of physiological disorders through micropropagation of walnut has not been comprehensively investigated. Different studies on walnut tissue culture have been focused on introducing some chemicals like phloroglucinol and FeEDDHA to DKW or [36] (MS) basal media [11, 15], supplementing media with various concentrations of different PGRs [37,38,39], removing agar [39], ventilation and reducing sucrose concentration [40], but a few of the studies focused on media components, including mineral nutrients [9, 41], vitamins and PGRs [39] interaction on proliferation quality and quantity.

Here, we concentrated on increasing PR and reducing important abnormalities occurring during this phase, by recording data associated to several designed experiments. The subsequent database including a range of concentrations of each component in culture media allows simultaneous evaluation of the impacts of all minerals, vitamins and PGRs used in media as well as genotype on the explant growth indices only through the ML tools.

Machine learning as a powerful tool has been effectively applied in plant biology studies [42, 43] including plant tissue culture data analysis and accurate prediction of optimal in vitro culture media composition [20, 24,25,26,27,28]. The development of in vitro plant tissues is controlled by minerals, vitamins and PGRs in the culture media. To achieve maximum explant performance, the prediction of the most efficient media composition is highly useful since the optimization of the type and concentration of minerals, vitamins and PGRs in media is a time-consuming, expensive and laborious job [9, 41].

In our previous studies, we successfully performed constructing neural models using ANN technique to study the effects of different combinations of minerals and PGRs on in vitro proliferation and rooting of G × N15 Prunus rootstock [25,26,27]. Our study on comparing ANN with MLR modeling to forecast the optimum concentrations of macronutrients for OHF 69 and Pyrodwarf Pyrus rootstocks in vitro media showed ANN as a precise and promising technique [24]. The important benefit of ANN-based methods is that they do not need a prior identification of proper fitting function consequently; they have an overall approximation ability to calculate all kinds of non-linear functions in practice. This trait may help the modeler to develop the most possible precise model. Despite the fact that ANN is a good alternative for MLR, it does not provide us any equations including the relationships between input and output variables. Moreover, the ANN technique needs a time-consuming process of trial and error to find network parameters like number of neurons and hidden layers [44,45,46]. ANNs as the most extensively used ML model, can efficiently solve different multivariate, non-linear and nonparametric problems via an unidentified ‘‘black box” training [47]. Nevertheless, there are also some drawbacks with ANN “black box” nature [48]. In general, ANN is unable to clarify its logical process and this constraint makes ANN application unfriendly in natural science studies, as it can just simulate the change process according to experimental data, without helping us to understand the reason of the change.

Considering these restrictions in using ANN models, in another study on Pyrus rootstocks in vitro proliferation [20] we compared the power of GEP technique to ANN (RBFNN) and MLR in predicting the optimal media. RBFNN and GEP exhibited higher performance precision towards the MLR, and the GEP resulted in the most precise model as well as being practical [20]. In our recent research [28], we used two algorithms, GEP and M5’ model tree to overcome the ANN method weaknesses and simplify forecast of the media components interactions on in vitro proliferation of Pyrus rootstocks. Again, we found GEP as a more accurate technique than M5’ model tree [28].

Consequently, in the present study, we applied GEP as the most precise modeling procedure found by [20, 28], MLPNN as an ANN technique that its models are easier to give precise prediction than RBFNN when input data are randomly distributed [49] and KNN as one of the simplest machine learning approaches which can also be used for regression problems [50]. The MLR was also applied as a linear modeling method to be compared with above-mentioned ML procedures in predicting the optimum in vitro proliferation media composition of walnut to achieve the most appropriate outcomes. The accuracy of the developed prediction models was evaluated using MAE, RMSE and R2 statistics and correlation coefficient between observed and predicted values of each output. To our knowledge, KNN algorithms have not ever been applied to predict the plant tissue culture media composition. The advantage of KNN algorithm is that it does not require specific assumptions about the predictors’ distribution. The samples of KNN are classified according to the k neighbor responses mean values in a space of predictor [51]. The examples of training are defined by n traits. Each example means a point in a space with n-dimension. So, all examples of training will be kept in a space with the pattern on n-dimension. Here, the number of neighbors (k) leading to the best results for each model are presented in Table 3.

A key advantage of GP-based procedures such as GEP, toward other methods is that they do not need any hypothesis for preceding form of the relationship to produce prediction equations. GP and its deviations have been applied in many researches to find any complicated relationships which fit different experimental data [52,53,54]. An individuals’ population is employed in this technique and afterwards, better individuals are chosen by using genetic variations and fitness function. The genetic variations are introduced by genetic operators. Machine learning approaches including GEP have been programed to learn the variables̛ relationships in data collections. GEP difference with GA and GP as its precursors is in the method of individual programming so that in GEP, individuals are programmed as chromosomes i.e. fixed length linear strings which are presented finally as a simple diagram called expression tree. Whereas, in GA and GP, individuals are expressed, as nonlinear entities with different shapes (parse trees) and sizes and chromosomes, respectively. One of the GEP strengths over GA and GP is that genetic operators work very simple at the level of chromosome in GEP making development of genetic diversity. GEP unique, multi-genic nature is another important point which allows more complicated programs with multiple sub-programs to be developed. The advantages of both GA and GP are collected in GEP, whereas some of their constraints are met [55].

Based on our results presented in Table 2, KNN, MLPNN and GEP models were much more accurate than MLR. On the other hand, in most cases, the MLPNN method provided better fit calculation than KNN and GEP. But based on the results of our aforementioned studies [20, 28], the optimized GEP method provides better fit calculation than other approaches. Furthermore, GEP is preferred over ANN models, as ANN is a black-box model, whereas GEP explains the constructed prediction models with mathematical Eqs. [54].

Through the previous years, GEP has been applied extensively in other areas because of its high efficiency and effectiveness. GEP applications are so wide and are rapidly enhancing [55]. GEP is one of the most effective function mining algorithms which has been widely used in classification, pattern recognition, prediction, and other research areas. This algorithm can mine an ideal function to deal with further complex tasks [56]. GEP has been used to determine the quality and stress of water on lakes or rivers as a result of the wastewater pollutants [31]. The problem of missing values in data set due to the measurement conditions can simply be solved by employing GEP [31]. Results based on actual data set confirmed that the multiple GEP and fuzzy expert system outperforms detection methods in medical field by attaining high prediction precision [57].

Our previous studies [20, 24, 28] on pear rootstocks using ML-based modeling showed that there are different responses to the concentrations of macronutrients and PGRs based on genotypes, as we found here in Persian walnut varieties. Regarding the complex interactions, detection of the optimum levels of minerals and PGRs for a certain plant genotype is complicated [58]. Furthermore, the incidence of physiological disorders like Vit and STN throughout the proliferation phase of walnut needs improvement of media for optimal growth of explants. Constructing optimized and effective media by using authentic mathematical modeling and optimization methods have been performed previously on different plant species [17, 20, 24,25,26,27,28, 59,60,61]. Here, we consequently suggested use of ML-based modeling to recognize concentrations of minerals and PGRs that would maximize PR while minimizing CW, STN and Vit [24]. As we found here (Table 3), our previous results on pear [20, 24] showed that ANN prediction models had higher precision than MLR models and MLR could not be a trustworthy method for assessing nonlinear or non-polynomial relationships among variables.

It has been revealed from our recent study on pear rootstocks micropropagation [28] that the most efficient optimization method for optimizing GEP models was multi-objective PSO. Therefore, here, we used multi-objective PSO method for optimization of selected GEP models. Our GEP-PSO optimized models could give us intact optimized formula for proliferation of Chandler and Rayen (Table 3).

The mono-objective GA optimized MLPNN and RBFNN-based models obtained in our previous studies [20, 24] on Pyrodwarf and OHF Pyrus rootstocks showed the significance of some minerals such as NH4+ and NO3 and/or PGRs for explants proliferation. Our previous research [25] on G × N15 Prunus rootstock by using mono-objective GA optimized ANN models found the higher importance of NO3, NH4+, Ca2+, K+, and PO42− towards Mg2+, Cl and SO42− for in vitro proliferation. Our recent study [28] on Pyrus rootstocks using mono-objective GA optimization of GEP models indicated that high PR may cause low quality plantlets. In accordance with it, our study [25] on G × N15 using mono-objective GA optimization of ANN models also predicted that increasing the NH4+ concentration will enhance shoot number and length with higher number of non-healthy shoots but decreasing amount of NH4+ will enhance the plantlets quality. Our results [28] on pear rootstock using RBFNN and GEP modeling procedures also indicated that a lower content of nitrogen will result in higher quality plantlets. NH4+, NO3 and K+ interaction has been the main subject of most in vitro studies [62] but using ML models, [63] reported interaction of K+, EDTA and SO42− with critical effect of K+ on PR of pistachio; as low and high concentrations of K+ resulted in the highest and lowest PR, respectively. Study on Prunus sp. also showed that K+ at low concentration promotes PR [64]. Nezami-Alanagh et al. [63] concluded that either low or too high amounts of K+, EDTA and SO42− ions result in low quality plantlets. Considering macro- and micro-elements, our multi-objective PSO optimized GEP models in Chandler showed that increasing NH4+, NO3 and SO42− increased PR and Vit while decreasing CW and STN. But the results in Rayen showed that increasing SO42− except K2SO4 as well as increasing NO3 except KNO3 increased PR and CW while decreasing STN and Vit (Table 3).

Reed et al. [65] emphasized on the optimization of nitrogen components content of the culture media to stimulate high number of elongated shoots and reduced amount of callus, in different pear species. Nezami-Alanagh et al. [66] suggested avoiding high content of NH+ to reduce callus formation in the in vitro pistachio shoots. Low amounts of some of the MS medium components such as KNO3, MgSO4, KH2PO4, CaCl2, and NH4NO3 have been reported to contribute to STN promotion in some Pyurus species [67]. Whereas based on our results, lower concentrations of K2SO4, MgSO4, MnSO4, CuSO4 in Chandler and K2SO4 and KNO3 in Rayen reduced the occurrence of STN. The results of [63] using neurofuzzy logic showed that low amount of K+ and mid to high concentrations of SO42− inhibit the STN in pistachio explants with lower signs in UCB1than in Ghazvini which refers to the genotypes differences as we found in our study. Ion confounding problem again prevents determining exact relationship between a given mineral and the physiological disorder.

The neurofuzzy logic procedure show a linear positive effect of nicotinic-acid and pyridoxine–HCl on pistachio parameters of shoot multiplication [68], but, to our knowledge, there is no study about the impact of vitamins on the proliferation of walnut. Nezami-Alanagh et al. [66] showed that the glycine and thiamin-HCl affected differently on some in vitro disorders of pistachio. They showed that increasing glycine content highly reduced the development of callus. Our study showed that higher content of vitamins reduced CW in Chandler (Table 3) and reduced vitamins content in Rayen which caused higher CW (Table 3). Rayen was more recalcitrant to micropropagation than Chandler, hence, achieving higher PR and lower incidence of STN and Vit can cover the low increase in CW.

Genotype is an important factor influencing the occurrence of physiological disorders in walnut which is in agreement with reports of [63, 66] in pistachio. Similarly, other researches on pear [20, 24, 28, 67] explained that the in vitro physiological disorders incidence caused by unbalanced mineral nutrition differed among genotypes.

The purpose of our study was to present an ML approach with high accuracy for prediction of optimized culture media. We applied techniques of MLPNN, KNN and GEP combined with PSO to walnut proliferation data sets to achieve the most appropriate proliferation results. Comparison of our results with the previous ones [20, 24,25,26,27,28, 63, 64, 66] indicates that using at least two methods together results in more precise consequences. So that, comparing the results of the used methods showed the effect of media components enhancing or reducing the measured parameters (Table 3). The efficiency of the developed optimized media was compared to DKW. The media constituents proposed by our PSO optimized GEP models related to Chandler showed that decrease in K2SO4, MgSO4, MnSO4, CuSO4 and BAP besides increase in other nutrients, PGRs and vitamins increased PR as well as Vit while reducing CW, and STN. Nevertheless, it was slightly different for Rayen as decrease in K2SO4, KNO3, vitamins and BAP along with increase in remained nutrients and PGRs caused higher PR and CW but lower STN and Vit (Table 3). The use of macro- and micro-nutrients as factors, in many micropropagation studies [20, 24, 28, 63, 68], indicates the ion confounding problem, being problematic to recognize precisely corresponding ion(s) affecting the studied parameter [69]. Our results in comparison to previous studies on walnut [11, 15, 37,38,39] which were about minerals and/or PGRs effects, showed for the first time that not only the effects of minerals depend on the used PGRs concentration but vitamins concentration affects the explant response. The interaction of minerals, PGRs and vitamins could determine the quantity and quality of proliferated plantlets. The plant species and genotype are also highly important in predicting the explant growth response to the minerals, PGRs and vitamins interaction.

Plant PGRs interactions make a critical complication in regulating the processes of plant growth, as well. Cytokinin controls cell proliferation [70] and auxin enhances the sensitivity of apical meristem less mitotically active cells to cytokinin [71]. Cytokinin to auxin ratio is a key signal which controls phenotype [72]. As auxin and cytokinin have roles in DNA replication and cell cycle regulation, respectively [73]. PGRs effects may vary with plant species. Ref. [26] results on Prunus rootstock indicated that applying cytokinin and auxin together will result in higher PR than employing each one alone. According to their results, PGRs concentration and interaction are also important. According to these results and [74] and [75] findings, we used various concentrations of BAP, TDZ and IBA in our experiments. Our adverse results can be attributed to the interaction of genotype and culture medium constituents [76] with PGRs [20]. Type and concentration of cytokinin highly affected in vitro growth and survival of black walnut [39]. Ref. [37] reported that lower concentrations of zeatin was better than BAP for fast shoot elongation of black walnut nodal explants, while higher levels of zeatin and BAP led to shoot necrosis. Using TDZ at 0.01–0.02 mg/l in the medium resulted in an enhanced rate of morphological disorders [37]. But higher levels of TDZ (1.30 and 0.52 mg/l in Chandler and Rayen, respectively) in our present study resulted in reduction of STN in both Chandler and Rayen. Juglans regia was successfully micropropagated using 0.1–2.01 mg/l BAP [4, 8, 12, 77,78,79,80,81]. Our used BAP concentrations (0.67 and 0.99 mg/l for Chandler and Rayen, respectively) are also in this range. There is no result in the literature about the effect of BAP on the incidence of walnut in vitro physiological disorders. But according to the results of in vitro studies on other plant species like pistachio [64, 82, 83], addition of adequate amount of BAP strongly decreases the incidence of STN.

Therefore, in the present study, we evaluated the interaction of cytokinin and auxin PGRs and medium components including nutrients and vitamins on proliferation of walnut to achieve the most efficient protocol with a reasonable range of PGRs. Our analyses using PSO optimized GEP modeling technique showed that this method can be used as an efficient procedure for evaluating the interaction of different factors on walnut explant growth indices in proliferation phase. Therefore, for the first time GEP is introduced as a great tool in optimizing higher quality and efficiency walnut tissue culture protocols in less time.

Callus development during explant proliferation is a common problem in walnut micropropagation which has been reduced here by increasing PR in Chandler while enhanced by increasing PR in Rayen (Table 3). Yegizbayeva et al. [15] reported that callus formation is not correlated with PR in walnut. Callus formation has been attributed to certain concentrations of different mineral nutrients in various plant species like KH2PO4, CaCl2 and MgSO4 in some Prunus cultivars [67], NO3 in germplasms of Robus [84] or MgSO4 in Prunus armeniaca [85]. Akin et al. [86] reported NH4+ and after that genotype and SO42− as significant factors affecting callus formation in hazelnut in vitro proliferation using CHAID analysis. Nezami-Alanagh et al. [63] using neurofuzzy logic predicted that high and low concentrations of Fe2+ and SO42−, respectively, result in the lowest callus formation in pistachio rootstocks explants. They suggested that lower concentration of SO42− in MS reduces shoot tip necrosis and callus development in pistachio in vitro proliferation. While our results showed that lower concentrations of both FeEDDHA and minerals containing SO42− in DKW caused lower CW in both Chandler and Rayen. Bosela et al. [37] showed that the high-salt media i.e. DKW and MS resulted in lower Vit vs. WPM and 1/2X DKW media in walnut.


Walnut micropropagation is a problematic process with lots of in vitro drawbacks including necrosis, callusing and vitrification. The present study demonstrated the efficiency of plant in vitro proliferation predictive models by using advanced ML modeling procedures. Therefore, a regression model i.e. MLR and three advanced ML models including MLPNN, KNN, and GEP were constructed to predict walnut in vitro PR and associated physiological disorders under the effect of culture medium constituents and genotype. According to the results, following conclusions and suggestions are presented:

  • Advanced computational models are the highly precise approaches which can be applied to control and predict walnut explant in vitro performance. They can also be employed as an alternative technique for linear regression and usual statistical analysis methods with noteworthy performance among them. The KNN model has been used for the first time in this study for predicting plant in vitro performance. The optimized models should be applied to predict walnut PR in experimental designs for controlling undesirable physiological disorders.

  • All ML models performed accurately for forecasting PR, CW, STN and Vit. Nevertheless, the accuracy of the GEP models were mostly higher than ANN and KNN models. So, the GEP models were selected to be optimized by PSO technique in order to achieve optimal culture media.

  • Using above-mentioned ML models is extremely useful for reducing time and cost for formulating efficient walnut tissue culture media.

  • The ML-designed media for walnut can not only raise PR (especially about Chandler) but, simultaneously, reduce CW, STN and Vit.

  • Genotype is a very important factor which affects the in vitro performance and based on our results, it seems that Rayen as a not bred genotype is more recalcitrant to in vitro propagation than the bred cultivar Chandler.

  • Other factors such as sucrose along with our studied medium components and their interaction on PR and occurrence of physiological disorders also need to be incorporated into the predicting model to control the PR comprehensively.


MLR, MLPNN, KNN and GEP modeling techniques were applied to make models using various arrangements of minerals, vitamins and PGRs with different concentrations as inputs and different proliferation indices as outputs. The selected models were used to achieve the optimized models using PSO. Two case studies were done using walnut cultivar Chandler and genotype Rayen which have explained details of the used procedures to understand the optimized inputs combinations as follows.

Case studies

In vitro established nodal cultures of Chandler and Rayen were sub-cultured in altered DKW media supplemented with various auxin and cytokinin PGRs concentrations, 30 g/l sucrose and 3 g/l Gelrite. The media were dispersed into jam jars (250 ml) with polyethylene caps after adjusting pH to 5.5. Then, the distributed media were autoclaved for 15 min at 1 kg cm−2 s−1 (121 °C). The cultures were kept under 16-h white fluorescent (80 µmol m2 s−1) light at 25 ± 2 °C for 30 days. Subsequently, parameters comprising PR, CW, STN and Vit were measured. In each experiment set, every treatment included 8 replicates (jam jars) for both Chandler and Rayen.

Taguchi experimental design for optimization of explant proliferation

Taguchi design is a strong and effective tool for the process of optimization that functions constantly and optimally through different conditions. Evaluating numerous factors with limited runs is possible via Taguchi designs i.e. orthogonal arrays. In this design, factors are not weighted more or less in the same experiment and therefore all factors are analyzed independently to each other. Deviation of a product efficient characteristics from their target values is produced by some noise factors such as human errors. Based on orthogonal arrays of Taguchi’s, a standard orthogonal array L27 (35) 27 experiments by 26 of freedom were applied for each of Chandler and Rayen to evaluate the effect of nine factors according to Table 4, on PR, CW, STN, and Vit. For each experiment, three different levels of factor variations were based on various coefficients × DKW basal medium nutrients and different PGRs concentrations (Tables 5). Every nutrient and PGR concentration treatment includes at least 8 replicates. 157 experimental sets (70% of data lines) among 224 sets were randomly chosen for training the modeling methods and the rest 67 sets (30% of data lines) were applied for testing the model’s generalization capacity. In all ML models, k-fold (k = 10) cross validation method [87, 88] was used for training to maintain and grantee the generalizability of constructed models.

Table 4 The components and levels of factors used for walnut micropropagation and measured traits mean values applied to characterize it
Table 5 The components of factors and experimental runs ranges based on DKW medium

Modeling techniques

Multiple linear regression

MLR analysis is a multivariate statistical method to assess the relationship between multiple independent variables and an individual dependent variable. Two important purposes of MLR are prediction and explanation. The MLR prediction comprises the level to which the independent variables can predict the dependent variables. The mentioned description of MLR estimates the coefficients of regression, their sign, magnitude and statistical interface, for each independent variable [89]. Linear regression is considered as the first statistical method in regression and assumed to be an index technique to be used by new methods. As other regression methods, the relationships between a dependent variable and multiple independent variables are modeled by MLR and a linear equation is fitted to the experimental data. MLR technique makes relationship between independent variable k value and the dependent variable M value. The regression equation of n input variables × 1, × 2, …, kn is according to the following:

$$M=\alpha 0+ \alpha 1k1+\dots + \alpha n$$

in which the dependent variable is M, k (× 1, …, kn) denotes a vector of input variables, α0 indicates intercept (a constant), and α is the coefficient of regression vector, each of which is for each expository variable. Y experimental values have various meanings and are supposed with the identical standard deviation ε. The SPSS 19 software package was used for the MLR modeling.

Multilayer perceptron neural network

The neural network is divided into various types based on the transfer functions basis. In the present study, we used multi-layer perceptron (MLPNN) network. The MLPNN model is the most common and widely used type of artificial neural network [90]. This model generally contains an input layer and an output layer. One or more hidden layers can be placed between these two layers. Each neuron in this structure has a number of inputs and a number of outputs. A neuron calculates its output responses based on the weighted sum of all its inputs, performed by a stimulus or transmission function. In the MLPNN model, starting from the input information in the first layer (independent variables), the information flows in only one direction and enters the output layer (dependent variable) by transferring from the hidden layer. The training process of MLPNN model involves adjusting and modifying the weights of the interface between neurons using different network training methods [91]. In this study, Broyden-Fletcher-Goldfarb-Shanno (BFGS) training algorithm has been used. Also, stimulus functions; The tangent hyperbolic (Tanh), sigmoid function (Logs), exponential function (Exp), relu function (Relu) in the hidden layer and linear function (Idn) in the output layer were compared and evaluated and the best function was selected. The number of hidden layers was also determined by trial and error by reaching the minimum error rate. See [91, 92] for more information.

k-nearest neighbors’ algorithm

The k-nearest neighbors (KNN) model is a non-supervised learning machine algorithm for data classification. In this model, each data represents a coordinate position in a vector-space model that the information of each particular section must have similar properties as well as be close to each other. In the KNN algorithm, determining the number of neighbors (k) as well as the method based on which the distance between them is calculated is of particular importance. If k is considered too small, then neighboring points that do not appear in the classification will reduce the accuracy of the results. On the other hand, if k is considered too large, the results of the same classifications may be merged as the computational volume increases [93]. The nearest neighbor was evaluated and selected from different values ​​to find the best value of k and to achieve the highest model accuracy. Distances between neighboring points were determined using various geometric methods. In this study, the methods of Euclidean Distance, Chebyshev Distance, Manhattan Distance and Minkowski Distance were studied and the best method was selected.

Gene expression programming

GP is a modeling approach used to model the structural engineering complications behavior. It is an extension of genetic algorithm that utilizes a program space for searching, rather than using a data space. An important benefit of applying GP-based techniques toward other methods is their capability to produce equations of prediction without using any hypothesis for previous relationship form. Many researchers have applied GP and GP-based methods to find any complicated relationships fitting different experimental data [44, 94, 95]. GEP has been introduced as an effective substitute method to the conventional GP [31, 46]. GEP have established many computer programs, by getting encoded in linear chromosomes with constant length, each of which included several encoding genes [31, 96]. GEP is originated of evolutionary algorithms such as GA and GP. In this technique, an individual population is applied and afterwards, fitness function and genetic variations are used to select better individuals. The genetic variations are presented by genetic operators. GEP is a learning machine which is assumed to learn the variables relationship in datasets. The individual programming technique is different in GEP and its predecessors GP and GA since GEP programs individuals as linear strings (chromosomes) with fixed length which are finally displayed by expression trees as unsophisticated diagram. While, GP and GA express individuals in the form of linear strings (chromosomes) with fixed length and nonlinear entities of diverse forms (parse trees) and dimensions, respectively. One of the strongpoints of GEP towards GP and GA is that genetic operators run very easily at the level of chromosome in GEP producing genetic diversity creation. Another strength of GEP is its exclusive, multi-genic nature letting more complicated programs with numerous subprograms to be developed. Both GP and GA advantages are collected in GEP, whereas some of their constraints are met [57, 97, 98].

The real GEP chromosome phenotype is the illustration in Fig. 10 and the genotype would be simply described of the phenotype as represented in Eq. (1)

Fig. 10
figure 10

Diagram of gene expression programming as a prediction model


Functional steps of the GEP are represented in Fig. 10 [31]. According to this diagram, the GEP start point is a population of chromosomes. After that, the chromosomes genes are expressed, and each individual fitness is analyzed. Then, the individuals are defined according to their fitness to reproduce with alteration. The same development process is run on the new individuals’ generation. Overall, this technique is replicated for a particular number of generations or it is performed until reaching a termination condition. Roulette wheel sampling with elitism is employed by GEP system to ensure that the top individuals, according to the fitness, are remained and copied to the next generation. Once genetic operator(s) are performed on chosen chromosomes, comprising mutation, cross over and rotation, diversity is developed into the population.

The GeneXpro software package was applied to perform the GEP models. The parameters employed in the GEP models are represented in Table 6.

Table 6 Parameters of training GEP model

In this study, the selected functions and mathematical operators are rational and not definite so that the plant modeling designer is free to select such functions according to the studied problem anatomy. The functions and operators were selected with a viewpoint of invocating simpleness of the advanced model assuring quicker convergence. The size of the population (chromosomes number) adjusts the programs number into the population. The larger the population, the longer it takes for an iteration run. High chromosomes number were tried to realize minimum error models. The program running continued to reach no significant rectification in the models’ performance. Here, we aimed to achieve obvious relationship between decision variables and response variables. GEP clear formulations were obtained for PR, CW, STN and Vit as a function of experimental parameters including Y1, Y2, Y3 and Y4 = f (X1, X2, X3, X4, X5, X6, X7, X8 and X9) (Table 1).

Input data were normalized in the range of 0 and 1 according to the Eq. 2:


where Xn is normalized dimensionless data, Xi is observed data, Xmin is the minimum amount of observed data, and Xmax is its maximum value.

Comparison of the performance of developed models

To evaluate the precision of created models, we used different statistical indices including coefficient of determination (R2), root mean square error (RMSE) and mean absolute error (MAE) based on Eqs. 5, 6 and 7:

$${R}^{2}=\frac{{\left({\sum }_{i=1}^{N}\left({y}_{i}-\overline{\widehat{y} }\right)\left(\widehat{{y}_{i}}-\overline{\widehat{y} }\right)\right)}^{2}}{{\sum }_{i=1}^{N}{\left({y}_{i}-\overline{y }\right)}^{2}{\sum }_{i=1}^{N}{\left(\widehat{{y}_{i}}-\overline{\widehat{y} }\right)}^{2}}$$
$$RMSE=\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}{\left({y}_{i}-\widehat{y}\right)}^{2}}$$
$$MAE=\frac{1}{N}{\sum }_{i=1}^{N}\left|{y}_{i}-\widehat{y}\right|$$

where \(y\) and \(\overline{y }\) are observed values and their mean and \(\widehat{y}\) and \(\overline{\widehat{y} }\) are predicted values and their mean, respectively, as well for N samples. Analyses of parameters were performed together to achieve an accurate medium composition.

In addition, the predicted values by the developed models were plotted against the corresponding observed values to evaluate the ability of models for prediction.

Particle swarm optimization of GEP models

PSO is a method of evolutionary calculation and swarm intelligence algorithm according to population to solve the pervasive problem of optimization that was developed by [99]. It is a method of mathematical computation that starts with the swarm (a population of grain) and mostly based on social models, such as the swarm theory, fish schooling and bird flocking [20]. PSO key factors are with behavior of swarm i.e. keeping optimum distances between different members and their neighbors. To optimize each particle location, their position is modified as arranged for the objective function within the search area. Thus, PSO key factor is a particle velocity which is compared to the previous one in each repetition to lead the particle to its optimal position. The best solution (fitness) every particle in a swarm achieves so far in each repetition, named pbest. Extra “best” value that a particle is attained in the population up to now followed by the particle swarm optimizer which is global best, named gbest. Each particle velocity in a swarm is estimated by Eqs. 3 and 4 [99].


in which, Vi+1 is each particle new velocity based on prior velocity (Vi), w is inertial coefficient (0.8–1.2), c1 and c2 are cognitive and social coefficients, respectively (0–2), r1 and r2 are random values for each velocity update (0–1) and Xi+1 is new location for each particle according to the prior location (Xi).

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.



Plant growth regulators


Shoot tip necrosis




Machine learning


Multi-layer perceptron neural network


k-Nearest neighbors


Gene expression programming


Multiple linear regression


Particle swarm optimization


Driver and Kuniyuki walnut


Artificial neural network


Genetic algorithm


Radial basis function neural network


Genetic programming


Multi-objective evolutionary optimization algorithm


Multi-objective optimization problems


Proliferation rate


Callus weight


Murashige and Skoog


  1. McGranahan G, Leslie C. Walnut. In: Fruit breeding. Springer: Boston; 2012. p. 827–846.

  2. Skender A, Kurtovic M, Drkenda P, Becirspahic D, Ebrahimi A. Phenotypic variability of autochthonous walnut (Juglansregia L) genotypes in northwestern Bosnia and Herzegovina. Turk J Agric For. 2020;44(5):517–25.

    Article  Google Scholar 

  3. Driver JA, Kuniyuki AH. In vitro propagation of Paradox walnut rootstock. HortScience. 1984;19(4):507–9.

    Google Scholar 

  4. McGranahan GH, Driver JA, Tulecke W. Tissue culture of Juglans. In: Cell and tissue culture in forestry; 1987. Springer, Dordrecht. p. 261–271.

  5. Cornu D, Jay-Allemand C. Micropropagation of hybrid walnut trees (Juglans nigra x Juglans regia) through culture and multiplication of embryos. In: Annales des sciences forestières 1989 (Vol. 46, No. Supplement, pp. 113s-116s). EDP Sciences.

  6. Pijut PM. Micropropagation of Juglans cinerea L. (butternut). In: High-Tech and Micropropagation V 1997; Springer, Berlin. p. 345–357.

  7. Bourrain L, Navatel JC. Plant production of walnut Juglans regia L. by in vitro multiplication. In: IV International Walnut Symposium 544 1999. p. 465–471.

  8. Vahdati K, Jariteh M, Niknam V, Mirmasoumi M, Ebrahimzadeh H. Somatic embryogenesis and embryo maturation in Persian walnut. InV International Walnut Symposium. 2004;705:199–205.

    Google Scholar 

  9. Vahdati K, Najafian Ashrafi E, Ebrahimzadeh H, Mirmasoumi M. Improved micropropagation of walnut (Juglans regia L.) on media optimized for growth based upon mineral content of walnut seed. In: I International Symposium on Biotechnology of Fruit Species: BIOTECHFRUIT2008 839 200. p. 117–124.

  10. Gotea R, Gotea I, Sestras RE, Vahdati K. In vitro Propagation of Several Walnut Cultivars. In: Bulletin of the University of Agricultural Sciences & Veterinary Medicine Cluj-Napoca. Horticulture. 2012. p. 69.

  11. Licea-Moreno RJ, Contreras A, Morales AV, Urban I, Daquinta M, Gomez L. Improved walnut mass micropropagation through the combined use of phloroglucinol and FeEDDHA. Plant Cell Tissue Organ Culture. 2015;123(1):143–54.

    CAS  Article  Google Scholar 

  12. Gruselle R, Boxus P. Walnut micropropagation. Int Symp Walnut Production. 1989;284:45–52.

    Google Scholar 

  13. Lone IA, Misger FA, Banday FA. Effect of different growth regulator combinations on the per cent media browning in walnut in vitro studies using MS medium. Asian J Soil Sci. 2017;12(1):135–42.

    Article  Google Scholar 

  14. Sadat-Hosseini M, Vahdati K, Leslie CA. Germination of Persian walnut somatic embryos and evaluation of their genetic stability by ISSR fingerprinting and flow cytometry. HortScience. 2019;54(9):1576–80.

    CAS  Article  Google Scholar 

  15. Yegizbayeva TK, García-García S, Yausheva TV, Kairova M, Apushev AK, Oleichenko SN, Licea-Moreno RJ. Unraveling Factors Affecting Micropropagation of Four Persian Walnut Varieties. Agronomy. 2021;11(7):1417.

    CAS  Article  Google Scholar 

  16. Gago J, Martínez-Núñez L, Landín M, Gallego PP. Artificial neural networks as an alternative to the traditional statistical methodology in plant research. J Plant Physiol. 2010;167(1):23–7.

  17. Gallego PP, Gago J, Landín M. Artificial neural networks technology to model and predict plant biology process. In: Suzuki K. Artificial Neural Networks-Methodological and Biomedical Applications (Croatia: Intech Open Access Publisher). 2011. p. 197–216.

  18. Barone JO. Use of multiple regression analysis and artificial neural networks to model the effect of nitrogen in the organogenesis of Pinus taeda L. Plant Cell Tissue Organ Culture. 2019;137(3):455–64.

    Article  Google Scholar 

  19. Hesami M, Naderi R, Tohidfar M. Modeling and optimizing medium composition for shoot regeneration of Chrysanthemum via radial basis function-non-dominated sorting genetic algorithm-II (RBF-NSGAII). Sci Rep. 2019;9(1):1–1.

    Article  CAS  Google Scholar 

  20. Jamshidi S, Yadollahi A, Arab MM, Soltani M, Eftekhari M, Sabzalipoor H, Sheikhi A, Shiri J. Combining gene expression programming and genetic algorithm as a powerful hybrid modeling approach for pear rootstocks tissue culture media formulation. Plant Methods. 2019;15(1):1–8.

    Article  Google Scholar 

  21. García-Pérez P, Lozano-Milo E, Landin M, Gallego PP. Machine Learning unmasked nutritional imbalances on the medicinal plant Bryophyllum sp. cultured in vitro. Front Plant Sci. 2020;11:1887.

  22. Salehi M, Farhadi S, Moieni A, Safaie N, Ahmadi H. Mathematical modeling of growth and paclitaxel biosynthesis in Corylus avellana cell culture responding to fungal elicitors using multilayer perceptron-genetic algorithm. Front Plant Sci. 2020;11:114.

    Article  Google Scholar 

  23. Hesami M, Naderi R, Tohidfar M. Introducing a hybrid artificial intelligence method for high-throughput modeling and optimizing plant tissue culture processes: The establishment of a new embryogenesis medium for chrysanthemum, as a case study. Appl Microbiol Biotechnol. 2020;104(23):10249–63.

    CAS  PubMed  Article  Google Scholar 

  24. Jamshidi S, Yadollahi A, Ahmadi H, Arab MM, Eftekhari M. Predicting in vitro culture medium macro-nutrients composition for pear rootstocks using regression analysis and neural network models. Front Plant Sci. 2016;29(7):274.

    Google Scholar 

  25. Arab MM, Yadollahi A, Shojaeiyan A, Ahmadi H. Artificial neural network genetic algorithm as powerful tool to predict and optimize in vitro proliferation mineral medium for G× N15 rootstock. Front Plant Sci. 2016;19(7):1526.

    Google Scholar 

  26. Arab MM, Yadollahi A, Ahmadi H, Eftekhari M, Maleki M. Mathematical modeling and optimizing of in vitro hormonal combination for G× N15 vegetative rootstock proliferation using Artificial Neural Network-Genetic Algorithm (ANN-GA). Front Plant Sci. 2017;1(8):1853.

    Article  Google Scholar 

  27. Arab MM, Yadollahi A, Eftekhari M, Ahmadi H, Akbari M, Khorami SS. Modeling and optimizing a new culture medium for in vitro rooting of G× N15 Prunus rootstock using artificial neural network-genetic algorithm. Sci Rep. 2018;8(1):1–8.

    CAS  Article  Google Scholar 

  28. Jamshidi S, Yadollahi A, Arab MM, Soltani M, Eftekhari M, Shiri J. High throughput mathematical modeling and multi-objective evolutionary algorithms for plant tissue culture media formulation: Case study of pear rootstocks. PLoS ONE. 2020;15(12):e0243940.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. Kosiński W, Prokopowicz P, Ślȩzak D. Fuzzy reals with algebraic operations: Algorithmic approach. InIntelligent Information Systems 2002. Heidelberg: Physica; 2002. p. 311–20.

    Google Scholar 

  30. Albiol J, Campmajó C, Casas C, Poch M. Biomass estimation in plant cell cultures: a neural network approach. Biotechnol Prog. 1995;11(1):88–92.

    CAS  Article  Google Scholar 

  31. Ferreira C. Gene expression programming: a new adaptive algorithm for solving problems. arXiv preprint cs/0102027. 2001.

  32. Chu Y, Urquhart B, Gohari SM, Pedro HT, Kleissl J, Coimbra CF. Short-term reforecasting of power output from a 48 MWe solar PV plant. Sol Energy. 2015;1(112):68–77.

    Article  Google Scholar 

  33. Tongal H. Nonlinear forecasting of stream flows using a chaotic approach and artificial neural networks. Earth Sci Res J. 2013;17(2):119–26.

    Google Scholar 

  34. Antonio LM, Coello CA. Use of cooperative coevolution for solving large scale multiobjective optimization problems. In: 2013 IEEE Congress on Evolutionary Computation 2013. p. 2758–2765.

  35. Durillo JJ, Nebro AJ, Coello CA, García-Nieto J, Luna F, Alba E. A study of multiobjective metaheuristics when solving parameter scalable problems. IEEE Trans Evol Comput. 2010;14(4):618–35.

    Article  Google Scholar 

  36. Murashige T, Skoog F. A revised medium for rapid growth and bio assays with tobacco tissue cultures. Physiol Plant. 1962;15(3):473–97.

    CAS  Article  Google Scholar 

  37. Bosela MJ, Michler CH. Media effects on black walnut (Juglans nigra L) shoot culture growth in vitro: evaluation of multiple nutrient formulations and cytokinin types. Vitro Cell Develop Biol-Plant. 2008;44(4):316–29.

    CAS  Article  Google Scholar 

  38. Kepenek KA, Kolağasi Z. Micropropagation of Walnut (Juglans regia L). Acta Phys Pol, A. 2016;130:1.

    Google Scholar 

  39. Stevens ME, Pijut PM. Rapid in vitro shoot multiplication of the recalcitrant species Juglans nigra L. In Vitro Cell Develop Biol-Plant. 2018;54(3):309–17.

    CAS  Article  Google Scholar 

  40. Hassankhah A, Vahdati K, Lotfi M, Mirmasoumi M, Preece J, Assareh MH. Effects of ventilation and sucrose concentrations on the growth and plantlet anatomy of micropropagated Persian walnut plants. Int J Horticult Sci Technol. 2014;1(2):111–20.

    CAS  Google Scholar 

  41. Ashrafi EN, Vahdati K, Ebrahimzadeh H, Mirmasoumi M. Analysis of in-vitro explants mineral contents to modify medium mineral composition for enhancing growth of Persian walnut (Juglans regia L). J Food Agric Environ. 2010;8:325–9.

    CAS  Google Scholar 

  42. Eftekhari M, Yadollahi A, Ahmadi H, Shojaeiyan A, Ayyari M. Development of an artificial neural network as a tool for predicting the targeted phenolic profile of grapevine (Vitis vinifera) foliar wastes. Front Plant Sci. 2018;19(9):837.

    Article  Google Scholar 

  43. Sheikhi A, Mirdehghan SH, Arab MM, Eftekhari M, Ahmadi H, Jamshidi S, Gheysarbigi S. Novel organic-based postharvest sanitizer formulation using Box Behnken design and mathematical modeling approach: A case study of fresh pistachio storage under modified atmosphere packaging. Postharvest Biol Technol. 2020;160:111047.

    CAS  Article  Google Scholar 

  44. Gandomi AH, Alavi AH. A new multi-gene genetic programming approach to nonlinear system modeling Part I: materials and structural engineering problems. Neural Computing Appl. 2012;21(1):171–87.

    Article  Google Scholar 

  45. Gandomi AH, Alavi AH, Arjmandi P, Aghaeifar A, Seyednour R. Genetic programming and orthogonal least squares: a hybrid approach to modeling the compressive strength of CFRP-confined concrete cylinders. J Mech Mater Struct. 2010;5(5):735–53.

    Article  Google Scholar 

  46. Gandomi AH, Roke DA. Assessment of artificial neural network and genetic programming as predictive tools. Adv Eng Softw. 2015;1(88):63–72.

    Article  Google Scholar 

  47. De Giorgi MG, Congedo PM, Malvoni M, Laforgia D. Error analysis of hybrid photovoltaic power forecasting models: a case study of mediterranean climate. Energy Convers Manage. 2015;1(100):117–30.

    Article  Google Scholar 

  48. Kannangara M, Dua R, Ahmadi L, Bensebaa F. Modeling and prediction of regional municipal solid waste generation and diversion in Canada using machine learning approaches. Waste Manage. 2018;1(74):3–15.

    Article  Google Scholar 

  49. Bagheri M, Bazvand A, Ehteshami M. Application of artificial intelligence for the management of landfill leachate penetration into groundwater, and assessment of its environmental impacts. J Clean Prod. 2017;15(149):784–96.

    Article  CAS  Google Scholar 

  50. Quiros AR, Bedruz RA, Uy AC, Abad A, Bandala A, Dadios EP, Fernando A. A kNN-based approach for the machine vision of character recognition of license plate numbers. InTENCON 2017–2017 IEEE Region 10 Conference 2017. p. 1081–1086.

  51. Mahmoudzadeh H, Matinfar HR, Taghizadeh-Mehrjardi R, Kerry R. Spatial prediction of soil organic carbon using machine learning techniques in western Iran. Geoderma Reg. 2020;21:e00260.

    Article  Google Scholar 

  52. Dikmen E. Gene expression programming strategy for estimation performance of LiBr–H 2 O absorption cooling system. Neural Comput Appl. 2015;26(2):409–15.

    Article  Google Scholar 

  53. Nazari A. RETRACTED ARTICLE: Application of gene expression programming to predict the compressive damage of lightweight aluminosilicate geopolymer. Neural Comput Appl. 2019;31(2):767–76.

    Article  Google Scholar 

  54. Esha R, Imteaz MA. Pioneer use of gene expression programming for predicting seasonal streamflow in Australia using large scale climate drivers. Ecohydrology. 2020;13(8):e2242.

    Article  Google Scholar 

  55. Zhong J, Feng L, Ong YS. Gene expression programming: A survey. IEEE Comput Intell Mag. 2017;12(3):54–72.

    Article  Google Scholar 

  56. Xu L, Huang Y, Shen X, Liu Y. Parallelizing gene expression programming algorithm in enabling large-scale classification. Sci Program. 2017;20:2017.

    Google Scholar 

  57. Ferreira C. Gene expression programming: mathematical modeling by an artificial intelligence. New York: Springer; 2006.

    Book  Google Scholar 

  58. Nas MN, Read PE. A hypothesis for the development of a defined tissue culture medium of higher plants and micropropagation of hazelnuts. Sci Hortic. 2004;101(1–2):189–200.

    CAS  Article  Google Scholar 

  59. Gago J, Landín M, Gallego PP. A neurofuzzy logic approach for modeling plant processes: A practical case of in vitro direct rooting and acclimatization of Vitis vinifera L. Plant Sci. 2010;179(3):241–9.

  60. Gago J, Landín M, Gallego PP. Artificial neural networks modeling the in vitro rhizogenesis and acclimatization of Vitis vinifera L. J Plant Physiol. 2010;167(15):1226–31.

  61. Gago J, Pérez-Tornero O, Landín M, Burgos L, Gallego PP. Improving knowledge of plant tissue culture and media formulation by neurofuzzy logic: a practical case of data mining using apricot databases. J Plant Physiol. 2011;168(15):1858–65.

  62. Niedz RP, Hyndman SE, Evens TJ, Weathersbee AA. Mineral nutrition and in vitro growth of Gerbera hybrida (Asteraceae). In Vitro Cell Develop Biol-Plant. 2014;50(4):458–70.

    CAS  Article  Google Scholar 

  63. Nezami-Alanagh E, Garoosi GA, Landín M, Gallego PP. Combining DOE with neurofuzzy logic for healthy mineral nutrition of pistachio rootstocks in vitro culture. Front Plant Sci. 2018;9:1474.

  64. Nezami-Alanagh E, Garoosi GA, Haddad R, Maleki S, Landín M, Gallego PP. Design of tissue culture media for efficient Prunus rootstock micropropagation using artificial intelligence models. Plant Cell Tissue Organ Culture (PCTOC). 2014;117(3):349–59.

  65. Reed M, Sugae W, Randall P. Niedz & Barbara. In Vitro Cell Dev Biol-Plant. 2015;51:19–27.

    Google Scholar 

  66. Nezami-Alanagh E, Garoosi GA, Landín M, Gallego PP. Computer-based tools provide new insight into the key factors that cause physiological disorders of pistachio rootstocks cultured in vitro. Sci Reports. 2019;9(1):1–5.

  67. Plant IV. Barbara M. Reed, Sugae Wada, Jeanine DeNoma & Randall P. Niedz.

  68. Nezami-Alanagh E, Garoosi GA, Maleki S, Landín M, Gallego PP. Predicting optimal in vitro culture medium for Pistacia vera micropropagation using neural networks models. Plant Cell Tissue Organ Culture 2017;129(1):19–33.

  69. Niedz RP, Evens TJ. A solution to the problem of ion confounding in experimental biology. Nat Methods. 2006;3(6):417.

    CAS  PubMed  Article  Google Scholar 

  70. Vanstraelen M, Benková E. Hormonal interactions in the regulation of plant development. Annu Rev Cell Dev Biol. 2012;10(28):463–87.

    Article  CAS  Google Scholar 

  71. Schaller GE, Bishopp A, Kieber JJ. The yin-yang of hormones: cytokinin and auxin interactions in plant development. Plant Cell. 2015;27(1):44–63.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  72. George EF, Hall MA, De Klerk GJ. Plant growth regulators I: Introduction; auxins, their analogues and inhibitors. In: Plant propagation by tissue culture 2008 (pp. 175–204). Springer, Dordrecht.

  73. Pasternak T, Miskolczi P, Ayaydin F, Mészáros T, Dudits D, Fehér A. Exogenous auxin and cytokinin dependent activation of CDKs and cell division in leaf protoplast-derived cells of alfalfa. Plant Growth Regul. 2000;32(2):129–41.

    CAS  Article  Google Scholar 

  74. Hepaksoy S. Investigations on micropropagation of some cherry rootstocks I shoot development and proliferation. Ziraat Fakultesi Dergisi. 2004;41(3):11.

    CAS  Google Scholar 

  75. Ružić D, Vujović T, Nikolić D, Cerović R. In vitro growth responses of the ‘Pyrodwarf’pear rootstock to cytokinin types. Romanian Biotechnological Letters. 2011;16(5):6631.

    Google Scholar 

  76. Molassiotis AN, Dimassi K, Therios I, Diamantidis G. Fe-EDDHA promotes rooting of rootstock GF-677 (Prunus amygdalus× P persica) explants in vitro. Biol Plant. 2003;47(1):141–4.

    CAS  Article  Google Scholar 

  77. Chalupa V. Clonal propagation of broad-leaved forest trees in vitro. Communicationes Instituti Forestalis Cechosloveniae. 1982;12:9.

    Google Scholar 

  78. Revilla MA, Majada J, Rodriguez R. Walnut (Juglans regia L.) micropropagation. InAnnales des Sciences Forestières 1989 (Vol. 46, No. Supplement, pp. 149s-151s). EDP Sciences.

  79. Jay-Allemand C, Capelli P, Cornu D. Root development of in vitro hybrid walnut microcuttings in a vermiculite-containing gelrite medium. Sci Hortic. 1992;51(3–4):335–42.

    Article  Google Scholar 

  80. Heloir MC, Kevers C, Hausman JF, Gaspar T. Changes in the concentrations of auxins and polyamines during rooting of in-vitro-propagated walnut shoots. Tree Physiol. 1996;16(5):515–9.

    CAS  PubMed  Article  Google Scholar 

  81. Saadat YA, Hennerty MJ. The effects of different in vitro and ex vitro treatments on the rooting performance of Persian walnut (Juglans regia L) microshoots. In: IV International Walnut Symposium 544 1999. p. 473–480.

  82. Yang Z, Lüdders P. In vitro propagation of Pistachio (Pistacia vera L). Gartenbauwissenschaft. 1994;59(1):30–4.

    Google Scholar 

  83. Onay A. In Vitro organogenesis and embryogenesis of pistachio, Pistacia vera L (Doctoral dissertation, University of Edinburgh).

  84. Poothong S, Reed BM. Optimizing shoot culture media for Rubus germplasm: the effects of NH 4+, NO 3−, and total nitrogen. In Vitro Cell Devel Biol-Plant. 2016;52(3):265–75.

    CAS  Article  Google Scholar 

  85. Kovalchuk IY, Mukhitdinova Z, Turdiyev T, Madiyeva G, Akin M, Eyduran E, Reed BM. Modeling some mineral nutrient requirements for micropropagated wild apricot shoot cultures. Plant Cell Tissue Organ Culture. 2017;129(2):325–35.

    CAS  Article  Google Scholar 

  86. Akin M, Eyduran E, Niedz RP, Reed BM. Developing hazelnut tissue culture medium free of ion confounding. Plant Cell Tissue Organ Culture. 2017;130(3):483–94.

    CAS  Article  Google Scholar 

  87. Fushiki T. Estimation of prediction error by using K-fold cross-validation. Stat Comput. 2011;21(2):137–46.

    Article  Google Scholar 

  88. Shiri J, Marti P, Karimi S, Landeras G. Data splitting strategies for improving data driven models for reference evapotranspiration estimation among similar stations. Comput Electron Agric. 2019;1(162):70–81.

    Article  Google Scholar 

  89. Hair Jr JF, Black WC, Babin BJ, Anderson RE. Multivariate Data Analysis A Global Perspective. New Jersey: Person Education.

  90. Moghaddamnia A, Gousheh MG, Piri J, Amin S, Han D. Evaporation estimation using artificial neural networks and adaptive neuro-fuzzy inference system techniques. Adv Water Resour. 2009;32(1):88–97.

    CAS  Article  Google Scholar 

  91. Hassanpour Kashani M, Montaseri M, Lotfollahi Yaghin MA. Flood estimation at ungauged sites using a new hybrid model. J Appl Sci. 2008;8(9):1744–9.

    Article  Google Scholar 

  92. ASCE Task Committee on Application of Artificial Neural Networks in Hydrology. Artificial neural networks in hydrology. II: Hydrologic applications. J Hydrol Eng. 2000;5(2):124–37.

  93. Wei CC. Predictions of surface solar radiation on tilted solar panels using machine learning models: A case study of Tainan city. Taiwan Energies. 2017t;10(10):1660.

    Article  Google Scholar 

  94. Dey P, Das AK. A utilization of GEP (gene expression programming) metamodel and PSO (particle swarm optimization) tool to predict and optimize the forced convection around a cylinder. Energy. 2016;15(95):447–58.

    Article  Google Scholar 

  95. Gholampour A, Gandomi AH, Ozbakkaloglu T. New formulations for mechanical properties of recycled aggregate concrete using gene expression programming. Constr Build Mater. 2017;15(130):122–45.

    Article  Google Scholar 

  96. Mitchell M. An introduction to genetic algorithms. New York: MIT press; 1998.

    Book  Google Scholar 

  97. Ferreira C. Gene expression programming in problem solving. In: Soft computing and industry; 2002. Springer, London. pp. 635–653

  98. Ferreira SC, Bruns RE, Ferreira HS, Matos GD, David JM, Brandão GC, da Silva EP, Portugal LA, Dos Reis PS, Souza AS, Dos Santos WN. Box-Behnken design: an alternative for the optimization of analytical methods. Anal Chim Acta. 2007;597(2):179–86.

    CAS  PubMed  Article  Google Scholar 

  99. Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedings of ICNN'95-international conference on neural networks 1995; Vol. 4, p. 1942–1948.

Download references


We also would like to thank University of Jiroft, University of Tehran, Tarbiat Modares University (TMU), Iran National Science Foundation (INSF), Iran’s National Elite Foundation (INEF) and Center of Excellence for Walnut Improvement and Technology of Iran, for their supports.


No funding was received.

Author information

Authors and Affiliations



MSH: performing the experiments, gathering data, interpreting data and revising the manuscript. MMA: designing the experiments, interpreting data, summing up and revising the manuscript. MS: statistical analyzing. ME: interpreting experiments, writing and revising the manuscript. AS: performing experiments. KV: revising manuscript. All authors listed, had direct, considerable and intellectual contribution to the research. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mohammad Sadat-Hosseini.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sadat-Hosseini, M., Arab, M.M., Soltani, M. et al. Predictive modeling of Persian walnut (Juglans regia L.) in vitro proliferation media using machine learning approaches: a comparative study of ANN, KNN and GEP models. Plant Methods 18, 48 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Walnut in vitro propagation
  • Artificial neural network
  • Gene expression programming
  • k-nearest neighbors
  • Prediction model