Bayesian optimization for seed germination

Background Efficient seed germination is a crucial task at the beginning of crop cultivation. Although boundaries of environmental parameters that should be maintained are well studied, fine-tuning can significantly improve the efficiency, which is infeasible to be done manually due to the high dimensionality of the parameter space. Results Traditionally seed germination is performed in climatic chambers with controlled environmental conditions. In this study, we perform a set of multiple-day seed germination experiments in the controllable environment. We use up to three climatic chambers to adjust humidity, temperature, water supply and apply machine learning algorithm called Bayesian optimization (BO) to find the parameters that improve seed germination. Experimental results show that our approach allows to increase the germination efficiency for different types of seeds compared to the initial expert knowledge-based guess. Conclusion Our experiments demonstrated that BO could help to identify the values of the controllable parameters that increase seed germination efficiency. The proposed methodology is model-free, and we argue that it may be useful for a variety of optimization problems in precision agriculture. Further experimental studies are required to investigate the effectiveness of our approach for different seed cultures and controlled parameters. Electronic supplementary material The online version of this article (10.1186/s13007-019-0422-z) contains supplementary material, which is available to authorized users.


Introduction
Seed germination has been an interesting subject of study for many years. On the one hand, it is the topic for basic research since many biochemical processes occur during dormancy and different stages of seed germination. On the other hand, the problem is also of great practical importance: finding the optimal parameters such as substrate material, amount of water supply, air temperature, the proportion of plant growth promoters, etc. is a challenging task. Seed germination comprises many processes, and relationships of factors affecting termination of seed dormancy are very diverse. For example, the aforementioned water and temperature combined with light and nitrate level influence seed germination, however, their effect does depend on the level of dormancy of the seeds [1].
The problem becomes even more challenging when multiple parameters must be considered together, and specific sets of parameters are supposed to be optimized for each time step. Dynamic models of seed germination have been developed [1][2][3] to address this issue. These models may be helpful in understanding the underlying processes of seed germination. However, to achieve satisfactory optimization results using model-based techniques, comprehensive prior knowledge of the problem structure is required [4]. Moreover, particular dynamic models may not be appropriate for the specific conditions that these models were not developed for, e.g., different plant species, substrates or growth stimulators.
A more adaptive approach, based on machine learning (ML) methods, seems to be promising to tackle this issue. Among those methods the Bayesian optimization (BO) [5,6] algorithm based on the Gaussian process regression (GPR) is one of the most attractive. It is a black-box optimization algorithm that does not require knowledge of the system intrinsics. It is widely used in the ML community for hyperparameter optimization and was even successfully applied in culinary arts [7]. Similarly, an approach based on Genetic Algorithms and GPR has been previously proposed for precision agriculture [8].

Open Access
Plant Methods *Correspondence: artem.nikitin@skolkovotech.ru 1 CDISE, Skolkovo Institute of Science and Technology, Nobelya 3, Moscow, Russia 121205 Full list of author information is available at the end of the article In this paper, we apply BO to simplified seed germination process in the controllable environment in order to identify the values of the controlled parameters that yield the best germination efficiency. First, we select the number of tunable parameters that we can control during the germination period (several days) with the help of climatic chambers, e.g., humidity, temperature, amount of water supply provided and choose the reasonable bounds for these parameters based on the expert knowledge. Then, we iteratively apply BO algorithm, to find the values of parameters that maximize the number of germinated seeds. We show that starting with an initial expert knowledge-based guess our approach allows to find such values of parameters that yield solid improvement both when initial germination efficiency is low (first experiment) and high (second experiment).

Materials and methods
In this section, we describe the methodology and the algorithms used to build our framework. Figure 1 shows a schematic overview of the proposed system.

Seed germination
We conducted two experiments, first, using pea seeds (Pisum sativum L.) and, second, using radish seeds (Raphanus sativa L.) in different settings. Seeds were purchased from Federal Scientific Center of Vegetable (Odintsovo, Russia). The weight of 100 seeds showed an average of 0.751 ± 0.01 g for radish, and of 19.95 ± 1.31 g for pea. All seeds were presterilized in 0.5% of KMnO4 solution for 10 min and then rinsed for several times with deionized water. Three climatic chambers (Binder KBWF 240, KBF 240, KMF 240) allowed to control air temperature ( ±0.1 • C ) and humidity ( ± 1% ), which was maintained at 80%. No light sources were used in the chambers during the experiments.
The first experiment was conducted in the form of sequential trials with each trial comprising three concurrent germination processes and lasting for 72 h (3 days in total). One hundred pea seeds were placed on a dish covered with sterile cheesecloth and put in each of the three climate chambers to germinate. Totally, 7 controllable parameters were selected: air temperature and the amount of water supplied at 0, 24, 48, 72 and 0, 24, 48 h steps, respectively. The temperature in the chambers was changed smoothly between the selected values during the trials.
During the second experiment, only two climatic chambers were used (KBF 240, KMF 240) to set 4 controllable parameters, namely temperatures at 0, 12, 24, 36 h. Seeds were placed in containers of size 21 × 15.5 × 0.8 cm with two sections (each accommodating 16 seeds) on the cloth and watered once at the beginning of a trial with a fixed amount of 6 ml. Figure 2 depicts a single container at the beginning (left) and the end (right) of a trial.
These containers, then, were grouped by 3, giving 96 seeds in a group. Three such groups then were placed almost vertically in each of two climatic chambers with the same controllable parameters set, thus, for each trial giving 6 repetitions with a total amount of seeds equal to 96 in each of them. Figure 3 shows how containers with seeds were installed in the chambers during the second experiment.
After the seeds were germinated, the number of germinated and well-germinated seeds were counted in each chamber. In the first experiment, we considered the seeds germinated when only the radicle emerged and could be visibly separated from the seed. If not only radicle but also the hypocotyl emerged and could be visibly separated, the seed was classified as wellgerminated. For the second experiment, we considered seeds germinated if radicle emerged and its length is less than 17.5 mm, and well-germinated if it is larger.  Figure 4 shows an example of not germinated (left), germinated (middle) and well-germinated (right) radish seeds according to our methodology.

Bayesian optimization framework
In this section, we describe the Bayesian optimization framework based on the Gaussian process regression that we used in our work.

Gaussian process regression
Bayesian optimization relies on the Gaussian Process Regression [9], also called kriging in geostatistics, which learns a generative probabilistic model of an arbitrary function of independent variables with the assumption of normality. A Gaussian process is completely determined by its mean µ(·) and covariance (kernel) k(·, ·) functions: Let consider the GP model with an additive normal noise: where n is the number of available measurements and (·) ⊺ denotes the transpose, the predictive distribution at an unobserved point x * is given by where K (X, X) is a matrix of the form K ij = k(x i , x j ), i, j = 1, . . . , n . Particular choice of the kernel function depends on the assumptions about the model and a particular application, however, there exist commonly used kernels, such as Radial basis function (RBF) and Mateŕn that work well in general. Kernel hyperparameters are usually optimized using Maximum Likelihood Estimation (MLE) [10] or its variations. Figure 5 shows an example of GPR using RBF kernel over the sine function with noisy measurements, where predictive variance increases at points with missing measurements. Outside of the interpolation region predictive variance significantly increases with the mean failing to capture the true function trend.

Bayesian optimization
An advantageous property of GPR is that it provides not only the prediction of the value at unobserved points but the complete probabilistic distribution determined by the mean and variance. The general idea behind BO algorithms is to use such distribution to explore parameter space and select values of x * in a way that it will most probably maximize target function f (x) . The common approach is to select a particular acquisition function that takes parameters of the predictive distribution of the fitted model as an input and outputs some value which is maximized instead. There exist multiple strategies, for example, using the probability of improvement, expected improvement or integrated expected improvement over the current best value, entropy search or upper confidence bound (UCB) [6]. We have selected the UCB acquisition function in our work as it is easy to evaluate and was shown to be effective in practice. It is expressed using the predictive mean and variance as follows: Exploration-exploitation trade-off is managed by the parameter κ , where for small κ regions with a high mean (exploitation) and large κ regions with high uncertainty (exploration) are preferred, respectively. We will further omit κ from the arguments of the UCB function where it is assumed fixed. Figure 6 shows the 4th step (with 2 initial data points at the boundaries) of the BO algorithm on an example function with several local maximums using UCB acquisition function with the fixed κ = 2.
It is critical to note that BO performance is profoundly affected by the dimensionality of the input data due to the exponential growth of the parameter space. It may start to perform poorly when the number of controlled parameters becomes larger than ten [11].

Noise estimation
We defined the target function that we aim to optimize as the sum of averages of germinated and well-germinated seeds (see "Seed germination" section). First, let N denote the number of seeds used in the experiment. Second, due, to the stochasticity, we model the success of a single seed germination for the fixed values of parameters x as a Bernoulli trial. Then, the probability that a single seed is germinated equals to p(x) = p , 5 Gaussian process regression (red dashed line depicts the predictive mean and orange fill depicts the standard deviation intervals) with noisy measurements (blue dots) of the sine function (solid green line) using RBF kernel. The predictive variance increases in the areas of missing measurements, and the predictive mean fails to capture the true function trend outside of the interpolation region whereas probability that a single seed is well-germinated, given that it has germinated, equals to q(x) = q . If N g and N wg denote the number of germinated and well-germinated seeds in the experiment, respectively, then, it can be shown that for sufficiently large N (for details, see "Appendix" section) our target function is where µ = p(1 + q) and σ 2 = p(1 + 3q) − p 2 (1 + q) 2 . Due to the normality of the obtained distribution, its variance can be interpreted as an input-dependent Gaussian noise in the Eq. (1). Therefore, we can simplify hyperparameter optimization by setting a lower bound of the noise variance with the following value: Alternatively, for each obtained observation y i a lowerbound of the noise variance can be estimated as (for details, see "Appendix" section) in order to incorporate the dependence on the values of observations.

Concurrent experiments
Aforementioned BO formulation assumes that the optimization process is sequential, i.e., only a single x * is selected at each step. However, it may be necessary to be able to select several vectors of parameters to explore, e.g., if there are multiple CPU cores for computations or several experimental setups available (climate chambers . . , n in our case). This is referred in the literature as batch setting [12,13] or setting with a delayed feedback [14]. In this work we consider the following approach from [12] to tackle this problem: for each trial comprising the selection of multiple vectors of parameters, we find the maximizer of acquisition function and "observe" the target function using the predictive mean of GPR instead of the real outcome (see Algorithm 1).

Exploration-exploitation control
It may happen when performing exploitation that the algorithm could propose parameters that are very close to the already explored data points, e.g., try 22.001 • C temperature after 22.000 • C , which yields a change beyond the controllable precision. In order to cope with this problem and reduce the manual labor of an operator in the selection of κ from Eq. (2) that will give a reasonable exploitation, we propose an additional optimization procedure. First, we formulate the notion of a reasonable exploitation as the following constraint: where n is the number of already observed data points and ǫ xploit is a predefined constant. This constraint means that at least one of the parameters must be at least as far as ǫ xploit from the respective parameter of the closest already observed data point. One can think of a more fair constraint, where a too small change of a parameter is diminished to zero, however, it may pose challenges for the optimization algorithms. Similarly, in order to avoid unreasonable exploration, we consider the following constraint: where x i is taken from a subset of size s ≤ n of already observed points, e.g., one may like to ignore manually initialized data (see "Data preparation" section) and prefer exploration around knowingly good regions. This constraint means that the selected parameters must be at most as ǫ xplore far in total form the closest already observed data point. Algorithm 2 describes the exploration-exploitation control procedure.

Experimental evaluation
In this section, we describe the details of our experimental setup and provide the obtained results.

Selecting parameters
We implemented 1 our solution with Python 3 programming language using the Bayesian optimization library. 2 As the covariance function we selected the composition of constant, isotropic Mateŕn (with ν = 2.5 , assuming sufficient smoothness) and white noise kernels with tunable hyperparameters: where δ ij is a Kronecker-delta, α, ρ ∈ R + . Optimization of the hyperparameters is performed at each step when new data is being available using the MLE with the number of optimizer restarts equal to 30. Bounds for hyperparameter optimization were set as follows: α ∈ [10 −5 , 10 5 ] , ρ ∈ [10 −5 , 10 5 ] and σ 2 ∈ [0.01, 10 5 , ] (see "Seed germination" and "Noise estimation" sections). GP mean was selected to be the mean value of the observed measurements.
Given the small number of tunable parameters (7 in the first experiment and 4 in the second), we considered the basic BO approach. As an acquisition function, we selected UCB since it has been shown to be effective in various scenarios. Exploration-exploitation trade-off was managed through κ parameter based on the expert knowledge, i.e., at each step, κ was selected in such a way that the algorithm does not purely exploit almost the same parameters or explore knowingly unprofitable regions. Additional control was performed by setting ǫ xploit equal to 0.1 • C and 1 ml and ǫ xplore equal to 10 • C and 100 ml for the temperature and the water supply, respectively. For constrained optimization we have used SciPy [15] library implementation of the Sequential least squares programming (SLSQP) algorithm [16]. Each optimization step requires the evaluation of the maximum of acquisition function at several points, which impose computational overhead, however, it can be considered negligible compared to the time-scale of a single trial.

Data preparation
To set up the experiments, we had to consider several issues. First, we had to select the boundaries for the optimized parameters: we selected them at 0, 40 • C (in both experiments) and 0, 250 ml (in the first experiment) for the temperature and the water supply, respectively. Second, as the parameters may have different unit measures, which affects modeling due to isotropy of the selected kernel, we needed to scale them appropriately: we linearly mapped temperature and water supply values to [0, 1] and [0, 0.5] intervals, respectively, assuming "equivalence" of 1 • C and 12.5 ml (during the second experiment, this step was ignored as the only temperature was varied). Finally, we had to add some initial data so that optimization could kick off: we picked all of the possible combinations of 0 and 40 temperatures (in both experiments) with 0 water supply (in the first experiment) on each day and assigned the "observed" target function values equal to 0 (totally 2 4 = 16 initial points). It can be considered reasonable as extreme conditions should produce poor results.

First experiment (poorly germinated pea seeds)
For a single germination process, we used N = 100 pea seeds and conducted only a single repetition for each selected vector of controlled parameters. The first trial was conducted using the single reference vector of parameters selected with the expert knowledge, which gave the number of germinated seeds equal to 73, and the two vectors selected by the BO algorithm. At the 11th observation the algorithm discovered the parameters, which yielded 73 germinated seeds with an additional amount of 18 well-germinated. The 20th selected vector of parameters produced as much as 80 germinated and 33 well-germinated seeds, which in total gave a 55% improvement over the initial guess. Subsequent 13 steps didn't provide any further enhancement. Figure 7 shows the target values obtained during 11 trials of the first experiment. Black dashed line denotes the kriged average and shows the trend of improvement in the germination efficiency, whereas the green top dotted line shows the best-observed values for each trial. Table 1 depicts all of the 33 vectors of parameters and respective observed target function values obtained during 11 trials.
Notably, without any prior knowledge of the underlying system, the algorithm was able to learn the values of the controlled parameters that yield sufficient improvement of the germination efficiency. The values of the parameters that achieved the maximum found target function value of 1.13 at the 20th iteration are listed in italics in Table 1. The identified values can be explained from the physiological point of view. For example, periodically changing temperature may be favorable due to the natural adaptation of seeds to day and night, whereas water supply identified by the algorithm is in a good agreement with the dynamics of water uptake by seeds, previously described in [17]. According to this study, water uptake by plant seeds is triphasic, comprising a rapid initial absorption, followed by a plateau phase and a further increase due to embryonic axes elongation.

Second experiment (well-germinated radish seeds)
Although the first experiment showed a substantial improvement of germination efficiency in the case of poorly germinated seeds, it could not be that easily observed for well-germinated seeds. Therefore, in the second experiment, we used N = 96 radish seeds with 6 repetitions for a single germination trial. The first 4 trials were conducted by setting all of the temperature parameters as either 21, 22, 23 or 24. At the 9th trial (5th automatic step), the algorithm discovered the parameters, which yielded the best average of 10 germinated and 88 well-germinated seeds. Figure 8 shows the target values obtained during 12 trials, where the last trial served as a validation for the best found vector of parameters during the 9th trial. Green dotted line shows the best-observed mean value of the target function, whereas the red dashed line depicts the first expert-knowledge guess-based trial. Table 2 lists all of the 11 vectors of parameters and the corresponding means and standard deviations of the target function values obtained during 12 trials. The complete table containing target function values for every repetition during each trial can be found in Additional file 1.
Although with the initial guess seeds already propagated efficiently, the algorithm was able to achieve substantial improvement after the several steps and identify the parameters, which yielded the maximum mean value of 1.903 of the target function with low dispersion.

Conclusions and future work
We applied Bayesian optimization framework to the seed germination process in a controlled environment. Our experiments demonstrated that the proposed methodology allowed to identify the values of the controllable parameters that increase germination efficiency in different settings for different seeds both in the case when initial expert-knowledge based guess yields low and high germination efficiency. The proposed methodology is model-free, and we argue that it may be useful for a variety of optimization problems in intelligent agriculture. Using this approach, we achieved increase in germination efficiency (according to our metrics) from 36.5 to 56.5% by 19 iterations in the first experiment (pea seeds) with low initial germination efficiency, whereas in the second experiment (radish seeds) with high initial germination We note that selection of the controllable parameters must be made carefully during the preliminary planning. On the one hand, increasing their number allows to perform better fine-tuning, on the other hand, it makes BO algorithms less efficient and requires more trials to be conducted, which may be both overly time-consuming and equipment demanding.
Combination of the proposed technique with the existing methods of computer vision-based seed counting [18,19] and seed quality evaluation [20] may decrease manual labor significantly and improve scalability. The BO methods definitely could help to reveal optimum chemical parameters of growing mediums or find the environmentally friendly doses of plants biostimulants (humic substances, synthetic hormones, etc.), which effects on plants usually have a nonlinear Table 1 Values of the 33 explored vectors of parameters (t 1 , . . . , t 4 , w 1 , . . . , w 3

) T and respective target function values
Parameters t and w stand for the air temperature in • C and the water supply in ml, respectively. The optimal parameters are highlighted in italics