A non-invasive method to predict drought survival in Arabidopsis using quantum yield under light conditions

Background Survival rate (SR) is frequently used to compare drought tolerance among plant genotypes. While a variety of techniques for evaluating the stress status of plants under drought stress conditions have been developed, determining the critical point for the recovery irrigation to evaluate plant SR often relies directly on a qualitative inspection by the researcher or on the employment of complex and invasive techniques that invalidate the subsequent use of the tested individuals. Results Here, we present a simple, instantaneous, and non-invasive method to estimate the survival probability of Arabidopsis thaliana plants after severe drought treatments. The quantum yield (QY), or efficiency of photosystem II, was monitored in darkness (Fv/Fm) and light (Fv’/Fm’) conditions in the last phase of the drought treatment before recovery irrigation. We found a high correlation between a plant’s Fv’/Fm’ value before recovery irrigation and its survival phenotype seven days after, allowing us to establish a threshold between alive and dead plants in a calibration stage. This correlation was maintained in the Arabidopsis accessions Col-0, Ler-0, C24, and Kondara under the same conditions. Fv’/Fm’ was then applied as a survival predictor to compare the drought tolerance of transgenic lines overexpressing the transcription factors ATAF1 and PLATZ1 with the Col-0 control. Conclusions The results obtained in this work demonstrate that the chlorophyll a fluorescence parameter Fv’/Fm’ can be used as a survival predictor that gives a numerical estimate of the Arabidopsis drought SR before recovery irrigation. The procedure employed to get the Fv’/Fm’ measurements is fast, non-destructive, and requires inexpensive and easy-to-handle equipment. Fv’/Fm’ as a survival predictor can be used to offer an overview of the photosynthetic state of the tested plants and determine more accurately the best timing for rewatering to assess the SR, especially when the symptoms of severe dehydration between genotypes are not contrasting enough to identify a difference visually. Supplementary Information The online version contains supplementary material available at 10.1186/s13007-023-01107-w.


Experiments and Data
The experiments analyzed in this section correspond to the ones mentioned as "Calibration Stage" experiments, and were designed to evaluate plant survival as function of the FVFM parameter.The experimental unit were individual plants of each one of the 4 ecotypes (Col-0, Ler-0, C24, and Kondara), grown in pots with 5 plants and with 4 biological replicates per experiment.All plants were subjected to the same drought treatment, and FVFM measurements began when the plants showed severe dehydration symptoms.Pots were watered at different times to obtain FVFM values in the rank between 0.00 and 0.77 (normal conditions).Plant survival was evaluated seven days after taking the FVFM measurement.The whole experiment was repeated independently 3 times, to obtain a total of 60 evaluations of FVFM and plant survival for each one of the 4 ecotypes, thus we have a total of 60 × 4 = 240 measurements.
Box 1 shows a general description and initial analysis of the data.

Analysis
A Generalized Linear Model (GLM) for the surviving genotype as function of FVFM in the Col-2 ecotype.Having observed that the surviving phenotype has a strong relation with FVFM within each genotype, we can estimate a GLM for the surviving probability as function of the observed FVFM parameter, assuming the Binomial family distribution (see, for example, Draper and Smith (2014)).
Box 2 presents the R calculations and output of the model for the probability of surviving given the value of FVFM in the Col-0 ecotype.
In Figure 1 we can observe the strong dependence that exist between the estimated probability of survival and the FVFM in the Col-0 ecotype.In this plot we can see that FVFM = 0.45 (vertical blue line) is a threshold for the estimated probability of survival, i.e., all points below such threshold have an estimated probability of survival less than 0.5, while all points above that threshold have an estimated probability of survival less larger than 0.5.On the other hand, Figure 2 shows the box plot of the distributions of the estimated probability of survival in the two phenotypes, plants that died (numeric phenotype = 0) and plants that survived (numeric phenotype = 1), in both cases 7 days after taking the FVFM measurement.
To predict the state (phenotype) of a plant after the seven days after taking the FVFM value we can use the criterion "predict that a plant will survive after seven days if the estimated probability of survival is equal of grater than 1/2 ".Table 1  From Table 1 we can see that the proportion of correct predictions of the model is 27 + 28 = 55, i.e., the model using all 60 observations correctly predicted the state of the plant in ≈ 92%(55/60 ≈ 0.9167) of the cases, indicating that the FVFM parameter is highly efficient to predict survival in the conditions of the experiment.Also from Table 1 we can see that the model erroneously predicts in 3 cases that the plants will survive when in fact after 7 days they are found to be dead; this implies that the model is giving a rate of false positives of 3/60 = 0.05 These three cases correspond to the three points found over or above the 0.5 threshold in the left hand side box plot (numeric phenotype 0) in Figure 2. On the other hand, the model gives 2 false negatives, i.e., in two cases it predicted that the plants will be dead when in fact they are found to be alive after 7 days; this implies a rate of false negatives of 2/60 ≈ 0.0333.Those two cases correspond to the points found below the 0.5 threshold in the right hand side box plot (numeric phenotype 1) in Figure 2.
Assessing the power of the method to predict the phenotype.To evaluate how good is the method employed to predict plant status as function of the FVFM parameter, we must split the data into two almost independent subsets, say, a training set from which the parameters of the model are estimated, and the other (not employed in the estimation process) where we could test the predictions.We have 60 observations available per genotype and ideally, both, the training and the validation sets could be of the same size (n = 30).However, in this case the process to estimate the GLM fails to converge when the number of data points is less than approximately 50 (data not shown).Thus, to obtain a training set we will select at random 50 of the 60 data points, and using the estimated parameters in the remaining 10 plants to test the power of the method.Because the "training" and "validation" sets are almost independent, i.e., they can be considered to be the results of independent experiments performed under the same conditions each time, the power of the method to predict the phenotype as function of the FVFM parameter will be unbiasedly estimated.
By repeating the random selection of the "training" (n = 50) and "validation" (n = 10) sets many times, say B = 1000, we obtain in each case a good measure of the expected performance of the method to predict future observations (Harrell and Harrell, 2001).Box 3 presents the calculations and general results of this procedure with the data of the Col-0 ecotype.
- for the slope, with median p values of 0.008 and 0.006, respectively.This means that the resampling procedure using only 83% of the data gave models close to the ones obtained with the full data set and with only a relatively small decrease of significance.
Now we need to analyze the rate of correct predictions obtained from each one of the B = 1000 training models in the ten independent data used for validation in each case.Box 4 presents the rearrangement and plotting of the results of the resampling procedure.
Note that in each training / validation case the results of each one of the 10 predictions could be classified as "Correct" if the model correctly classified the phenotype; see Table 1 for the results in the original model which employed all the 60 data points.See also that table for the definitions of the other four categories: "True Positive", "False Positive", "False Negative" and "True Negative".White asterisks show the mean of each frequency, while white "X" show the estimates in the model with all 60 data points (see Box 4).
In Figure 3 we can observe how the results of the validation process are statistically congruent with the results obtained when using all 60 data points in the original model.First, in all 5 groups, the values estimated from the full model (60 data points) and shown as white "X" in the figure, are close to the mean and median values (white asterisks and bold black lines respectively).This means that in the long run the prediction of new data from models using only part of the data result both, accurate and precise enough to adopt the method proposed in the paper to predict the phenotype of interest.
For example, in the box plot for the "Correct" group in Figure 3 we see that the proportion of correct results in the full model was of 92%, while the mean of the B = 1000 replicates gave correct results for data not employed in the training in 90% of the cases, i.e., only 2% less in average that the model using the whole data set.Furthermore, the interquartile range in the "Correct" group goes from 0.8 to 1 with a median of 0.9, suggesting that the correct prediction of future phenotypes (dead or alive) using the FVFM measurement has high probability of success.On the other hand, the rates of the 4 groups that segregate the results, i.e., the sets of "False Negative", "False Positive", "True Negative" and "True Positive" shown in Figure reff3 are also close to the values estimated from the full model, demonstrating the statistical robustness of the method.
It could be argued that in the paper the selection of the FVFM thresholds employed are heuristically based in models simpler than the GLM employed here; i.e., the suitable FVFM threshold was based in the estimation of the FVFM value that gave the best partition between phenotypes observed a posteriori.However, this do not alter the fact that FVFM are good a priori predictor of the phenotype, and both, simple graphical models as well as the GLM will give concordant results.As shown in Figure 1 of this report, the FVFM threshold to determine the sharp change in surviving probability is alike the thresholds employed in the figures of the paper because determining the empirical threshold is almost equivalent to the fitting of the GLM.

Estimation and validation in other ecotypes
The calibration experiments were performed for 4 ecotypes: Col-0, Ler-0, C24, and Kondara.Previously, we have seen that ecotype (source "Trat" in the ANOVA of Box 1) as well as the survival phenotype (source "Phenotype" in the ANOVA of Box 1) have strong and significant effect on the FVFM parameter.
We have examined in detail the results for the Col-0 ecotype (Figures 1 and 2), corroborating that the FVFM parameter has plenty of statistical information to predict the survival phenotype with accuracy and precision in the Col-0 ecotype (Figure 3).In this section we present the results for the other 3 ecotypes, Ler-0, C24, and Kondara.Analysis in R were performed following exactly the same pipeline than with the Co.-0 ecotype, and thus boxes with the R calculations for each one of these 3 ecotypes are omitted for brevity, but the R objects and log files are available upon request.
Table 2 presents the estimated of the parameters for the GLM in the four ecotypes, while Figure 4 presents the estimated probabilities for those models as function of the FVFM parameter also by ecotype.In Table 2 we can see that the estimated parameters of the GLM (both, intercept and slope) depend on the ecotype, a fact previously seen in Box 1.Nevertheless, in all four ecotypes the statistical tests performed for the null hypotheses in these estimates are always highly significant, presenting p-values < 0.008 in all cases.This implies that, in general, the FVFM parameter has a definite significant influence over the surviving phenotype.
The differences in the fitted models for the 4 ecotypes can be appreciated in Figure 4, which presents the estimated probabilities as functions of the FVFM parameter per ecotype.In Figure 4 we can see that, even when the rank at which the estimated probabilities have a sudden change from almost zero to almost one differ among ecotypes, being smaller and more alike in the Ler-0 and C24 ecotypes, intermediate in the Kondera ecotype and larger for the Col-0 ecotype, in all 4 cases this change is sudden, corroborating the fact that the estimate probability of survival quickly changes in a narrow rank of FVFM values.For practical proposes this means that to use the FVFM parameter to predict the survival phenotype it is necessary to perform a pilot calibration study to adequate to particular situations (ecotype, stress, etc.).
Using the criteria of predicting the survival phenotype by the 0.5 threshold of the estimated probability of survival produces very satisfactory results for all the 4 ecotypes, has shown in Table 3.In Table 3 we can see that, by predicting the survival of an individual plant when the probability of survival estimated from the model was > 0.5, gives a large percentage (≥ 91%) of correct predictions in all 4 ecotypes, while the rate of false negatives and false positives are small, in all cases ≤ 6% (4/60).
The main objective of the statistical analyses presented here was to estimate the capability of the methodology to predict the results of independent data points by using models with parameters estimated from a training set.
Recapitulating, for each ecotype, B = 1000 different models using n = 50 data points were estimated, and the results were used to predict the phenotype on the n = 10 data points ignored in the estimation process.In Figure 3 we presented the results of the validation procedure for the Col-0 ecotype, showing that the rate of correct predictions is ≈ 90% and discussing in detail the implications.White asterisks show the mean of each frequency, while white "X" show the estimates in the model with all 60 data points.
From Figures 5, 6 and 7 we can see that the validation procedure produces correct predictions of independent observations in an average ≥ 90% in all 4 ecotypes; in fact for the C24 ecotype, the median of correct predictions reaches 100%, while the rates of false positives and false negatives are reasonably low in all 4 ecotypes.Also, in all 4 ecotypes, the average rate of correct predictions of the validation procedure are very close to the estimated rates of correct predictions when using the full model (see white asterisks and crosses in the "Correct" distributions in figures 3, 5, 6 and 7).

Conclusion
The estimated FVFM parameter is a robust predictor of the probability of survival to drought stressed Arabidopsis plants, and thus can be used as a non-invasive method to predict plant survival.It is important to stress the fact that in all cases a pilot calibration experiment must be performed to estimate the suitable FVFM value for predicting the surviving of stressed plants under the particular conditions.

Figure 1 .
Figure 1.Estimated probability of plant survival as function of the FVFM parameter in the Col-2 ecotype.

Figure 2 .
Figure 2. Box plots for the estimated probability of survival group by numeric phenotype.

Figure 3 Figure 3 .
Figure 3 summarizes the validation results by showing the box plot distribution of the relative frequencies of each one of the groups of in the B = 1000 random replications of the process.

Figure 4 .
Figure 4.Estimated probability of plant survival as function of the FVFM parameter in the four ecotypes.

Figure 5 .
Figure5.Box plots for the validation of the procedure in the Ler-0 ecotype.White asterisks show the mean of each frequency, while white "X" show the estimates in the model with all 60 data points.

Table 1 .
presents the contingency table showing the numbers of plants in each state in rows and the state predicted by the model in columns (object "tab1" in Box 2).True (rows) and Predicted (columns) states of the plants.

Table 2 .
Parameters (intercepts and slopes) estimated for the GLM in the four ecotypes.

Table 3 .
Frequency of occurrence of predicted groups per ecotype in the sets of 60 data points per ecotype.