Skip to main content

Optimizing fully-efficient two-stage models for genomic selection using open-source software

Abstract

Genomic-assisted breeding has transitioned from theoretical concepts to practical applications in breeding. Genomic selection (GS) predicts genomic breeding values (GEBV) using dense genetic markers. Single-stage models predict GEBVs from phenotypic observations in one step, fully accounting for the entire variance-covariance structure among genotypes, but face computational challenges. Two-stage models, preferred for their simplicity and efficiency, first calculate adjusted genotypic means accounting for spatial variation within each environment, then use these means to predict GEBVs. However, unweighted (UNW) two-stage models assume independent errors among adjusted means, neglecting correlations among estimation errors. Here, we show that fully-efficient two-stage models perform similarly to UNW models for randomized complete block designs but substantially better for augmented designs. Our simulation studies demonstrate the impact of the fully-efficient methodology on prediction accuracy across different implementations and scenarios. Incorporating non-additive effects and augmented designs significantly improved accuracy, emphasizing the synergy between design and model strategy. Consistent performance requires the estimation error covariance to be incorporated into a random effect (Full_R model) rather than into the residuals. Our results suggest that the fully-efficient methodology, particularly the Full_R model, should be more prevalent, especially as GS increases the appeal of sparse designs. We also provide a comprehensive theoretical background and open-source R code, enhancing understanding and facilitating broader adoption of fully-efficient two-stage models in GS. Here, we offer insights into the practical applications of fully-efficient models and their potential to increase genetic gain, demonstrating a \(13.80\%\) improvement after five selection cycles when moving from UNW to Full_R models.

Background

Over the past two decades, genomic-assisted breeding has evolved significantly, transitioning from theoretical concepts to practical applications in both plant and animal breeding. Genomic selection (GS) predicts genomic breeding values (GEBV) for plants or animals using dense genetic markers [1]. In plants, several GS analysis approaches are employed, including kernel-based [2,3,4,5], regularized linear [6,7,8,9] and machine learning models [10,11,12,13,14]. An analysis is considered fully-efficient when all sources of variation are accounted for simultaneously in a single model [15, 16].

Single-stage models fit phenotypic observations in one step, fully accounting for the entire variance-covariance structure, making them fully-efficient [17]. However, single-stage analysis faces computational challenges due to the cubic complexity of inverting the high-dimensionality coefficient matrix [18]. Two-stage analysis, preferred for its simplicity and computational efficiency, first calculates adjusted genotypic means accounting for spatial variation within each environment, and then uses these means to predict GEBVs based on markers [19,20,21].

Two-stage models generally assume independent errors among the adjusted means from the first-stage model, an approximation that neglects correlations among their estimation errors [20]. Unbalanced designs, where not all genotypes are grown in every environment and replication levels vary, complicate the variance-covariance structures [22,23,24]. Consequently, the independent, identically distributed (i.i.d.) residuals assumed by two-stage models do not accurately represent the actual estimation error variance (EEV) of the adjusted means. This discrepancy has to be resolved to make two-stage models equivalent to single-stage models.

To address this, there are two approaches: (i) using EEV as the assumed residual covariance matrix for the second-stage models to reflect reality or (ii) scaling the adjusted means using EEV to achieve an i.i.d. covariance structure. The former can be implemented through weighted regression with the diagonal [15] or the full [21] EEV matrix, while the latter can be done through rotations as described by Piepho et al. [21]. A detailed history of the development of fully-efficient models is available in Appendix 1. It is important to note that both the weighted regression with the full EEV and the rotations are mathematically equivalent to each other (proof available in Appendix 1) and to the single-stage models if the true values in EEV are known. However, this is rarely the case because EEV values depend on the estimation of variance components in the first-stage models, which introduces some errors. Nevertheless, several studies [21, 25,26,27,28,29] have demonstrated that this has a minor impact and that fully-efficient models are extremely similar to single-stage models in practice, as the correlations between their GEBVs almost always exceed 0.995. In contrast, unweighted models consistently show a lower correlation to the single-stage GEBVs [21, 25,26,27]. Furthermore, using the entire EEV matrix consistently outperformed models that consider only its diagonal [21, 25, 27,28,29], although a direct comparison of these models in terms of their actual predictive ability is still needed.

Fully-efficient models have also been applied in other scenarios, such as genome-wide association studies (GWAS) [27] and the modelling of genotype by environment interactions using different covariance structures [26] and with environmental covariates [30]. Moreover, it has also been shown that a fully-efficient three-stage analysis is almost as robust as its two-stage equivalent [21, 27]. For simplicity, this work will focus on two-stage analysis, but we have detailed how our models can be extended to fully-efficient three-stage analysis in Appendix 2.

One major disadvantage of fully-efficient models we have observed in our experience is their inability to simplify reality. Phenotype is influenced by complex terms such as epistasis and genotype by environment interaction, which are often difficult to predict and not always of primary interest [31]. Consequently, models typically exclude terms for these effects, allowing them to be absorbed by the residuals. This absorption requires i.i.d. residuals that are not constrained by a user-defined covariance structure. Nevertheless, fully-efficient models require a weighted regression where the covariance matrix of the residuals is the EEV of the adjusted means [21]. This requirement prevents the residuals from absorbing unmodeled effects, often leading to significant performance drops, as demonstrated in this study. This limitation probably accounts for the limited adoption of fully-efficient methodologies.

Recently, Endelman [32] proposed a modification in which EEV is included as a random effect, leaving residuals i.i.d. This approach significantly enhances model versatility, enabling a fully-efficient model without explicitly including genotype by environment interactions. While Endelman’s work [32] is noteworthy and represents a significant advancement, he did not extensively discuss the implications of including EEV as a random effect instead of applying it to the residuals. Additionally, there is a lack of literature comparing these two approaches. Therefore, the first aim of this work is to showcase the limitations of Piepho’s methodology [21] in simple models and to compare it to Endelman’s implementation [32].

Regarding the practical application of this methodology, the StageWise R package developed by Endelman [32] is powerful and user-friendly. However, it requires ASReml [33], which is not freely available, and it offers limited flexibility for user-defined models in the second-stage. Our aim in this paper is to detail the fully-efficient two-stage model for GS using an open-source alternative, providing benchmark data and detailed code to enhance understanding of fully-efficient two-stage models in GS using freely available software. To this end, we have made available in GitHub several Vignettes (https://github.com/TheRocinante-lab/Publications/tree/main/2024/Fernandez-Gonzalez_et_al_Fully_Efficient). Vignette 0 contains examples of the different ways the fully-efficient models can be implemented in R, while Vignettes 1 through 3 replicate the examples from StageWise package [32] using open-source code.

Results

In this section, we first explore the results of comparing different fully-efficient models in a simulation with different experimental designs and levels of heritability (\(H^2\)). Next, we assess the success of replicating the StageWise package using open-source software.

Simulation results

Figure 1 shows the cross-validation accuracy of the models summarized in Fig. 2, applied to different experimental designs and with or without non-additive effects. In Fig. 1, accuracies were substantially higher for the augmented designs compared to randomized complete block designs (RCBD). For example, with the single-stage (SS) model, the augmented design outperformed RCBD by 8.8% when considering only additive effects and by 7.1% when including non-additive effects. Similarly, incorporating non-additive effects improved SS model’s prediction accuracy by 3.1% in the augmented design and by 4.8% in RCBD. Generally, the SS model performed best across scenarios, followed by the model with EEV in a random effect (Full_R), the diagonal of EEV in a random effect (Diag_R), the unweighted model (UNW), the model with EEV in the residuals (Full_Res), and the diagonal of EEV in the residuals (Diag_Res). More details about these models are available in the Methodology section. Exceptionally, Full_R, Diag_R, and UNW performed better than SS in RCBD with only additive effects. Differences between models were smaller in RCBD designs compared to augmented ones. Full_Res and Diag_Res models tended to perform well when non-additive effects were present, but their performance dropped significantly when considering only additive effects, especially in the augmented design.

Fig. 1
figure 1

Overview of the average accuracy of prediction of the additive breeding values of the test set across the 500 repetitions for the different models in the four scenarios tested within the intermediate heritability simulation (RCBD and augmented RCBD designs, models with or without a random term for non-additive genetic effects). Within each scenario, six models were tested. From left to right, they correspond to (1–4) two-stage models with the full inverse of EEV (“Full”) or only its diagonal elements (“Diag”) in the residuals (“Res”) or a random effect (“R”); (5) unweighted two-stage model (“UNW”); (6) single-stage model (SS). The points correspond to the average accuracy and the error bars refer to the standard error of the mean. The asterisks indicate if there were significant differences between each model and the single-stage one (no asterisk: p > 0.05; one asterisk: 0.01 < p < 0.05; two asterisks: 0.001 < p < 0.01; three asterisks: p < 0.001). Asterisks in black were used when the model performed worse than single-stage and red color indicates that they were better

The main patterns described above are consistent between the intermediate \(H^2\) scenario (Fig. 1) and the other scenarios (Supplementary Figures S1, S2). The most substantial difference is that, for the augmented design with only additive effects, Diag_R, Full_R, and UNW models performed worse than SS in the low and intermediate \(H^2\) scenarios, but better in the high \(H^2\) scenario.

Additionally, the differences between models were larger at lower \(H^2\) values. For instance, in the augmented design with non-additive effects, Full_R outperformed UNW by 2.62% at low \(H^2\), 1.22% at intermediate \(H^2\), and 0.93% at high \(H^2\). In all scenarios, relatively small differences in average accuracy were highly significant. The statistical test performed was a Wilcoxon signed-rank test for matched pairs, as the assumptions for parametric tests were not met. Although the experimental design was not independent across all iterations, the simulated genetic and non-genetic effects were independent. Since the model’s performance primarily depends on these effects, we determined that the assumptions of the Wilcoxon signed-rank test were satisfied. The frequent significance of the observed differences is due to the large amount of replication and the fact that a paired statistical test was performed, which increased statistical power.

All of the above results apply to the accuracy of prediction for an unobserved test set (TS). However, the accuracy of predicting breeding values within the training set (TRS) is also of interest in breeding studies. The results for within-TRS accuracy are presented in Supplementary Figures S3 - S5. In general, accuracy was higher within the TRS than in the TS. The relative accuracy across models and scenarios closely mirrored the general patterns described above. Accuracy improved when non-additive effects were included, and augmented designs outperformed RCBD. Among the models, SS was consistently performed the best, followed by Full_R, Diag_R, UNW, Full_Res, and Diag_Res.

Table 1 Root mean square error (RMSE) of the estimation of variance components for the different combinations of models and experimental designs tested in the Intermediate Heritability scenario

Next, Table 1 shows the root mean square error (RMSE) in the estimation of variance components. The lowest RMSE values were recorded for the augmented design in models with both additive and non-additive effects, followed by RCBD with similar models, augmented with only additive effects, and RCBD with only additive effects. Generally, models that presented lower accuracy in Fig. 1 tended to have higher errors in the variance estimation as well. For instance, models with EEV in the residuals (Full_Res, Diag_Res) usually had higher RMSE in the variance estimation, especially for augmented designs and models with only additive effects, which coincides with the scenarios where they had lower accuracy. Diag_R, Full_R, UNW, and SS models generally had very similar RMSE values, with a few noteworthy differences. All models overestimated additive variance when non-additive effects were not considered, resulting in higher RMSE values. This trend was more pronounced for the SS model (higher RMSE) than for the others, while Diag_R presented the lowest RMSE in these cases, matching the patterns observed in Fig. 1.

We observed very similar general trends in scenarios with low and high heritability (Supplementary Figures, Tables S1, S2). Generally, the higher heritability values, the larger the RMSE values for genetic variances. This is not due to a worse estimation of the variance, but simply because for higher heritability values, genetic variance components are larger and for the same relative error a higher RMSE is incurred.

Finally, we also recorded the time required for calibrating the different models. The average time across all iterations and heritability levels is shown in Table 2. The time needed to train the first stage models was approximately 50% longer for augmented designs compared to RCBD. This trend became more pronounced for second-stage models, where the time requirement was 15 to 20 times greater for augmented designs than for RCBD. In contrast, single-stage models showed similar times for both RCBD and augmented designs. The fastest model across all scenarios was generally UNW. Next, the fully-efficient, two-stage models ranged from being slightly faster than UNW to up to 85% slower. The Diag_Res, Full_Res, Diag_R, and Full_R models exhibited broadly similar computational speeds, with minor fluctuations depending on the scenarios. The SS model was consistently the slowest, being an order of magnitude slower than the others for augmented designs, and two orders of magnitude slower for RCBD.

Table 2 Average time (in seconds) required to train each model

StageWise replication with open source software

In this section, we explore whether the genotypic values and variance components estimated by the fully-efficient, two-stage methodology in StageWise [32] (ASReml-based [33]) are equivalent to those obtained using open-source code (lme4, Sommer R packages [34, 35]). We used the same examples employed in the StageWise vignettes (more details in the Methodology section) and summarized the similarity between StageWise and open-source alternatives in Table 3.

Table 3 Comparison of the results obtained with StageWise package (ASReml-based) and with the open-source software (lme4 + Sommer)

The first vignette encompasses simple models focusing only on main genotypic effects. In this context, the first stage model fitted using lme4 yielded the same genotypic best linear unbiased estimators (BLUEs) and associated EEV as ASReml. In the subsequent second-stage, the genotypic best linear unbiased predictors (BLUPs) obtained with Sommer and ASReml were extremely similar, with a correlation ranging from 0.99 to 1 (see Table 3). The differences between the estimated variance components were no larger than rounding errors.

In the second vignette, which explores genotype by environment interactions, the first-stage models were again identical in ASReml and open-source software. In the second-stage models, the genotypic values within the observed locations were very similar, with a correlation ranging from 0.95 to 0.99 (see Table 3). In contrast, when making predictions for unobserved locations, the predictions from Sommer and ASReml were completely different, showing a correlation of only 0.05.

In the third vignette, multi-trait models are introduced. The complex variance-covariance structures involved in all stages make it impractical to use lme4, so we used Sommer for stage 1 instead. The stage 1 BLUEs and EEV were mostly similar between Sommer and ASReml, with a correlation value of 0.98. In stage 2 models, the estimated variance components and correlation between traits were very similar. The only exception was the residual variance component, which tended to be larger in Sommer than in ASReml. The genotypic values of observed traits were almost identical, with a correlation between 0.98 and 0.99 (see Table 3). When unobserved traits were predicted based on observed secondary traits and the correlation between traits estimated in a previous model, the differences between Sommer and ASReml became larger, but still, their results remained closely aligned, with a correlation between 0.93 and 0.94.

Discussion

In the literature, fully-efficient models have been shown to yield very similar predictions to the SS model [21, 25, 26, 28, 29]. However, this does not provide a complete perspective, as it does not directly reflect the impact on accuracy and genetic gain, and it is difficult to interpret.

As far as we know, only [27] has compared the accuracy of fully-efficient, UNW and SS models, but this was limited to a single empirical maize dataset. Additionally, much attention has been paid to the differences between weighting with the full EEV matrix or only with its main diagonal [21, 25,26,27,28,29]. However, the arguably more important question of whether the EEV should be included in the random effects or in the residuals has not been addressed. In this work, we used simulations to directly study the impact of the fully-efficient methodology on prediction accuracy using four different implementations in four different scenarios for three degrees of heritability. This approach enables us to obtain a general picture of when the fully-efficient methodology is most effective, and which implementation should be used. We also replicated the ASReml-based StageWise package using open-source software, enabling the widespread application of this methodology without the need for an expensive license.

Simulation results

In this section, we interpret the results of the model comparison in the simulated datasets (Fig. 1). First, models that included non-additive effects performed better than those with only additive ones. This outcome was expected because the simulation included substantial non-additive effects. Thus, they were necessary for the model to accurately reflect reality. Similarly, augmented designs outperformed RCBD by a large margin. This finding aligns with previous studies [36,37,38,39,40] and demonstrated that increasing the size of the training set is more important than increasing replication for maximizing GS accuracy on an unobserved test set. Therefore, especially in the early stages of breeding programs, replication should be kept to a minimum that provides enough degrees of freedom to fit the desired models.

Regarding the performance of the different types of models, the SS model was almost universally the best. The only exception was the RCBD scenario without non-additive effects (Fig. 1). This discrepancy may be related to the inflation of additive variance in the absence of non-additive effects (Table 1). While this inflation occurs with all models, it is more pronounced for the SS model than for UNW, Full_R or Diag_R. Considering all effects in a single-stage might allow the SS model to capture as much variance as possible through the additive effect, which is usually beneficial. However, this approach results in an overly inflated additive variance component when the model does not match reality due to the lack of non-additive effects. Additionally, the SS model might have more difficulty converging than the equivalent two-step models because it has to estimate more parameters in a single step.

Considering only additive effects had an even greater detrimental influence on the models with EEV in the residuals (Full_Res and Diag_Res). In this situation, the non-additive effects present in the simulated dataset had to be pushed into the additive effect or the residuals. Because these residuals are constrained to follow the variance-covariance in EEV (which only reflects the stage 1 estimation error), these models tended to break down. This issue is exemplified by the extreme RMSE they presented for the estimation of additive variance in the augmented design (Table 1). In RCBD, this pattern was less pronounced because EEV is substantially less informative for RCBD than for augmented designs [15], allowing more non-additive variance to be pushed into the residuals instead of the additive effect. This highlights the risks of using EEV as the variance-covariance of the residuals, as it may cause major disruptions in the model when it does not fully represent reality, which is relatively frequent when working with complex empirical datasets.

Finally, considering the full EEV matrix was often slightly better than using only its main diagonal (Fig. 1), which again was expected due to the loss of information incurred when discarding the off-diagonal elements of EEV and aligns with the findings in [21, 25,26,27,28,29]. However, this effect was relatively small, suggesting that using only the diagonal of EEV to increase computational efficiency is a viable strategy. We recommend this approach only if it is strictly necessary.

Regarding computational time (Table 2), the SS model was the slowest, as expected. This is likely due to the cost of performing spatial analysis across all 5 environments simultaneously. Since the spatial analysis depends on the number of plots, not genotypes, the computational time for SS was similar for both RCBD and augmented designs. Given that matrix operations have cubic time complexity and the SS model performs spatial analysis across 5 environments, it is expected to be \(5^3=125\) times slower than first-stage models. This is consistent with the results in Table 2. For second-stage models, computational time is primarily influenced by the number of genotypes. As the augmented design includes 2.6 times more genotypes than RCBD, we expect second-stage models to be approximately \(2.6^3 = 17.576\) times slower for augmented designs, which aligns with the observed results. The SS model remains slower than two-stage models due to the number of plots being equal to or greater than the number of training set genotypes, making SS impractical for large datasets. While a simpler spatial analysis (e.g., block effects) could reduce time, this would likely compromise accuracy more than switching to a two-stage model. Among the two-stage models, UNW was the fastest, while fully-efficient models were 0 to 85% slower, a reasonable difference. Models using the full EEV matrix showed minimal speed differences compared to those using a diagonal matrix, as the “mmer()” solver from the Sommer package does not leverage sparsity. Even exploiting sparsity, we know that fully-efficient models cannot be faster than UNW, which means that time requirements between using the full EEV matrix or only its diagonal are in the same order of magnitude. Furthermore, even the full EEV matrix is very sparse, as all covariances between different environments are zero. Overall, the Full_R model strikes the best balance between performance (nearly as good as SS) and speed (much faster than SS and only slightly slower than other two-stage alternatives).

In summary, the best scenario was the augmented design with non-additive effects considered in the models. This scenario also presented the largest improvements in accuracy from SS or fully-efficient models over UNW. Therefore, there is a favourable synergy between the best design (augmented designs) and the best model strategies. Since EEV is more informative for augmented designs [15], fully-efficient modelling allows for further increases in accuracy when moving from RCBD to augmented designs. As a result, we recommend SS if computational time allows for it and Full_R (the two-stage model with the best accuracy) otherwise. Full_Res and Diag_Res should be avoided when possible, as they are risky when the model does not match reality.

The strong performance of the augmented design for predicting an unobserved TS was expected, as testing more genotypes enlarges the TRS, leading to better predictions [41, 42]. However, we anticipated RCBD would outperform augmented designs for within-TRS predictions due to increased replication improving reliability [43,44,45]. Surprisingly, Supplementary Figures S3- S5 show that within-TRS accuracy in augmented designs was comparable to RCBD in the low heritability scenario and up to 10% better in high heritability. We hypothesize that this is due to: i) the larger TRS, ii) the genomic relationship matrix allowing genotypes to “borrow” information from one another, acting as pseudo-replication [46], iii) unreplicated genotypes within locations being fully replicated across locations, improving reliability, and iv) the simplicity of non-additive effects in the simulation, reducing the need for replication, especially at high heritability.

In addition, augmented designs offer additional benefits: the greater number of genotypes in the TRS may increase the genetic variance represented and allow for higher selection intensity without reducing the number of selected genotypes. These factors, along with improved accuracy, directly enhance the response to selection, as evident from the breeder’s equation [47].

In light of these results, adopting sparser designs with minimal replication within each environment may be beneficial. Fitting stage 1 models with fixed genotypic effects becomes challenging under such designs, but using random effects and deregressing the BLUPs, as demonstrated by Galli 2018 [48] and Verbyla 2023 [29], can perform similarly to fixed effects. However, this approach increases the complexity of the analysis, and extremely low replication may negatively impact the accuracy of estimating spatial trends within the field. This potential drawback could offset any accuracy gains from increased sparsity. Furthermore, to our knowledge, stage 1 deregressed BLUPs have only been used with residual weighting in stage 2 [29, 48], and their impact when \(EEV_{all}\) is included as a random effect rather than in the residuals has not been investigated. Further research is needed to explore this.

To exemplify the impact that upgrading to a fully-efficient methodology may have on genetic gain for a low heritability trait such as yield, Full_R presented an accuracy of 2.62% greater than UNW for the augmented design with non-additive effects. All other things being equal, the response from selection over time is directly proportional to accuracy according to the breeder’s equation [47]. Therefore, in each selection cycle, the genetic gain would be a 2.62% larger with Full_R model than with the UNW model. This increase compounds over time. For example, after five cycles, an improvement of \(1.0262^5 = 1.1380\) is expected. Thus, simply moving from UNW to Full_R model would increase the genetic gain by a very substantial 13.80% after only five cycles of selection.

StageWise replication with open source software

Sommer and ASReml models were equivalent in most circumstances (Table 3) with the noteworthy exception of FA2 for the modelling of genotype by environment interactions. This discrepancy arises because Sommer and ASReml use completely different implementations of the factor analysis methodology, as detailed in Appendix 4. Both methods yielded similar BLUPs within observed environments, but the estimated correlations between environments were completely different. This difference significantly impacted the prediction of unobserved locations, leading to the divergent results highlighted in Table 3. In multi-trait analyses, Sommer and ASReml also did not perform identically, contrary to our expectations. We hypothesize that, as model complexity increases without a correspondingly larger dataset, convergence becomes more difficult, causing Sommer and ASReml to diverge. Based on these observations, we can summarize the comparison between modeling software in Table 4.

Table 4 Equivalence between ASReml and open-source alternatives

The package lme4 is roughly as computationally efficient as ASReml, but it cannot handle complex covariance structures, while Sommer can. The main difference between Sommer and ASReml is computational time. In our experience, Sommer models were usually substantially slower, which could be a problem for very large datasets. This issue is partially mitigated by the presence of two Sommer solvers: mmer and mmec. The function mmer is faster if \(c > r\) and mmec is preferable when \(c < r\), where r is the number of records in the data and c is the number of coefficients to estimate by the model [35]. Furthermore, mmer uses dense matrix operations while mmec leverages sparsity. Using the most suitable solver can substantially reduce computational time, but it was never as fast as ASReml or lme4. This contrasts with the comparisons made by Covarrubias-Pazaran [35], in which Sommer was faster than ASReml. This inconsistency can be explained by the fact that both packages have been substantially updated since then, which may have altered their relative performance.

In summary, lme4 and Sommer can serve as an equivalent alternative to ASReml in most situations. However, Sommer may be non-viable for extremely large datasets, where ASReml would be required. For dataset sizes commonly used in research articles, Sommer offeres good performance. The code in the Vignette0 available in GitHub (https://github.com/TheRocinante-lab/Publications/tree/main/2024/Fernandez-Gonzalez_et_al_Fully_Efficient) outlines the process needed to implement the fully-efficient methodology in Sommer across several scenarios, paving the way for wider adoption of this methodology.

Conclusions

The fully-efficient, two-stage methodology performs very similarly to UNW for RCBD, but it is substantially better for augmented designs. Therefore, we expect the fully-efficient methodology to become more prevalent in the future, as GS has increased the appeal of augmented designs due to their ability to increase model accuracy. This is achieved by testing a greater number of genotypes while keeping field size constant, resulting in a cost similar to RCBD.

In this work, we have successfully compared the different implementations of fully-efficient models across a wide range of scenarios. As a result, if SS models are not an option due to computational time, the Full_R model is recommended for its great performance, reasonable computational cost and high consistency across scenarios. Furthermore, we have provided all the necessary details about the theoretical background of fully-efficient models, as well as the R code to implement them in open-source software equivalent to the expensive ASReml, albeit with reduced computational efficiency. Therefore, this work combines a showcase of the importance of fully-efficient models with insights into their best implementation and provides all the tools needed for a wider adoption of this methodology.

Methods

Simulation

In this work, we used simulated datasets to compare the performance of the different models. The simulation was based on empirical genotypic information from an oat dataset [49], which included 699 lines and 17,288 SNP markers per line after quality control. We simulated the following effects from multivariate normal distributions:

$$\begin{aligned} y_{ijk} = \mu + A_i + Gres_i + E_j + P_{jk} + \epsilon _{ijk} \end{aligned}$$
(1)

where \(y_{ijk}\) is the simulated phenotype of genotype i in plot k of environment j, comprised of a fixed intercept \(\mu\); an additive genotypic effect (true breeding value) \(A_i \sim N(\varvec{0}, G_{str}\sigma ^2_a\)); a non-additive genotypic effect \(Gres_i \sim N(\varvec{0}, I\sigma ^2_{Gres})\), which can be viewed as a genetic residual comprising all genetic effects not captured by \(A_i\); an environmental effect \(E_j \sim N(\varvec{0}, I\sigma ^2_e)\); a plot spatial effect \(P_{jk} \sim N(\varvec{0}, AR1xAR1\sigma ^2_p)\) and a residual (nugget) containing the true error (e.g. measurement errors) \(\epsilon _{ijk} \sim N(\varvec{0}, I\sigma ^2_\epsilon )\). \(G_{str}\) refers to the VanRaden [2] additive genomic relationship matrix calculated using the “G.matrix” function in snpReady R package, version 0.9.6 [50]. The subindex str indicates that it is a variance-covariance structure (see Appendix 3 for a detailed discussion). I refers to an identity matrix of the needed dimensions. AR1xAR1 refers to a correlation matrix from a two-dimensional, separable autoregressive process of order one with an autocorrelation coefficient (\(\rho\)) of value 0.7 for both rows and columns. \(\sigma ^2_a\), \(\sigma ^2_{Gres} = \frac{2}{3}\sigma ^2_a\), \(\sigma ^2_e = 1\cdot n_j\), \(\sigma ^2_p = \frac{1}{2} n_j n_k\) and \(\sigma ^2_\epsilon = \frac{1}{5} n_j n_k\) are the variance components of their corresponding effects. \(n_j = 5\) is the number of environments simulated and \(n_k = 3\) is the number of blocks per environment. 100 plots within each block were simulated. The value of \(\sigma ^2_a\) was tuned as needed to obtain the target heritability values of the three scenarios tested: a low heritability scenario with broad sense heritability (\(H^2\)) of 0.3 and a narrow sense heritability (\(h^2\)) of 0.18, an intermediate heritability scenario with \(H^2 = 0.5\), \(h^2 = 0.3\) and a high heritability scenario with \(H^2 = 0.7\), \(h^2 = 0.42\). Heritability was calculated as follows:

$$\begin{aligned} H^2 = \frac{\sigma ^2_a + \sigma ^2_{Gres}}{\sigma ^2_a + \sigma ^2_{Gres} + \frac{\sigma ^2_e}{n_j} + \frac{\sigma ^2_p}{n_j n_k} + \frac{\sigma ^2_\epsilon }{n_j n_k}}; h^2 = \frac{\sigma ^2_a}{\sigma ^2_a + \sigma ^2_{Gres} + \frac{\sigma ^2_e}{n_j} + \frac{\sigma ^2_p}{n_j n_k} + \frac{\sigma ^2_\epsilon }{n_j n_k}} \end{aligned}$$

Within each heritability scenario, we tested two experimental designs: a randomized complete block design (RCBD) and an augmented RCBD. In the RCBD, 100 genotypes were randomly sampled and they were fully replicated across blocks and environments, with their positions randomized within each block. In the augmented design, 20 random genotypes were used as checks and fully replicated across environments and blocks. Additionally, 240 randomly selected genotypes were replicated only once per environment but were present in all environments. The position of each genotype within a block was also randomized.

For both designs, we performed 50 repetitions for the random selection of genotypes and their distribution in the field. For each experimental design, we conducted 10 additional repetitions of the phenotypic simulation (Eq. 1), resulting in a total of 500 repetitions. The simulated phenotypes served as a training set (TRS) for calibrating the models. The remaining non-phenotyped genotypes were used as a test set (TS) to validate the model’s predictions. Model accuracy was computed as the correlation between the true breeding values and the GEBVs predicted by the model.

Models

In this section, we will explain the models tested in each simulation scenario. A summary is available in Fig. 2. All models were fitted using Sommer R package, version 4.3.1 [35].

Fig. 2
figure 2

Summary of the different types of models tested in this work, fitted to multi-environment trials with five environments: E1 through E5. The single-stage model is on top, with the different types of two-stage models below it. The stage 1 model is common to all types of two-stage models, which only differ on how the stage 2 model accounts for the stage 1 estimation error variance-covariance (\(EEV_{all}\)). UNW: unweighted model, does not account for \(EEV_{all}\); Full_R: a model that includes \(EEV_{all}\) as the covariance matrix of a random effect (i.e., \(cov(\varvec{u_s}) = EEV_{all}\)); Diag_R: a model that includes only the diagonal of \(EEV_{all}\) as the variance of a random effect (i.e., \(cov(\varvec{u_s}) = diag(EEV_{all}^{-1})^{-1}\)); Full_Res: a model that includes \(EEV_{all}\) as the covariance matrix of the residuals (weighted regression, \(cov(\varvec{\epsilon }) = EEV_{all}\)); Diag_Res: a model that includes only the diagonal of \(EEV_{all}\) as the variance of the residuals (weighted regression, \(cov(\varvec{\epsilon }) = diag(EEV_{all}^{-1})^{-1}\)). Independent stage 1 models were fitted for each environment and their outputs were combined and used as input for each stage 2 model. Similar effects across models have been aligned in the same column and highlighted in different colors. The effect for the non-additive genetic effects (\(Z_{Gres}\varvec{u_{Gres}}\)) is indicated in brackets to show that we have tested all models with and without it

Single-stage model

Modelling all effects in a single-stage (SS) is considered the best approach, but it has a larger computational cost than the two-stage alternatives, which can make it unfeasible for industrial-scale datasets. The model used in this work is the following:

$$\begin{aligned} \varvec{y} = \varvec{1}\mu + Z_e \varvec{u_e} + Z_a \varvec{u_a} + (Z_{Gres} \varvec{u_{Gres}}) + Z_p \varvec{u_p} + \varvec{\epsilon } \end{aligned}$$
(2)

where \(\varvec{y}\) is a vector of simulated phenotypes. \(\mu\) is the fixed intercept. \(\varvec{u_e} \sim N(\varvec{0}, I\sigma ^2_e)\) is a vector of BLUPs for the main environmental effects. \(\varvec{u_a} \sim N(\varvec{0}, G_{str}\sigma ^2_a)\) is a vector of additive genotypic BLUPs. \(\varvec{u_{Gres}} \sim N(\varvec{0}, I\sigma ^2_{Gres})\) is a vector of non-additive genotypic BLUPs. \(u_p\) is a vector of random spatial effects following the 2D p-splines framework [35, 51,52,53] and nested within the environment. \(\varvec{\epsilon } \sim N(\varvec{0}, I\sigma ^2_\epsilon )\) is a vector of residuals. \(\varvec{1}\) is a vector of ones, i.e. the design matrix of the intercept. \(Z_e\), \(Z_a\), \(Z_{Gres}\) and \(Z_\epsilon\) are the design matrices of their corresponding effects and \(G_{str}\) is the additive genomic relationship matrix [2]. One important consideration about the residuals in this model is that they only contain the nugget (true error) from the simulation, which was specified to have the same variance in all environments. Therefore, we can use homogeneous residuals across environments for the model, but heterogeneous residuals are common when working with empirical datasets.

\(Z_{Gres}\varvec{u_{Gres}}\) is between brackets to indicate that the model was tested with and without this effect. When the \(Z_{Gres}\varvec{u_{Gres}}\) effect is included, this model has an explicit random effect for each of the components that influence the phenotype in the simulation and therefore it can be said that it perfectly matches the reality. In contrast, without this effect, the model is a simplification of reality.

Two-stage models

We tested five different kinds of two-stage models: the unweighted (UNW) model and four implementations that rely on EEV. The stage 1 model is the same for all of them, simply removing the spatial effects within each environment to obtain adjusted genotypic means that can be used as the response variable for the next stage. In the second-stage, these adjusted genotypic means are split into the environmental effect and the different genotypic effects. The only difference between the five kinds of stage 2 models tested is whether or not information about the reliability of the estimation of the adjusted genotypic means (EEV) is considered and how it is considered. The classical fully-efficient methodology employs EEV as the covariance matrix of the residuals [21], while recently it has been proposed that it can instead be treated as the covariance of a random effect [32]. Furthermore, the full EEV can be used [21, 32] or we can focus only on its main diagonal [15]. We aim to compare these different approaches in this work.


Stage 1 Model

A different stage 1 model is fit to each environment independently:

$$\begin{aligned} \varvec{y_j} = X_j \varvec{\beta _j} + Z_{p_j} \varvec{u_{p_j}} + \varvec{\epsilon _j} \end{aligned}$$
(3)

Where \(\varvec{y_j}\) contains the phenotypic observations in environment j; \(\varvec{\beta _j}\) is a vector of fixed genotypic effects and \(X_j\) is its corresponding design matrix; \(\varvec{u_{p_j}}\) is a vector of random spatial effects following the 2D P-splines framework [35, 51,52,53] and \(Z_{p_j}\) is its corresponding design matrix; \(\varvec{\epsilon _j} \sim N(\varvec{0}, I\sigma ^2_{\epsilon _j})\) is a vector of residuals. I is the identity matrix of appropriate dimensions and \(\sigma ^2_{\epsilon _j}\) is the residual variance component.

From the model in equation 3, we can obtain BLUEs for the fixed genotypic effects (\(\varvec{\hat{\beta }_j}\)), which can be regarded as adjusted genotypic means without the spatial effects of the field. We can also calculate their associated variance-covariance, which can be used as a matrix of weights. These will be carried on to the second-stage. For the sake of clarity, we will refer to the variance-covariance associated to stage 1 genotypic BLUEs as the estimation error variance-covariance matrix (EEV), analogous to the commonly used prediction error variance-covariance matrix (PEV) of the random effects BLUPs.

\(EEV = var(\hat{\beta }_i - \beta _i) = var(\hat{\beta }_i)\), where \(var(\hat{\beta }_i)\) is the sampling variance of the \(i^{th}\) BLUE (i.e. its squared standard error) as described in [18]. It is important to note that \(var(\hat{\beta }_i - \beta _i) = var(\hat{\beta }_i)\) because, in fixed effects, the true \(\beta _i\) is assumed to be a scalar and its subtraction from the estimated \(\hat{\beta }_i\) does not alter the variance of the distribution. Confusingly, the nomenclature for EEV in the literature is not uniform, sometimes having different names even within a single paper. Its most common names are variance-covariance matrix of the adjusted means [21, 25,26,27,28, 30], variance-covariance matrix of the errors [27, 30], and variance-covariance matrix of the Stage 1 genotype [26, 32].

We believe that naming it EEV makes it easier to differentiate from other variance-covariance matrices involved in mixed models, and it is a more intuitive concept due to its resemblance to the analogous and more widespread PEV. To interpret the EEV values, it is important to note that large values of error variance are associated with a low reliability in the estimation of the BLUEs. In other words, if we repeated a similar field trial many times, we would expect very different BLUEs for the same genotype across repetitions. The opposite happens for BLUEs with small error variance. Therefore, BLUEs with small error variance should be “trusted” more during the calibration of the second-stage models. For readers not familiar with the EEV concept, we strongly recommend reading Appendix 5. There, we have included an overview on how EEV is calculated and its intuitive meaning is explained in detail using three toy examples, reflecting how EEV captures extensive information about the experimental design and the choice of stage 1 model.

Before proceeding to the second-stage model, we need to merge the data from the first-stage models for each one of the j environments by stacking their BLUEs vectors and performing the direct sum of their EEV matrices:

$$\begin{aligned} \varvec{\hat{\beta }_{all}} = \begin{pmatrix} \varvec{\hat{\beta }_{1}} \\ \varvec{\hat{\beta }_{2}} \\ \vdots \\ \varvec{\hat{\beta }_{j}} \end{pmatrix} \end{aligned}$$
(4)
$$\begin{aligned} EEV_{all} = \begin{pmatrix} EEV_{1} & 0 & \ldots & 0 \\ 0 & EEV_{2} & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & EEV_{j} \\ \end{pmatrix} \end{aligned}$$
(5)

\(\varvec{\hat{\beta }_{all}}\) and \(EEV_{all}\) are the inputs needed to calibrate the stage 2 models. It is important to note that, in the stage 1 model, we followed the standard approach of setting the genotypic effects as fixed to obtain unbiased estimates and avoid double shrinkage. However, if replication of genotypes within the field is low or non-existent, fitting a fixed-effects model may be impossible. In such cases, using random genotypic effects and deregressing them [54] as demonstrated by Galli 2018 [48] and Verbyla 2023 [29], can provide unbiased estimates. This approach performs similarly to using BLUEs in stage 1, with the advantage of accommodating sparser designs and the disadvantage of a more complex modelling process [29].


Stage 2 Models


Unweighted second-stage

This model (UNW) uses \(\varvec{\hat{\beta }_{all}}\) as a response variable but does not consider \(EEV_{all}\), therefore giving the same weight to all observations regardless of how confident we are on the quality of their estimation:

$$\begin{aligned} \varvec{\hat{\beta }_{all}} = \varvec{1}\mu + Z_e \varvec{u_e} + Z_a \varvec{u_a} + (Z_{Gres} \varvec{u_{Gres}}) + \varvec{\epsilon } \end{aligned}$$
(6)

where the \(\varvec{\hat{\beta }_{all}}\) genotypic BLUEs from the stage 1 models are the response variable and all other terms are the same as the effects with the same name in the SS model (Eq. 2).


Fully-efficient second-stage, full EEV in residuals

This model (Full_Res) uses the same equation as the UNW second-stage (Eq. 6), but now EEV is included as the variance-covariance matrix of the residuals: \(\varvec{\epsilon } \sim N(\varvec{0}, EEV_{all})\). This approach is commonly called weighted regression because it gives more weight to observations with lower error variance for the estimation of all non-residual terms. It is important to note that, when the \(Z_{Gres} \varvec{u_{Gres}}\) effect is included in the model, all possible sources of variation (Eq. 1) are explicitly stated in the model. As a result, the residuals will only contain the true error, which is well described by EEV. However, when the \(Z_{Gres} \varvec{u_{Gres}}\) effect is not present, the non-additive genotypic effects will have to be pushed into other terms, which can inflate the additive BLUPs and the residuals. Consequently, the residuals will no longer contain only the true error and they will no longer be well described by EEV. Therefore, we expect this model to perform poorly when the \(Z_{Gres} \varvec{u_{Gres}}\) term is not included.

Finally, it is important to note that there is no variance component in the residual term of Full_Res model. This is because \(EEV_{all}\) is already the result of the product of a variance-covariance structure and a variance component estimated in the stage 1 model. In other words, the amount of variation caused by the true error is already known, and there is no need for the model to estimate it. For more details and an explanation of how to extract a variance component from \(EEV_{all}\), see Appendix 3.


Fully-efficient second-stage, diagonal EEV in residuals

This model (Diag_Res) is similar to the Full_Res model (Eq. 6), with the only exception that only the main diagonal elements of EEV are considered in the weighted regression. This approach allows for faster computations at the expense of losing some information. It is important to note that, in the mixed models, the residual variance-covariance matrix is not used directly. Instead, its inverse is of interest. Therefore, to minimize the loss of information, the off-diagonals of EEV only have to be discarded after inverting it, i.e. \(R^{-1} = diag(EEV_{all}^{-1})\), where R is the generic name for the residual variance-covariance matrix and diag() refers to selecting only the main diagonal elements of a matrix (and turning the rest into zero). This can be rearranged as: \((R^{-1})^{-1} = R = diag(EEV_{all}^{-1})^{-1}\). Therefore, \(\varvec{\epsilon } \sim N(\varvec{0}, diag(EEV_{all}^{-1})^{-1})\). We expect this model to have similar problems as Full_Res when the \(Z_{Gres} \varvec{u_{Gres}}\) term is not included in the model.

Fully-efficient second-stage, full EEV in random effect

This model (Full_R) includes \(EEV_{all}\) as the variance-covariance matrix of a random effect as suggested by Endelman 2023 [32], and leaves the residuals independent, identically distributed (i.i.d.):

$$\begin{aligned} \varvec{\hat{\beta }_{all}} = \varvec{1}\mu + Z_e \varvec{u_e} + Z_a \varvec{u_a} + (Z_{Gres} \varvec{u_{Gres}}) + Z_s \varvec{u_s} + \varvec{\epsilon } \end{aligned}$$
(7)

The same nomenclature as in Eq. 6 has been used where applicable. \(\varvec{u_s} \sim N(\varvec{0}, EEV_{all})\) is the random effect for the stage 1 error, which contains the true error. As in the previous models, no variance component for the stage 1 error term is estimated as it is already implicitly included within \(EEV_{all}\) (more details in Appendix 3). In this model, \(\varvec{\epsilon } \sim N(\varvec{0}, I\sigma ^2_\epsilon )\), indicating that the residuals are i.i.d., which provides more flexibility to the model. For instance, if the \(Z_{Gres} \varvec{u_{Gres}}\) term is not included, the non-additive genotypic effects can be absorbed by the residuals, preventing them from interfering with \(EEV_{all}\). Therefore, we expect this model to work well with and without the \(Z_{Gres} \varvec{u_{Gres}}\) effect.


Fully-efficient second-stage, diagonal EEV in random effect

This model (Diag_R) is similar to Full_R (Eq. 7), but only the diagonal elements of \(EEV_{all}\) are used, i.e. \(\varvec{u_s} \sim N(\varvec{0}, diag(EEV_{all}^{-1})^{-1})\) as explained for Diag_Res model. We expect this model to perform similarly but slightly worse than Full_R due to the loss of information when discarding the off-diagonals of \(EEV_{all}^{-1}\).

StageWise replication with open-source software

Apart from comparing the different implementations of the fully-efficient models, another aim of this work is to provide an open-source alternative (R packages lme4 [34], version 1.1\(-\)35.1 and Sommer [35], version 4.3.1) to the commonly used ASReml-based approach and to demonstrate it is equivalent. To that end, we have replicated the vignettes of StageWise package [32], version 1.05 (using ASReml [33] version 4.2.0.257). All fully-efficient models in StageWise are based on the Full_R model (Eqs. 3, 7) but they are adapted to the different needs in the examples provided (e.g. different covariates in stage 1 models, the inclusion of genotype by environment (GxE) interactions or multitrait analyses in stage 2 models, etc.). Here, we briefly describe the content of the different vignettes. For more details, all R scripts and a detailed explanation are available on GitHub, Vignettes 1 through 3 (https://github.com/TheRocinante-lab/Publications/tree/main/2024/Fernandez-Gonzalez_et_al_Fully_Efficient).

Vignette 1

This vignette focuses on models without GxE or multi-trait analyses. The stage 1 model used is the same as the one in Eq. 3 with the addition of a fixed covariate for “stand.count” and replacing the 2D P-splines spatial analysis by random, i.i.d. block effects. The stage 2 models can be summarized as:

$$\begin{aligned} \varvec{\hat{\beta }_{all}} = X_e \varvec{\beta _e} - \varvec{F}b + Z_{g_{main}} \varvec{u_{g_{main}}} + (Z_{g_{non.add}} \varvec{u_{g_{non.add}}}) + \varvec{s} + \varvec{\epsilon } \end{aligned}$$
(8)

where applicable, the same nomenclature as in Eq. 7 has been used. Additionally, \(\varvec{u_{g_{main}}} \sim N(\varvec{0}, G_{str}\sigma ^2_{main})\) is a vector with the main random genotypic effects. \(G_{str}\) can be an identity matrix, a genomic relationship matrix [2] or a blending between genomic and pedigree relationship matrices [55]. \(\varvec{u_{g_{non.add}}} \sim N(\varvec{0}, D_{str}\sigma ^2_{non.add})\) is a vector of non-additive random genotypic effects. This effect is between brackets to indicate that it is never present when \(G_{str}\) is the identity matrix, as in that case the main effect already includes non-additive genotypic effects. \(D_{str}\) can be an identity matrix or a genomic dominance relationship matrix [56]. \(-\varvec{F}b\) is a fixed covariate for directional dominance (i.e. heterosis), where \(\varvec{F}\) is a vector of genomic inbreeding coefficients (e.g., computed from the main diagonal of additive relationship matrix [32]), with the regression coefficient b. \(-\varvec{F}b\) effect is only present when \(D_{str}\) is a dominance relationship matrix.

A genome-wide association study (GWAS) based on the model above was also performed and a fixed effect for the significant SNPs can subsequently be included in the model. More details are available in the GitHub file for Vignette1.

Vignette 2

This vignette includes genotype by location interactions (gxLoc). Stage 1 models are similar to the one in Eq. 3 without the spatial effect. The stage 2 models used are as follows:

$$\begin{aligned} \varvec{\hat{\beta }_{all}} = X_e \varvec{\beta _e} - \varvec{F}b + Z_{g_{main}} \varvec{u_{g_{main}}} + (Z_{g_{non.add}} \varvec{u_{g_{non.add}}}) + Z_{gxLoc} \varvec{u_{gxLoc}} + \varvec{s} + \varvec{\epsilon } \end{aligned}$$
(9)

All terms are similar to Eq. 8. The only difference is the addition of the genotype by location random effect \(\varvec{u_{gxLoc}} \sim N(\varvec{0}, G_{str} \otimes \Sigma _{Loc})\), where \(G_{str} \otimes \Sigma _{Loc}\) is the Kronecker product between the \(G_{str}\) matrix and a variance-covariance matrix between locations (\(\Sigma _{Loc}\)) estimated by the model using the factor analysis framework (FA2, more details in Appendix 4).

Vignette 3

In this vignette, multi-trait models are introduced, which are substantially different from the previous ones. Independent stage 1 models are made for each environment j. Within each environment, the model for replication l of genotype i and trait k is as follows:

$$\begin{aligned} y_{ijkl} = g_{ijk} + x_{ijl}b_{jk} + u_{ijk} + \epsilon _{ijkl} \end{aligned}$$
(10)

where \(y_{ijkl}\) is the phenotype; \(g_{ijk}\) is the fixed genotypic effect nested within the trait; \(b_{jk}\) is a fixed regression coefficient for the stand count covariate nested within the trait and \(x_{ijl}\) is the value for stand count of genotype i in location j, replication l; \(u_{ijk}\) is a random block effect nested within trait with \(var(\varvec{u_{j}}) = I \otimes \Sigma _{u_j}\), where \(\Sigma _{u_j}\) indicates a diagonal variance-covariance structure across traits; \(\epsilon _{ijkl}\) is the residual with \(var(\varvec{\epsilon _{j}}) = I \otimes \Sigma _{\epsilon _j}\), where \(\Sigma _{\epsilon _j}\) indicates an unstructured variance-covariance structure across traits.

The genotypic BLUEs for each trait across all locations and their corresponding EEV matrices are combined as in Eqs. 4, 5 and they are used as inputs for the following stage 2 model:

$$\begin{aligned} \hat{\beta }_{ijk} = E_{jk} - F_i b_k + a_{ik} + (d_{ik}) + s_{ijk} + \epsilon _{ijk} \end{aligned}$$
(11)

where \(\hat{\beta }_{ijk}\) is the BLUE for genotype i, trait k in environment j; \(E_{jk}\) is the fixed environmental effect nested within trait; \(b_k\) is a fixed directional dominance regression coefficient nested within trait and \(F_i\) is the inbreeding value for genotype i; \(a_{ik}\) is an additive genotypic value following \(var(\varvec{a}) = G_{str} \otimes \Sigma _{a}\); \(d_{ik}\) is a dominance genotypic value following \(var(\varvec{d}) = D_{str} \otimes \Sigma _{d}\); \(s_{ijk}\) is the random stage 1 error effect with \(var(\varvec{s}) = EEV_{all}\); \(\epsilon _{ijk}\) is the residual with \(var(\varvec{\epsilon }) = I \otimes \Sigma _{\epsilon }\). \(\Sigma _{a}\), \(\Sigma _{d}\) and \(\Sigma _{\epsilon }\) are unstructured variance-covariance matrices across traits. \(d_{ik}\) is between brackets to indicate that it is an optional term.

Availability of data and materials

The oat dataset on which the simulations were based was described in [49]. Example data used for the comparison of Sommer and StageWise are publicly available in the latter package [32]. Example code for the analyses performed here is available in GitHub Vignettes (https://github.com/TheRocinante-lab/Publications/tree/main/2024/Fernandez-Gonzalez_et_al_Fully_Efficient). Vignette0: Example of the different kinds of two-stage models that can be fit to multi-environment trials with open-source software. Here, we showcase and compare the single-stage model, an unweighted two-stage model, and the four different types of weighted two-stage models outlined in this work. Vignette1: Replication of the Vignette1 from StageWise R package using open-source software. Single trait models without genotype by environment interactions. Vignette2: Replication of the Vignette2 from StageWise R package using open-source software. Single trait models with genotype by environment interactions. Vignette3: Replication of the Vignette3 from StageWise R package using open-source software. Multi-trait models.

References

  1. Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. VanRaden P. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23.

    Article  CAS  PubMed  Google Scholar 

  3. Morota G, Gianola D. Kernel-based whole-genome prediction of complex traits: a review. Front Genet. 2014;5:363.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Cuevas J, et al. Genomic prediction of genotype\(\times\) environment interaction kernel regression models. Plant Genome. 2016. https://doi.org/10.3835/plantgenome2016.03.0024.

    Article  PubMed  Google Scholar 

  5. Habier D, Fernando RL, Dekkers J. The impact of genetic relationship information on genome-assisted breeding values. Genetics. 2007;177:2389–97.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the Bayesian alphabet for genomic selection. BMC Bioinform. 2011;12:186. https://doi.org/10.1186/1471-2105-12-186.

    Article  Google Scholar 

  7. Endelman JB. Ridge regression and other kernels for genomic selection with r package rrblup. Plant Genome. 2011;4:250–5. https://doi.org/10.3835/plantgenome2011.08.0024.

    Article  Google Scholar 

  8. Gianola D. Priors in whole-genome regression: the Bayesian alphabet returns. Genetics. 2013;194:573–96.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Wolc A, Dekkers JC. Application of Bayesian genomic prediction methods to genome-wide association analyses. Genet Sel Evol. 2022;54:31.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Long N, Gianola D, Rosa GJ, Weigel KA, Avendano S. Machine learning classification procedure for selecting snps in genomic selection: application to early mortality in broilers. J Animal Breed Genet. 2007;124:377–89.

    Article  CAS  Google Scholar 

  11. Ogutu JO, Piepho H-P, Schulz-Streeck T. A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proc. 2011;5:1–5.

    Article  Google Scholar 

  12. McDowell R. Genomic selection with deep neural networks. Master’s thesis, Iowa state university 2016.

  13. Pérez-Enciso M, Zingaretti LM. A guide on deep learning for complex trait genomic prediction. Genes. 2019;10:553.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Montesinos-López OA, et al. A review of deep learning applications for genomic selection. BMC Genomics. 2021;22:1–23.

    Article  Google Scholar 

  15. Smith A, Cullis B, Gilmour A. Applications: the analysis of crop variety evaluation data in Australia. Aust N Z J Stat. 2001;43:129–45.

    Article  Google Scholar 

  16. Caliński T, Czajka S, Kaczmarek Z, Krajewski P, Pilarczyk W. Analyzing multi-environment variety trials using randomization-derived mixed models. Biometrics. 2005;61:448–55.

    Article  PubMed  Google Scholar 

  17. Cullis B, Gogel B, Verbyla A, Thompson R. Spatial analysis of multi-environment early generation variety trials. Biometrics. 1998;54:1–18.

    Article  Google Scholar 

  18. Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31:423–47.

    Article  CAS  PubMed  Google Scholar 

  19. Frensham A, Cullis B, Verbyla A. Genotype by environment variance heterogeneity in a two-stage analysis. Biometrics. 1997;53:1373–83.

    Article  Google Scholar 

  20. Möhring J, Piepho H. Comparison of weighting in two-stage analyses of series of experiments. Crop Sci. 2009;49:1977–88.

    Article  Google Scholar 

  21. Piepho H-P, Möhring J, Schulz-Streeck T, Ogutu JO. A stage-wise approach for the analysis of multi-environment trials. Biometr J. 2012;54:844–60.

    Article  Google Scholar 

  22. Elias AA, Robbins KR, Doerge R, Tuinstra MR. Half a century of studying genotype\(\times\) environment interactions in plant breeding experiments. Crop Sci. 2016;56:2090–105.

    Article  Google Scholar 

  23. Costa-Neto G, Fritsche-Neto R. Enviromics: bridging different sources of data, building one framework. Crop Breed Appl Biotechnol. 2021;21:e393521S12.

    Article  CAS  Google Scholar 

  24. Crespo-Herrera L, et al. Genome-enabled prediction for sparse testing in multi-environmental wheat trials. Plant Genome. 2021;14:e20151.

    Article  CAS  PubMed  Google Scholar 

  25. Schulz-Streeck T, Ogutu JO, Piepho H-P. Comparisons of single-stage and two-stage approaches to genomic selection. Theor Appl Genet. 2013;126:69–82.

    Article  PubMed  Google Scholar 

  26. Damesa TM, Möhring J, Worku M, Piepho H-P. One step at a time: stage-wise analysis of a series of experiments. Agron J. 2017;109:845–57.

    Article  Google Scholar 

  27. Damesa TM, et al. Comparison of weighted and unweighted stage-wise analysis for genome-wide association studies and genomic selection. Crop Sci. 2019;59:2572–84.

    Article  Google Scholar 

  28. Gogel B, Smith A, Cullis B. Comparison of a one-and two-stage mixed model analysis of Australia’s national variety trial southern region wheat data. Euphytica. 2018;214:1–21.

    CAS  Google Scholar 

  29. Verbyla A. On two-stage analysis of multi-environment trials. Euphytica. 2023;219:121.

    Article  Google Scholar 

  30. Buntaran H, Forkman J, Piepho H-P. Projecting results of zoned multi-environment trials to new locations using environmental covariates with random coefficient models: accuracy and precision. Theor Appl Genet. 2021;134:1513–30.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Falconer DS, Mackay TF. Quantitative genetics. London: Longman; 1983.

    Google Scholar 

  32. Endelman JB. Fully efficient, two-stage analysis of multi-environment trials with directional dominance and multi-trait genomic selection. Theor Appl Genet. 2023;136:65.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Butler D, Cullis B, Gilmour A, Gogel B, Thompson R. Asreml-r reference manual version 4. Hemel Hempstead: VSN International Ltd.; 2017.

    Google Scholar 

  34. Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67:1–48.

    Article  Google Scholar 

  35. Covarrubias-Pazaran G. Genome assisted prediction of quantitative traits using the r package sommer. PLoS ONE. 2016;11:1–15.

    Article  Google Scholar 

  36. Moehring J, Williams ER, Piepho H-P. Efficiency of augmented p-rep designs in multi-environmental trials. Theor Appl Genet. 2014;127:1049–60.

    Article  PubMed  Google Scholar 

  37. Federer WT, Crossa JI. 4 screening experimental designs for quantitative trait loci, association mapping, genotype-by environment interaction, and other investigations. Front Physiol. 2012;3:156.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Cullis BR, Smith AB, Coombes NE. On the design of early generation variety trials with correlated data. J Agric Biol Environ Stat. 2006;11:381–93.

    Article  Google Scholar 

  39. Cullis BR, Smith AB, Cocks NA, Butler DG. The design of early-stage plant breeding trials using genetic relatedness. J Agric Biol Environ Stat. 2020;25:553–78.

    Article  Google Scholar 

  40. Williams E, Piepho H-P, Whitaker D. Augmented p-rep designs. Biometr J. 2011;53:19–27.

    Article  Google Scholar 

  41. Isidro J, et al. Training set optimization under population structure in genomic selection. Theor Appl Genet. 2015;128:145–58.

    Article  PubMed  Google Scholar 

  42. Fernandez-Gonzalez J, Akdemir D, Isidro y Sanchez J. A comparison of methods for training population optimization in genomic selection. Theor Appl Genet. 2023;136:30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Fisher RA. The arrangement of field experiments. J Minist Agric. 1926;33:503–15.

    Google Scholar 

  44. Casler MD. Fundamentals of experimental design: Guidelines for designing successful experiments. Agron J. 2015;107:692–705.

    Article  Google Scholar 

  45. Zystro J, Colley M, Dawson J. Alternative experimental designs for plant breeding. Plant Breed Rev. 2018;42:87–117.

    Article  Google Scholar 

  46. Endelman JB, et al. Optimal design of preliminary yield trials with genome-wide markers. Crop Sci. 2014;54:48–59.

    Article  Google Scholar 

  47. Lush J. Animal breeding plans. 2nd ed. Ames: Iowa State College Press; 1943.

    Google Scholar 

  48. Galli G, et al. Impact of phenotypic correction method and missing phenotypic data on genomic prediction of maize hybrids. Crop Sci. 2018;58:1481–91.

    Article  Google Scholar 

  49. Canales FJ, et al. Population genomics of mediterranean oat (a. sativa) reveals high genetic diversity and three loci for heading date. Theor Appl Genet. 2021. https://doi.org/10.1007/s00122-021-03805-2.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Granato I, Fritsche-Neto R. snpReady: preparing genotypic datasets in order to run genomic analysis. 2018. https://CRAN.R-project.org/package=snpReady. R package version 0.9.6.

  51. Rodríguez-Álvarez MX, Boer MP, van Eeuwijk FA, Eilers PH. Correcting for spatial heterogeneity in plant breeding experiments with p-splines. Sp Stat. 2017;23:52–71. https://doi.org/10.1016/j.spasta.2017.10.003.

    Article  Google Scholar 

  52. Lee D-J, Durbán M, Eilers P. Efficient two-dimensional smoothing with p-spline anova mixed models and nested bases. Comput Stat Data Anal. 2013;61:22–37.

    Article  Google Scholar 

  53. Eilers PH, Marx BD, Durbán M. Twenty years of p-splines. SORT Stat Oper Res Trans. 2015;39:0149–86.

    Google Scholar 

  54. Garrick DJ, Taylor JF, Fernando RL. Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet Sel Evol. 2009;41:1–8.

    Article  Google Scholar 

  55. Legarra A, Aguilar I, Misztal I. A relationship matrix including full pedigree and genomic information. J Dairy Sci. 2009;92:4656–63.

    Article  CAS  PubMed  Google Scholar 

  56. Vitezica ZG, Varona L, Legarra A. On the additive and dominant variance and covariance of individuals within the genomic selection scope. Genetics. 2013;195:1223–30.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Xu ZC, Zhu J. An approach for predicting heterosis based on an additive, dominance and additivexadditive model with environment interaction. Heredity. 1999;82:510–7.

    Article  PubMed  Google Scholar 

  58. Searle SR, Casella G, McCulloch CE. Variance components. Hoboken: John Wiley & Sons; 2009.

    Google Scholar 

Download references

Funding

J.I.y.S and JFG were supported by grant PID2021-123718OB-I00 funded by MCIN/AEI/10.13039/501 100 011 033 and by “ERDF A way of making Europe,CEX2020-000999-S. J.I.y.S was also supported by the Severo Ochoa Program for Centers of Excellence in R&D from the “Agencia Estatal de Investigación” of Spain to the CBGP. JFG was also supported by the grant FPU22/02543 from Ministerio de Ciencia, Innovación y Universidades of Spain.

Author information

Authors and Affiliations

Authors

Contributions

JFG and JIS jointly conceived the study, discussed the results, and wrote the article. JIS was responsible for funding acquisition. JFG performed the statistical analyses and prepared the figures.

Corresponding authors

Correspondence to Javier Fernández-González or Julio Isidro y Sánchez.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Appendices

Appendices

Appendix 1: History of fully-efficient, two-stage models

Let us consider the single-stage model presented in the main text (Eq. 2). It would typically be split into two-stages using an unweighted (UNW) model, as indicated in Eqs. 3 and 6. Using this as a starting point, we will trace the historical development of fully-efficient alternatives to the UNW model.

The main problem addressed by the fully-efficient models is that, in most experimental designs, the Stage 1 BLUEs (\(\hat{\varvec{\beta _{all}}}\)) have non-independent, non-identically distributed errors. Specifically, the \(EEV_{all}\) matrix is not a diagonal matrix with uniform values. However, in Stage 2 of the UNW model (Eq. 6), the residuals are assumed to be i.i.d. Therefore, there is a mismatch between the errors of the data used to train the model and the assumed error structure, which can be detrimental to model performance.

The first attempt to produce a fully-efficient model was the approach by Smith et al. [15], which accounted for the heterogeneous error variances, i.e. the diagonal values of \(EEV_{all}\). This is the Diag_Res model explained in the methodology section, with \(\varvec{\epsilon } \sim N(\varvec{0}, diag(EEV_{all}^{-1})^{-1})\). This adjustment brings the assumed error variance much closer to the true one. However, the error covariances are neglected to obtain sparser matrices, resulting in a small reduction in computational time but making the model not completely fully-efficient.

The next improvement was the approach by Piepho et al. [21], which utilized the full \(EEV_{all}\) matrix and made the model fully-efficient. This is the Full_Res model described in the main text, a weighted regression model with \(\varvec{\epsilon } \sim N(\varvec{0}, EEV_{all})\). As a result, the assumed covariance structure of the residuals perfectly matched the true error variance of the response variable, resulting in a model theoretically equivalent to the single-stage model. The only deviation this model presented from the single-stage model is because the true values for \(EEV_{all}\) are not known. The calculation of \(EEV_{all}\) is done through the Henderson equations [18] (more details in Appendix 5) and depends on the estimation of variance components in the Stage 1 model. These variance components are estimated via REML, which is unbiased but presents some error. Thus, the values in \(EEV_{all}\) will also be estimations with some error.

Another disadvantage of the weighted regression with \(\varvec{\epsilon } \sim N(\varvec{0}, EEV_{all})\), is that the residual covariance structure is no longer sparse, which increases the computational time and complicates implementation in common software packages. Piepho [21] proposed an alternative approach, the rotations, which are equivalent to the weighted regression but address these issues. The basic concept of the rotations is straightforward: instead of changing the model, the data is transformed. If the response variable (\(\hat{\varvec{\beta _{all}}}\)) can be scaled to be i.i.d., then it will match the UNW model and can be used without loss of efficiency. This can be easily achieved using the following property of the multiplication of any given fixed matrix (A) by any random variable \(\varvec{x} \sim N(\varvec{\mu }, \Sigma )\):

$$\begin{aligned} var(A\varvec{x}) = A var(\varvec{x}) A' = A \Sigma A' \end{aligned}$$
(12)

Thus, from Eq. 12 it is evident that we can scale \(\hat{\varvec{\beta _{all}}}\) to unit variance. This is what Piepho et al. [21] called the rotated BLUEs (\(\tilde{\varvec{\beta _{all}}}\)):

$$\begin{aligned} \tilde{\varvec{\beta _{all}}} = EEV_{all}^{-\frac{1}{2}} \hat{\varvec{\beta _{all}}} \\ var(\tilde{\varvec{\beta _{all}}}) = EEV_{all}^{-\frac{1}{2}} var(\hat{\varvec{\beta _{all}}}) (EEV_{all}^{-\frac{1}{2}})' = EEV_{all}^{-\frac{1}{2}} EEV_{all} EEV_{all}^{-\frac{1}{2}} = I \end{aligned}$$

This can be implemented into the Stage 2 model. To that end, let’s start with the model used in the weighted regression in Full_Res:

$$\begin{aligned} \varvec{\hat{\beta }_{all}} = \varvec{1}\mu + Z_e \varvec{u_e} + Z_a \varvec{u_a} + \varvec{\epsilon }; \end{aligned}$$
(13)

where the same nomenclature as in Eq. 6 is used but it is assumed that \(\varvec{\epsilon } \sim N(\varvec{0}, EEV_{all})\). Next, we can introduce the scaling factor \(EEV_{all}^{-\frac{1}{2}}\) by multiplying in both sides of the equation:

$$\begin{aligned} EEV_{all}^{-\frac{1}{2}} \varvec{\hat{\beta }_{all}} = EEV_{all}^{-\frac{1}{2}}(\varvec{1}\mu + Z_e \varvec{u_e} + Z_a \varvec{u_a} + \varvec{\epsilon }) \end{aligned}$$

Rearranging and renaming:

$$\begin{aligned} \tilde{\varvec{\beta _{all}}} = (EEV_{all}^{-\frac{1}{2}}\varvec{1})\mu + (EEV_{all}^{-\frac{1}{2}}Z_e) \varvec{u_e} + (EEV_{all}^{-\frac{1}{2}}Z_a) \varvec{u_a} + EEV_{all}^{-\frac{1}{2}}\varvec{\epsilon } \end{aligned}$$
$$\begin{aligned} \tilde{\varvec{\beta _{all}}} = \tilde{\varvec{1}}\mu + \tilde{Z_e} \varvec{u_e} + \tilde{Z_a} \varvec{u_a} + \tilde{\varvec{\epsilon }} \end{aligned}$$
(14)

where \(\tilde{\varvec{1}} = EEV_{all}^{-\frac{1}{2}}\varvec{1}\), \(\tilde{Z_e} = EEV_{all}^{-\frac{1}{2}} Z_e\) and \(\tilde{Z_a} = EEV_{all}^{-\frac{1}{2}} Z_a\) are the rotated design matrices and \(\tilde{\varvec{\epsilon }} = EEV_{all}^{-\frac{1}{2}} \varvec{\epsilon }\) are the rotated residuals, which are i.i.d., as proven by:

$$\begin{aligned} var(\tilde{\varvec{\epsilon }}) = var(EEV_{all}^{-\frac{1}{2}}\varvec{\epsilon }) = EEV_{all}^{-\frac{1}{2}} EEV_{all} (EEV_{all}^{-\frac{1}{2}})' = I \end{aligned}$$

Thanks to rotations, the model has an i.i.d. response variable and i.i.d. residuals, thereby aligning the model’s assumptions with reality. It can be proven that the rotated model in Eq. 14 is equivalent to the weighted regression model in Eq. 13 using the Henderson equations [18]. We will start with the Henderson equations for the rotated model and then rearrange them to derive the Henderson equations for the weighted regression model:

$$\begin{aligned} \begin{bmatrix} \tilde{\varvec{1}}'R^{-1}\tilde{\varvec{1}} & \tilde{\varvec{1}}'R^{-1} \tilde{Z_e} & \tilde{\varvec{1}}'R^{-1} \tilde{Z_a} \\ \tilde{Z_e}'R^{-1}\tilde{\varvec{1}} & \tilde{Z_e}'R^{-1} \tilde{Z_e} + (I\hat{\sigma ^2_e})^{-1} & \tilde{Z_e}'R^{-1} \tilde{Z_a} \\ \tilde{Z_a}'R^{-1}\tilde{\varvec{1}} & \tilde{Z_a}'R^{-1} \tilde{Z_e} & \tilde{Z_a}'R^{-1} \tilde{Z_a} + (G_{str}\hat{\sigma ^2_a})^{-1} \end{bmatrix} \begin{bmatrix} \varvec{\hat{\mu }} \\ \varvec{\hat{e}} \\ \varvec{\hat{a}} \\ \end{bmatrix} = \begin{bmatrix} \tilde{\varvec{1}}'R^{-1}\tilde{\varvec{\beta _{all}}} \\ \tilde{Z_e}'R^{-1}\tilde{\varvec{\beta _{all}}} \\ \tilde{Z_a}'R^{-1}\tilde{\varvec{\beta _{all}}} \end{bmatrix} \end{aligned}$$

From the fact that \(var(\tilde{\varvec{\epsilon }}) = R = I\) and decomposing each rotated element (\(\tilde{\varvec{1}}\), \(\tilde{Z_e}\), \(\tilde{Z_a}\) and \(\tilde{\varvec{\beta _{all}}}\)) into \(EEV^{-\frac{1}{2}}\) multiplying the unrotated original element, we can rearrange the former equation into:

$$\begin{aligned} \begin{bmatrix} \varvec{1}'EEV_{all}^{-1}\varvec{1} & \varvec{1}'EEV_{all}^{-1} Z_e & \varvec{1}'EEV_{all}^{-1} Z_a \\ Z_e'EEV_{all}^{-1}\varvec{1} & Z_e'EEV_{all}^{-1} Z_e + (I\hat{\sigma ^2_e})^{-1} & Z_e'EEV_{all}^{-1} Z_a \\ Z_a'EEV_{all}^{-1}\varvec{1} & Z_a'EEV_{all}^{-1} Z_e & Z_a'EEV_{all}^{-1} Z_a + (G_{str}\hat{\sigma ^2_a})^{-1} \end{bmatrix} \begin{bmatrix} \varvec{\hat{\mu }} \\ \varvec{\hat{e}} \\ \varvec{\hat{a}} \\ \end{bmatrix} = \begin{bmatrix} \varvec{1}'EEV_{all}^{-1}\hat{\varvec{\beta _{all}}} \\ Z_e'EEV_{all}^{-1}\hat{\varvec{\beta _{all}}} \\ Z_a'EEV_{all}^{-1}\hat{\varvec{\beta _{all}}} \end{bmatrix} \end{aligned}$$

Which is the exact Henderson equation for the weighted regression model (Eq. 13), proving its equivalence to the rotated model.

Using rotations allows for a sparser diagonal matrix (the identity) for the residuals, which can accelerate computations. However, the rotated design matrices are often less sparse than the original ones, potentially negating some of the computational gains. In any case, with improvements in computational hardware and modelling software, these small differences have become less critical. Furthermore, introducing the rotated design matrices into the most commonly used statistical software is often very challenging, making the weighted regression approach almost universally preferable. This is why the rotations are rarely seen in the literature.

The Piepho et al. [21] approach has been used in numerous studies [21, 25,26,27,28,29,30] and, to our knowledge, was the only fully-efficient method to leverage the full \(EEV_{all}\) matrix until a recent paper by Endelman [32]. Endelman proposed using \(EEV_{all}\) as the covariance structure of a random effect (\(\varvec{s}\)) rather than for the residuals, as explained in the main text for model Full_R, Eq. 7.

The rationale behind this is that models are usually a simplification of reality. Phenotypes can be influenced by numerous and complex interaction terms such as dominance, epistasis, additive by environment interaction, dominance by environment interaction, epistasis by environment interaction, etc [57]. Modelling all these effects is often not viable. Instead, only the ones of interest are included in the model as random effects, while the rest are absorbed by the residuals. However, in this work, we demonstrated that the Piepho et al. [21] approach (Model Full_Res) is not able to work under these conditions. Because the residuals are constrained to follow \(EEV_{all}\) as the covariance structure, they can only accommodate the Stage 1 estimation errors and cannot absorb sources of phenotypic variability not explicitly included in the model as fixed or random effects.

The Endelman [32] approach (model Full_R) addresses this issue. The Stage 1 estimation errors are modeled by the \(\varvec{s}\) term, leaving the residuals free to absorb any other non-modelled sources of phenotypic variability. Therefore, this model provides all the advantages of the Piepho et al. [21] approach while substantially increasing versatility and reducing the risk of the model underperforming due to unexpected sources of phenotypic variability.

Appendix 2: Fully-efficient, three-stage model example

When fitting mixed models to typical multi-environment field trials, there are three potential steps: (1) estimation of spatial effects within each location, (2) estimation of environmental effects across environments, and (3) use of genomic information to split genotypic values into additive and non-additive terms. Commonly, a two-stage analysis is performed in which steps 2 and 3 are merged into a single one, allowing for the estimation of genotype by environment interactions and generally maintaining a reasonable computational cost. This is the strategy followed in the main text of this manuscript. However, here we showcase how a fully-efficient, three-stage model can be fit. We will use the single-stage model described in the main text in Eq. 2 as a reference and split it into three stages:


Stage 1 model: In a similar way to the two-stage methodology, a different stage 1 model is fit to each

\(j^{th}\) environment independently:

$$\begin{aligned} \varvec{y_j} = X_j \varvec{\beta _j} + Z_j \varvec{u_{p_j}} + \varvec{\epsilon _j} \end{aligned}$$

where \(\varvec{y_j}\) contains the phenotypic observations in environment j; \(\varvec{\beta _j}\) is a vector of fixed genotypic effects and \(X_j\) is its corresponding design matrix; \(\varvec{u_{p_j}}\) is a vector of random spatial effects following the 2D P-splines framework [35, 51,52,53] and \(Z_{p_j}\) is its corresponding design matrix; \(\varvec{\epsilon _j} \sim N(\varvec{0}, I\sigma ^2_{\epsilon _j})\) is a vector of residuals. I is the identity matrix of appropriate dimensions and \(\sigma ^2_{\epsilon _j}\) is the residual variance component. The genotypic BLUEs from each environment (\(\varvec{\hat{\beta }_j}\)) are merged by stacking them and their associated EEV matrices are merged through their direct sum as shown in Eqs. 4 and 5. The resulting \(\varvec{\hat{\beta }_{all}}\) and \(EEV_{all}\) are used as the inputs to the second-stage:


Stage 2 model: In this model we simply estimate and remove environmental effects:

$$\begin{aligned} \varvec{\hat{\beta }_{all}} = X\varvec{\beta _2} + Z_e \varvec{u_e} + Z_s \varvec{u_{s}} + \varvec{\epsilon } \end{aligned}$$

where \(\varvec{\beta _2}\) is a vector of fixed genotypic values with design matrix X, \(\varvec{u_e} \sim N(\varvec{0}, I\sigma ^2_l)\) is a vector of random environmental effects with design matrix \(Z_e\), \(\varvec{u_{s}} \sim N(\varvec{0}, EEV_{all})\) is a vector of stage 1 errors with design matrix \(Z_s\) and \(\varvec{\epsilon }\) is a vector of residuals with heterogeneous variance across environments. This model allows to estimate a single adjusted phenotype for each genotype (\(\varvec{\hat{\beta _2}}\)), as all non-genetic terms have been estimated and removed. These BLUEs and their associated errors (\(EEV_2\)) are carried into the third and final stage:


Stage 3 model: In this model we split the genetic values into additive and non-additive:

$$\begin{aligned} \varvec{\hat{\beta }_{2}} = \varvec{1}\mu + Z_a \varvec{u_a} + Z_{s2} \varvec{u_{s2}} + \varvec{\epsilon } \end{aligned}$$
(15)

where \(\mu\) is a fixed intercept, \(\varvec{u_a} \sim N(\varvec{0}, G_{str}\sigma ^2_a)\) is a vector of additive genotypic BLUPs with design matrix \(Z_a\), \(\varvec{u_{s2}} \sim N(\varvec{0}, EEV_{2})\) is a vector of stage 2 errors with design matrix \(Z_{s2}\) and \(\epsilon \sim N(\varvec{0}, I\sigma ^2_\epsilon )\) is a vector of residuals, which contains the non-additive genotypic values.

It is important to note that in Eq. 7, a \(Z_{Gres} \varvec{u_{Gres}}\) genetic residual term for the non-additive genotypic values was used. We did not use a similar term here because, as there is only a single observation per genotype in Eq. 15, it would be redundant with the residuals (both the residuals and this effect would have one coefficient per observation and both would be i.i.d.), making the model impossible to fit. Instead, we have let the non-additive genetic effects go into the residuals. Alternatively, it would also have been possible to remove the \(\varvec{u_{s2}}\) effect and set \(EEV_{2}\) as the variance-covariance of the residuals, which would allow the inclusion of the \(Z_{Gres} \varvec{u_{Gres}}\) term without any problem.

Appendix 3: Variance-covariance matrix and variance-covariance structure

In the literature, it is common to not make an explicit distinction between variance-covariance matrix and structure, which at times may be a source of confusion. It is, however, very important to make this distinction clear when using \(EEV_{all}\) as a variance-covariance matrix in fully-efficient two-stage modelling. Therefore, in this section we will explore this distinction in detail.

Let’s imagine we have a generic random effect \(\varvec{u} \sim N(\varvec{0}, G_{str} \sigma ^2_{u})\), where \(G_{str}\) is its variance-covariance structure and \(\sigma ^2_{u}\) is its variance component. \(G_{str}\) defines the patterns of covariance in the population, i.e. the relative values between the different covariances in the matrix. For instance, a genomic relationship matrix would be a variance-covariance structure. \(\sigma ^2_{u}\) acts as a scaling factor, indicating the amount of variation present in the random effect \(\varvec{u}\). The variance-covariance matrix (G) of \(\varvec{u}\) would simply be the product of its variance-covariance structure and its variance component, i.e. \(G = G_{str} \sigma ^2_{u} \rightarrow \varvec{u} \sim N(\varvec{0}, G_{str} \sigma ^2_{u}) \rightarrow \varvec{u} \sim N(\varvec{0}, G)\). Here, we use the subindex “str” to differentiate the variance-covariance structure from the variance-covariance matrix.

Finding the variance-covariance matrix from the structure and the variance component is straightforward, but the inverse process is not so simple, which can be a problem because the variance component is often of interest, e.g., for heritability calculation. This is especially relevant in fully-efficient, two-stage models for the Stage 1 error (\(\varvec{s} \sim N(\varvec{0}, EEV_{all})\)) term in the second-stage model (Eq. 7), as the \(EEV_{all}\) matrix is a variance-covariance matrix, not a variance-covariance structure. Therefore, we need to find a way to calculate the overall variability represented by \(EEV_{all}\), i.e. extract its variance component. This is done in StageWise package [32] by calculating \(\sigma ^2_s = mean(diag(EEV_{all}))-mean(EEV_{all})\), where diag() extracts the main diagonal of a matrix and mean() computes the average of all elements of a matrix or vector. This is the expected variance value of a sample from a distribution following the multivariate covariance matrix \(EEV_{all}\) [58]. Note that \(mean(diag(EEV_{all}))-mean(EEV_{all})\) will output larger values for large diagonal values (variances) and small off-diagonals (covariances), and therefore high covariances actually reduce the overall variability described by a matrix.

We have used this methodology while replicating StageWise with open-source software for the sake of consistency, but it is noteworthy that there are other alternatives. For instance, for the scaling of genomic relationship matrices to unit variance, the snpReady package [50] uses \(\sigma ^2 = mean(diag(\Sigma ))\), where \(\sigma ^2\) is the variance component used as scaling factor and \(\Sigma\) is the variance-covariance matrix. While this approach ignores the covariances, it has a robust mathematical background, as the average of the main diagonal of a variance-covariance matrix is equal to the average of its eigenvalues and can be interpreted as the overall average variability in a variance-covariance matrix. Therefore, it could also be possible to calculate the variance component for the EEV simply as \(\sigma ^2_s = mean(diag(EEV_{all}))\).

The distinction between the variance-covariance matrix and structure also plays an important role in the two most common formulations of the Henderson equations, as one is based on variance-covariance matrices and the other one, on structures and variance components. This is relevant here since EEV and PEV are calculated from the inverse of the coefficient matrix (CoeffMat). Its most general form is the following:

$$\begin{aligned} CoeffMat^{-1} = \begin{bmatrix} X'R^{-1}X & X'R^{-1}Z \\ Z'R^{-1}X & Z'R^{-1}Z + G^{-1} \end{bmatrix}^{-1} \end{aligned}$$
(16)

where X and Z are the design matrices for fixed and random effects respectively and R and G are the variance-covariance matrices for the residuals and random effects. By splitting both R and G into their variance components and variance-covariance structures (again identified with a “str” subindex for the sake of clarity, although this is not standard nomenclature), we can make the following rearrangements:

$$\begin{aligned} CoeffMat^{-1} = \begin{bmatrix} X'R_{str}^{-1}\frac{1}{\sigma ^2_\epsilon } X & X'R_{str}^{-1}\frac{1}{\sigma ^2_\epsilon } Z \\ Z'R_{str}^{-1}\frac{1}{\sigma ^2_\epsilon } X & Z'R_{str}^{-1}\frac{1}{\sigma ^2_\epsilon } Z + G_{str}^{-1}\frac{1}{\sigma ^2_g} \end{bmatrix}^{-1} = \begin{bmatrix} X'R_{str}^{-1} X & X'R_{str}^{-1}Z \\ Z'R_{str}^{-1} X & Z'R_{str}^{-1}Z + \frac{\sigma ^2_\epsilon }{\sigma ^2_g} G_{str}^{-1} \end{bmatrix}^{-1}\sigma ^2_\epsilon \end{aligned}$$

Assuming i.i.d. residuals (\(R_{str} = I\)) and defining \(\lambda = \frac{\sigma ^2_\epsilon }{\sigma ^2_g}\) we can simplify the matrix to:

$$\begin{aligned} CoeffMat^{-1} = \begin{bmatrix} X'X & X'Z \\ Z'X & Z'Z + \lambda G_{str}^{-1} \end{bmatrix}^{-1}\sigma ^2_\epsilon \end{aligned}$$
(17)

This is a very commonly used formulation but it is only a particular case of Eq. 16 when the residuals are i.i.d.

Appendix 4: Factor analysis for genotype by location interaction in ASReml and Sommer

The factor analysis framework for estimating genotype by location (GxLoc) effects is very appealing thanks to its ability to calculate the correlation between locations with far fewer parameters estimated than unstructured covariance structures, which facilitates the convergence of the model. Here, we present a brief conceptual overview of the factor analysis methodology and its implementation in Sommer and ASReml, which will help to explain why these software packages yielded different GxLoc results. In this section, we will explore factor analysis with two latent variables (FA2), but a different number of latent variables may also be used.

Factor analysis methodology

First, in the GxLoc context, let’s define a (W) feature matrix in which each location is described by the stage 1 BLUEs of the genotypes (GIDs) it contains:

(18)

Factor analysis is a dimensionality reduction technique that replaces the features in W by a reduced number of latent variables (\(F_1\), \(F_2\)), plus a specific factor (e) that contains the variability not explained by \(F_1\) and \(F_2\):

$$\begin{aligned} Loc_1 = \lambda _{1,1}F_1 + \lambda _{1,2}F_2 + e_1 \\ Loc_2 = \lambda _{2,1}F_1 + \lambda _{2,2}F_2 + e_2 \\ \vdots \\ Loc_j = \lambda _{j,1}F_1 + \lambda _{j,2}F_2 + e_j \end{aligned}$$

The loadings (\(\lambda\)) link the locations in W with the latent variables \(F_1\) and \(F_2\). We can define the following loading matrix (\(\Gamma\)):

Finally, the variance-covariance matrix between locations (\(\Sigma _{Loc}\)) can be defined as:

$$\begin{aligned} \Sigma _{Loc} = \Gamma \Gamma ' + \psi \end{aligned}$$

Where \(\psi\) is a diagonal matrix with the variance explained by the specific factors (\(\phi\)):

In factor analysis, it is assumed that the latent variables and specific factor are independent, the latent variables (\(F_1\), \(F_2\)) are i.i.d and their expectation is zero. There are many sets of loadings that satisfy these assumptions. Therefore, in factor analysis there are numerous, different, valid solutions, each of which leads to a different interpretation. Rotations allow to get a new solution from an existing one.

ASReml implementation

In an ASReml mixed model, a GxLoc term can be defined as \(\varvec{u_{gxLoc}} \sim N(\varvec{0}, G_{str} \otimes \Sigma _{Loc})\). \(G_{str}\) is the genomic relationship matrix and therefore it is known, while \(\Sigma _{Loc}\) has to be estimated by the model. If \(\Sigma _{Loc}\) is treated as an unstructured variance-covariance structure (all elements within the matrix are estimated by the model with no constraints), the model needs to estimate \(\frac{j}{2}(j+1)\) parameters, which can be problematic when the number of locations (j) is large. Conversely, in a FA2 framework, instead of directly estimating \(\Sigma _{Loc}\), we assume \(\Sigma _{Loc} = \Gamma \Gamma ' + \psi\) and indirectly calculate \(\Sigma _{Loc}\) from \(\Gamma\) and \(\psi\) matrices. Therefore, we only need to estimate the elements in \(\Gamma\) and \(\psi\), which amounts to 3j parameters, substantially lower than the unstructured alternative for large numbers of locations.

In summary, in ASReml, REML is used to estimate the most likely values for \(\Gamma\) and \(\psi\) within the factor analysis framework (i.e. the factor analysis assumptions have to be met).

Sommer implementation

Sommer approaches factor analysis in a completely different way from ASReml. First, Sommer uses the reduced rank factor analysis models, assuming that the specific effects \(\psi\) are equal to zero. Secondly, while ASReml updated the FA2 loadings in each REML iteration within the model, Sommer calculates one possible solution of the FA2 loadings (\(\Gamma\)) from the W matrix at the beginning and that one is subsequently used to fit the model, without updating the loadings with REML. To that end, the GxLoc term in the Sommer model is converted into a genotype by factor scores interaction, \(\varvec{u_{gxF}} \sim N(\varvec{0}, G_{str} \otimes \Sigma _{F})\), where \(\Sigma _{F}\) is an unstructured \((n_F \times n_F)\) covariance matrix for the factor scores (where \(n_F = 2\) in FA2) whose elements are estimated by REML. The factor effect has a design matrix \(Z_F = Z_{Loc}\Gamma\), where \(Z_{Loc}\) is the design matrix linking the observations to the locations. The BLUPs for the GxLoc term (\(\varvec{u_{gxLoc}}\)) and the correlation between the locations \(\Sigma _{Loc}\) can be extracted using the loadings (\(\Gamma\)) to link \(u_{gxF}\) with \(u_{gxLoc}\) and \(\Sigma _{F}\) with \(\Sigma _{Loc}\).

In our experience, the GxLoc BLUPs estimated by Sommer are comparable to the results obtained with ASReml, but the correlation between locations and variance components are not. This is because Sommer does not find the most likely loadings within the model, while ASReml does. Furthermore, Sommer uses the simplifying assumption that the specific effects are zero. The advantage of the Sommer approach is the small number of parameters estimated by the model, only three are needed if two latent variables are used regardless of the number of locations.

Appendix 5: Intuitive overview of the information included in EEV

To understand the information carried by the EEV matrix, it is important to review how it is calculated, as well as the intuitive meaning of the operations involved. EEV (and PEV as well) are directly derived from Henderson equations [18]:

$$\begin{aligned} \begin{bmatrix} X'R^{-1}X & X'R^{-1}Z \\ Z'R^{-1}X & Z'R^{-1}Z + G^{-1} \end{bmatrix} \begin{bmatrix} \varvec{\hat{\beta }} \\ \varvec{\hat{u}} \end{bmatrix} = \begin{bmatrix} X'R^{-1}\varvec{y} \\ Z'R^{-1}\varvec{y} \end{bmatrix} \end{aligned}$$
(19)

Where X and Z are the design matrices for fixed and random effects respectively, \(\varvec{\hat{\beta }}\) and \(\varvec{\hat{u}}\) are the vectors of BLUEs and BLUPs respectively, \(\varvec{y}\) is the vector of response variable observations, R and G are the variance-covariance matrices for the residuals and random effects respectively (i.e. they are the assumed variance-covariance structure multiplied by the corresponding variance component, see Appendix 3)

The first matrix in equation 19 is the coefficient matrix (CoeffMat), and both EEV and PEV are directly extracted from its inverse:

$$\begin{aligned} CoeffMat^{-1} = \begin{bmatrix} X'R^{-1}X & X'R^{-1}Z \\ Z'R^{-1}X & Z'R^{-1}Z + G^{-1} \end{bmatrix}^{-1} = \begin{bmatrix} C_{11} = EEV & C_{12} \\ C_{21} & C_{22} = PEV \end{bmatrix} \end{aligned}$$
(20)

In all the examples we elaborated here, we will assume that the residuals are i.i.d. (i.e. \(R = I\sigma ^2_\epsilon\)). As explained in Appendix 3, Eq. 17, this allows to simplify the coefficient matrix equation to:

$$\begin{aligned} CoeffMat = \begin{bmatrix} X'X & X'Z \\ Z'X & Z'Z + \lambda G_{str}^{-1} \end{bmatrix} \frac{1}{\sigma ^2_\epsilon } \\ CoeffMat^{-1} = \begin{bmatrix} X'X & X'Z \\ Z'X & Z'Z + \lambda G_{str}^{-1} \end{bmatrix}^{-1}\sigma ^2_\epsilon = \begin{bmatrix} C_{11} = EEV & C_{12} \\ C_{21} & C_{22} = PEV \end{bmatrix} \end{aligned}$$

Where \(\lambda\) is the residual variance divided by the random effect variance. In all subsequent examples, we will also assume that both variance components are equal to one (i.e. \(\lambda = 1\)) and that the random effects are also i.i.d. (i.e. \(G_{str} = I\)) for the sake of simplicity.

Thus, understanding the intuition behind the operations in the coefficient matrix is essential to fully understand EEV. To that end, we will use three toy examples based on an experimental design with 4 plots in which 3 different genotypes are tested:

Fig. 3
figure 3

Toy example of a design in a field with two rows and two columns. Three genotypes (GID a, GID b, GID c) are distributed in the available plots organized in two blocks of two plots each

Using the design in Fig. 3, we will explore how CoeffMat and EEV change as we add increasingly complex spatial effects to a stage 1 model.

No spatial effects

First, we will assume the simplest possible stage 1 model:

$$\begin{aligned} \varvec{y} = X\varvec{\beta } + \varvec{\epsilon } \end{aligned}$$
(21)

This model only contains fixed genotypic effects (\(\varvec{\beta }\)), and therefore it could be solved by ordinary least squares (OLS). We will, however, analyze it in the context of Henderson equations, which would yield the same results as OLS. Due to the lack of any random effects (Z matrix and G do not exist), CoeffMat becomes simply \(X'X\frac{1}{\sigma ^2_\epsilon }\). The design matrix X would be as follows:

(22)

Using this X matrix, we can calculate EEV:

(23)

The matrix multiplication \(X'X\) in Eq. 23 simply counts the number of times each genotype is replicated in the design: \(GID_a\) is present two times, the rest only once. The EEV matrix associated with it would be its inverse:

(24)

Therefore, as seen in equations 23 and 24, in a model without spatial effects, the error associated with each genotype is only influenced by the residual variance (which is common to all genotypes) and the number of times it is replicated (more replication implies less error). It is also considered that the errors of all genotypes are independent (all off-diagonals are zero). If this EEV matrix is included in a second-stage model, simply more weight will be given to the BLUEs of more replicated genotypes, i.e. their BLUEs will be “trusted” more during the model calibration.

It is important to note that, in this scenario, calculating EEV is simply done by dividing the residual variance by the number of times each genotype is observed, which is exactly the same as the square of the standard error of the mean.

Block effects

The previous scenario was not very realistic, as usually at least block effects are considered to correct for spatial effects:

$$\begin{aligned} \varvec{y} = X\varvec{\beta } + Z\varvec{u} + \varvec{\epsilon } \end{aligned}$$
(25)

Where all terms follow the same nomenclature as in equation 21, the X matrix is unchanged from equation 22, \(\varvec{u}\) is a vector of block effects \(\varvec{u} \sim N(\varvec{0}, G = I\sigma ^2_u)\) and Z is its corresponding design matrix:

(26)

We will assume that the residual and random effect variance-covariance structures are the identity matrix and their corresponding variance components are equal to one (\(G_{str} = I\); \(\lambda = \frac{\sigma ^2_\epsilon }{\sigma ^2_u} = 1\)). This allows to simplify the coefficient matrix to:

(27)

The first three rows and columns (positions [1:3,1:3]) simply contain the amount of replication of the genotypes as in equation 23. The last two rows and columns (positions [4:5,4:5]) contain the size of the blocks (each block has 2 plots, which becomes a 3 in the matrix due to the addition of \(\lambda G_{str}=I\) for the regularization of the random effects). The positions [1:3,4:5] and [4:5,1:3] link the fixed and random effects, i.e. they indicate which genotypes have been tested in each block and how many times (in this case, all have been tested only one or zero times per block).

By inverting CoeffMat and subsetting the positions [1:3,1:3] corresponding to the genotypic fixed effects, we obtain the EEV matrix:

(28)

In equation 28, we can observe that, apart from error variances (main diagonal), we now have nonzero error covariances (off-diagonals). First, in the previous scenario (equation 24), the non-replicated genotypes had twice the error variance of the replicated one (\(GID_a\)). Now, they have a slightly lower error variance relative to \(GID_a\) \((1.75<2*1)\). The reason behind this is that, thanks to blocking structure, the non-replicated genotypes can borrow some of the information gained by replicating \(GID_a\), and their reliability increases. Regarding the off-diagonals, \(GID_a\) error is substantially correlated to the other genotypes (covariance of 0.5) because \(GID_a\) has been tested in the same block as them (\(GID_a\) is together with \(GID_b\) in block 1 and with \(GID_c\) in block 2). Finally, \(GID_b\) and \(GID_c\) show a small error covariance (0.25) between themselves despite being tested in different blocks. This is because block 1 and block 2 are linked. After all, both blocks contained \(GID_a\).

It is important to note that the error variance for the genotypes in the model with block effects (main diagonal in equation 28) is larger than for the model without spatial effects (main diagonal in equation 24). Therefore, if we ignore the off-diagonals, it may look like adding block effects has reduced the reliability of the estimations (larger error variance implies lower reliability). However, high covariance values reduce the overall variability described by the variance-covariance matrix. An extreme case would be having a perfect correlation of 1 for all genotypic errors. In this situation, regardless of the amount of error variance of the genotypes, the error would always be similar for all genotypes and their ranking would be perfect. Thus, the increase in error variance when block effects are added is compensated by the increase in error covariance.

Row and column effects

The previous scenario showed a more detailed EEV matrix, but it can become increasingly more informative for more complex spatial effects in stage 1 modelling. Here, we will use row and column effects as cofactors for illustrative purposes, but it is important to note that there are many possibilities for spatial analysis (autoregressive structure, 2D P-splines, etc.):

$$\begin{aligned} \varvec{y} = X\varvec{\beta } + Z_r\varvec{u_r} + Z_c\varvec{u_c} + \varvec{\epsilon } \end{aligned}$$
(29)

In equation 29, we have included two random effects, one for the row effect (\(\varvec{u_r}\)) and another one for the column effect (\(\varvec{u_c}\)) with their corresponding design matrices (\(Z_r\) and \(Z_c\)). The other effects follow the same nomenclature as in previous models and the X matrix remains unchanged from equation 22.

(30)

For the sake of simplicity, we are assuming identity variance-covariance structures with a variance component equal to one for the residuals and both random effects (\(\lambda = 1; G_r = I; G_c = I\)). For the calculation of CoeffMat, we merge both random effects into a single, equivalent one \(\begin{pmatrix} \varvec{u_r} \\ \varvec{u_c} \end{pmatrix} = \varvec{u_{all}} \sim N(\varvec{0}, G_{all})\):

(31)
(32)

From the above equations, we can calculate CoeffMat (with a simplified equation thanks to the assumed \(G_{all}=I; \lambda = 1\)) as follows:

(33)

In equation 33, as before, positions [1:3,1:3] count the number of times each genotype is replicated and positions [4:7,4:7] count how many plots there are in each row or column (2 plots in each one, which turns into 3 in CoeffMat due to the addition of \(G_{all} = I\) to \(Z_{all}'Z_{all}\) for the regularization of the random effects). Furthermore, the off-diagonals in positions [4:7,4:7] count how many plots a given row and a given column have in common. e.g. \(Row_1\) contains \(Plot_1\) and \(Plot_2\); \(Col_1\) contains \(Plot_1\) and \(Plot_3\). Therefore, they have a single plot (\(Plot_1\)) in common (the plot in which \(Row_1\) and \(Col_1\) intersect). Finally, positions [1:3,4:7] and [4:7,1:3] specify how many times each genotype is present in each row or column. The calculation of EEV from CoeffMat is done as in the previous scenario:

(34)

In equation 34, we can appreciate that \(GID_b\) and \(GID_c\), despite being replicated only half the number of times of \(GID_a\), present an error variance smaller than 1.5 times the error variance of \(GID_a\). This is because all rows contain \(GID_a\), which allows other genotypes to borrow some of the information gained from its replication through the row effect.

The error covariances of \(GID_a\) with \(GID_b\) and \(GID_c\) are positive but relatively small (0.5). This happens because these pairs of genotypes are present in the same rows but not in the same columns. Finally, the covariance between \(GID_b\) and \(GID_c\) is very large (1.25) for two reasons: i) both genotypes are present in the same column and ii) although they are in different rows, these two rows are linked to each other because both contain \(GID_a\).

From the examples above, we can get an intuitive summary of how the experimental design is contained in EEV, which will be carried to the second-stage. Please recall that EEV is the variance-covariance matrix for the “stage 1 error” random effect within the stage 2 model. Genotypes with large variance values within EEV (low reliability) are thus likely to have large “stage 1 error” random effect BLUPs. These large “stage 1 error” BLUP values are obtained at the expense of reducing the contribution of the unreliable genotype to the main genotypic effects, i.e., the model will not “trust” genotypes with large error variances during model calibration, and their effects will be absorbed into the “stage 1 error” term instead of the main genotypic effects. The opposite happens for genotypes with low error variances, which will affect the main genotypic effects more strongly. Finally, if two genotypes were subjected to similar spatial effects in stage 1 (e.g., they were in the same block), their BLUEs used for stage 2 calibration will contain a similar bias derived from the error in the estimation of the block effect. This is reflected in their positive error covariances in EEV. In stage 2, these positive covariances will induce them to have “stage 1 error” values with the same sign, correcting for their common bias due to their presence in the same block. That way, spatial information from Stage 1 is taken into account during stage 2 calibration. Furthermore, with the above examples, it becomes apparent that the higher the resolution of the spatial effects in stage 1, the more detailed the information in EEV is.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fernández-González, J., Isidro y Sánchez, J. Optimizing fully-efficient two-stage models for genomic selection using open-source software. Plant Methods 21, 9 (2025). https://doi.org/10.1186/s13007-024-01318-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13007-024-01318-9

Keywords