 Research
 Open access
 Published:
Genomic prediction for root and yield traits of barley under a water availability gradient: a case study comparing different spatial adjustments
Plant Methods volume 20, Article number: 8 (2024)
Abstract
Background
In drought periods, water use efficiency depends on the capacity of roots to extract water from deep soil. A semifield phenotyping facility (RadiMax) was used to investigate aboveground and root traits in spring barley when grown under a water availability gradient. Aboveground traits included grain yield, grain protein concentration, grain nitrogen removal, and thousand kernel weight. Root traits were obtained through digital images measuring the root length at different depths. Two nearestneighbor adjustments (M1 and M2) to model spatial variation were used for genetic parameter estimation and genomic prediction (GP). M1 and M2 used (co)variance structures and differed in the distance function to calculate betweenneighbor correlations. M2 was the most developed adjustment, as accounted by the Euclidean distance between neighbors.
Results
The estimated heritabilities (\({\widehat{h}}^{2}\)) ranged from low to medium for root and aboveground traits. The genetic coefficient of variation (\(GCV\)) ranged from 3.2 to 7.0% for aboveground and 4.7 to 10.4% for root traits, indicating good breeding potential for the measured traits. The highest \(GCV\) observed for root traits revealed that significant genetic change in root development can be achieved through selection. We studied the genotypebywater availability interaction, but no relevant interaction effects were detected. GP was assessed using leaveonelineout (LOO) crossvalidation. The predictive ability (PA) estimated as the correlation between phenotypes corrected by fixed effects and genomic estimated breeding values ranged from 0.33 to 0.49 for aboveground and 0.15 to 0.27 for root traits, and no substantial variance inflation in predicted genetic effects was observed. Significant differences in PA were observed in favor of M2.
Conclusions
The significant \(GCV\) and the accurate prediction of breeding values for aboveground and root traits revealed that developing genetically superior barley lines with improved root systems is possible. In addition, we found significant spatial variation in the experiment, highlighting the relevance of correctly accounting for spatial effects in statistical models. In this sense, the proposed nearestneighbor adjustments are flexible approaches in terms of assumptions that can be useful for semifield or field experiments.
Introduction
Barley (Hordeum vulgare L.) is one of the major cereal crops worldwide. Its production is mainly used for animal feed or malting for alcoholic beverage fabrication (FAO 2016). One of the challenges for barley production is the influence of drought stress [44]. This challenge is exacerbated by the changing climate conditions that pose a great risk to the future water supply for agricultural production [27, 33, 42]. Thus, barley varieties that can overcome periods of drought stress with acceptable productivity are important to ensure future sustainable production. In a waterlimited environment, efficient use of water depends on the capacity of roots to extract water from deep soil. In addition, deep roots allow the uptake of nitrogen and other nutrients available in deeper soil layers [27, 33, 42].
Many studies in crop breeding have focused on investigating aboveground traits with a primary focus on grain yield [41, 53, 60], and fewer studies have investigated root traits (Den [14, 24, 25, 31, 35, 51]. The complexity and difficulty of root phenotyping under field conditions are some of the major reasons for the scarcity of root studies. To cope with this issue, a largescale phenotyping facility to study root growth under semifield (rainout shelter) conditions called RadiMax was developed [59]. The RadiMax facility potentially allows testing different plant species in four semifield shelters of 150 rows of capacity each; see Svane et al. [59] for a full description of the facility. The varieties are assessed under different levels of water availability and genetic differences in root development in crops such as barley can be identified.
Molecular markers have been exploited in plant breeding approaches for the last three decades to improve traits of economic importance. This was initially achieved through markerassisted selection (MAS, [50],Collard and Mackil, 2007; [10]. The genetic improvements achieved with MAS were mainly relevant for traits affected by the effects of major quantitative trait loci ‘‘QTL’’ [5, 16, 40, 48]. The introduction of genotyping techniques using numerous markers spread over the whole genome has made it possible to perform genomic selection (GS, [39]. GS allows us to capture most QTL effects (major and minor) to predict breeding values for complex traits due to genetic linkage between markers and QTL.
Genomic prediction (GP) uses a biometrical prediction model that is first trained using a population that contains both genotyped and phenotyped individuals. The trained model can then predict genomic breeding values on individuals that have been only genotyped. Such models can also increase the accuracy of predicted breeding values for lines with both phenotypic and genotypic data due to better use of information from genotyped relatives. Genomic estimated breeding values (GEBVs) are calculated as the sum of effects of all dense genetic markers in linkage disequilibrium (LD) with one or more QTLs across the entire genome [23]. GP has been successfully used in barley to predict genomic breeding values for several traits of economic relevance [3, 41, 63] and hybrid performance [45].
Plant experiments are usually affected by spatial variation in the experimental fields that cannot be completely controlled by blocks in the experimental design. For example, intrablock variability can occur due to differences in the availability of nutrients, water and other uncontrolled biotic and abiotic factors [6]. A vast body of scientific literature have been published to address the spatial variation in experimental fields from more than one century ago [2, 12, 21, 43, 56, 61, 68, 69]. Specifying spatial effects in statistical models is important as it can improve the fitting of the model [6, 57].
A classical approach to model spatial variation proposed by Papadakis [43] and developed by Wilkinson et al. [68] is to use the neighbor information to adjust the spatial variation (NNA, nearest neighbor adjustment), which is a particular kind of geostatistical analysis for field trials [47]. Gleeson and Cullis [22] proposed to fit autoregressiveintegratedmoving average models (ARIMA) to the plot errors in one direction (rows or columns), which was later extended by Cullis and Gleeson [11] to two directions (rows and columns). Other methods based on spline function have been demonstrated to be efficient in modeling spatial variation in field experiments [46, 65,66,67]. Detailed reviews of methods for spatial modeling can be found in Martin [38], Hinkelmann and Kempthorne [28] and Piepho et al. [47]. Note that spatial models use information from neighbor observations in order to control environmental variation in the experiments,this needs to be differentiated from the genetic effect of neighbors that, for example, may occur by competition or differential exposure to diseases [8, 36, 37].
In this study, we used a set of spring barley breeding lines provided by breeding companies Nordic Seed A/S and Sejet Plant Breeding, and evaluated under a water availability gradient in the Radimax facility. The water availability gradient was characterized in wet and dry treatments, where aboveground and root traits were collected under each treatment conditions. Details about the facility and treatments are described in the material and methods section. Spatial variation in our experiment was expected along the facility due to differences in soil compaction and other factors. The first motivation for this study came from the need to improve barley lines to withstand drought periods and present higher yields under waterscarce conditions. Our second motivation was to study the effect of different spatial adjustments on the estimation of variance components (VCs), genetic parameters estimation and genomic prediction. The specific objectives of this study were to:

(i)
Investigate genetic variation for root development (shallow and deep), grain yield, grain protein content, grain nitrogen removal, and thousand kernel weight in spring barley.

(ii)
Investigate genotypebytreatment (wet and dry) interaction and the potential genetic effects due to neighboring lines in the RadiMax experiment.

(iii)
Analyze the possibility of performing genomic prediction for the several root and aboveground traits evaluated in RadiMax using a leaveonelineout (LOO) crossvalidation (CV) strategy.

(iv)
To compare two different spatial nearest neighbor adjustments (NNAs) based on (co)variance structures to model spatial relationships between neighboring rows.
Materials and methods
Plant materials
A total of 74 Danish spring barley (Hordeum vulgae L.) lines provided by Nordic Seed A/S and Sejet plant breeding companies were used in the current experiment (a Principal Component Analysis for genotypes is presented in Additional file 1: Figure S1). The experiment was carried out in 2017 in the RadiMax semifield root phenotyping facility [59] located at Copenhagen University experimental farm (Latitude 55.66815°N, Longitude 12.30848^{o}E). A detailed description of the RadiMax infrastructure can be found in Svane et al. [59].
RadiMax facility and experiment layout
The experiment was established in two beds, each divided into two independent experimental units (halfbeds) with 150 rows of length each. The halfbeds operated as an independent replicate so that the experiment presented four replicates. As shown in Fig. 1, the RadiMax facility contained concrete walls in the laterals to prevent runoff from adjacent areas and a Vshaped bottom lined with an impermeable plastic membrane. The Vshaped bottom is designed so that the soil depth increases from 1.1 m at the sides to 3.0 m at the deep end of the beds (Fig. 1). A moveable rainout shelter was used to cover the units during rainy periods. The open ends of the rainout shelter were covered with a transparent insect net, allowing enough ventilation to reduce warming effects. There were 74 barley lines randomized within each bed. A single barley line was sown in each row with 25 cm row distance. A wet treatment was defined for units one and four, and a dry treatment was defined for units two and three. Treatments are further described in the following heading.
Treatments
A subsurface irrigation system was used to create a water gradient along the sloped bottom of each experimental bed resulting in different amounts of water available to the plants (Fig. 1). The system uses the principle of capillary movement to distribute water without the assistance of external forces. However, since the distance from soil surface to the water supply was different a dry and wet treatment was defined. The water availability gradient was divided into two parts called treatments (wet and dry) along the rows where the plants were grown (Fig. 1). A row of the experimental bed contains both treatments, where half of the row is wet treatment and the other half is dry treatment. Further details on the RadiMax facility and experimental design can be found in Svane et al. [58].
Aboveground measures
Aboveground measurements were done separately for wet and dry treatments on halfsegment of the rows of each experimental bed. The outermost part of the rows (0.5 m to both sides) were not included for sampling in order to avoid border effects. Four aboveground traits were measured: (1) grain yield (GY) in t/ha, (2) grain protein content (GPC) expressed in percentage, (3) grain nitrogen removal (GNR) in kg/ha and (4) thousand kernel weight (TKW) in grams. GPC was determined using nearinfrared spectroscopy (NIR, Intratec grain analyzer, Foss, Hilleroed, Denmark) and expressed as a percentage. GNR was estimated from grain yield and grain protein concentration as: \(\frac{GY \times GPC}{6.25}\), where 6.25 is the standard nitrogentoprotein conversion factor for cereals [30]. TKW was determined as a measure of kernel size using an imagebased system for counting the number of seeds in a sample, and it was calculated as the quotient of the sample weight in grams and the total number of seeds, multiplied by 1000.
In total, 1200 observations were recorded for the four aboveground traits (GY, GPC, GNR, TKW). The 1200 observations resulted from collecting samples from beds (2)*halfbeds(2)*rows(150)*treatments(2). The following criteria were used for data editing: (1) lines with no observations for any of one trait were removed, (2) data records outside ± 3 standard deviations (SD) from the mean were removed, and (3) lines with no genomic information were removed. After the data cleaning, 1031 observations for 66 lines were available for statistical analysis. Descriptive statistics for aboveground and root traits are presented in Table 1, and a boxplot is included in Additional file 2: Figure S3.
Root images
Root images were taken in minirhizotrons (MR), transparent tubes installed 40 cm above the sloping bottom for half of the experimental beds (Fig. 1). Tubes were guided through holes in the concrete wall to facilitate capturing of root images in the range across a soil depth of 0.4 to 2.7 m. The MR tube has 70 mm outer diameter, 60 mm inner diameter and a total length of 5.5 m. The MR tubes were placed 25 cm apart. In total, there were 300 MR tubes, with 150 MR tubes for each MRequipped bed. A root image (19.8 cm.^{2}) facing upward was taken for every 5 cm within the tube, using a custombuilt multispectral MR imaging system. The system offers automated quantification of the length of living root structures, by a multivariate grouping of pixels based on differences in reflectance and suppression of background noise by the use of a vesselness enhancement filter algorithm. For a full description of the image acquisition system and the training procedure of the image analysis procedure see Svane et al. [58]
Root images were taken at three different time points during the growth season. Time points 1, 2, and 3 were on June 13, June 28, and July 19 of year 2017, respectively. The number of images for time points 1, 2, and 3 were 9,253, 13,485, and 12,106, respectively. The following criteria were used for editing the image data: (1) images with no observations were removed, (2) data records outside ± 3 SD from the mean were removed, and (3) lines with no genomic information were removed. After data cleaning, 6027, 10,032, and 11,642 images were available for analysis for time points 1, 2, and 3, respectively. On average, 21, 35, and 41 images were available per line for time points 1, 2 and 3, respectively. For the genetic analysis, the root data from the three time points were used as repeated observations in the model, resulting in a potential number of 900 observations from beds(2)*Minirhizotrons(150)*timepoints(3). The reason for combining the data as repeated observations was that the initial analysis for each time point separately showed relatively homogeneous genetic variance across time points, and the combined analysis tended to capture higher genetic variance.
For the root data analysis, we have summarized the root data for each minirhizotron and timepoint in three different ways:

(1)
Total root length (TRL) calculated by summing the visible root length present in each image for each MR tube.

(2)
Shallow root length (SRL) as the sum of visible root length in the interval of 100–120 cm of soil depth.

(3)
Deep root length (DRL) as the sum of visible root length in the interval of 120–180 cm of soil depth.
Genomic data
Samples consisting of seven flag leaves per line taken from wet treatment were used for transcriptome sequencing (RNAseq). RNA was extracted from leaf tissues using the Total Plant RNA kit (SigmaAldrich, Schnelldorf, Germany). Gene expression data was generated via RNAseq on the Illumina HiSeq4000 platform (2 × 100 bp, ~ 20 M reads per sample) by the Beijing Genomics Institute (BGI, Shenzhen, China). The initial data set included 135,329 SNPs and the following SNP filtering criteria was applied: (1) SNPs and lines with more than 10% missing data were excluded, (2) SNPs with minor allele frequency (MAF) lower than 3% were excluded. After filtering, 60,048 SNPs were available for the analysis.
The SNP data set was then used to compute the genomic relationship matrix \(({\varvec{G}})\) following the first method of VanRaden [64] as follows:
where M is the centered genotypic matrix, and \({p}_{j}\) is the minor allele frequency of j^{th} SNP. A Heatmap of \({\varvec{G}}\) is presented in Additional file 1: Figure S2.
Statistical models
Initially, several models were tested to estimate VCs for aboveground and root traits. Investigation of spatial effect separately for areas under wet and dry treatments for aboveground trait, and by timepoint for root traits, revealed heterogenous spatial variances. Thus, we considered heterogeneous spatial variance for each treatment (dry and wet) for above ground traits, and for time points 1, 2, and 3 for root traits. Two approaches based on covariance structures with different distance functions to compute neighbor relationships were used (Additional file 3: Figure S4). The two modeling approaches were referred to as M1 and M2, for the aboveground models as AM1 and AM2 and the root models as RM1 and RM2.
Aboveground models (AM)
The AM1 model was defined as follows:
where \({\varvec{y}}\) is the vector of phenotypes for GY, GPC, GNR, or TKW; \({\varvec{X}}\) is the design matrix for the fixed effects; \({\varvec{b}}\) is the vector of fixed effects (unitbedtreatment); \({\varvec{g}}\) is the vector of additive genomic breeding values of the lines with \({\varvec{g}} \sim N(0,{\varvec{G}}{\sigma }_{g}^{2})\), where \({\varvec{G}}\) is the genomic relationship matrix and \({\sigma }_{g}^{2}\) the additive genomic variance; \({\varvec{l}}\) is the vector of line effects that include nonadditive genetic effects and potential additive effects not explained by the genomic information with \({\varvec{l}} \sim N(0,{\varvec{I}}{\sigma }_{l}^{2})\), where \({\sigma }_{l}^{2}\) is the variance of line effects; \({{\varvec{n}}}_{g}\) is the vector of additive genomic effects due to neighboring effects with \({{\varvec{n}}}_{{\varvec{g}}} \sim N(0,{\varvec{G}}{\sigma }_{ng}^{2})\), with \({\varvec{G}}\) as previously defined and \({\sigma }_{ng}^{2}\) the genomic additive variance due to neighboring effects; \({{\varvec{n}}}_{l}\) is the vector of neighbor line effects due to neighboring effects including nonadditive effects and potential additive effects not explained by the genomic information with \({{\varvec{n}}}_{{\varvec{l}}} \sim N(0,{\varvec{I}}{\sigma }_{nl}^{2})\), where \({\sigma }_{nl}^{2}\) is the variance of neighbor line effects; \({\varvec{r}}\) is the vector for row effects with \({\varvec{r}}\boldsymbol{ }\sim \boldsymbol{ }N(0, {\varvec{I}}{\sigma }_{r}^{2})\), where \({\sigma }_{r}^{2}\) is the row variance, \({{\varvec{t}}}_{g}\) and \({{\varvec{t}}}_{l}\) are vectors of genotype by treatment interactions, \({{\varvec{t}}}_{{\varvec{g}}}\) with covariance structure \({{\varvec{t}}}_{g} \sim N(0,\left[\begin{array}{cc}G& 0\\ 0& G\end{array}\right]{\sigma }_{tg}^{2})\) and \({{\varvec{t}}}_{l} \sim N(0, {\varvec{I}}{\sigma }_{tl}^{2})\), where \({\sigma }_{tg}^{2}\) and \({\sigma }_{tl}^{2}\) are the genomic additivebytreatment interaction variance and the linebytreatment interaction variance, respectively; \({{\varvec{s}}}_{1}\) and \({{\varvec{s}}}_{2}\) are the vectors of spatial effects for wet and dry treatments, respectively, with \({{\varvec{s}}}_{1} \sim N(0,{{\varvec{S}}}_{knn}{\sigma }_{s1}^{2})\) and \({{\varvec{s}}}_{2} \sim N(0,{{\varvec{S}}}_{knn}{\sigma }_{s2}^{2})\), where \({{\varvec{S}}}_{knn}\) is a spatial correlation matrix [49] and \({\sigma }_{s1}^{2}\) and \({\sigma }_{s2}^{2}\) is the variance of \({{\varvec{s}}}_{1}\) and \({{\varvec{s}}}_{2}\) effects; \({\varvec{e}}\) is a vector of random residual effect with \({\varvec{e}} \sim N(0,{\varvec{I}}{\sigma }_{e}^{2})\), where \({\sigma }_{e}^{2}\) is the residual variance. The spatial effects were defined as the combination of 11 spatial subcomponents integrated by the row centered in the row of the observation (i.e. target row) and the ten neighboring rows (five to the left and five to the right). Virtual rows were added to complete empty row spaces [62]. The \({{\varvec{S}}}_{knn}\) matrix was computed as:
where \({\mathbf{X}}_{1}\) is an n × q matrix, with n = number of observations (i.e., number of real rows), and q = number of real plus virtual rows. The \({{\mathbf{X}}_{1}}_{{\text{n}}\times {\text{q}}}\) is an indicator matrix relating observations to spatial effects in \({{\varvec{S}}}_{knn}\), \(tr\) is the trace (sum of diagonal elements) and \(n\) the total number of rows.
The AM2 model had the same fixed and random effects as AM1, with the same definition for\({\varvec{y}}\),\({\varvec{X}}\),\({{\varvec{Z}}}_{{\varvec{n}}}\),\({\varvec{b}}\),\({\varvec{g}}\),\({\varvec{l}}\),\({{\varvec{n}}}_{{\varvec{g}}}\),\({{\varvec{n}}}_{{\varvec{l}}}\),\({\varvec{r}}\),\({{\varvec{t}}}_{{\varvec{g}}}\),\({{\varvec{t}}}_{{\varvec{l}}}\), and\({\varvec{e}}\), but a different definition for \({{\varvec{s}}}_{1}\) and\({{\varvec{s}}}_{2}\), which accounted for the Euclidean distance among neighbors. The spatial effects in AM2 were defined as \({{\varvec{s}}}_{1} \sim N(0,{{\varvec{S}}}_{euc}{\sigma }_{s1}^{2})\) and\({{\varvec{s}}}_{2} \sim N(0,{{\varvec{S}}}_{euc}{\sigma }_{s2}^{2})\), where \({{\varvec{S}}}_{euc}\) is a spatial correlation matrix and \({\sigma }_{s1}^{2}\) and \({\sigma }_{s2}^{2}\) is the variance of \({{\varvec{s}}}_{1}\) and \({{\varvec{s}}}_{2}\) effects. The \({{\varvec{S}}}_{euc}\) matrix was computed as\({{\varvec{S}}}_{euc}=\frac{{\mathbf{X}}_{2}{{\mathbf{X}}_{2}}^{\mathbf{^{\prime}}}}{tr\left({\mathbf{X}}_{2}{{\mathbf{X}}_{2}}^{\mathbf{^{\prime}}}\right)/n}\), where \({\mathbf{X}}_{2}\) is an n × q matrix, with n = number of observations and q = number of real plus virtual rows. The \({{\mathbf{X}}_{2}}_{{\text{n}}\times {\text{q}}}\) matrix was built first relating the target rows with the neighbors within a distance ≤ 2.75 m (equivalent to the distance of 11 rows as defined for the\({{\varvec{S}}}_{knn}\)), and second by scaling the Euclidean distance between the target rows and their neighbors (\({d}_{ij}\)) by the maximum distance \({d}_{max}\) = 2.75 m [13, 15]. Virtual rows were added to complete empty row spaces. This approach was implemented using the "adespatial" R package [15]. Note that the spatial effect in M1 and M2 are both based on nearest neighbor adjustment (NNA) but differ in the distance function used to compute the relationship between neighbors \(.\) A scatter plot of the between neighbor correlations in \({{\varvec{S}}}_{knn}\) and \({{\varvec{S}}}_{euc}\) as a function of neighbors distance, and a heatmap of the \({{\varvec{S}}}_{knn}\) matrix is provided in Additional file 3. Note that the main difference between \({{\varvec{S}}}_{knn}\) and \({{\varvec{S}}}_{euc}\) are that \({{\varvec{S}}}_{euc}\) weight higher correlations for closer neighbors and lower for more distant neighbors compared to\({{\varvec{S}}}_{knn}\).
Root models (RM)
Two models, RM1 and RM2, were developed for root traits. Unlike the AM models, the RM models did not include neighboring line effects because of the high model complexity. The RM1 model was defined as follows:
where \({\varvec{y}}\) is the vector of phenotypes for TRL, SRL, and DRL; \({\varvec{X}}\) is the design matrix for the fixed factors; \({\varvec{b}}\) is the vector of fixed effects (bedcameratime); and \({{\varvec{Z}}}_{{\varvec{n}}}\), \({\varvec{g}}\), \({\varvec{l}}\), \({\varvec{r}}\), and \({\varvec{e}}\) were defined as in AM1; \({{\varvec{s}}}_{1}\), \({{\varvec{s}}}_{2}\), and \({{\varvec{s}}}_{3}\) are spatial variances for timepoint 1, 2, and 3, respectively, with \({{\varvec{s}}}_{1} \sim N(0,{{\varvec{S}}}_{knn}{\sigma }_{s1}^{2})\), \({{\varvec{s}}}_{2} \sim N(0,{{\varvec{S}}}_{knn}{\sigma }_{s2}^{2})\), and \({{\varvec{s}}}_{3} \sim N(0,{{\varvec{S}}}_{knn}{\sigma }_{s3}^{2})\). The RM2 model had the same fixed and random effects as RM1, but with \({{\varvec{s}}}_{1}\), \({{\varvec{s}}}_{2}\), and \({{\varvec{s}}}_{3}\) using the \({{\varvec{S}}}_{euc}\) covariance structure instead of \({{\varvec{S}}}_{knn}\).
Variance components and genetic parameters
Variance components were estimated by Restricted Maximum Likelihood (REML) using the Average Information (AIREML) module in DMU software [34]. The narrow (\({\widehat{h}}^{2}\)) and broadsense (\({\widehat{H}}^{2}\)) heritabilities were estimated at the level of halfrows for the different treatments for AM and different timepoints for RM models. The narrow (\({h}^{2}\)) and broadsense heritabilities (\({H}^{2}\)) were estimated for all models as \({\widehat{h}}^{2}=d(\mathbf{G}){ \widehat{\sigma }}_{g}^{2}/{ \widehat{\sigma }}_{P}^{2}\) and \({\widehat{H}}^{2}=({\widehat{\sigma }}_{l}^{2}+d(\mathbf{G}){ \widehat{\sigma }}_{g}^{2})/{ \widehat{\sigma }}_{P}^{2}\), where \(d(\mathbf{G})\) is the average of diagonal elements of the genomic relationship matrix \(d\left(\mathbf{G}\right)\) = 1.856, \({\widehat{\sigma }}_{g}^{2}\) is the estimated additive genomic variance,\({\widehat{\sigma }}_{l}^{2}\) is the estimated variance of line effects, and \({\widehat{\sigma }}_{P}^{2}\) is the estimated halfrow phenotypic variance. The \({\widehat{\sigma }}_{P}^{2}\) for AM models was calculated as:
where \(i\) is 1 or 2 for wet and dry treatments, respectively; the “hat” denotes the estimate of the parameter for \({\sigma }_{g}^{2}\), \({\sigma }_{l}^{2}\), \({\sigma }_{ng}^{2}\), \({\sigma }_{nl}^{2}\), \({\sigma }_{r}^{2}\), \({\sigma }_{tg}^{2}\), \({\sigma }_{tl}^{2}\), \({\sigma }_{{s}_{i}}^{2}\) (\(i=1\) and \(2\) for wet and dry treatment, respectively), and \({\sigma }_{e}^{2}\). The \({\widehat{\sigma }}_{P}^{2}\) for RM models was calculated as:
where \(k\) is 1, 2, or 3 for timepoint 1, 2, and 3, respectively, \({\widehat{\sigma }}_{g}^{2}\), \({\widehat{\sigma }}_{l}^{2}\), \({\widehat{\sigma }}_{r}^{2}\), and \({\widehat{\sigma }}_{e}^{2}\) were defined as for the AM models, and \({\widehat{\sigma }}_{{s}_{k}}^{2}\) is the estimated variance for the spatial effect \(k\) (\(k=1\), 2, and \(3\) for timepoint 1, 2, and 3, respectively).
The genetic coefficient of variation (\(GCV\)) was calculated for all aboveground and root traits as \(GCV= \frac{{\sigma }_{g}}{\overline{x} }*100\), where \({\sigma }_{g}\) is the square root of the additive genomic variance and \(\overline{x }\) the general average for each trait.
Genomic predictions
Models were validated using a leaveonelineout (LOO) crossvalidation (CV) strategy, where the GEBV of each line was predicted from a model trained on all the other lines. This validation strategy first estimates variance components and fixed effects from the full data set. Then estimates of the fixed effects were subtracted from the phenotype to get corrected phenotype \(({y}_{c})\). For breeding value prediction, one line was left out at a time and prediction for the leftout line was done based on the rest of the lines. This process continued until all lines were predicted. The predictive ability (PA) of models were determined as the correlation between the phenotypic values corrected for the fixed effects and GEBVs. The maximum potential PA was computed as \(\sqrt{n{h}^{2}/(1+\left(n1\right){h}^{2})}\) [9, 19], where \({\text{n}}\) is the average number of line repetitions (\({\text{n}}=\)15.9 for aboveground traits and \({\text{n}}=\)11.5 for root traits including the three time points). Note that the prediction accuracy (ACC) can be estimated by scaling the PA by the estimated maximum potential PA. The regression coefficient of predicted genetic values obtained with whole phenotypic information on predicted values obtained with partial phenotypic information was used as an estimate for variance inflation: \({{\varvec{b}}}_{w,p}= \frac{cov({\widehat{g}}_{w}, {\widehat{g}}_{p})}{var({\widehat{g}}_{p})}\) [32]. An ordinary nonparametric bootstrapping with replacement, full sample size, and 10,000 replication was used to obtain standard errors for PA and \({{\varvec{b}}}_{w,p}\). The PA from the different model was assessed using a HotellingWiliams ttest [17]. Differences were considered significant for a Pvalue lower than 0.01.
Results
Treatment estimates and variance component estimates for aboveground traits
The main treatment effect (wet and dry) was estimated as fixed effects in AM1 and AM2. Similar treatment estimates were observed between models. The betweenmodel average estimates for wet treatment were 7.20, 9.30, 107.02, and 53.42 for GY, GPC, GNR and TKW, respectively, and 6.91, 9.59, 106.72, and 52.40 for GY, GPC, GNR and TKW for dry treatment, respectively.
The VCs and genetic parameters estimates for aboveground traits are presented in Table 2. The AM1 had an estimated additive genomic variance (\({\widehat{\sigma }}_{g}^{2}\)) of 0.23, 0.09, 50.1, and 8.97 for GY, GPC, GNR and TKW, respectively, and it was higher than the estimated line variance (\({\widehat{\sigma }}_{l}^{2}\)) for all aboveground traits. The estimated row variance (\({\widehat{\sigma }}_{r}^{2}\)) was 0.13, 0.03, 34.3, and 0.55 for GY, GPC, GNR and TKW, respectively. The estimated genetic variance of effect of neighbors (\({\widehat{\sigma }}_{{n}_{g}}^{2}\) and \({\widehat{\sigma }}_{{n}_{l}}^{2}\)) and geneticbytreatment interaction (\({\widehat{\sigma }}_{tg}^{2}\) and \({\widehat{\sigma }}_{tl}^{2}\)) was low for all traits. The spatial variances estimated with AM1 for the wet treatment (\({\widehat{\sigma }}_{s1}^{2}\)) were 0.12, 0.26, 89.6, and 0.40 for GY, GPC, GNR and TKW, respectively, which were higher than for the dry treatment and represented 11.2, 59.2, 29.4, and 2.6% of the estimated phenotypic variance (\({\widehat{\sigma }}_{P}^{2}\)). The \({\widehat{\sigma }}_{s2}^{2}\) for dry treatment were 0.06, 0.16, 43.9, and 1.41, representing 5.9, 47.7, 16.9, and 7.8% for GY, GPC, GNR and TKW, respectively.
The AM2 model presented similar VCs estimates to AM1 for \({\widehat{\sigma }}_{g}^{2}\), \({\widehat{\sigma }}_{l}^{2}\), \({\widehat{\sigma }}_{{n}_{g}}^{2}\), \({\widehat{\sigma }}_{{n}_{l}}^{2}\), \({\widehat{\sigma }}_{tg}^{2}\) and \({\widehat{\sigma }}_{tl}^{2}\) for all aboveground traits. The estimated spatial variances with AM2 for the wet treatment (\({\widehat{\sigma }}_{s1}^{2}\)) were 0.15, 0.45, 111.6, and 0.20, representing 13.3, 63.3, 31.7, and 1.1% for GY, GPC, GNR and TKW, respectively, and for the dry treatment (\({\widehat{\sigma }}_{s2}^{2}\)) were 0.04, 0.18, 40.4, and 1.23, representing 4.2, 40.8, 14.3, and 6.7% for GY, GPC, GNR and TKW, respectively. The estimated row (\({\widehat{\sigma }}_{r}^{2}\)), spatial (\({\widehat{\sigma }}_{s1}^{2}\) and \({\widehat{\sigma }}_{s2}^{2}\)), and residuals (\({\widehat{\sigma }}_{e}^{2}\)) variances with AM2 were, in general, larger than for AM1. Note that there were three model effects, row (\({\varvec{r}}\)), spatial (\({\varvec{s}}\)), and residual (\({\varvec{e}}\)), describing the environmental variance in AM1 and AM2.
Analysis of residuals was performed for aboveground traits (Additional file 4: Figure S6). Normal distribution and homoscedasticity of residual variances were observed for all traits.
Genetic parameters for aboveground traits
The narrow (\({\widehat{h}}^{2}\)) and broad sense heritabilities (\({\widehat{H}}^{2}\)) at halfrow level were estimated for AM1 and AM2 (average values for the treatment are reported in Table 2). The \({\widehat{h}}^{2}\) and \({\widehat{H}}^{2}\) for AM1 and AM2 varied for the different traits, with higher values reported for TKW (\({\widehat{h}}^{2}\) = 0.49 to 0.52 and \({\widehat{H}}^{2}\) = 0.76 to 0.78), lower for GNR (\({\widehat{h}}^{2}\) = 0.18 and \({\widehat{H}}^{2}\) = 0.20 to 0.23), and intermediate values for GY (\({\widehat{h}}^{2}\) = 0.22 and \({\widehat{H}}^{2}\) = 0.31 to 0.32) and GPC (\({\widehat{h}}^{2}\) = 0.19 to 0.24 and \({\widehat{H}}^{2}\) = 0.20 to 0.27). In general, \({\widehat{h}}^{2}\) and \({\widehat{H}}^{2}\) were similar for the wet and dry treatments, but some small differences were observed depending on the traits analyzed, with a trend of slightly higher heritability for the dry treatment. AM1 and AM2 presented similar trends on \({\widehat{h}}^{2}\) and \({H}^{2}\) for GY, GNR and TKW; however, the AM2 revealed higher heritabilities than AM1 for GPC.
The genetic coefficient of variation (\(GCV\)) was estimated for AM1 and AM2 (Table 2), and similar \(GCV\) were observed for both models. The AM1 had a \(GCV\) of 6.8, 3.2, 6.6, and 5.7 for GY, GPC, GNR and TKW, respectively, and the AM2 presented a \(GCV\) of 7.0, 3.4, 7.0, and 5.6 for GY, GPC, GNR and TKW, respectively.
Variance component estimates for root traits
The VCs and genetic parameter estimates for root traits are presented in Table 3. The RM1 had a \({\widehat{\sigma }}_{g}^{2}\) of 10.13, 4.02, and 0.69 for TRL, SRL and DRL, respectively. The \({\widehat{\sigma }}_{g}^{2}\) captured most of the genetic variation in the RM1 and \({\widehat{\sigma }}_{l}^{2}\) were close to zero for all traits. The estimated row variance (\({\widehat{\sigma }}_{r}^{2}\)), spatial variance for timepoint 1 (\({\widehat{\sigma }}_{s1}^{2}\)), 2 (\({\widehat{\sigma }}_{s2}^{2}\)), and 3 (\({\widehat{\sigma }}_{s3}^{2}\)) captured most of the variation for all root traits. The \({\widehat{\sigma }}_{r}^{2}\) were 127.82, 109.82, and 34.46 for TRL, SRL and DRL, respectively. The \({\widehat{\sigma }}_{s1}^{2}\) for RM1 were 61.89, 16.76, and 6.30, and represented 24.7, 11.2, and 10.4% of \({\widehat{\sigma }}_{P}^{2}\) for TRL, SRL and DRL, respectively; \({\widehat{\sigma }}_{s2}^{2}\) were 88.45, 1.93, and 29.88 (32.0, 1.4, and 35.6%, respectively) and \({\widehat{\sigma }}_{s3}^{2}\) were 232.72, 26.42, and 67.70 (55.3, 16.6, and 55.5%, respectively) for TRL, SRL and DRL, respectively.
The RM2 had similar \({\widehat{\sigma }}_{g}^{2}\) to RM1 for TRL and SRL, but had lower \({\widehat{\sigma }}_{g}^{2}\) (\({\widehat{\sigma }}_{g}^{2}\) = 0.36) than RM1 for DRL. The \({\widehat{\sigma }}_{l}^{2}\) for RM2 was close to zero for all root traits. Similarly to RM1, in RM2 the \({\widehat{\sigma }}_{r}^{2}\), \({\widehat{\sigma }}_{s1}^{2}\), \({\widehat{\sigma }}_{s2}^{2}\), and \({\widehat{\sigma }}_{s3}^{2}\) captured most of the variation for all root traits. The \({\widehat{\sigma }}_{r}^{2}\) were 193.46, 110.64, and 40.25 for TRL, SRL and DRL, respectively. The \({\widehat{\sigma }}_{s1}^{2}\) for RM1 were 95.16, 29.55, and 1.70, and represented 26.6, 18.0, and 2.6% of \({\widehat{\sigma }}_{P}^{2}\) for TRL, SRL and DRL, respectively; \({\widehat{\sigma }}_{s2}^{2}\) were 53.37, 2.15, and 44.11 (16.9, 1.6, and 40.5%, respectively) and \({\widehat{\sigma }}_{s3}^{2}\) were 238.64, 66.19, and 159.85 (47.6, 33.0, and 71.2%, respectively) for TRL, SRL and DRL, respectively. The estimated row (\({\widehat{\sigma }}_{r}^{2}\)), spatial (\({\widehat{\sigma }}_{s1}^{2}\), \({\widehat{\sigma }}_{s2}^{2}\), and \({\widehat{\sigma }}_{s2}^{2}\)), and residuals (\({\widehat{\sigma }}_{e}^{2}\)) variances with RM2 were, in general, larger than for RM1. The three model effects, row (\({\varvec{r}}\)), spatial (\({\varvec{s}}\)), and residual (\({\varvec{e}}\)), described the environmental variance in RM1 and RM2.
An analysis of residuals was performed for root traits (Additional file 4: Figure S7). Normal distribution and homoscedasticity of residual variances were observed for all traits.
Genetic parameters for root traits
The narrow (\({\widehat{h}}^{2}\)) and broad sense (\({\widehat{H}}^{2}\)) heritabilities at halfrow level were estimated for RM1 and RM2 (Table 2). The \({\widehat{h}}^{2}\) and \({H}^{2}\) for root traits were low, and they were lower than for aboveground traits. The \({\widehat{h}}^{2}\) and \({H}^{2}\) had similar values within each trait due to the \({\widehat{\sigma }}_{l}^{2}\) being close to zero for all root traits, and all genetic variation was mostly explained by \({\widehat{\sigma }}_{g}^{2}\). The highest \({\widehat{h}}^{2}\) was observed for TRL, followed by SRL and the lowest \({\widehat{h}}^{2}\) was observed for DRL. In RM1, the \({\widehat{h}}^{2}\) for TRL was 0.046 (timepoint 1 and 2) and 0.024 (timepoint 3); for SRL it was 0.027 (timepoint 1), 0.030 (timepoint 2), and 0.025 (timepoint 3); and for DRL it was 0.011 (timepoint 1), 0.008 (timepoint 2), and 0.010 (timepoint 3). The RM2 presented a higher \({\widehat{h}}^{2}\) than RM1. In RM2, the \({\widehat{h}}^{2}\) for TRL was 0.046 (timepoint 1 and 2) and 0.040 (timepoint 3); for SRL was 0.28 (timepoint 1 and 3) and 0.029 (timepoint 3); and for DRL it was 0.012 (timepoint 1), 0.011 (timepoint 2), and 0.010 (timepoint 3).
The genetic coefficient of variation (\(GCV\)) was estimated for RM1 and RM2 (Table 2), and similar values were observed for both models. The RM1 had a \(GCV\) of 9.4, 10.2, and 6.5 for TRL, SRL, and DRL, respectively, and the RM2 had a \(GCV\) of 8.8, 10.4, and 4.7 for TRL, SRL, and DRL, respectively. Generally, the \(GCV\) observed for root traits were higher than for aboveground traits.
Genomic predictions for aboveground traits
The performance of genomic prediction (GP) was determined for AM1 and AM2 using LOOCV. The predictive abilities (PAs) and prediction accuracy (ACC) for aboveground traits are presented in Fig. 3. The PAs for AM1 were 0.408, 0.482, 0.394, and 0.331 for GY, GPC, GNR and TKW, respectively, and for AM2 were 0.414, 0.489, 0.408, and 0.325 for GY, GPC, GNR and TKW, respectively. The AM2 had slightly higher PA than AM1 for GY, GPC and GNR. The differences in PA between M1 and M2 were significant in a HotellingWilliams (significance threshold set at 0.01). The theoretical maximum PAs for AM1 and AM2 were similar between models for the different aboveground traits (green bars in Fig. 2). The ACC (Fig. 2b) followed a similar trend as the PA for all aboveground traits.
The regression coefficient of additive genetic values obtained with whole phenotypic information on additive values obtained with partial phenotypic information (\({{\varvec{b}}}_{w,p}\)) in LOOCV is presented in Fig. 3. The \({{\varvec{b}}}_{w,p}\) is as an estimate for variance inflation in the predicted genetic effect, where \({{\varvec{b}}}_{w,p}\) = 1 reveals no under or overdispersion (i.e., no variance inflation for \({{\varvec{b}}}_{w,p}\) = 1). The bootstrapbased distribution of estimates revealed that \({{\varvec{b}}}_{w,p}\) was close to 1 for all aboveground traits with both models, indicating no significant variance inflation. Nevertheless, \({{\varvec{b}}}_{w,p}\) values around 0.9 for the different aboveground traits indicated a low overdispersion (variance inflation) for predicted values.
Genomic prediction for root traits
The performance of GP was evaluated for RM1 and RM2 using a LOOCV. The PAs and ACC for root traits are presented in Fig. 2. The RM1 had a PA of 0.199, 0.153, and 0.250 for SRL, DRL and TRL, respectively, and the RM2 had a PA of 0.216, 0.171, and 0.267 for SRL, DRL, and TRL, respectively. The RM2 showed an improvement in PA compared to RM1 of 11.8, 8.9, and 6.5% for SRL, DRL and TRL, respectively, and differences between M1 and M2 were significant in a HotellingWilliams (significance threshold set at 0.01). The theoretical maximum PAs for RM1 was higher than for RM2. Note that the theoretical maximum PAs depend on \({\widehat{h}}^{2}\), and therefore, the higher values observed for RM1 reflect the higher \({\widehat{h}}^{2}\) found for RM1. The ACC for root traits is presented in Fig. 2b. Higher ACCs were observed for RM2 compared to RM1, and larger differences between models were observed for ACC than for the PA.
The regression coefficients (\({{\varvec{b}}}_{w,p}\)) for RM1 and RM2 in LOOCV are presented in Fig. 3. The bootstrapbased distribution of estimates revealed that \({{\varvec{b}}}_{w,p}\) was close to 1 for all root traits with both models, indicating that no significant variance inflation was present for any of the root traits. Nevertheless, the value around 1.1 for all root traits indicated a low underdispersion (variance deflation) for predicted breeding values.
Discussion
The present study was carried out in a semifield (rainout shelter) phenotyping facility (RadiMax) with four specific aims. First, we investigated the genetic variation and parameters for root development (shallow and deep) and four aboveground traits relevant to barley breeding (GY, GPC, GNR and TKW). Second, we studied the genotypebytreatment (wet and dry) interaction for the root and aboveground traits in the RadiMax facility, and no relevant genotypebytreatment interaction was detected for the different traits. Third, genomic prediction (GP) was investigated, and we observed that GP is feasible for all the analyzed traits. All the analyses were performed comparing two alternative nearestneighbor adjustments (NNA) to model spatial variation in RadiMax.
Genetic parameters and variance estimates for aboveground traits
The heritabilities (\({\widehat{h}}^{2}\) and \({\widehat{H}}^{2}\)) and variance components (VCs) were estimated for aboveground traits. In our study, \({\widehat{h}}^{2}\) and \({\widehat{H}}^{2}\) were reported for wet and dry conditions at the halfrow level and not as family heritability, as sometimes are used in plant breeding [29]. Our estimates of heritabilities do not account for the number of repetitions used in the experiment. Therefore, the estimates are expected to be lower than family heritability, but they are more suitable to compare across studies and populations. The differences in \({\widehat{h}}^{2}\) and \({\widehat{H}}^{2}\) for wet and dry treatments were small and were attributed to heterogeneous spatial variance observed for each treatment, which resulted in a different phenotypic variance (\({\widehat{\sigma }}_{P}^{2}\)). The estimates of \({\widehat{h}}^{2}\) for GY ranged from 0.21 to 0.23, were in a similar range of previous estimates of 0.24 found by Tsai et al. [63] for a larger field population from Nordic Seed A/S breeding program, and by Ahmadi et al. [1]. The \({\widehat{h}}^{2}\) for GPC varied from 0.21 to 0.35, where the upper range values were obtained using AM2. The \({\widehat{h}}^{2}\) reported for GPC agreed with Nielsen et al. [41], who found a value of 0.21 using a Nordic Seed A/S breeding population. The \({\widehat{h}}^{2}\) for GNR in our population ranged from 0.16 to 0.21; other studies have reported family \({\widehat{h}}^{2}\) for GNR, as found in Schmidt et al. [55], but as previously discussed, the plot and family \({\widehat{h}}^{2}\) are not directly comparable. The TKW was the trait with the highest \({\widehat{h}}^{2}\) ranging from 0.50 to 0.54. Other studies have reported the family \({\widehat{h}}^{2}\) for TKW [4, 52].
The variance components for genotypebytreatment interaction effects (\({\varvec{t}}{\varvec{g}}\) and \({\varvec{t}}{\varvec{l}}\)) were close to zero, meaning no relevant interaction effects were detected. Similar results have been found for wheat by Guo et al. [24] in a similar experiment in the RadiMax facility. The lack of interaction effects may be attributed to how the water stress gradient was managed in the experiment. According to a previous study using the same barley population in RadiMax, Svane et al. [59] observed that by the time the plants should have shown water stress symptoms, there was sufficient water available. Consequently, it delayed the start of water stress symptoms reducing the possibility of genotypebytreatment interactions.
The genetic coefficient of variation (\(GCV\)) was estimated for all traits. The \(GCV\) is a useful genetic parameter as it allows us to infer the potential of selection for the traits of interest and to compare the genetic variation across traits and populations. The \(GCV\) were 6.9, 3.3, 6.8, and 5.7 for GY, GPC, GNR and TKW, respectively (average of AM1 and AM2), and no large differences were observed between AM1 and AM2. The same traits were investigated for a wheat breeding population in RadiMax facility [24], and the authors found \(GCV\) estimates is an similar range for all traits.
Genetic parameters and variance estimates for root traits
The heritabilities (\({\widehat{h}}^{2}\) and \({\widehat{H}}^{2}\)) and VCs were estimated for root traits in the three time points. The estimates of \({\widehat{h}}^{2}\) for TRL, SRL, and DRL were low and mainly dominated by the large row (\({\widehat{\sigma }}_{r}^{2}\)) and spatial (\({\widehat{\sigma }}_{s1}^{2}\), \({\widehat{\sigma }}_{s2}^{2}\), and \({\widehat{\sigma }}_{s3}^{2}\)) variation observed. The \({\widehat{h}}^{2}\) ranged from 0.02 to 0.04 for TRL, 0.02 to 0.03 for SRL, and 0.002 to 0.01 for DRL. Comparing between time points, higher \({h}^{2}\) was seen for timepoint 1, which was measured 36 days before the last timepoint. The differences in the \({h}^{2}\) between time points were explained by the heterogeneous spatial variance, resulting in a different \({\widehat{\sigma }}_{P}^{2}\) for each time point. Other studies have reported \({\widehat{h}}^{2}\) for root traits [31, 51], but they have focused on family mean \({\widehat{h}}^{2}\) instead of halfrow \({\widehat{h}}^{2}\) as in our study.
The \(GCV\) was estimated for TRL, SRL, and DRL. In our study, the \(GCV\) is a particularly relevant parameter for root traits as it allows us to interpret the genetic variation independently from the environmental variables affecting the population. The \(GCV\) were 9.1, 10.3, and 5.6 for TRL, SRL, and DRL, respectively (average of RM1 and RM2), and no large differences were observed for RM1 and RM2. The \(GCV\) for TRL and SRL were larger than the aboveground traits analyzed. High values of \(GCV\) are desirable for breeding as it means that there is good potential for selection and there will be a good response to selection for the analyzed traits. The \(GCV\) for DRL was lower than the rest of the root traits, revealing a lower potential for breeding and response to selection.
Genomic prediction for aboveground traits
Genomic prediction for AM1 and AM2 was evaluated for the aboveground traits using a LOOCV (Fig. 2). The LOOCV uses the largest possible training population, which maximize the genetic correlations between training and testing sets. The LOOCV allows us to compare and investigate the potential PA of genetic models. The PA, measured as the correlation between corrected phenotypes \(({y}_{c})\) and GEBVs, was used as an estimate of the correlation between GEBV and true underlying breeding values. The AM1 and AM2 showed a similar PA and ACC for the aboveground traits, with a trend of slightly higher PA and ACC for AM2 to GY, GPC and GNR. The PAs observed in our study were in the range reported in previous studies for GY [18] and GPC [41, 55], and were higher for GY, PC, and GNR than PAs reported for a fivefold CV in Hansen et al. [25]. For TKW, we observed similar PA than Hansen et al. [25] and lower PA than Schmid and Thorwarth [54] and Thorwarth et al. [60], which found PAs around 0.7. The higher PA obtained for TKW in other studies could be related to performing GP on populations coming from a single breeding program. Note that genetic relations between lines are higher within breeding programs (e.g. due to full and halfsibs are included in the population) than between breeding programs. Consequently, higher genetic relationships are present between individuals in TP and VP, resulting in higher PA. The differences in PA among traits could be attributed to several factors, among them, the LD (linkage disequilibrium) between markers in training and testing populations, the heritability of the trait under investigation, and the genetic architecture of traits, where complex traits controlled by many loci with small effects have lower predictability than traits controlled with less number of loci [26, 53]. Another relevant parameter in GS is the regression coefficients \({{\varvec{b}}}_{w,p}\), which is used as an estimate for variance inflation in the predicted genetic effect. In general no large variance inflation was observed as \({{\varvec{b}}}_{w,p}\) was not statistically different from 1 for all aboveground traits (Fig. 3; nevertheless, a low overdispersion (inflation in \({{\varvec{b}}}_{w,p}\) was reported (\({{\varvec{b}}}_{w,p}\) ~0.9). The variance infaltion could be attributed to having a mixture of different breeding populations (and breeding cycles) in the experiment, which could result in differences in allele frequency and LD for individuals genetically distant.
Genomic prediction for root traits
Genomic prediction for RM1 and RM2 was evaluated for root traits using a LOOCV (Fig. 2). The PAs for root traits were lower than for aboveground traits (Fig. 2a). The highest PA was observed for TRL, followed by SRL, and the lowest was for DRL. Higher PA for RM2 was observed, and differences between models were significant in a HotellingWilliams test (significance threshold set at 0.01). The improvement in PA conferred by RM2 was 11.8, 8.9, and 6.5% for SRL, DRL and TRL, respectively. Similarly, higher ACC were obtained with RM2. The differences between RM1 and RM2 were acentuated for the ACC estimate (Fig. 2b). This can be explained due to the estimate of ACC is inversely related to \({\widehat{h}}^{2}\), and the lower \({\widehat{h}}^{2}\) values found fot RM2, contributed to higher ACC obtained with RM2. Our results confirm that the accuracy of predicting genetic effects for root development in barley is sufficient to allow genetic selection. Other reports have demonstrated the viability of GP for root traits in barley [51] and other species [24, 35, 70].
An additional analysis was performed using a multitrait model to study the genetic correlation (\(\rho\)) between SRL and DRL (results not shown) and revealed a positive correlation between the traits (\(\rho =0.36\)). The positive correlation is convenient for breeding since the better performance of GP for SRL could be exploited by selecting higher values on SRL, leading to higher DRL.
The regression coefficients (\({{\varvec{b}}}_{w,p}\)) were used to estimate variance inflation in the predicted genetic effect for root traits (Fig. 3). In general, \({{\varvec{b}}}_{w,p}\) was close to 1, revealing no relevant variance inflation. Nevertheless the \({{\varvec{b}}}_{w,p}\) ~1.1 indicated a low underdispersion (deflation) for predicted values, which could be explained by having a mixture of different breeding populations in the experiment, resulting in differences in allele frequency and LD for individuals genetically distant.
Modeling of spatial effects in genetic models
Two spatial methodologies (M1 and M2) based on NNA [47, 68] using 5left and right neighbors and (co)variance structures (\({{\varvec{S}}}_{knn}\) or \({{\varvec{S}}}_{euc}\)) to model spatial variation were utilized. The M1 and M2, differ in the distance function used to compute the correlation between neighbors (Additional file 3: Figure S4). The \({{\varvec{S}}}_{knn}\) connected the row of the observation with the 5left and right neighbors for each observation; while the \({{\varvec{S}}}_{euc}\) is a more developed adjustment which starts connecting the target rows to the 5left and right neighbors, followed by weighing neighbors’ relationships according to the Euclidean distance between them and the target row. In general, both methodologies (either for aboveground or root traits) presented similar trends for the different traits. In some cases, VCs were similar between the methodologies, but differences in allocating variance were observed in some specific cases. Other studies, as Guo et al. [24] for barley and Malinowska et al. [35] for perennial ryegrass modeled spatial variation in RadiMax, with methodologies comparable to M1. Similarly to our work, they have concluded that the spatial variation was significant and accounting for spatial effects was needed to reduce the noise level. In addition to the results presented, an analysis without including virtual rows was performed. A relevant influence of virtual rows was observed in VCs analysis, decreasing the variance captured by spatial effects when they were not included. The CVs analysis revealed better predictive performance for M2 (especially for root traits), and it is therefore, our preferred methodology for the RadiMax experiment.
Conclusions
In this research, we utilized data from a semifield phenotyping facility (RadiMax) to investigate aboveground and root traits of spring barley under a water availability gradient. First, we concluded that heritable genetic variation and significant genetic coefficient of variation (\(GCV\)) were present, indicating a good breeding potential for all analyzed traits. Second, no relevant genotypebytreatment (wet and dry) interaction and genetic effects due to neighboring lines were observed in RadiMax. Third, we concluded that there is good potential to perform genomic prediction for all analyzed traits, as revealed by the high predictive ability and prediction accuracy, and low variance inflation of predicted genetic effect in the leaveonelineout crossvalidation analysis. Fourth, all the performed analyses were carried out using two proposed spatial methodologies, and the main conclusion was that our most developed spatial methodology had a significant effect on predictive performance, improving genomic prediction especially for root traits.
Availability of data and materials
The datasets analyzed in the current study are available at the Harvard dataverse public repository at the following link for phenotyipic (https://doi.org/https://doi.org/10.7910/DVN/KZZSRH) and genomic information (https://doi.org/https://doi.org/10.7910/DVN/W1QIJR).
References
Ahmadi J, Vaezi B, PourAboughadareh A. Analysis of variability, heritability, and interrelationships among grain yield and related characters in barley advanced lines. Genetika. 2016;48(1):73–85.
Baenziger PS, Depauw RM. Wheat breeding: Procedures and strategies Wheat science and trade. Hoboken: Wiley; 2009.
Bhatta M, Gutierrez L, Cammarota L, Cardozo F, Germán S, GómezGuerrero B, et al. Multitrait genomic prediction model increased the predictive ability for agronomic and malting quality traits in barley (Hordeum vulgare L). Genes Genomes Genetics. 2020;10(3):1113–24.
Bouhlal O, Affricot JR, Puglisi D, ElBaouchi A, El Otmani F, Kandil M, et al. Malting quality of ICARDA elite winter barley (Hordeum vulgare l) germplasm grown in Moroccan middle atlas. J Am Soc Brewing Chem. 2022;80(4):401–12.
Buerstmayr H, Ban T, Anderson JA. QTL mapping and markerassisted selection for Fusarium head blight resistance in wheat: a review. Plant Breeding. 2009;128(1):1–26.
Burgueño, J. (2018). Spatial analysis of field experiments. Applied statistics in agricultural biological and environmental sciences. Madison.
Collard BC, Mackill DJ. Markerassisted selection: an approach for precision plant breeding in the twentyfirst century. Phil Trans Royal Soc B Biol Sci. 2008;363(1491):557–72.
Costa Silva J, Potts B, Gilmour A, Kerr R. Geneticbased interactions among tree neighbors: identification of the most influential neighbors, and estimation of correlations among direct and indirect genetic effects for leaf disease and growth in Eucalyptus globulus. Heredity. 2017;119(3):125–35.
Crossa J, Campos Gde L, Perez P, Gianola D, Burgueno J, Araus JL, et al. Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics. 2010;186(2):713–24. https://doi.org/10.1534/genetics.110.118521.
Crossa J, PérezRodríguez P, Cuevas J, MontesinosLópez O, Jarquín D, De Los Campos G, et al. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci. 2017;22(11):961–75.
Cullis B, Gleeson A. Spatial analysis of field experimentsan extension to two dimensions. Biometrics. 1991;47:1449–60.
Cullis B, Gogel B, Verbyla A, Thompson R. Spatial analysis of multienvironment early generation variety trials. Biometrics. 1998;54:1–18.
Cuyabano BCD, Rovere G, Lim D, Kim TH, Lee HK, Lee SH, et al. GPS coordinates for modelling correlated herd effects in genomic prediction models applied to hanwoo beef cattle. Animals. 2021;11(7):2050.
Den Herder G, Van Isterdael G, Beeckman T, De Smet I. The roots of a new green revolution. Trends Plant Sci. 2010;15(11):600–7.
Dray S, Blanchet G, Borcard D, Guenard G, Jombart T, Larocque G, et al. Package ‘adespatial.’ R package. 2018;2018:3–8.
Dreisigacker S, Sukumaran S, Guzmán C, He X, Lan C, Bonnett D, et al. Molecular markerbased selection tools in spring bread wheat improvement: CIMMYT experience and prospects. Molecular Breeding for Sustainable Crop Improvement: 2016;2:421–74.
Dunn OJ, Clark V. Comparison of tests of the equality of dependent correlation coefficients. J Am Stat Assoc. 1971;66(336):904–8.
Endelman JB, Atlin GN, Beyene Y, Semagn K, Zhang X, Sorrells ME, et al. Optimal design of preliminary yield trials with genomewide markers. Crop Sci. 2014;54(1):48–59.
Fè, D., Ashraf, B.H., Pedersen, M.G., Janss, L., Byrne, S., Roulund, N., et al. (2016). Accuracy of genomic prediction in a commercial perennial ryegrass breeding program. The Plant Genome 9(3), plantgenome2015.2011.0110.
Francia E, Tacconi G, Crosatti C, Barabaschi D, Bulgarelli D, Aglio E, et al. Marker assisted selection in crop plants. Plant Cell Tissue Organ Cult. 2005;82(317):342.
Gilmour AR, Cullis BR, Verbyla AP. Accounting for natural and extraneous variation in the analysis of field experiments. J Agric Biol Environ Stat. 1997;2:269–93.
Gleeson AC, Cullis BR. Residual maximum likelihood (REML) estimation of a neighbour model for field experiments. Biometrics. 1987;43:277–87.
Goddard M, Hayes B. Genomic selection. J Anim Breed Genet. 2007;124(6):323–30.
Guo X, Svane SF, Füchtbauer WS, Andersen JR, Jensen J, ThorupKristensen K. Genomic prediction of yield and root development in wheat under changing water availability. Plant Methods. 2020;16(1):1–15.
Hansen PB, Ruud AK, de Los Campos G, Malinowska M, Nagy I, Svane SF, et al. Integration of DNA methylation and transcriptome data improves complex trait prediction in hordeum vulgare. Plants. 2022;11(17):2190.
Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard ME. Genetic architecture of complex traits and accuracy of genomic prediction: coat colour, milkfat percentage, and type in Holstein cattle as contrasting model traits. PLoS Genet. 2010;6(9): e1001139.
Hertel TW, Burke MB, Lobell DB. The poverty implications of climateinduced crop yield changes by 2030. Glob Environ Chang. 2010;20(4):577–85.
Hinkelmann K, Kempthorne O. Design and analysis of experiments Introduction to experimental design. Hoboken: John Wiley Sons; 2007.
Holland, J.B., Nyquist, W.E., CervantesMartínez, C.T., and Janick, J. (2003). Estimating and interpreting heritability for plant breeding: an update. Plant breeding reviews 22.
ISO16634 (2016). Food products—Determination of the total nitrogen content by combustion according to the Dumas principle and calculation of the crude protein content. International Organization for Standardization.
Jia Z, Liu Y, Gruber BD, Neumann K, Kilian B, Graner A, et al. Genetic dissection of root system architectural traits in spring barley. Front Plant Sci. 2019;10:400.
Legarra A, Reverter A. Semiparametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the LR method. Genet Sel Evol. 2018;50(1):53. https://doi.org/10.1186/s1271101804266.
Lobell DB, Schlenker W, CostaRoberts J. Climate trends and global crop production since 1980. Science. 2011;333(6042):616–20.
Madsen, P., and Jensen, J. (2013). "An User's Guide to DMU, Version 6, Release 5.1. Center for Quantitative Genetics and Genomics," in Dept. of Molecular Biology and Genetics, University of Aarhus. Research Centre Foulum Tjele, Denmark).
Malinowska M, Ruud AK, Jensen J, Svane SF, Smith AG, Bellucci A, et al. Relative importance of genotype, gene expression, and DNA methylation on complex traits in perennial ryegrass. Plant Genome. 2022;15(4): e20253.
Marjanovic J, Mulder HA, Rönnegård L, Bijma P. Modelling the coevolution of indirect genetic effects and inherited variability. Heredity. 2018;121(6):631–47.
Marjanovic J, Mulder HA, Rönnegård L, de Koning DJ, Bijma P. Capturing indirect genetic effects on phenotypic variability: competition meets canalization. Evol Appl. 2022;15(4):694–705.
Martin R. 15 Spatial experimental design. Handbook Statist. 1996;13:477–514.
Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genomewide dense marker maps. Genetics. 2001;157(4):1819–29.
Miedaner T, Korzun V. Markerassisted selection for disease resistance in wheat and barley breeding. Phytopathology. 2012;102(6):560–6.
Nielsen NH, Jahoor A, Jensen JD, Orabi J, Cericola F, Edriss V, et al. Genomic prediction of seed quality traits using advanced barley breeding lines. PLoS ONE. 2016;11(10): e0164494. https://doi.org/10.1371/journal.pone.0164494.
Olesen JE, Trnka M, Kersebaum KC, Skjelvåg AO, Seguin B, PeltonenSainio P, et al. Impacts and adaptation of European crop production systems to climate change. Eur J Agron. 2011;34(2):96–112.
Papadakis, J. (1937). Méthode statistique pour des expériences sur champ. Thessalonike: Institut d'Amélioration des Plantes à Salonique.
Pham AT, Maurer A, Pillen K, Brien C, Dowling K, Berger B, et al. Genomewide association of barley plant growth under drought stress using a nested association mapping population. BMC Plant Biol. 2019;19(1):1–16.
Philipp N, Liu G, Zhao Y, He S, Spiller M, Stiewe G, et al. Genomic prediction of barley hybrid performance. Plant Genome. 2016. https://doi.org/10.3835/plantgenome2016.02.0016.
Piepho HP, Boer MP, Williams ER. Twodimensional Pspline smoothing for spatial analysis of plant breeding trials. Biom J. 2022;64(5):835–57.
Piepho HP, Richter C, Williams E. Nearest neighbour adjustment and linear variance models in plant breeding trials. Biom J. 2008;50(2):164–89.
Raffo M, Azzimonti G, Pereyra S, Pritsch C, Lado B, Dreisigacker S, et al. Introgression of the coupled Fhb1Sr2 to increase Fusarium head blight and stem rust resistance of elite wheat cultivars. Plant Genetic Resources. 2022;20(1):36–45.
Raffo M, Sarup P, Andersen J, Orabi J, Jahoor A, Jensen J. Integrating a growth degreedays based reaction norm methodology and multitrait modeling for genomic prediction in wheat. Front n Plant Sci. 2022. https://doi.org/10.3389/fpls.2022.939448.
Ribaut JM, Hoisington D. Markerassisted selection: new tools and strategies. Trends Plant Sci. 1998;3(6):236–9.
Robinson H, Kelly A, Fox G, Franckowiak J, Borrell A, Hickey L. Root architectural traits and yield: exploring the relationship in barley breeding trials. Euphytica. 2018;214:1–16.
Rode J, Ahlemeyer J, Friedt W, Ordon F. Identification of markertrait associations in the German winter barley breeding gene pool (Hordeum vulgare L). Mol Breeding. 2012;30:831–43.
Sallam A, Endelman J, Jannink JL, Smith K. Assessing genomic selection prediction accuracy in a dynamic barley breeding population. Plant Genome. 2015. https://doi.org/10.1007/9783662444061_19.
Schmid KJ, Thorwarth P. Genomic selection in barley breeding. Biotechnol Approaches Barley Improve. 2014. https://doi.org/10.1007/9783662444061_19.
Schmidt M, Kollers S, MaasbergPrelle A, Großer J, Schinkel B, Tomerius A, et al. Prediction of malting quality traits in barley based on genomewide marker data to assess the potential of genomic selection. Theor Appl Genet. 2016;129:203–13.
Smith A, Cullis B, Thompson R. Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics. 2001;57(4):1138–47.
Stringer JK, Cullis BR, Thompson R. Joint modeling of spatial variability and withinrow interplot competition to increase the efficiency of plant improvement. J Agric Biol Environ Stat. 2011;16(2):269–81.
Svane SF, Dam EB, Carstensen JM, ThorupKristensen K. A multispectral camera system for automated minirhizotron image analysis. Plant Soil. 2019;441:657–72.
Svane SF, Jensen CS, ThorupKristensen K. Construction of a largescale semifield facility to study genotypic differences in deep root growth and resources acquisition. Plant Methods. 2019;15:1–16.
Thorwarth P, Ahlemeyer J, Bochard AM, Krumnacker K, Blümel H, Laubach E, et al. Genomic prediction ability for yieldrelated traits in German winter barley elite material. Theor Appl Genet. 2017;130:1669–83.
TownleySmith T, Hurd E. Use of moving means in wheat yield trials. Can J Plant Sci. 1973;53(3):447–50.
Tsai HY, Cericola F, Edriss V, Andersen JR, Orabi J, Jensen JD, et al. Use of multiple traits genomic prediction, genotype by environment interactions and spatial effect to improve prediction accuracy in yield data. PLoS ONE. 2020;15(5): e0232665.
Tsai HY, Janss LL, Andersen JR, Orabi J, Jensen JD, Jahoor A, et al. Genomic prediction and GWAS of yield, quality and diseaserelated traits in spring barley and winter wheat. Sci Rep. 2020;10(1):1–15.
VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.
Velazco JG, RodríguezÁlvarez MX, Boer MP, Jordan DR, Eilers PH, Malosetti M, et al. Modelling spatial trends in sorghum breeding field trials using a twodimensional Pspline mixed model. Theor Appl Genet. 2017;130:1375–92.
Verbyla AP, Cullis BR, Kenward MG, Welham SJ. The analysis of designed experiments and longitudinal data by using smoothing splines. J Roy Stat Soc: Ser C. 1999;48(3):269–311.
Verbyla AP, De Faveri J, Wilkie JD, Lewis T. Tensor cubic smoothing splines in designed experiments requiring residual modelling. J Agric Biol Environ Stat. 2018;23:478–508.
Wilkinson G, Eckert S, Hancock T, Mayo O. Nearest neighbour (NN) analysis of field experiments. J Roy Stat Soc: Ser B. 1983;45(2):151–78.
Wood TB, Stratton F. The interpretation of experimental results. J Agric Sci. 1910;3(4):417–40.
Yonis BO, Pino del Carpio D, Wolfe M, Jannink JL, Kulakow P, Rabbi I. Improving root characterisation for genomic prediction in cassava. Sci Rep. 2020;10(1):8003.
Acknowledgements
We thank Aarhus and Copenhagen Universities for providing financial support. We also thank Prof. Torben Asp's group at QGG Flakkebjerg for sample collection and Dr. Istvan Nagy for supervising variant calling implementation. Nordic Seed A/S breeding company and Sejet Plant Breeding I/S are also greatly acknowledged for providing phenotypic data.
Funding
This project was funded by Innovation Fund Denmark (Grant no 5184–000358), Crop Innovation Denmark, Promilleafgiftsfonden, Plan Danmark and Godfred Birkedal Hartmanns Familiefond.
Author information
Authors and Affiliations
Contributions
Miguel A. Raffo and Biructawit B. Tessema contributed equally and share first authorship. Conceptualization: JJ, MAR, BBT; data curation: BBT, MR, AKR, MM, SFS; formal analysis: MAR, BBT; funding acquisition: JJ, KTK, LK, JDJ; investigation: MAR, BBT, JJ; methodology: MAR, BBT, JJ; project administration: MAR, BBT, JJ; resources: JJ, KTK, LK, JDJ; software: MAR, BBT: supervision: JJ; validation: MAR, BBT; visualization: MAR, BBT; writing—original draft: MAR, BBT; writing—review and editing: JJ, XG, AKR, MM.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Figure S1.
Principal component analysis (PCA) of genomic relationship lines. Figure S2. Heatmap of genomic relationship matrix.
Additional file 2: Figure S3.
Boxplot of aboveground and root traits after data edition. GY grain yield, GPC grain protein content, GNC grain nitrogen content, TKW thousand kernel weight, SRL shallow root length, DRL deep root length, TRL total root length
Additional file 3: Figure S4.
Scatter plot of the betweenneighbor correlations in M1 (\({{\varvec{S}}}_{knn}\)) and M2 (\({{\varvec{S}}}_{euc}\)) as a function of plot distances. Figure S5. Heatmap for the spatial correlation structure \({{\varvec{S}}}_{euc}\). Example for timepoint 1 and subbed 1.
Additional file 4: Figure S6.
Histogram and scatter plots for residuals of aboveground traits (example for AM1). GY grain yield, GPC grain protein content, GNC grain nitrogen content, TKW thousand kernel weight. Figure S7. Histogram and scatter plots for residuals of root traits (example for RM1). TRL: total root length; SRL: shallow root length; DRL: deep root length.
Additional file 5: Table S1.
Standard errors of variance estimates for aboveground traits. Table S2. Standard errors of variance estimates for root traits.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Tessema, B.B., Raffo, M.A., Guo, X. et al. Genomic prediction for root and yield traits of barley under a water availability gradient: a case study comparing different spatial adjustments. Plant Methods 20, 8 (2024). https://doi.org/10.1186/s1300702301121y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1300702301121y