### Field experiment and data acquisition

An experiment was carried out at the International Maize and Wheat Improvement Center (CIMMYT), located in the Yaqui valley near to Obregon, Sonora, in the northwest from Mexico. The experiment aimed to investigate the effect of different levels of phosphorus (P) fertilization on P content in the wheat grain. No fertilization was performed with P in the experimental area during the previous four wheat cycles of the experiment to avoid residual effect. The climate in the Yaqui valley is semi-arid with variable precipitation rates averaging 280 mm per year and an average daily temperature of 24 °C. The soils in this region are clay coarse sandy, mixed with montmorillonite clay.

An experiment in split plots to evaluate the phosphorus in wheat grain was carried out in 3 cycles corresponding to the years 2010, 2011 and 2012. Two levels of phosphorus, 0 kg/ha and 80 kg/ ha, were considered in the main plots and 21 different wheat genotypes in subplots. 3 repetitions were performed. Each of the 126 plots, was 4 beds of 0.8 m with two rows on the top by 5 m long.

Hyperspectral reflectance was measured in each plot using a JAZ spectroradiometer Z31 with a CC-3 cosine corrector attached to the optical fiber with a FOV (field of view) aperture of 25° (Ocean Optics, Dunedin, FL, EE.UU.). The sensor has a spectral range of 339 to 1029 nm (nm) with a bandwidth of 0.38 nm, giving a total of 2048 bands. A dark reading was taken just before measurements to set a lower reflectance point of the device. A diffuse white reflectance target, Spectralon (Labsphere, North Sutton, NH, EE.UU.) was used from time to time for field measurements as reference for the upper reflectance point of the device. The data was downloaded and subsequently calibrated using SpectraSuite software (Ocean Optics, Dunedin, FL, EE.UU.). Measurements were taken at the center of each of the four beds, typically from 11:00 am. to 2:00 p.m., targeting at the canopy at a constant height. This procedure was done 2 weeks after anthesis.

At the start this experiment was carried out for purposes of fertilization studies, but for our purpose only the data was retrieved because there were 21 wheat genotypes in 2010 and 2011, in addition to another 21 in 2021, this would allow us to capture the variability not only between individuals but also the spatial one and be able to compare the results under 3 scenarios in the sense that it can be observed variability in each year (Fig. 1) and it is due to this capability of capture the variability that predictions can improve or worsen.

Figure 1 shows the distribution of phosphorus content in wheat grain in the genotypes of the years 2010, 2011 and 2012. In 2012, 21 different genotypes were included than in the previous years and it is generally observed that these genotypes have lower phosphorus content in the grain.

### Pre-processing of hyperspectral reflectance data

Information about 2048 spectral bands was available for use as independent variables in the models. Nevertheless, it is observed that there is up to 65% of data lost in both the first 296 bands and the last 592 bands, so we decided to narrow the range of light spectrum, considering only the range from 450 to 850 nm. Additionally, it is known from several studies that multicollinearity exists between the bands of the spectrum and in our case the information was available with a narrowband of 0.38 nm bandwidth, with which linear combination in parts could be generated to make a resampling using Bsplines [7] and stay with a bandwidth of 4 nm, resulting in 101 total wavelengths used for the data analysis.

### Models

Consider the model:

$${y}_{i}={x}_{i}^{t}\beta +{\epsilon }_{i}$$

(1)

where \({y}_{i}\) is the natural logarithm of phosphorus in wheat grain in case \(i=1,\dots ,n\), \({x}_{i}^{t}={(x}_{i1},\dots ,{x}_{ip})\) represents the reflectance of light value in each wavelength and \({\epsilon }_{i}|{\sigma }^{2}\sim NIID\left(0,{\sigma }^{2}\right)\) where “NIID” stands for normal independent and identically distributed random variables, \(\beta\) is the vector of effects for each of the independent variables and \({\sigma }^{2}\) is the variance component associated to the residuals.

Model (1) can be further extended to include spatial correlation between observations, the so called geo-spatial model which can be written as:

$${y}_{i}={x}_{i}^{t}\beta +{w}_{i}+{\epsilon }_{i}$$

(2)

where \(w = \left( {w_{1} , \ldots ,w_{n} } \right)\prime\) is the spatial random effects vector with distribution \(N\left(0,{\sigma }^{2}\mathrm{H}\left(\phi \right)\right)\) and \({\tau }^{2}\) is the variance component of \(y\), \(\mathrm{H}\left(\phi \right)=exp\left(-\phi ||{s}_{i}-{s}_{j}||\right)\), is the isotropic correlation function, where \(||{s}_{i}-{s}_{j}||\) is the Euclidean distance between the site *i* and *j* [2].

Another model used frequently in spatial statistics for dealing with aerial data is CAR, the model can be written as:

$${y}_{i}|{y}_{j:i\ne j}={x}_{i}^{t}\beta +{\sum }_{j=1}^{n}{c}_{ij}\left({y}_{j}-{x}_{j}^{t}\beta \right)+{\epsilon }_{i}$$

(3)

where \({\epsilon }_{i}|{\tau }^{2}\sim NIID\left(0, {\tau }^{2}\right)\), \({\tau }^{2}>0\) and \({c}_{ij}>0\) are covariance parameters, with \({c}_{ii}=0\) for all *i*. For the set of full conditional distributions to determine a well-defined joint distribution for \(y\) we need to consider:

$$y\sim N\left(X\beta , {\tau }^{2}{\left({D}_{M}-\phi M\right)}^{-1}\right)$$

where \({\tau }^{2}\) is the variance component of \(y\), \(M\) the neighborhood matrix, \({D}_{M}={\sum }_{j=1}^{n}{m}_{ij}\) and \(\phi\) is the autocorrelation parameter related to the ordered eigenvalues \(\left({\lambda }_{(1)}<{\lambda }_{(2)}<\dots <{\lambda }_{(n)}\right)\) of \({{D}_{M}}^{1/2}M{{D}_{M}}^{1/2}\) (for see more details consult [2]).

### Selection Criteria

Intrinsic to the adjustment, the method includes the selection of bands (variables), by adapting the method proposed by [10], which induces variable selection. As a result, the posterior probability is obtained, \(p\left({\beta }_{j}\ne 0|data\right)\), where \({\beta }_{j}\) is the regression coefficient corresponding to the *j*-th band. Or similarly, \(p\left({\gamma }_{j}=1|data\right)\), where \({\gamma }_{j}\) is an indicator variable corresponding to the *j*-th band, which is equal to 1 if \({\beta }_{j}\ne 0\), and 0 otherwise. With this information the spectral bands can be selected under two criteria:

Those with \(p\left({\beta }_{j}\ne 0|data\right)\ge\) 0.6 [3].

The *γ* vector represents a submodel, its probability was obtained by counting the frequency of each submodel, in this way we can classify the submodels and identify which one is the most likely (avgmod).

Once the previous criteria for band selection have been applied and their respective parameters have been estimated in each model, the Deviance Information Criterion (DIC) [20] is calculated in order to select the best model.

### Sampling model and likelihood function

Assuming a random sample from (1), (2) and (3) respectively. Then the conditional distribution for (1) \({y}_{i}|\beta ,{\sigma }^{2}\) is normal with mean \({x}_{i}^{t}\beta\) and variance \({\sigma }^{2}\). The conditional distribution for (2) \({y}_{i}|\beta ,{\sigma }^{2},{\tau }^{2},\phi ,w\) is normal with mean \({x}_{i}^{t}\beta +{w}_{i}\) with variance \({\tau }^{2}\). Finally, conditional distribution for (3) \({y}_{i}|\beta ,{\tau }^{2},\phi\) is normal with mean \({x}_{i}^{t}\beta\) and variance \({\tau }^{2}{\left({\sum }_{j=1}^{n}{m}_{ij}-\phi {m}_{i.}\right)}^{-1}\). In general, we the joint conditional distribution of \(y|\theta\) is given by \(p\left(y|\theta \right)=\prod_{i=1}^{n}p\left({y}_{i}|\theta \right)\), where \(\theta\) denote the model unknowns for each model.

### Prior distributions

In order to complete the specification of the models, we assign prior distribution to the model unknowns. Let \(p\left({\beta }_{j}|{\gamma }_{j}\right)=\left(1-{\gamma }_{j}\right){p}_{spike}\left({\beta }_{j}\right)+{\gamma }_{j}{p}_{slab}\left({\beta }_{j}\right)\), with \({p(\gamma }_{j}=0)={p}_{j}\), \({p}_{spike}\left({\beta }_{j}\right)={I}_{(0)}{(\beta }_{j})\) and \({p}_{slab}\left({\beta }_{j}\right)=N({\beta }_{j}|{\mu }_{j},{\nu }_{j})\) for each model. For model (1) \({\sigma }^{2}|{\alpha }_{11},{\eta }_{11}\sim IG({\alpha }_{11},{\eta }_{11})\). In model (2) \({\sigma }^{2}|{\alpha }_{21},{\eta }_{21}\sim IG\left({\alpha }_{21},{\eta }_{21}\right),\) \({\tau }^{2}|{\alpha }_{22},{\eta }_{22}\sim IG({\alpha }_{22},{\eta }_{22})\) and for \(\phi >0, \phi |min,max\sim U(min,\mathrm{max})\). And for model (3) \({\tau }^{2}|{\alpha }_{31},{\eta }_{31}\sim IG({\alpha }_{31},{\eta }_{31})\) and \(\phi |\frac{1}{{\lambda }_{(1)}}, \frac{1}{{\lambda }_{(n)}}\sim U\left(\frac{1}{{\lambda }_{(n)}}, \frac{1}{{\lambda }_{(1)}}\right)\). With \(IG(\alpha ,\eta )\) we denote an inverse gamma distribution, whose probability density function is \(f\left(x;\alpha ,\eta \right)=\frac{{\eta }^{\alpha }}{\Gamma (\alpha )}{\left(1/x\right)}^{\alpha +1}exp\left(-\eta /x\right)\) where \(\alpha\) and \(\eta\) correspond to the shape and rate parameters, respectively. The joint priori distribution \(p\left(\theta |H\right)\) of each model unknows is given by:

\(p\left( {\beta ,\sigma^{2} {|}H_{1} } \right) \propto p(\beta |\gamma )p\left( {\sigma^{2} |\alpha_{11} ,\eta_{11} } \right)\) for model (1), where \({H}_{1}=\left\{{\alpha }_{11},{\eta }_{11}\right\}\) is the set of hyper-parameters.

\(p\left( {\beta ,\sigma^{2} ,\tau^{2} ,\phi ,w{|}H_{2} } \right) \propto p(\beta |\gamma )p\left( {\sigma^{2} |\alpha_{21} ,\eta_{21} } \right)p\left( {\tau^{2} |\alpha_{22} ,\eta_{22} } \right)p\left( {\phi |min,max} \right)p\left( {w|\sigma^{2} ,\phi } \right)\) for model (2), where \({H}_{2}=\left\{{\alpha }_{21},{\eta }_{21},{\alpha }_{22},{\eta }_{22},min,max\right\}\) is the set of hyper-parameters.

\(p\left( {\beta ,\tau^{2} ,\phi {|}H_{3} } \right) \propto p(\beta |\gamma )p\left( {\tau^{2} |\alpha_{31} ,\eta_{31} } \right)p\left( {\phi |\frac{1}{{\lambda_{\left( 1 \right)} }},{ }\frac{1}{{\lambda_{\left( n \right)} }}} \right)\) for model (3), where \({H}_{3}=\left\{{\alpha }_{31},{\eta }_{31},\frac{1}{{\lambda }_{(1)}}, \frac{1}{{\lambda }_{(n)}}\right\}\) is the set of hyper-parameters.

### Posterior distributions

The joint posterior distribution of all quantities can be obtained by applying the Bayes’ theorem, so we obtain \(p\left(\theta |data,H\right)\propto p\left(y|\theta \right)p\left(\theta |H\right)\).

The hierarchical structure of this distribution allows us to obtain the conditional distributions necessary to implement the Gibbs sampler [8] and draw samples from the joint posterior distribution, in other form this distribution is analytically un-tractable. Not all full conditional distributions have a closed form, for that reason was necessary to implement the Metropolis–Hastings algorithm [6]; the algorithms are described in the Appendix.

The hyper-parameters for the inverse gamma distributions are set as \(\alpha =\eta =0.01\) because it provides a weakly informative prior [14]; for the uniform distributions are set as \(min\) =0, \(max=1\), and \({\lambda }_{(1)},{\lambda }_{(n)}\) are the eigenvalues mentioned in model (3).

### Software

The algorithm to fit models was implemented in a program written in R [18]. The input arguments are the response vector y, the matrices X,w,M, the number of Markov Chain Monte Carlo (MCMC) iterations, a burn-in period and the hyper-parameters. The outputs provide the mean of the predictive distribution obtained through the MCMC algorithm and provides us with the variables selected under the selection criteria previously described, it also computes the DIC [20] and it provides the correlations that are obtained when making the cross validations. Three libraries from R were used, MCMCpack [15], mvtnorm [9] and truncnorm [16].

### Simulations

To evaluate the behavior of the variable selection procedure in each model, a simulation experiment was carried out, this consisted of generating \(X\) with 100 random variables from a normal standard distribution, based on linear combinations \({x}_{i}+{x}_{j}={x}_{k}\) with \(i\in \left\{\mathrm{1,3},5,\dots ,99\right\}\), \(j\in \left\{\mathrm{2,4},6,\dots ,100\right\}\) and \(k\in \left\{\mathrm{101,102},\dots ,150\right\}\) were generated 50 more variables, this to simulate the multicollinearity expected in real dataset. The neighborhood matrix was the same as in the real dataset. \(\beta\) with values close to zero was also generated to specify variables with little effect and large values for the most important variables, let \(A=\{\mathrm{2,6},\mathrm{13,33,25,67,71,77,85,94,96,99}\}\) and \(B=\{\mathrm{1,2},3,\dots ,150\}\) then \({\beta }_{j}=0.9\) with \(j\in A\), \({\beta }_{j}=0.01\) with \(j\in B-A\); and values were set for the parameters \({\sigma }^{2}=0.3,{\tau }^{2}=0.45,\phi =0.21\). Using those data and conditional distribution we simulated \(y\) for each model. Finally, we used the simulated data as an input in our R script in order to check that in each model the desired parameters are being correctly estimated.

### Cross validation and predictions

In order to have a reference regarding the predictive power of the models used, cross validation was performed. The response vector, y, was divided into 2 disjoint sets, randomly, always considering 25 observations for the testing data set and 101 for the training set. In such a way that \(y=\) \(\left( {y_{training} , y_{testing} } \right)\prime\). Also, the matrix of covariates or bands, was divided in a way that corresponded to the values of y, such as \(X = \left( {X_{training} , X_{testing} } \right)\prime\) The models were fitted with the information corresponding to the “training”, and the adjusted model was used to predict the response of the “testing” set. This procedure was repeated 5 times.