Experiment and data collection
We used the maize diversity panel, which consists of 282 genetically diverse lines [19]. This panel was selected to capture as much of the genetic diversity present in maize as possible, while consisting of lines that could be reliably grown to maturity in temperate North America [19]. This panel has also been phenotyped for a wide range of traits across many years and environments.
The field experiment was conducted on Havelock Research Farm of the University of Nebraska-Lincoln (45°51′49″N, 96°31′09″W). The predominant soil types were Zook silty clay loam and Colo silty clay loam. The maize diversity panel was grown in two replicates, one under low nitrogen condition (− N) and the other normal condition (+ N). For the + N treatment, 135 kg/ha urea (dry fertilizer) was applied; whereas for the − N treatment, no N fertilizer was added. The planting date was May/16/2018. Each replicate consisted of 288 plots, among which 229 were from the maize diversity panel. The remaining plots were hybrid check varieties (B73xMo17 and B37xMo17) and expired plant variety protection (PVP) lines interspersed randomly. Each plot was 1.6 m wide and 6.3 m long, comprising of two rows of 38 seeds from each maize line. All other agronomic practices followed the recommendations by University of Nebraska’s Research Farm support group.
Plant leaf sampling was conducted on July/12 and July/13 2018, when roughly 50% of the plots were tasseling or had already tasseled. From each maize genotype (plot), a representative plant was identified. Leaf 2, 3 and 4 (leaf 1 was the flag leaf) from the plant were cut at the stem and immediately placed in a Ziploc bag and stored in an ice cooler. The leaf samples were then transported to the lab and processed and analyzed for leaf chemical properties.
VIS–NIR–SWIR reflectance spectra of leaf samples were measured by a benchtop spectroradiometer (FieldSpec4, Malvern Panalytical Ltd., Formerly Analytical Spectral Devices) with a contact probe. The spectral range of the instrument was 350–2500 nm and the spectral sampling interval was 1 nm. Each raw spectrum therefore had 2151 data points. The contact probe had a light aperture of 10 mm, which was its effective sampling area. For each leaf, three spectral measurements were taken at the tip, middle and base sections (but avoiding the midrib area) to account for in-leaf variability. Measurements were also made consistently from leaf’s adaxial side. The nine VIS–NIR–SWIR scans were then averaged to represent the spectral reading from that plant.
Leaf chlorophyll concentration (CHL) was measured with a handheld chlorophyll concentration meter (MC-100, Apogee Instruments, Inc., Logan, UT) using the sensor’s build-in calibration for maize. Similar to the VIS–NIR–SWIR measurements, chlorophyll concentration was also measured at three locations per leaf and nine readings from each plant were averaged. The unit of CHL was µmol/m2.
Leaf area (LA) was measured with a leaf area meter (LI-3100, LI-COR Biosciences, Lincoln, NE). Fresh weight (FW) of the leaves was recorded by a digital balance. Leaf samples were placed in a walk-in oven set to 50 °C and dried over 72 h to a constant weight. Dry weight (DW) of the leaves was then recorded. Leaf Water Content (LWC, %) was calculated as (FW–DW)/FW × 100%. Specific Leaf Area (SLA, m2/kg) was calculated as LA/DW.
Dried plant samples were sent to a commercial lab (Midwest Laboratories, Inc., Omaha, NE) where the samples are ground, homogenized, and analyzed for N, phosphorus (P), and potassium (K) concentration. N was analyzed with Dumas method using a LECO FP428 nitrogen analyzer (AOAC method 968.06). P and K were analyzed with microwave nitric acid digestion followed by inductively coupled plasma spectrometry (AOAC method 985.01).
A third replicate of the maize diversity panel was grown at the University of Nebraska-Lincoln’s Greenhouse Innovation Center. Three seeds were sown in 9.08 L pots (diameter 24 cm, height 26 cm) and thinned to one plant per pot after germination. Temperature in the greenhouse was set between 22.7 and 28.3 °C; and relative humidity was approximately 60%. The lighting cycle was set at 16 h from 0600 to 2200 hours. The pot was filled with growth media (Premier Tech Horticulture Promix BX) mixed with 0.015 kg of 15-9-12 osmocote (3–4 months release), 0.015 kg of 15-9-12 osmocote (5–6 months release), 0.037 kg of lime, and 1.3 kg of water. Water was added daily to pots with automated watering stations, with a target weight of 7.4 kg (including the pot carrier) at the beginning and 8.3 kg at the end. The date of planting was Aug/1/2018 and the leaf samples were taken on Oct/9/2018 and Oct/10/2018 (plants were at the flowering stage) following the same protocols as the field samples described above. The total number of samples from the greenhouse was 262, which included 229 lines from the maize diversity panel and 33 maize landraces.
In summary, the six leaf physiological and chemical properties we were interested in VIS–NIR–SWIR modeling were: leaf chlorophyll concentration (CHL, µmol/m2 of leaf area), leaf water content (LWC, %), specific leaf area (SLA, m2/kg), nitrogen (N, %), phosphorus (P, %) and potassium (K, %). CHL, LWC, SLA and N were among the most important leaf properties frequently studied by plant breeders, physiologists, and agronomists. While P and K were less studied spectroscopically, both were essential nutrients that have significant implications for crop production.
Spectral preprocessing and multivariate modeling
Spectral data from 350 to 450 nm exhibited relatively high levels of noise, and were removed and excluded from downstream spectral analysis. The spectra were preprocessed with a Savitzky–Golay smooth filter to further reduce noise (window size = 5 and polynomial order = 2, [20]. The smoothed spectra were down-sampled to every five nm to reduce the dimensionality of the predicator variables for more efficient computation.
The entire sample set was randomly split into a training set (60%) and a test set (40%). The training set was used for calibrating prediction models of the six maize leaf properties using spectral data; and the model performance was assessed on the test set. We investigated two multivariate modeling approaches: Partial Least Squares Regression (PLSR) and Support Vector Regression (SVR). Both approaches were widely employed for modeling by using all wavebands in VIS–NIR–SWIR hyperspectral data. PLSR is a linear modeling technique where the regression is conducted between the response variable and the PLS Latent Variables (LV). The LVs are linear combinations of the original wavebands which achieve (1) accounting for the maximum variability in the hyperspectral data, and (2) maximally correlated with the response variable (Helland [21]). SVR, on the other hand, is a nonlinear technique where an optimal hyperplane is constructed in a higher dimensional feature space. A linear regression function is then computed in the higher dimensional feature space for the original wavebands which are mapped through a kernel function [22, 23]. PLSR and SVR, together with other techniques (such as Random Forest and Artificial Neural Network) are usually referred to as Machine Learning approaches [24].
Before modeling, response and predictor variables were zero-centered (by their respective means) and scaled to unit variance (by their respective standard deviations). Ten-fold (random segments) cross validation was employed in model training to balance model complexity and predictive accuracy (i.e., avoid overfitting). In PLSR, models having as many as 25 latent variables (nLV) were considered, and the best model was the one that gave the lowest cross-validated root mean squared error (RMSECV). In SVR, a linear kernel function was used. The regularization parameter C (cost for constraints violation) was tested with five values: 0.01, 0.1, 1, 10, and 100; and the optimal C was the one that gave the lowest RMSECV in cross validation.
The best models were then applied to the test set. The models were evaluated by comparing the lab-measured and model-estimated leaf properties using Coefficient of Determination (R2), Root Mean Squared Error of Testing (RMSET, Eq. 1), Mean Absolute Percent Error of Testing (MAPET, Eq. 2) and Ratio of Performance to Deviation (RPD, Eq. 3). These analyses were performed in R statistical environment [25] with the “pls” [26], “prospectr” (Stevens and Ramirez-Lopez [27]), and “e1071” [28] packages
$$RMSE = \sqrt {\frac{1}{N} \times \mathop \sum \limits_{i = 1}^{N} \left( {\hat{Y}_{i} - Y_{i} } \right)^{2} }$$
(1)
$$MAPE = \frac{{\frac{1}{N} \times \mathop \sum \nolimits_{i = 1}^{N} \left| {\hat{Y}_{i} - Y_{i} } \right|}}{{\bar{Y}}}$$
(2)
$$RPD = \frac{SD}{RMSE}.$$
(3)
Vegetation indices
Hyperspectral-based, narrow-band VIs are commonly used to quantify leaf CHL, N and LWC. To test the usefulness of the VIs for predicting the leaf properties in our dataset, we computed three common VIs from the VIS–NIR–SWIR hyperspectral data. They were Green Normalized Difference Vegetation Index (GNDVI, [29], Red-edge Normalized Difference Vegetation Index (RENDVI, [30], and Normalized Difference Water Index (NDWI, [31]. GNDVI and RENDVI were shown useful for CHL and N quantification [32, 33], and NDWI useful for foliar water content [34].
Similar to the PLSR and SVR analyses, we used the training set (60%) to develop calibration models (linear regression considering a linear and quadratic term) between the leaf properties and the VIs, and then applied the models on the test set and reported test R2 and RPD. In addition, we also conducted an exhaustive search of all possible two-band combinations in the form \(\left( {B1 - B2} \right)/\left( {B1 + B2} \right)\) (note GNDVI, RENDVI and NDWI all took this form to compute) and selected the one giving the highest correlation with the target leaf property to test its performance.