High-throughput phenotyping allows the selection of soybean genotypes for earliness and high grain yield

Background Precision agriculture techniques are widely used to optimize fertilizer and soil applications. Furthermore, these techniques could also be combined with new statistical tools to assist in phenotyping in breeding programs. In this study, the research hypothesis was that soybean cultivars show phenotypic differences concerning wavelength and vegetation index measurements. Results In this research, we associate variables obtained via high-throughput phenotyping with the grain yield and cycle of soybean genotypes. The experiment was carried out during the 2018/2019 and 2019/2020 crop seasons, under a randomized block design with four replications. The evaluated soybean genotypes included 7067, 7110, 7739, 8372, Bonus, Desafio, Maracai, Foco, Pop, and Soyouro. The phenotypic traits evaluated were: first pod height (FPH), plant height (PH), number of branches (NB), stem diameter (SD), days to maturity (DM), and grain yield (YIE). The spectral variables evaluated were wavelengths and vegetation indices (NDVI, SAVI, GNDVI, NDRE, SCCCI, EVI, and MSAVI). The genotypes Maracai and Foco showed the highest grain yields throughout the crop seasons, in addition to belonging to the groups with the highest means for all VIs. YIE was positively correlated with the NDVI and certain wavelengths (735 and 790 nm), indicating that genotypes with higher values for these spectral variables are more productive. By path analyses, GNDVI and NDRE had the highest direct effects on the dependent variable DM, while NDVI had a higher direct effect on YIE. Conclusions Our findings revealed that early and productive genotypes can be selected based on vegetation indices and wavelengths. Soybean genotypes with a high grain yield have higher means for NDVI and certain wavelengths (735 and 790 nm). Early genotypes have higher means for NDRE and GNDVI. These results reinforce the importance of high-throughput phenotyping as an essential tool in soybean breeding programs.

prices, the demand for quality raw materials and fair prices has increased [2], requiring highly productive cultivars and increasingly efficient farming systems.
In this sense, Brazilian soybean breeding programs have sought cultivars that combine high grain yield and earliness. Earliness has been a target because it allows farmers to grow corn or cotton in the off-season, after soybean cultivation (in-season). Furthermore, early-cycle genotypes remain less time in the field and are subject to less disease pressure [3]. However, plant cycle characterization is a time and labor-demanding task since it requires on-field counting of the number of days from emergence until flowering or maturation of each genotype. This is because hundreds of soybean genotypes are evaluated annually in the breeding programs, and the cycle monitoring for each plot must be performed daily. To overcome these difficulties, the use of remote sensing techniques emerges as a high potential tool, providing specific and large-scale information for crop assessment [4][5][6][7].
Remote sensing-based high-throughput phenotyping (HTP) is a reliable and fast approach to real-time and large-scale plant trait measurements [8][9][10]. Adopting this approach is essential to achieve greater efficiency of plant breeding, as it provides monitoring and decision support with applications in several scenarios, such as monitoring the plant status [8,[11][12][13], discriminating cultivars [4,6], predicting crop yield [7,14,15], and selecting genotypes for traits of interest [9,16,17]. Unmanned Aerial Vehicles (UAVs) are essential for remote sensing-based HTP since they provide fast realtime data via remote sensors. These tools are required for obtaining Vegetation Indices (VIs), which are mathematical models for different wavelengths [7,10,18]. UAVs can estimate the spectral component of vegetation through combinations between red and near-infrared spectral bands [19] and can be assembled to assess the growth vigor, nutrient status, and photosynthetic activity of the plants in the field [20][21][22]. Thus, the use of spectral variables obtained by UAV imaging shows to be a promising approach for reliable, faster, and cost-effective measurements of the cycle and yield-related traits in soybean.
Santana et al. [11] assessed the relationship between VIs obtained from UAV multispectral imagery and leaf N content and yield-related traits in corn varieties grown in different N topdressing levels, and they verified a positive relationship between NDVI and NDRE and grain yield under adequate N levels. Da Silva et al. [7], in a study aiming at identifying which VIs can be used in soybean grain yield prediction by using UAV and remote multispectral sensor, verified that NDVI and SAVI had the higher direct effect on grain yield. However, further studies assessing the relationship between VIs and cycle and yield-related traits in soybean cultivars are still needed. Identifying the cause-and-effect relationship between spectral and agronomic variables provides an easier and faster phenotyping process in breeding programs since efforts can be directed only to the wavelengths and VIs showing the highest cause-and-effect relationship with cycle and yield. Additionally, genotypes with better means for these spectral variables should be identified to achieve an efficient selection for yield and earliness.
The research hypothesis was that soybean cultivars show phenotypic differences concerning the measurements of wavelengths and VIs. Thus, the objective of this study was to identify variables obtained by UAV-based HTP that are related to the grain yield and cycle of soybean genotypes.

Field trials
During the 2018/2019 and 2019/2020 crop seasons, two experiments were carried out at the experimental field of the Federal University of Mato Grosso do Sul, campus of Chapadão do Sul (18° 46′ 26″ S, 52° 37′ 28″ W, mean altitude of 810 m). According to the Köppen classification system, the climate is classified as tropical savanna (Aw). The soil of the experimental area is classified as dystrophic Red Latosol with a clay texture [23], and has the following chemical characteristics in the 0-20 cm layer, according to International System of Units: pH The experimental design consisted of randomized blocks with four replications. The evaluated soybean genotypes included 7067, 7110, 7739, 8372, Bonus, Desafio, Maracai, Foco, Pop, and Soyouro. The main phenotypic traits of the cultivars are shown in Table 1. The plots consisted of five rows (4 m long) with a spacing of 0.45 m. The sowing density was 15 plants m −1 . The climatic conditions during the experiments are shown in Fig. 1, respectively.

Evaluated variables
During both crop seasons, the high-throughput phenotyping measurements were performed at 60 days after emergence (DAE). On this date, soybean cultivars were in R1 and R2 according to phenological scale of Fehr and Caviness [24], showing the highest vegetative peak and maximum physiological development [25]. For spectral data acquisition, we used a Sensefly eBee RTK fixedwing unmanned aerial vehicle (UAV), with autonomous take-off control, flight plan, and landing. The overflights were performed with 75% lateral and 80% longitudinal overlap of the taken images. The overflight was performed at 100 m altitude, allowing a spatial image resolution of 0.10 m. The aircraft can fly between 50 and 400 ha field per flight (at a fully charged battery and dependent of the needed spatial resolution, wind speed and flight altitude). The flight autonomy depends on the image's desired spatial resolution and the overlap of the passes or flight lines, which can be up to 45 min. The aircraft has 0.96 m of wingspan, and the weight without a camera and battery is 0.46 kg. The nominal take-off weight with camera e battery is 0.73 kg. The nominal cruise speed is between 40 and 90 km h −1 (wind dependent).
SenseFly eBee RTK was equipped with a luminosity sensor and the Parrot Sequoia multispectral camera, with 1280 × 960 pixels and pixel size of 3.75 × 3.75 µm (Focal Length of 3.98 mm). The Parrot Sequoia includes a sunshine sensor at the top of the equipment, which registers the sun's total spectral irradiance at-sensor level and, thus, facilitates the automatic determination of the atsensor reflectance. The assumed Full-Width Half Maximum (FWHM) provided in the specification sheet, by guessing the shape of the relative spectral response function [26], are: Green 530-570 nm; Red 640-680 nm; Rededge 730-740 nm; and NIR 770-810 nm. The overflights were carried out near the zenith due to the minimization of the shadows of the trees, at 11 a.m., given that the multispectral sensor is passive type, that is, dependent on the solar luminosity.
The following wavelengths were evaluated: green (550 nm), red (660 nm), near-infrared (735 nm), and infrared (790 nm). The information acquired in these wavelengths allowed calculating the different vegetation indices, as shown in Table 2. The aerial survey was carried out using Real-Time Kinematics (RTK) technology, which was used to estimate the position of the camera at the time of image collection, with an accuracy of 2.5 cm. The images were mosaiced and orthorectified using the Pix4Dmapper software package. The positional accuracy of the orthoimages was verified using ground control points (GCP), obtained via data surveys in combination From each plot, five plants were randomly selected to evaluate the following agronomic traits: first pod height (FPH, cm), plant height (PH, cm), main stem diameter (SD, cm), hundred grain mass (HGM), days to maturity (DM), and grain yield (YIE). A measuring tape was used to evaluate both the FPH and PH. The SD was assessed with the aid of a digital caliper. The DM corresponded to the number of days between the emergence and maturity of the plants. The HGM was assessed using an analytical precision balance and corrected to 13% humidity. The central row of each plot was manually harvested to evaluate the YIE, which was then corrected for 13% humidity and extrapolated to kg ha −1 . Figure 2 demonstrates a diagram of ground data collection.

Statistical analysis
The data were submitted to individual analyses of variance, considering all effects as fixed. After verifying that the ratio between the largest and smallest mean squared errors did not exceed 7.0, a joint analysis was performed in accordance with the model described in Eq. 1. The Scott-Knott test [27] was used for grouping the means.
where Y ijk is the observation for the k-th block evaluated in the i-th cultivar during the j-th crop season; B k is the fixed block effect; G i is the fixed genotype effect; S j is the random crop season effect; G × S ij is the random interaction between genotypes and crops; and ε ijk is the error associated with observation Y ijk .
Pearson's correlations (r) between the evaluated trait pairs were estimated according to Eq. 2: where COV (XY) is the covariance between traits X and Y; σ 2 x is the variance of trait X; and σ 2 y is the variance of the YIE.
The graphical expression was performed using the functional relationship between the correlation coefficient estimates of the different environments, using a correlation network generated using Rbio software [28], in which the proximity between the nodes (traces) was proportional to the total value of the correlation between these nodes [29]. The thickness of the edges was controlled by applying a 0.60 cut-off value, in which only |rij| ≥ 0.60 have their edges highlighted. Thus, positive correlations were highlighted in green, while negatives correlations were highlighted in red.
The path analysis, considering YIE or DM as the principal dependent variable and the wavelengths and VIs as explanatory variables, was conducted according to the model described in Eqs. 3 and 4: where β 1 , β 2 , · · · β 11 are the direct effects for the variables 550, 660, 735, 790, NDVI, SAVI, GNVDI, NDRE, SCCCI, EVI, and MSAVI; and p ε is the residual effect. All statistical analyses were performed using Genes [30], Sisvar [31], and Rbio software, following the criteria recommended by Cruz et al. [32]. Table 3 shows the analyses of variance for the agronomic traits, wavelengths, and vegetation indices evaluated in ten soybean cultivars. There were significant differences (p-value ≤ 0.05) between the genotypes (G) for all analyzed variables. The crop season (S) was not significant for the FPH, GNDVI, and SCCCI. It is important to emphasize that all evaluated variables showed a coefficient of variation (CV) below 20%. The wavelengths and VIs showed the lowest CVs, varying from 1.15 (NDVI) to 7.88% (EVI). The CVs of the agronomic traits varied from 9.33 (DM) to 18.89 (SD). These results reveal a high precision of the measurements, especially for the spectral variables, and the possibility of an accurate association between the spectral variables and cycle and yield-related variables.

Results
Regarding the grouping of the agronomic trait means (Table 4), the genotype 7739 had the highest FPH, NB, and diameter of the main stem (DS). The genotype 8372 had the highest FPH, NB, and DM, while the genotype Bonus showed the highest PH and SD. The genotype Pop had higher PH, NB, and SD means along with a lower DM. The genotypes Maracai and Foco showed the highest grain yields throughout the crop seasons. The genotype 7739 presented the highest means for all assessed wavelengths, as shown in Table 5. Other genotypes obtained high means for two of the wavelengths, including 8372 (660 and 735 nm), Bonus (550 and 790 nm), Foco, and Maracai (735 and 790 nm). Table 6 shows the mean groupings of the VIs between the genotypes. It is important to note that the genotypes Maracai and Foco belonged to the groups with the highest means for all VIs. The genotypes 7067, 71,110, Bonus, Desafio, Maracai, Foco, and Soyouro obtained the highest means for the GNDVI. For the NDVI, the genotypes 7739, 8372, Bonus, Maracai, and Foco presented the better results.
The Pearson's correlation network between the evaluated variables is shown in Fig. 3. The YIE was positively correlated with the NDVI and certain wavelengths (735 and 790 nm). The path analysis considering the DM as the principal dependent variable is shown in Table 7. The GNDVI and NDRE vegetation indices had the highest direct effects (module), which were also in the same direction as their correlations with the DM. Table 8 shows the direct and indirect effects of the wavelengths and VIs on the YIE. The NDVI had a higher direct effect (module), which was in the same direction as its correlations with the YIE and MSAVI. The coefficients of determination (R) for the path analysis considering     The relationships between the NDVI, NDRE, and GNDVI with the grain yield and days to maturity are shown in Fig. 4. The dashed lines between the VIs show a positive correlation and high magnitude between the NDRE and GNDVI. The NDVI presents a direct positive effect on the grain yield. As previously mentioned, there is a direct relationship between both variables, so it is possible to estimate the final production of a crop using NDVI data. The NDRE and GNDVI showed direct negative effects on the days to maturity, meaning that the higher the NDVI and GNDVI, the lower the DM of the crop.

Discussion
The significance of the other variables can be related to the distinct weather conditions between the crop seasons, such as rainfall and temperature. The interaction (G × S) was significant for the first pod height (FPH), plant height (PH), number of branches (NB), days to maturity (DM), grain yield (YIE), 550 nm wavelength, and NDVI.
The variables differ between the genotypes due to their genetic and morphological characteristics and the environmental conditions to which they were subjected. Overall, the plants showed a low reflectivity in the visible spectrum range (400 to 700 nm) due to the influence of chlorophyll, the most abundant pigment in leaves. The chlorophyll presented a high absorption of blue and red wavelengths, while reflecting the green wavelength, which is remarkable, especially in healthy plants [33].
The presence of genotypes Maracai and Foco in the groups with the highest means for all VIs is interesting,  as these genotypes were the ones with the highest grain yield means. This finding supports the existence of a high correlation between the YIE and VIs during the reproductive stage of soybean, in which the plant reaches the maximum leaf area index and consequently has a high photosynthetic rate [34]. Another meaningful relationship was observed for Pop and Soyouro, which were the earliest genotypes and showed the highest means for the NDRE and SCCCI indices. This finding is supported by the association between soybean reflectance and phenological crop stage, in which cultivars with short cycles have faster development and higher chlorophyll concentration [35].
For the VIs, high values were only obtained for the NDVI and GNDVI. This result can be explained by the greater sensitivity of these VIs to the identification of canopy biomass, since both the GNDVI and NDVI are more sensitive to detect differences in the plant canopy [36], especially in terms of chlorophyll content and photosynthetic activity [37]. The VIs EVI, SAVI and MSAVI differ from NDVI and GNDVI especially by using correction factors, such as areas with a high presence of bare soil [6], while NDRE is more sensitive to detecting differences in late stages of growth, characterizing one of the possible reasons why the NDVI and GNVI values are higher.
The positive correlation between YIE x NDVI and wavelengths (735 and 790 nm) indicates that the higher the estimates of these wavelengths and the NDVI, the higher the grain yield achieved by the evaluated genotypes. Such results are relevant because although the grain yield is the most crucial trait in a soybean breeding program, it has low heritability due to the high environmental effect and laborious measurement [16,38].
In this sense, including the NDVI and 735/790 nm wavelength measurements as auxiliary variables for selecting soybean genotypes is a promising strategy since they are easier to measure, faster to obtain, require less labor, and provide more accurate results compared to grain yield measurements [7,39]. The NDVI and 735/790 nm wavelengths can remotely measure a large number of candidates for selection [16], which can improve the efficiency of breeding programs. In addition, the NDVI was positively correlated with the 735, 790, and 550 nm wavelengths, which in turn showed a positive correlation with the 660 nm wavelength. There was also a strong negative correlation between DM, GNDVI, and NDRE. Although important, Pearson's correlation coefficients can produce misunderstandings regarding the relationship between two variables, which may not be a true cause-and-effect relationship. A high or low correlation coefficient between two variables may result from the effects of a third variable or group of variables, thus not giving the exact relative importance of the direct and indirect effects of these factors [32]. Therefore, we performed path analysis, which investigates cause-and-effect relationships. This analysis promotes a detailed understanding of the effects of the variables involved and justifies the existence of positive and negative correlations (of a high or low magnitude) between the studied variables [40].
However, to obtain the direct and indirect effects by path analysis, the matrix X′X must be well-conditioned. Under the presence of multicollinearity, the variances associated with the path coefficient estimators can reach the highest values, becoming unreliable. Furthermore, the parameter estimates can assume values beyond the parametric space [32]. According to the criteria established in Montgomery and Peck [41], the obtained phenotypic correlation matrix estimates has strong multicollinearity since the condition number (CN) was equal to 521 and 223 when considering the YIE and DM as principal dependent variables, respectively. The CN of the phenotypic correlation matrix is calculated by the ratio of its highest eigenvalue over its lowest eigenvalue. When the condition number is less than 100, multicollinearity is weak; between 100 and 1000, multicollinearity is moderate to strong; finally, when greater than 1000, multicollinearity is severe [41]. Thus, a constant k = 0.05 was added to the X′X diagonal matrix to correct the multicollinearity for both cases.
These results of path analysis on days to maturity reveal a negative cause-and-effect relationship between the VIs and DM. Thus, the higher the values of these indices, the earlier the soybean genotypes are. This is due to the rapid initial development and higher chlorophyll concentration of these genotypes [35].
For path analysis on grain yield, we found a positive cause-and-effect relationship between NDVI and YIE. Thus, the higher the NDVI values, the higher the yield of the soybean genotypes. Lopresti et al. [14] reported that wheat crop monitoring using grain yield maps (obtained using the NDVI) could predict the grain yield 30 days before harvest. The NDVI allows for the monitoring of the soybean biomass growth, which provides information throughout the sub-periods of the crop cycle, thus establishing production estimates [42].
Soybean breeding programs increasingly seek to develop early soybean genotypes to facilitate the cultivation of crops, such as maize and cotton, during the second harvest. Thus, the DM is a continuously evaluated trait in hundreds of genotypes, but there is a lack of information concerning its relationships with the emitted wavelength and vegetation indices. The results provided by the correlation network demonstrate a negative association between the NDRE, GNDVI, and SCCCI with the DM, indicating that genotypes with higher values for these VIs can be selected for earliness. This is an important finding for soybean breeding, as it reveals the possibility of identifying early genotypes by UAV-based HTP using the VIs mentioned. Whereas traditional phenotyping of the soybean cycle is a time and labor-consuming task, requiring daily field visits to count the number of days to maturity, the use of VIs as a tool for selecting early genotypes can contribute to a significant decrease in the time and effort spent on this step of the program.
The acquisition of large-scale phenotypic data has become one of the major bottlenecks hindering crop breeding [22]. Our study provides relevant information to support management and decision-making in soybean breeding since we demonstrate that it is possible to select genotypes for earliness and yield through an easy and economical high-throughput phenotyping approach. Using the approach employed here, which involves the employment of statistical techniques to study the relationship between agronomic traits and VIs as well as the selection of genotypes based on VIs obtained by UAV imagery can increase the efficiency of current breeding programs by enabling large-scale evaluations with time and labor savings. In this sense, further studies addressing yield and maturity prediction of soybean genotypes based on the vegetation indices studied here are very promising and could be used to improve the efficiency of phenotypic evaluations in breeding programs.

Conclusions
Soybean genotypes with a high grain yield (Maracai and Foco) have higher vegetation index values, especially for the 735 and 790 nm wavelengths and NDVI. This vegetation index has a cause-and-effect relationship with the grain yield of soybean. Our findings suggest that NDVI can be used for high-throughput phenotyping to select genotypes for high grain yield in soybean breeding programs.
The earliest soybean genotypes have a higher NDRE and GNDVI. Due to the requirement for earlier genotypes, the number of days to maturity has been increasingly evaluated in soybean breeding programs. For a cause-and-effect relationship with the DM, we recommend that the NDRE and GNDVI vegetation indices be used for high-throughput phenotyping in soybean breeding programs seeking to select earlier genotypes.