Skip to main content

Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data

Abstract

Background

Above-ground biomass (AGB) is a basic agronomic parameter for field investigation and is frequently used to indicate crop growth status, the effects of agricultural management practices, and the ability to sequester carbon above and below ground. The conventional way to obtain AGB is to use destructive sampling methods that require manual harvesting of crops, weighing, and recording, which makes large-area, long-term measurements challenging and time consuming. However, with the diversity of platforms and sensors and the improvements in spatial and spectral resolution, remote sensing is now regarded as the best technical means for monitoring and estimating AGB over large areas.

Results

In this study, we used structural and spectral information provided by remote sensing from an unmanned aerial vehicle (UAV) in combination with machine learning to estimate maize biomass. Of the 14 predictor variables, six were selected to create a model by using a recursive feature elimination algorithm. Four machine-learning regression algorithms (multiple linear regression, support vector machine, artificial neural network, and random forest) were evaluated and compared to create a suitable model, following which we tested whether the two sampling methods influence the training model. To estimate the AGB of maize, we propose an improved method for extracting plant height from UAV images and a volumetric indicator (i.e., BIOVP). The results show that (1) the random forest model gave the most balanced results, with low error and a high ratio of the explained variance for both the training set and the test set. (2) BIOVP can retain the largest strength effect on the AGB estimate in four different machine learning models by using importance analysis of predictors. (3) Comparing the plant heights calculated by the three methods with manual ground-based measurements shows that the proposed method increased the ratio of the explained variance and reduced errors.

Conclusions

These results lead us to conclude that the combination of machine learning with UAV remote sensing is a promising alternative for estimating AGB. This work suggests that structural and spectral information can be considered simultaneously rather than separately when estimating biophysical crop parameters.

Background

Above-ground biomass (AGB) is a basic agronomic parameter for field investigation and is frequently used to indicate crop growth status, the effects of agricultural management practices, and the ability to sequester carbon above and below ground [1, 2]. The conventional way to obtain AGB is to use destructive sampling methods that require manual harvesting of crops, weighing, and recording, which makes large-area, long-term measurements challenging and time consuming. However, with the diversity of platforms and sensors and the improvements in spatial and spectral resolution, remote sensing is now regarded as the best technical means for monitoring and estimating AGB over large areas [3].

Many studies have used satellite remote-sensing images as a data source to estimate various vegetation biomasses, such as grassland [3, 4], forest [5,6,7,8], croplands [9,10,11], and wetland [7, 12]. Most research heretofore has focused on forest and has used the vegetation indices (VIs) to build models, especially the normalized difference vegetation index (NDVI). Although satellite remote sensing can be used for large-scale observation, it remains limited by cloud cover, satellite revisit time, coarse resolution [13]. Remote sensing using a low-altitude Unmanned Aerial Vehicle (UAV) is more flexible than satellite remote sensing, thereby overcoming these restrictions and providing remote-sensing data with higher temporal, spatial, and spectral resolution. As a result, UAV remote sensing is becoming a promising tool for frequent observations [14]. The higher spatial resolution allows more accurate extraction of plant-height information from digital images, thereby providing an attractive alternative based on modeling of plant height to estimate biomass. Plant height can be obtained from the crop surface model (CSM), which is created by using structure-from-motion techniques. Several studies have already used CSM to estimate plant height and biomass for various crops, including maize [15,16,17], rice [18], barley [19, 20], cotton [21, 22], sugarcane [23], wheat [24] and sorghum [16, 25]. Previous studies have confirmed that combining spectral information and plant-height information can improve biomass estimates [1, 26,27,28,29].

A literature review reveals that machine-learning methods are more prevalent in combination with satellite remote-sensing data. To estimate the biomass of a region, such approaches usually classify the vegetation first and then calculate the number of pixels of each class [30]. Yang, et al. [3] used the back propagation artificial neural network (BP-ANN) model to estimate grassland AGB at 500 m spatial resolution and demonstrated that the BP-ANN model achieves better results than the traditional multifactor regression models (R2 = 0.75–0.85 vs. 0.40–0.64, RMSE = 355–462 vs. 537–689 kg DW/ha). Mutanga et al. [12] used random forest regression and WorldView-2 imagery to predict wetland biomass and compared the results with those of stepwise multiple linear regression (MLR). The results demonstrate that random forest regression is more advantageous for estimating high-density biomass. Zhang et al. [31] used Landsat data and four machine-learning regression algorithms [support vector machine (SVM), random forest (RF), k-nearest neighbor (k-NN), and ANN] to estimate both live and total sawgrass biomass. The results indicate that ANN and SVM produce similar results for estimating live biomass.

However, few studies have used structural and spectral information provided by UAV remote sensing in combination with machine learning to estimate maize biomass. The specific objectives of this study therefore include (1) comparing the performance of different machine-learning modeling methods to estimate maize AGB, (2) verifying an improved method to extract plant height and obtain an indicator to estimate AGB; and (3) to explore the potential of machine-learning modeling based on remote sensing to quantify AGB.

Methods

Experimental materials and field measurements

The study area was located in the research station of Xiao Tangshan National Precision Agriculture Research Center of China, Changping District of Beijing City (115°50′17″–116°29′49″E, 40°20′18″–40°23′13″N), at an average elevation of 36 m. The study area has a warm temperate semi-humid continental monsoon climate, with the rainy season lasting from June to August. The average annual temperature is 11.8 °C [29]. Eight hundred plots were planted at a seeding density of 6 plants/m2 with a row spacing of 0.6 m and divided into four groups: mixed, TEM (temperate), TST (tropical/subtropical) and DH (doubled-haploid) according to the genetic background differences. The plots were 2 m × 2.4 m, and 72 plots were used as sampling plots for destructive biomass measurements; all other non-destructive measurements were made on other plots on June 28 and July 11, 2017 (Fig. 1). All plots were seeded on May 15, 2017.

Fig. 1
figure 1

Maize experiment at Xiao Tangshan National Precision Agriculture Research Center, Changping, 2017. a UAV platform and sensors. b Experimental site. “GCPs” refers to the ground control points used to limit errors and improve the accuracy of plant-height extraction

Sixteen ground control points (GCPs) distributed evenly within the field were used to obtain accurate geographical references and were located with millimeter accuracy by using a Differential Global Positioning System (DGPS, South Surveying & Mapping Instrument Co., Ltd., Guangzhou, China). Three plants were selected at random in the central part of sampling plots for measuring plant height and fresh biomass. The plant height was measured manually with a telescopic leveling rod. The mean height of the three plants was used as the canopy height of the given sampling plot. Next, the three plants were subjected to destructive biomass sampling. Fresh biomass was sealed in plastic bags and weighed on the same day. Finally, the masses were rescaled to kg/m2 by counting the actual number of plants in each sampling plot. Because 14 fresh biomass samples were not available due to a record problem in the laboratory, the total fresh biomass sample size numbers 130. Table 1 summarizes the data obtained from field measurements.

Table 1 Basic statistics of the field measurements

Unmanned aerial vehicle and camera setup

The digital and multispectral imagery was collected over three flights with an octocopter DJI Spreading Wings S1000 UAV (SZ DJI Technology Co., Shenzhen, China) platform equipped with two cameras. Digital imagery was collected by using a 20.2 megapixel Cyber-shot DSC-QX100 (Sony Electronics, Inc., Tokyo, Japan). Multispectral imagery were collected with a 1.2 megapixel Parrot Sequoia camera (MicaSense Inc., Seattle, USA), which captures four discrete spectral bands: green (wavelength = 550 nm, bandwidth = 40 nm), red (660 nm, 40 nm), red-edge (735 nm, 10 nm), and near infrared (790 nm, 40 nm). The radiometric calibration images of Parrot Sequoia camera were captured on the ground before and after each flight by using a calibrated reflectance panel (MicaSense Inc., Seattle, USA). The Parrot Sequoia camera relies on a sunshine sensor to automatically adjust the readings to ambient light to minimize error during image capture [32].

Flight paths over the trial area were designed by the DJI ground station, yielding six strips. The forward overlap was 80% and the lateral overlap was 75%. The flight speed was fixed at 6 m/s. ISO and shutter speed were fixed at 160 and 1/2000, respectively. The flight altitude above ground level (AGL) on June 28 and July 11, 2017 was 60 m. The ground sampling distances for digital and multispectral imagery were approximately 1.3 and 5.5 cm, respectively. To obtain a high-precision digital elevation model, the flight altitude above ground level for the first flight on June 8, 2017 was 40 m, yielding a ground sampling distance of 0.72 cm. The details of the UAV data acquisition are listed in Table 2.

Table 2 Details of UAV data acquisition

Image processing and data extraction

A Pix4Dmapper Pro (version 4.0, PIX4D, Lausanne, Switzerland) was used to produce digital surface models (DSMs), generate orthomosaics, do radiometric calibration, and calculate vegetation indices. The key steps of this process included image geolocation, importing ground control points, aligning images, building a dense point cloud, building a DSM and an orthomosaic, processing and calibrating radiometric information, and generating vegetation indices (VIs) maps. Sixteen ground control points in the Pix4D project were used to georeference the study area, increase the global accuracy, and reduce noise. The contents listed in Table 3 were used to evaluate the accuracy of DSMs. Radiometric calibration was done by using radiometric calibration images with known reflectance values provided by MicaSense. The radiometric corrections were used to improve the radiometric quality of the data and correct the images reflectance. Seven near-infrared VIs maps and four visible-band VIs maps were produced by using the index calculator in the Pix4D software. The calculated VIs are listed in Table 4. Related computation formulas are shown in Additional file 1.

Table 3 Processing quality report for evaluating the accuracy of DSMs
Table 4 Spectral vegetation indices used in this study to evaluate maize above ground biomass

In the second column, the letters represent spectral reflectance, such as NIR, which represents near-infrared reflectance in the UAV multispectral images.

Because these VIs can respond to different targets, we used Otsu algorithm [47] to determine thresholds and binarize the VIs maps, and then separated plants from the soil background in these VIs maps. ArcMap (version 10.2, Esri Inc., Redlands, USA) was used to create the area of interest (AOI) with separated plant areas and to extract the average VI for each plot. This process was also applied to extract plant height.

The CSM, which is widely used to extract plant-height information from different crops, was used in the present study. The CSM can be obtained by subtracting the digital elevation model from the DSM by using the raster calculator in ArcMap. On June 8th, 2017, the maize was about at the growth stage 13 (BBCH-scale) [48] and had an average height of less than 20 cm. We extracted 1332 elevation points from the DSM on June 8th from locations not covered with vegetation and interpolated a digital elevation model (DEM) from these data by using the Kriging spatial interpolation method. Thus, two CSMs were created (one on June 28th and one on July 11th).

We propose an improved method to filter out the point cloud formed by the soil background and the lower leaves. The method involves using image segmentation and kernel neighborhood maximal calculation (i.e., kernel thinning) to create a set of pixels that image the upper leaves of multiple plants. Resampling was used to control the number of pixel points involved in computation. These pixel points have three-dimensional spatial coordinates and thus have spatial distribution characteristics. Considering spatial variation, Kriging interpolation was done on these three-dimensional pixel points to generate a plant-height surface. The peak values on the surface were extracted as the representative values of plant height at the plot scale. Using areas of interest (only cover vegetation), we extracted plant-height information from the above results by using ENVI software (version 4.5, Esri Inc., Redlands, USA). All the concepts and terminology related to the above contents are illustrated in Additional file 2.

The canopy elevation relief ratio (CRR) were calculated by using the plant-height data. The CRR is commonly used in forestry studies as a metric that describes the relative shape of the canopy; it reflects the degree to which outer-canopy surfaces are in the upper (CRR > 0.5) or lower (CRR < 0.5) portions of the height range [49, 50]. Because the CRR is susceptible to outliers, we made a simple adjustment. The BIOVP is the sum of pixel values (i.e., plant height) in the CSM without soil background and after resampling (Additional file 2). The definitions of these three variables appear in Table 5.

Table 5 Definitions of three plant height-related metrics this study used for biomass estimation

CRR is the canopy elevation relief ratio with a simple adjustment. The maximum (PH10%max) and minimum (PH10%min) values are calculated by using the top 10% and bottom 10% plant-height data in a plot, respectively. PHkri is the plant height calculated by using a Kriging interpolation. BIOVP is a volume metric used to estimate crop biomass within a plot. S represents the area covered by plants after resampling and image segmentation, PHi indicates the plant height represented by the ith pixel, and N is the number of pixels within S.

Selecting predictor variables

A high Pearson’s correlation was found between AGB and some predictors, such as BIOVP, PHkri, VARI, CRR, and NGRDI. However, multi-collinearity is also present between these continuous predictor variables (Fig. 2). Data redundancy and multi-collinearity can increase model complexity and seriously affect regression performance [51]. The goal of selecting predictor variables is to find the optimal subset from the input, thereby reducing the effect of noise or uncorrelated variables, improving prediction performance, and reducing runtime [52, 53]. The recursive-feature-elimination (RFE) algorithm provides a way to automatically select predicator variables by repeatedly creating a model and removing predictors with low weights. This study uses the R package “caret” (version 6.0-80) [54] to implement this algorithm, which is based on the Gini criterion with repeated tenfold cross validation within the context of a random forest model [55, 56]. The subset of the recursive results with the smallest error served as the subset of predictors. The importance of the selected predictor variables was quantified based on the percent increase in mean square error (IncMSE%) and total increase in node purities (IncNodePurity) [28, 57].

Fig. 2
figure 2

Pearson’s correlation among predictor variables

Modeling and resampling

To obtain the most suitable model for estimating maize AGB by comparative analysis, we adopted three modeling strategies and created four models (Table 6). SVM and ANN models are strict in requiring predictor variables with a common scale, so data pre-processing techniques should be performed on the training set before modeling [57]. In this study, these pre-processing techniques contained data standardization and skewness transformations.

Table 6 Modeling strategies and methods implemented in this study

The cost-penalty parameter C indicates the tolerance to error. When C is large, the model cannot tolerate large error and becomes more flexible, which leads to overfitting. When C is small, the model becomes rigid and is more prone to underfitting. Sigma is a parameter of the radial basis function: a smaller sigma corresponds to fewer support vectors, which affects model training and prediction accuracy [58]. H is the number of hidden units that are linear combinations of some or all predictors, lambda is weight decay that restricts overfitting, and mtry is the number of randomly selected predictors at each split.

Because the 130 samples were each composed of two subsamples corresponding the two different observation dates, the stratified random sampling method was used to divide total samples into training set and test set with a split ratio of 70:30. The tenfold-repeated cross-validation resampling method was used to train and tune models. In this method, the training set was partitioned randomly into 10 subsets of approximately equal size. Each time, 90% of all samples was used to fit the model and the remaining 10% was used as a test set to estimate performance metrics. The 10 resampled performance estimates (i.e., the evaluation metrics of the model’s predictive capabilities) were summarized to analyze the relationship between the tuning parameters and model utility. For one modeling strategy, this procedure was repeated 10 times, yielding 10 random partitions of the training set and 100 training models. With the exception of the MLR model, each model had at least one tuning parameter. The grid-search method with a set of greedy search parameters was applied to find the optimal parameters [59].

To examine how the resampling method affected the training model, a modified bootstrap resampling (632 + boostrap) [60] method was used for comparison. This method consists of repeatedly and randomly selecting a sample from the training set.

Three types of regression diagnostics plots were used to check if the model works well for samples. The tuning-parameter plot shows how to determine the optimal parameter configuration when retaining an evaluation metric during the resampling procedure. The plot of observed values versus predicated values shows outliers or areas where the model is not calibrated and allows us to assess the proximity of the predictions to the actual values. The plot of residual values versus predicated values allows us to check whether a phenomenon appears with a different variance. If the plot shows that residuals do not appear to be randomly scattered about zero with respect to the predicted values, major predictors may be missing from the model. In this plot, marginal rugs were used to visualize the distribution of data with respect to each axis [61].

The coefficient of determination (R2), RMSE, and mean absolute error (MAE) were used as evaluation metrics to quantify the performance of the regression model and to determine how well the model predicts new data and whether the model is too complicated. Equations (1)–(3) are used to calculate R2, MAE, and RMSE:

$${\text{R}}^{2} = 1 - \frac{{\sum\nolimits_{i = 1}^{N} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{N} {\left( {y_{i} - \bar{y}_{i} } \right)^{2} } }}$$
(1)
$${\text{MAE}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {\left| {y_{i} - \hat{y}_{i} } \right|}$$
(2)
$${\text{RMSE}} = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} } }$$
(3)

where N is the total sample size, \(y_{i}\) is the ith measured AGB of the sample, \(\hat{y}_{i}\) is the ith predicted value, and \(\bar{y}_{i}\) is the ith mean measured value.

Comparison analysis was done for both the training set (during cross validation) and the test set. Random-number seeds were set before training each model to ensure that each model had the same data partition and repeats. The results included evaluation metrics from the final model, and we applied a statistical hypothesis to check whether a statistically significant difference existed in the results. More specifically, the student T test was applied if the results were normally distributed and the Wilcoxon rank sum test was applied if the distribution was unknown. The importance to a model of the various predictor variables was evaluated by changing the input value and comparing the sensitivity of the output of the training model, and importance scores are scaled to have a maximum value of 100 and a minimum value of 0.

The Caret package was used to create these machine-learning models in R (version 3.5.1, R Development Core Team, 2018), which created a comprehensive framework for building and evaluating predictive models [57]. The R package ggplot2 and its extension were used to draw figures. A schematic diagram of the methodology appears in Fig. 3. Relevant R code were shown in Additional file 3.

Fig. 3
figure 3

Schematic diagram of methodology used in this study. The red rectangular box contains all predictors extracted from the UAV images, and the blue rectangular box contains modeling methods and analysis procedures

Results

Model evaluation and comparison

Repeated cross validation was used to determine the optimal number of predicator variables required to minimize the RMSE. Figure 4 shows that the RFE algorithm found a minimum RMSE (0.472) where a subset contained six predictor variables. Sorted in terms of decreasing importance of variables, these selected predictors were BIOVP, PHkri, NGRDI, VARI, CRR, and NDVI, which were used for training models and obtaining optimal parameters. For the ANN model, three different weight-decay values were evaluated (lambda = 0.001, 0.01, and 0.10) along with a single hidden layer with sizes ranging from one to six hidden units. The optimal model was the average of five different neural networks created by using different initial values for parameters and used two hidden units with a medium degree of regularization (i.e., lambda = 0.01). For the SVM model, the kernel parameter was estimated analytically to be sigma = 0.3592 and the model was tuned over ten cost values between 0.25 and 128 on the log2 scale. A cost parameter C = 2 for the optimal model minimized the RMSE. As the cost parameter continued to increase, the error also began to increase and the model underfit. The RF model was numerically optimal at mtry = 2, which is also a recommended value (i.e., one third of the number of predictors) [62]. Figure 5 shows how to use the grid-search method to evaluate the optimal parameters of these models.

Fig. 4
figure 4

Cross-validated RMSE by recursive feature elimination (RFE) algorithm. RMSE is minimized with six predictor variables

Fig. 5
figure 5

Tuning parameters when using grid-search method and cross validate. The RMSE was used to select the optimal model using the smallest value. a Artificial neural network model. The optimal ANN model used a medium degree of regularization (i.e., lambda = 0.01) and a single hidden layer with two hidden units. b Support vector machine model. When using the radial basis function, the SVM model was numerically optimal at sigma = 0.3592 and C = 2 on the log2 scale. c Random forest model. The RF model was numerically optimal at mtry = 2

Regression diagnostic plots with marginal rugs from four models all showed that the distribution between the predicted values and the residuals appears to be random about zero, which infers that the six selected variables can adequately replace the other variables (Fig. 6). The RF model had the narrowest residual interval, whereas the MLR model had the widest residual interval. In the training set, the RF model was the most accurate of the four models, with R2 = 0.944 (RMSE = 0.495, MAE = 0.355). Overall, the nonlinear model performed significantly better than the linear model (i.e., the MLR model, with RMSE = 0.986, MAE = 0.714, R2 = 0.757), which revealed a nonlinear relationship between response variable (i.e., AGB) and predictors. Compared with the training set, the prediction ability of the four models in the test set was, to varying degrees, worse. This result may be caused by the small sample size of the test set, which leads to unstable results. The ANN and RF models had a larger R2 (i.e., 0.691 and 0.699, respectively) and a smaller RMSE (i.e., 1.210 and 1.200, respectively) with the test set, and so were better than other models. The two outliers located in the upper-right corner occurred with the test set and may be due to measurement errors. Therefore, poor-quality data was also one reason for the degraded performance of the model with the test set.

Fig. 6
figure 6

Regression diagnostics plots based on four modeling methods. a Multiple linear regression model. b Support vector machine model. c Artificial neural network model. d Random forest model. The horizontal axis represents the predicted AGB obtained from the model, and the vertical axis represents the AGB measured manually at ground level. Marginal rugs in the residuals-analysis plot were used to visualize the distribution of data on each axis. The solid cyan line represents a 1:1 relationship. In the training set, the four models tended to underestimate the AGB, whereas in the test set they tended to overestimate the AGB

Figure 7 shows the difference between performance metrics calculated by using the training-set data and by using the test-set data. Longer lines represented a larger performance difference between the two sets of data. Random forest model was not sensitive to outliers, so it performed best in training set with a relatively large sample size. However, the performance advantages of random forest model in test set with a small sample size are not fully demonstrated in this study. In other words, proportion and distribution of outliers and small sample size narrowed the performance difference between models in the test set, because the advantages and disadvantages of the model were not fully exposed. Overall, the RF model performed best with both the training set and the test set and was thus selected for this study to produce the AGB maps for June 28 and July 11, 2017.

Fig. 7
figure 7

Difference between performance metrics calculated by using cross-validation and test-set data. Longer lines represent a larger performance difference. ANN and RF models had a higher R2 and a lower RMSE both in the training set and test set, which indicates that they performed better than other models

At the 5% significance level, the Wilcoxon test accepted the null hypothesis that two sets of performance metrics calculated by using two different resampling methods were drawn from the same distribution (Fig. 8). From this we inferred that the two resampling methods have no significant difference on creating the optimal model.

Fig. 8
figure 8

Test of significant difference of two resampling methods. For a p value > 0.05, the Wilcoxon test accepted the null hypothesis that two sets of performance metrics calculated by using two different resampling methods were drawn from the same distribution

Mapping above-ground biomass of maize

We estimated the spatial distribution of the AGB at the plot scale based on the selected RF model (Fig. 9a, b). During the period from June 28 to July 11, 2017 strong winds and heavy rainfall lodged maize in some plots, which resulted in abnormal fluctuations in both plant height and spectral information. This was the main reason that the predicted values of some plots decreased instead of increasing (Fig. 9b). In some plots, maize grew rapidly because of abundant rain, causing the AGB to increase significantly in the short term. Most of the lodging plots were planted with TST group (Fig. 9c). Han et al. [63] provides in-depth analysis of the underlying association between maize lodging and the selected feature factors in this study area. These factors included but not limited to genetic backgrounds, terrain and plant height.

Fig. 9
figure 9

Spatial distribution of maize AGB (kg/m2) at the plot scale from RF model estimation. a On June 28, 2017. b On July 11, 2017. c Distribution of maize plots with four genetic backgrounds

ANOVA overall had a p value < 0.05, so we further compared the differences in the mean AGB between each group and all plots without grouping. When Wilcoxon signed-rank test was significant, it was found that DH group was significantly low-AGB compared to all (i.e., without grouping) and TST group was high-AGB compared to all. Because the test was not significant, there was no significant difference in the AGB between Mixed group and all (Fig. 10).

Fig. 10
figure 10

Genotypic differences in maize AGB (kg/m2). a On June 28, 2017. b On July 11, 2017. DH, Mixed, TEM and TST represent four genetic backgrounds of maize. The dashed black line indicates the mean biomass from all plots (i.e., baseMean). The black plus sign indicates the mean biomass from each genetic background (i.e., group). ANOVA is used to determine the existence of differences among four-group means. The Wilcoxon signed-rank test is used to perform comparison of each group against all without grouping (i.e., baseMean). The following convention for symbols indicates statistical significance: p > 0.05 (ns); p ≤ 0.05 (*); p ≤ 0.01 (**); p ≤ 0.001 (***); p ≤ 0.0001 (****). When the test is significant, it is found that DH group is significantly low-AGB compared to all and TST group is high-AGB compared to all. Because the test is not significant, there is no significant difference in the AGB between Mixed group and all

Importance of predictors and BIOVP

Although the predictor variables were the same, the importance of the predictors differed between the four models (Fig. 11a). We summed up the importance scores of predictors and found that the BIOVP scores were the highest (Fig. 11b). In this study, BIOVP can retain the largest strength effect on the AGB estimate, even if different modeling strategies were used to estimate the AGB. Figure 11 also shows that plant height exerts a more direct effect than the VIs for estimating maize AGB. As a volume metric, BIOVP’s bottom area is the sum of all pixel areas imaged by vegetation. Bottom area is the product of image segmentation using vegetation index (i.e., NGRDI). Thus, BIOVP includes implicit spectral information.

Fig. 11
figure 11

Importance scores for predictor variables. a The importance scores difference of predictors in different models. Predictor variables importance scores are the same in the ANN and SVM model. Because NDVI had a very small importance score in the MLR and RF models, removing it from the two models will be taken into consideration and then re-modeled. CRR can also be removed from ANN and SVM for the similar reasons. b The importance scores of predictors are aggregated based on four model types and are displayed on the x axis

To discover the effect of using the BIOVP to estimate AGB, we developed a bivariate linear regression (BLR) model based on the BIOVP (Fig. 12). The performance of the BLR model was even worse than the worst MLR model of the four models mentioned above. The BLR model based on the training set explained 71.7% of the variations in maize AGB, with a RMSE of 1.06 kg/m2 at the plot scale. The residuals-analysis plot revealed a different variance in the BLR model. Because the residuals do not appear to be randomly scattered about zero with respect to the predicted values, some predictors may have been missing from the BLR model. This result also showed that AGB estimates with a single predictor BIOVP were less effective in this study than with multiple predictors.

Fig. 12
figure 12

Bivariate linear-regression model based on the BIOVP. Residuals were not randomly scattered about zero with respect to the predicted values

Discussion

Estimating maize height from UAV images

Plant height is an important crop architecture that is highly correlated with biomass yield, and several researchers have highlighted plant height to be a key contributor to biomass yield [64,65,66]. Because of the small planting area and low planting density in this study, relatively fewer vegetation pixels were contained in the CSM of a plot. In this scenario, if the average method was used to extract the height information from the CSM, the plant-height information would be disturbed by the soil background noise, thus causing an obvious underestimate of plant height. Previous studies have confirmed this result [1, 14, 20, 24, 25, 67, 68]. To tackle this issue, various researchers have suggested using quantiles and maximum statistics to represent plant height at the plot scale. However, these statistics were susceptible to outliers and lack explanatory power. Taking the maximum statistic as an example, from the view of digital photogrammetry technology, plant height is actually just the value of a single pixel after imaging in the CSM. Thus, the value of one pixel represents the height of multiple plants in a plot, which is not appropriate, especially in the case where the size of the plot is small and there are few plants. The most appropriate method is to calculate the plant height at the plot scale by using the pixels representing the upper leaves of multiple plants, which requires considering the spatial distribution of multiple plants in a plot. The approach we propose herein differs from previous approaches in that it considers the spatial distribution of crops and has a good mathematical interpretation. Upon comparing the plant heights calculated by the three methods with manual ground-based measurements, we found that the proposed method increased the ratio of the explained variance (R2 = 0.85 vs 0.61) while reducing the error (RMSE = 14.61 cm vs. 27.59 cm, MAE = 12.36 cm vs. 20.68 cm), which shows that the proposed method is feasible and effective (Fig. 13).

Fig. 13
figure 13

Plant height extracted from CSM versus manual ground-based measurements made with a telescopic leveling rod. PHkri, mean, and maximum are three methods to calculate plant height extracted from CSM. PHobs represents manually measured plant height

Limitations and implications of study

For this study, predictor variables used for estimating maize AGB were collected rapidly and non-destructively by UAV. The UAV remote-sensing data contained uncertainties associated with multiple sources of error, which affected the accuracy of the estimate of maize AGB. Predictors measured by UAV remote sensing were all at the canopy scale and are affected by observation angle, illumination conditions, canopy structure, and leaf-morphology characteristics [69]. Because the VIs are susceptible to the confounding influences of canopy greenness and soil reflectance [70], the accumulation of maize AGB is more directly associated with changes in the physical structure of maize. When calculating the ground-based measurements of AGB on the plot scale, the growth difference between maize plants in a given plot was not considered. For destructive sampling, simply multiplying the average biomass by the number of plants may lead to systematic errors and the appearance of outliers in the data. This study thus used a small number of spectral predictor variables because of the limitation of the four-narrow-band multispectral sensor. In fact, when the UAV platform is equipped with a hyperspectral sensor, more spectral features can be used to estimate AGB [29, 71], which can reduce the collinearity and redundancy of spectral predictors that is caused by similar calculation formulas [28].

This study explored four machine-learning regression algorithms (MLR, SVM, ANN, and RF), all of which produced acceptable accuracy. The RF model yielded the best results with low error and a high ratio of the explained variance. In this study, nonlinear regression models performed significantly better than the MLR model because the former could fit the nonlinear relationship existing within the data. However, a distinct advantage of the MLR model is that it is highly interpretable [57]. The MLR model can thus be used to determine the strength of the effect that one or more predictor variable may have on a response variable by using the standardized partial regression coefficient [72].

Note also that limitations exist in the comparison of models. Because the sample size is small, the advantages and disadvantages of using different modeling strategies are not fully demonstrated. For example, the ANN model requires a lot of repeated training to obtain an optimal neural network, which requires more computer time. The inner workings of the ANN and SVM are difficult to understand, which leads to their being treated as black-box models [63]. The RF model has been applied in a wide variety of scientific areas because of its ability to resist overfitting and deal with high-dimensional data [73].

The BIOVP is a volumetric indicator for estimating maize AGB. Because image segmentation is a prerequisite for obtaining the BIOVP, this leads to an increase in the correlation between this indicator and certain spectral indices (e.g., NGRDI and VARI). Because BIOVP includes both spectral and plant-height information, both affect the accuracy of BIOVP calculations. In this study, the BIOVP was calculated by using point clouds based on digital images; point clouds based on LiDAR (light detection and ranging) are also applicable. Thus, further research is required to determine how BIOVP affects AGB estimate for different crops, scale plots, and in other scenarios.

Conclusions

This study used multispectral and digital images collected by a UAV system to estimate maize AGB by using four machine-learning algorithms (MLR, SVM, ANN, and RF). The RF model gave the most balanced results, with low error and a high ratio of the explained variance for both the training set and the test set. We proposed herein an improved method for extracting plant height from UAV images and an indicator (BIOVP) to evaluate crop AGB. The BIOVP considers both structural and spectral information and contributes significantly to improving estimates of maize AGB. The suitability of this approach still needs to be verified for different crops and on different scales. Thus, this work suggests that structural and spectral information can be considered simultaneously rather than separately when estimating biophysical crop parameters.

Abbreviations

AGB:

above-ground biomass

AGL:

flight altitude above ground level

ANN:

artificial neural network

ANOVA:

analysis of variance

AOI:

area of interest

BIOVP:

a volume metric used to estimate crop biomass within a plot

BLR:

bivariate linear regression

CRR:

canopy elevation relief ratio

CIgreen:

chlorophyll index green

CIrededge:

chlorophyll index rededge

CVI:

chlorophyll vegetation index

CSM:

crop surface model

DSMs:

digital surface models

DEM:

digital elevation model

ExG:

excess green index

GLI:

green leaf index

MAE:

mean absolute error

GSD:

ground sampling distance

MLR:

multiple linear regression

NGRDI:

normalized green–red difference index

NDRE:

normalized difference red-edge

NDVI:

normalized difference vegetation index

PHobs:

plant height measured by manpower

PHkri:

plant height calculated by using a Kriging interpolation

RF:

random forest

R2 :

coefficient of determination

RMSE:

root mean square error

RVI:

ratio vegetation index (also called simple ratio)

SVM:

support vector machine

UAV:

unmanned aerial vehicle

VIs:

vegetation indices

VARI:

visible atmospherically resistant index

WDRVI:

wide dynamic range vegetation index

References

  1. Bendig J, Yu K, Aasen H, Bolten A, Bennertz S, Broscheit J, Gnyp ML, Bareth G. Combining UAV-based plant height from crop surface models, visible, and near infrared vegetation indices for biomass monitoring in barley. Int J Appl Earth Obs Geoinf. 2015;39:79–87.

    Article  Google Scholar 

  2. Li W, Niu Z, Huang N, Wang C, Gao S, Wu CY. Airborne LiDAR technique for estimating biomass components of maize: a case study in Zhangye City, Northwest China. Ecol Indic. 2015;57:486–96.

    Article  Google Scholar 

  3. Yang S, Feng Q, Liang T, Liu B, Zhang W, Xie H. Modeling grassland above-ground biomass based on artificial neural network and remote sensing in the Three-River Headwaters Region. Remote Sens Environ. 2018;204:448–55.

    Article  Google Scholar 

  4. Yang X, Xu B, Yunxiang J, Jinya L, Zhu X. On grass yield remote sensing estimation models of China’s northern farming-pastoral ecotone. In: Lee G, editor. Advances in computational environment science. Berlin: Springer; 2012. p. 281–91.

    Chapter  Google Scholar 

  5. Zheng G, Chen JM, Tian QJ, Ju WM, Xia XQ. Combining remote sensing imagery and forest age inventory for biomass mapping. J Environ Manag. 2007;85:616–23.

    Article  CAS  Google Scholar 

  6. Zheng D, Rademacher J, Chen J, Crow T, Bresee M, Le Moine J, Ryu S-R. Estimating aboveground biomass using Landsat 7 ETM + data across a managed landscape in northern Wisconsin, USA. Remote Sens Environ. 2004;93:402–11.

    Article  Google Scholar 

  7. Güneralp İ, Filippi AM, Randall J. Estimation of floodplain aboveground biomass using multispectral remote sensing and nonparametric modeling. Int J Appl Earth Obs Geoinf. 2014;33:119–26.

    Article  Google Scholar 

  8. Anaya JA, Chuvieco E, Palacios-Orueta A. Aboveground biomass assessment in Colombia: a remote sensing approach. For Ecol Manag. 2009;257:1237–46.

    Article  Google Scholar 

  9. Moriondo M, Maselli F, Bindi M. A simple model of regional wheat yield based on NDVI data. Eur J Agron. 2007;26:266–74.

    Article  Google Scholar 

  10. J-h Bai, S-k Li, K-r Wang, Sui X-y, Chen B, Wang F-y. Estimating aboveground fresh biomass of different cotton canopy types with homogeneity models based on hyper spectrum parameters. Agric Sci China. 2007;6:437–45.

    Article  Google Scholar 

  11. Yan N, Wu B. Integrated spatial–temporal analysis of crop water productivity of winter wheat in Hai Basin. Agric Water Manag. 2014;133:24–33.

    Article  Google Scholar 

  12. Mutanga O, Adam E, Cho MA. High density biomass estimation for wetland vegetation using WorldView-2 imagery and random forest regression algorithm. Int J Appl Earth Obs Geoinf. 2012;18:399–406.

    Article  Google Scholar 

  13. Matese A, Toscano P, Di Gennaro S, Genesio L, Vaccari F, Primicerio J, Belli C, Zaldei A, Bianconi R, Gioli B. Intercomparison of UAV, aircraft and satellite remote sensing platforms for precision viticulture. Remote Sens. 2015;7:2971.

    Article  Google Scholar 

  14. Aasen H, Burkart A, Bolten A, Bareth G. Generating 3D hyperspectral information with lightweight UAV snapshot cameras for vegetation monitoring: from camera calibration to quality assurance. Isprs J Photogramm Remote Sens. 2015;108:245–59.

    Article  Google Scholar 

  15. Geipel J, Link J, Claupein W. Combined spectral and spatial modeling of corn yield based on aerial images and crop surface models acquired with an unmanned aircraft system. Remote Sens. 2014;6:10335–55.

    Article  Google Scholar 

  16. Pugh NA, Horne DW, Murray SC, Carvalho G, Malambo L, Jung J, Chang A, Maeda M, Popescu S, Chu T, Starek MJ, Brewer MJ, Richardson G, Rooney WL. Temporal estimates of crop growth in sorghum and maize breeding enabled by unmanned aerial systems. Plant Phenome J. 2018;1:170006.

    Google Scholar 

  17. Varela S, Assefa Y, Prasad PVV, Peralta NR, Griffin TW, Sharda A, Ferguson A, Ciampitti IA. Spatio-temporal evaluation of plant height in corn via unmanned aerial systems. J Appl Remote Sens. 2017;11:12.

    Article  Google Scholar 

  18. Bendig J, Willkomm M, Tilly N, Gnyp M, Bennertz S, Qiang C, Miao Y, Lenz-Wiedemann V, Bareth G. Very high resolution crop surface models (CSMs) from UAV-based stereo images for rice growth monitoring in Northeast China. Int Arch Photogramm Remote Sens Spat Inf Sci. 2013;40:45–50.

    Article  Google Scholar 

  19. Bendig J, Bolten A, Bennertz S, Broscheit J, Eichfuss S, Bareth G. Estimating biomass of barley using crop surface models (CSMs) derived from UAV-based RGB imaging. Remote Sens. 2014;6:10395–412.

    Article  Google Scholar 

  20. Brocks S, Bareth G. Estimating barley biomass with crop surface models from oblique RGB imagery. Remote Sens. 2018;10:268.

    Article  Google Scholar 

  21. Chu TX, Chen RZ, Landivar JA, Maeda MM, Yang CH, Starek MJ. Cotton growth modeling and assessment using unmanned aircraft system visual-band imagery. J Appl Remote Sens. 2016;10:17.

    Article  Google Scholar 

  22. Muharam FM, Bronson KF, Maas SJ, Ritchie GL. Inter-relationships of cotton plant height, canopy width, ground cover and plant nitrogen status indicators. Field Crop Res. 2014;169:58–69.

    Article  Google Scholar 

  23. Souza CHWD, Lamparelli RAC, Rocha JV. Height estimation of sugarcane using an unmanned aerial system (UAS) based on structure from motion (SfM) point clouds. Int J Remote Sens. 2017;38:2218–30.

    Article  Google Scholar 

  24. Holman FH, Riche AB, Michalski A, Castle M, Wooster MJ, Hawkesford MJ. High throughput field phenotyping of wheat plant height and growth rate in field plot trials using UAV based remote sensing. Remote Sens. 2016;8:1031.

    Article  Google Scholar 

  25. Watanabe K, Guo W, Arai K, Takanashi H, Kajiya-Kanegae H, Kobayashi M, Yano K, Tokunaga T, Fujiwara T, Tsutsumi N, Iwata H. High-throughput phenotyping of sorghum plant height using an unmanned aerial vehicle and its application to genomic prediction modeling. Front Plant Sci. 2017;8:11.

    Article  Google Scholar 

  26. Tilly N, Aasen H, Bareth G. Fusion of plant height and vegetation indices for the estimation of barley biomass. Remote Sens. 2015;7:11449.

    Article  Google Scholar 

  27. Jing R, Gong ZN, Zhao WJ, Pu RL, Deng L. Above-bottom biomass retrieval of aquatic plants with regression models and SfM data acquired by a UAV platform—a case study in Wild Duck Lake Wetland, Beijing, China. Isprs J Photogramm Remote Sens. 2017;134:122–34.

    Article  Google Scholar 

  28. Li W, Niu Z, Chen HY, Li D, Wu MQ, Zhao W. Remote estimation of canopy height and aboveground biomass of maize using high-resolution stereo images from a low-cost unmanned aerial vehicle system. Ecol Ind. 2016;67:637–48.

    Article  Google Scholar 

  29. Yue JB, Yang GJ, Li CC, Li ZH, Wang YJ, Feng HK, Xu B. Estimation of winter wheat above-ground biomass using unmanned aerial vehicle-based snapshot hyperspectral sensor and crop height improved models. Remote Sens. 2017;9:19.

    Article  Google Scholar 

  30. Ali I, Greifeneder F, Stamenkovic J, Neumann M, Notarnicola C. Review of machine learning approaches for biomass and soil moisture retrievals from remote sensing data. Remote Sens. 2015;7:15841.

    Google Scholar 

  31. Zhang C, Denka S, Cooper H, Mishra DR. Quantification of sawgrass marsh aboveground biomass in the coastal Everglades using object-based ensemble analysis and Landsat data. Remote Sens Environ. 2018;204:366–79.

    Article  Google Scholar 

  32. Hassan M, Yang M, Rasheed A, Jin X, Xia X, Xiao Y, He Z. Time-series multispectral indices from unmanned aerial vehicle imagery reveal senescence rate in bread Wheat. Remote Sens. 2018;10:809.

    Article  Google Scholar 

  33. Gitelson AA, Gritz † Y, Merzlyak MN. Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J Plant Physiol. 2003;160:271–82.

    Article  CAS  Google Scholar 

  34. Vincini M, Frazzi E, D’Alessio P. A broad-band leaf chlorophyll vegetation index at the canopy scale. Precis Agric. 2008;9:303–19.

    Article  Google Scholar 

  35. Gitelson A, Merzlyak MN. Quantitative estimation of chlorophyll-a using reflectance spectra: experiments with autumn chestnut and maple leaves. J Photochem Photobiol B Biol. 1994;22:247–52.

    Article  CAS  Google Scholar 

  36. Pearson RL, Miller LD. Remote mapping of standing crop biomass for estimation of the productivity of the shortgrass prairie. In: Remote sensing of environment, VIII. 1972. p. 7–12.

  37. Serrano L, Filella I, Peñuelas J. Remote sensing of biomass and yield of winter wheat under different nitrogen supplies. Crop Sci. 2000;40:723–31.

    Article  Google Scholar 

  38. Jordan CF. Derivation of leaf-area index from quality of light on the forest floor. Ecology. 1969;50:663–6.

    Article  Google Scholar 

  39. Tucker CJ. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens Environ. 1979;8:127–50.

    Article  Google Scholar 

  40. Rouse JW Jr, Haas RH, Schell JA, Deering DW. Monitoring vegetation systems in the great plains with ERTS. In: Freden SC, Mercanti EP, Becker MA, editors. Third earth resources technology satellite-1 symposium, vol. 1. Washington: NASA; 1974. p. 309–17.

    Google Scholar 

  41. Gitelson AA. Wide dynamic range vegetation index for remote quantification of biophysical characteristics of vegetation. J Plant Physiol. 2004;161:165–73.

    Article  CAS  Google Scholar 

  42. Louhaichi M, Borman M, Johnson D. Spatially located platform and aerial photography for documentation of grazing impacts on wheat. Geocarto Int. 2001;16:65–70.

    Article  Google Scholar 

  43. Eraymondjr H, Cst D, Januh E, Long D. Remote sensing leaf chlorophyll content using a visible band index. Agron J. 2011;103:1090.

    Article  Google Scholar 

  44. Gitelson AA, Viña A, Arkebauer JT, Rundquist DC, Galina K, Bryan L. Remote estimation of leaf area index and green leaf biomass in maize canopies. Geophys Res Lett. 2003;30:1248.

    Article  Google Scholar 

  45. Gitelson AA, Kaufman YJ, Stark R, Rundquist D. Novel algorithms for remote estimation of vegetation fraction. Remote Sens Environ. 2002;80:76–87.

    Article  Google Scholar 

  46. Woebbecke DM, Meyer GE, Von Bargen K, Mortensen DA. Color indices for weed identification under various soil, residue, and lighting conditions. In: American society of agricultural engineers meeting. 1994.

  47. Otsu N. A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern. 1979;9:62–6.

    Article  Google Scholar 

  48. Lancashire PD, Bleiholder H, Den Boom TV, Langeluddeke P, Stauss R, Weber E, Witzenberger A. A uniform decimal code for growth stages of crops and weeds. Ann Appl Biol. 1991;119:561–601.

    Article  Google Scholar 

  49. Pike RJ, Wilson SE. Elevation-relief ratio, hypsometric integral, and geomorphic area-altitude analysis. Geol Soc Am Bull. 1971;82(4):1079–84.

    Article  Google Scholar 

  50. Parker GG, Harmon ME, Lefsky MA, Chen JQ, Van Pelt R, Weis SB, Thomas SC, Winner WE, Shaw DC, Frankling JF. Three-dimensional structure of an old-growth Pseudotsuga-tsuga canopy and its implications for radiation balance, microclimate, and gas exchange. Ecosystems. 2004;7:440–53.

    Article  Google Scholar 

  51. Yue JB, Feng HK, Yang GJ, Li ZH. A comparison of regression techniques for estimation of above-ground winter wheat biomass using near-surface spectroscopy. Remote Sens. 2018;10:23.

    Google Scholar 

  52. Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng. 2014;40:16–28.

    Article  Google Scholar 

  53. Guyon Isabelle, Elisseeff Andr. An introduction to variable and feature selection. J Mach Learn Res. 2003;3:1157–82.

    Google Scholar 

  54. Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008;28:1–26.

    Article  Google Scholar 

  55. Lebedev AV, Westman E, Van Westen GJP, Kramberger MG, Lundervold A, Aarsland D, Soininen H, Kłoszewska I, Mecocci P, Tsolaki M, et al. Random Forest ensembles for detection and prediction of Alzheimer’s disease with a good between-cohort robustness. NeuroImage Clin. 2014;6:115–25.

    Article  Google Scholar 

  56. Kuhn M. Variable selection using the caret package. Int Rev Electr Eng. 2010;1:44–9.

    Google Scholar 

  57. Kuhn M, Johnson K. Applied predictive modeling. New York: Springer; 2013.

    Book  Google Scholar 

  58. Nasrabadi NM. Pattern recognition and machine learning. J Electron Imaging. 2007;16:049901.

    Article  Google Scholar 

  59. Bergstra J, Bardenet R, Bengio Y, Kégl B. Algorithms for hyper-parameter optimization. In: 25th annual conference on neural information processing systems (NIPS 2011); 2011-12-12; Granada, Spain. Neural Information Processing Systems Foundation; 2011.

  60. Efron B, Tibshirani R. Improvements on cross-validation: the 632 + bootstrap method. J Am Stat Assoc. 1997;92:548–60.

    Google Scholar 

  61. Chang W. R graphics cookbook: practical recipes for visualizing data. San Francisco: O’Reilly Media Inc.; 2012.

    Google Scholar 

  62. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Article  Google Scholar 

  63. Han L, Yang G, Feng H, Zhou C, Yang H, Xu B, Li Z, Yang X. Quantitative identification of maize lodging-causing feature factors using unmanned aerial vehicle images and a nomogram computation. Remote Sens. 2018;10:1528.

    Article  Google Scholar 

  64. Salas Fernandez MG, Becraft PW, Yin Y, Lübberstedt T. From dwarves to giants? Plant height manipulation for biomass yield. Trends Plant Sci. 2009;14:454–61.

    Article  CAS  Google Scholar 

  65. Alheit KV, Busemeyer L, Liu W, Maurer HP, Gowda M, Hahn V, Weissmann S, Ruckelshausen A, Reif JC, Würschum T. Multiple-line cross QTL mapping for biomass yield and plant height in triticale (× Triticosecale Wittmack). Theor Appl Genet. 2014;127:251–60.

    Article  Google Scholar 

  66. Montes JM, Technow F, Dhillon BS, Mauch F, Melchinger AE. High-throughput non-destructive biomass determination during early plant development in maize under field conditions. Field Crops Res. 2011;121:268–73.

    Article  Google Scholar 

  67. Liebisch F, Kirchgessner N, Schneider D, Walter A, Hund A. Remote, aerial phenotyping of maize traits with a mobile multi-sensor approach. Plant Methods. 2015;11:19.

    Article  Google Scholar 

  68. Matese A, Di Gennaro SF, Berton A. Assessment of a canopy height model (CHM) in a vineyard using UAV-based multispectral imaging. Int J Remote Sens. 2017;38:2150–60.

    Article  Google Scholar 

  69. Walter A, Liebisch F, Hund A. Plant phenotyping: from bean weighing to image analysis. Plant Methods. 2015;11:14.

    Article  Google Scholar 

  70. Jimenezberni JA, Deery DM, Rozaslarraondo P, Condon AG, Rebetzke GJ, James RA, Bovill WD, Furbank RT, Sirault XRR. High throughput determination of plant height, ground cover, and above-ground biomass in wheat with LiDAR. Front Plant Sci. 2018;9:237.

    Article  Google Scholar 

  71. Yue J, Feng H, Jin X, Yuan H, Li Z, Zhou C, Yang G, Tian Q. A comparison of crop parameters estimation using images from UAV-mounted snapshot hyperspectral sensor and high-definition digital camera. Remote Sens. 2018;10:1138.

    Article  Google Scholar 

  72. Graham MH. Confronting multicollinearity in ecological multiple regression. Ecology. 2003;84:2809–15.

    Article  Google Scholar 

  73. Lin X, Sun L, Li Y, Guo Z, Li Y, Zhong K, Wang Q, Lu X, Yang Y, Xu G. A random forest of combined features in the classification of cut tobacco based on gas chromatography fingerprinting. Talanta. 2010;82:1571–5.

    Article  CAS  Google Scholar 

Download references

Authors’ contributions

Liang Han drafted and revised the manuscript. Guijun Yang proposed the conceptualization of this study and reviewed the manuscript. Huayang Dai edited the manuscript. Hao Yang and Liang Han performed field experiments. Bo Xu and Haikuan Feng collected image data. Liang Han, Zhenhai Li and Xiaodong Yang analyzed and interpreted the results. All authors read and approved the final manuscript.

Acknowledgements

We thank the Maize Research Center department of the Beijing Academy of Agriculture and Forestry Sciences for preparing the seed and planting for the trial, and Dr. Yanxin Zhao, Dr. Xiaqing Wang, and Mr. Ruyang Zhang for designing the experiments and helping to collect the field data. We are also grateful to the anonymous reviewers for their valuable comments and recommendations.

Competing interests

The authors declare that they have no competing interests.

Availability of data

The datasets analysed during the current study are available from the corresponding author on reasonable request.

Funding

This study was supported by the National Key Research and Development Program of China (2016YFD0300602), the Natural Science Foundation of China (61661136003), the Beijing Natural Science Foundation (6182011), the Special Funds for Technology innovation capacity building sponsored by the Beijing Academy of Agriculture and Forestry Sciences (KJCX20170423).

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Guijun Yang or Xiaodong Yang.

Additional files

Additional file 1.

Method for calculating total error of estimating GCPs location in UAV images.

Additional file 2.

A schematic illustration for explaining the concepts of BIOVP and PHkri.

Additional file 3.

Running R scripts for machine learning modeling and diagnostic plots.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, L., Yang, G., Dai, H. et al. Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data. Plant Methods 15, 10 (2019). https://doi.org/10.1186/s13007-019-0394-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13007-019-0394-z

Keywords