Improving the estimation of alpine grassland fractional vegetation cover using optimized algorithms and multi-dimensional features

Lin, Xingchen; Chen, Jianjun; Lou, Peiqing; Yi, Shuhua; Qin, Yu; You, Haotian; Han, Xiaowen

doi:10.1186/s13007-021-00796-5

Research
Open access
Published: 17 September 2021

Improving the estimation of alpine grassland fractional vegetation cover using optimized algorithms and multi-dimensional features

Xingchen Lin¹,
Jianjun Chen ORCID: orcid.org/0000-0001-9464-3442^1,2,
Peiqing Lou⁴,
Shuhua Yi³,
Yu Qin⁴,
Haotian You^1,2 &
…
Xiaowen Han^1,2

Plant Methods volume 17, Article number: 96 (2021) Cite this article

2913 Accesses
18 Citations
1 Altmetric
Metrics details

Abstract

Background

Fractional vegetation cover (FVC) is an important basic parameter for the quantitative monitoring of the alpine grassland ecosystem on the Qinghai-Tibetan Plateau. Based on unmanned aerial vehicle (UAV) acquisition of measured data and matching it with satellite remote sensing images at the pixel scale, the proper selection of driving data and inversion algorithms can be determined and is crucial for generating high-precision alpine grassland FVC products.

Methods

This study presents estimations of alpine grassland FVC using optimized algorithms and multi-dimensional features. The multi-dimensional feature set (using original spectral bands, 22 vegetation indices, and topographical factors) was constructed from many sources of information, then the optimal feature subset was determined based on different feature selection algorithms as the driving data for optimized machine learning algorithms. Finally, the inversion accuracy, sensitivity to sample size, and computational efficiency of the four machine learning algorithms were evaluated.

Results

(1) The random forest (RF) algorithm (R²: 0.861, RMSE: 9.5%) performed the best for FVC inversion among the four machine learning algorithms driven by the four typical vegetation indices. (2) Compared with the four typical vegetation indices, using multi-dimensional feature sets as driving data obviously improved the FVC inversion accuracy of the four machine learning algorithms (R² of the RF algorithm increased to 0.890). (3) Among the three variable selection algorithms (Boruta, sequential forward selection [SFS], and permutation importance-recursive feature elimination [PI-RFE]), the constructed PI-RFE feature selection algorithm had the best dimensionality reduction effect on the multi-dimensional feature set. (4) The hyper-parameter optimization of the machine learning algorithms and feature selection of the multi-dimensional feature set further improved FVC inversion accuracy (R²: 0.917 and RMSE: 7.9% in the optimized RF algorithm).

Conclusion

This study provides a highly precise, optimized algorithm with an optimal multi-dimensional feature set for FVC inversion, which is vital for the quantitative monitoring of the ecological environment of alpine grassland.

Introduction

Known as the “Third Pole” and “Water Tower of Asia”, the Qinghai-Tibet Plateau (QTP) plays a very important role in regulating climate and water resources in East Asia and is thus regarded as the trigger and amplifier of climate change in Asia and even the Northern Hemisphere [76, 77]. As the main vegetation type on the QTP, alpine grassland has experienced serious degradation in the past few decades by the combined impact of climate warming, overgrazing, and rodent disturbance [17, 73]. Fractional vegetation cover (FVC) is an ideal indicator for the dynamic monitoring of the vegetation condition of the alpine ecosystems on the QTP [60, 61, 33, 48]. Therefore, high-precision FVC assessment of the alpine grassland on the QTP is of great significance as it provides insight into ecological environment changes and their accompanied influences [54, 82, 91].

Remote sensing technology has been widely used in FVC inversion at the regional scale. The inversion methods are generally divided into three categories: the regression model, the pixel dichotomy model, and machine learning algorithms. The regression model inverts FVC based on the statistical relationship between the vegetation index and measured data. Although this method is easy to implement, it is difficult to extend to other regions, owing to the limitations of the established model itself [25, 66]. The pixel dichotomy model generally determines FVC by dividing the surface features in the mixed pixel into vegetation and non-vegetation categories. However, it is difficult to find pure spectral pixels due to the restriction of the spatial resolution of remote sensing images [24, 38]; [79, 87]. Machine learning algorithms include multiple linear regression (MLR), back-propagation neural networks (BPNNs), support vector regression (SVR), random forest (RF), and a series of other algorithms [26, 39, 53]. The basic idea of machine learning algorithms is to invert FVC by simulating the intrinsic relationship between remote sensing information and FVC [72]. Although many types of algorithms exist, it is still unknown which has the best inversion accuracy and computational efficiency for FVC inversion.

In addition to the algorithm, the selection of features from the remote sensing dataset also has a great impact on the FVC inversion accuracy, such as vegetation indices calculated from original spectral bands of the remote sensing data [23]. The normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), soil-adjusted vegetation index (SAVI), modified soil-adjusted vegetation index (MSAVI), etc., are usually used [83]. To date, it is unknown whether other vegetation indices have a higher correlation with the FVC of alpine grassland than typical vegetation indices. Given the obvious differences in the elevation of the QTP [92], there are great variations in the digital elevation model (DEM), slope, and aspect of the alpine grassland. The influence of these topographical factors, however, is neglected. Although it is considered that analysis driven by a multi-dimensional feature set including original spectral bands, various vegetation indices, and topographical factors can improve the FVC inversion accuracy of the machine learning algorithm, this still needs to be further explored as there are too many features in dataset and data redundancy will inevitably occur, leading to longer training time for the inversion model and overfitting [74, 88]. Therefore, it is essential to eliminate the redundancy of the multi-dimensional feature set, which helps to improve the inversion accuracy and calculation efficiency of alpine grassland FVC.

No matter which kind of FVC remote sensing inversion method is utilized, high-precision measured data of FVC is necessary for calibration and verification. The measured data mainly depends on the field survey. Although traditional survey can obtain high-precision FVC at the quadrat scale, it consumes a lot of manpower and material resources. Therefore, this causes two problems with the current FVC remote sensing inversion method [51, 89]. On one hand, most FVC inversion studies have little or no measured data [8, 57]. On the other hand, the obtained FVC measured data at the quadrat scale by traditional field survey methods does not match the spatial scale of the satellite remote sensing image pixels ([10, 20, 75, 90]. Consequently, there is urgency to find an efficient field survey method that is both available at a large scale and matches satellite remote sensing image pixels at the spatial scale [12, 56]. In recent years, the gradual maturity of unmanned aerial vehicle (UAV) technology has brought new opportunities. Due to the lower flying height of a UAV, it is not disturbed by atmospheric factors and can take ultra-high-resolution aerial images. In addition, a UAV is portable and inexpensive. It is suitable for FVC field survey under harsh ecological environments [52, 68, 16, 81]. It has been proven in a previous study that UAV technology can not only solve the problem of the mismatch between the FVC measured data and the pixel scale of satellite remote sensing images, but can also access massive high-resolution data with high efficiency [13].

The objective of this study was to find a high-precision and high-efficiency FVC survey and inversion method for analyzing alpine grassland FVC to use in future studies with a focus on: (1) calibrating and evaluating different FVC inversion methods (regression model methods, the pixel dichotomy model, and machine learning algorithms) based on mass FVC measurement data obtained by UAV; (2) constructing a multi-dimensional feature set including original spectral bands, various vegetation indices and topographic factors, and then analyzing the influence of different features on the FVC inversion accuracy through three different feature selection algorithms; (3) tuning parameters for the four machine learning algorithms based on the grid search method to construct an optimized regression model; and (4) quantitatively analyzing the inversion accuracy, computational efficiency, and sensitivity for the sample size of the four machine learning algorithms (MLR, BPNNs, RF, and SVR).

Study area and data source

Study area

The source area of the Yellow River Basin (SYRB) is located in the northeastern part of the QTP and is the birthplace of the Yellow River, China's most important freshwater resource. It spans six states and 18 counties in Qinghai, Sichuan, and Gansu provinces, and its total area is approximate 132,000 km² (Fig. 1). Since the average altitude is greater than 4000 m, this area has the environmental characteristics of a low annual average temperature, a large daily temperature difference, long sunshine time, strong solar radiation, and obvious seasonal precipitation. The SYRB is sensitive to climate change, and the ecological environment is fragile. The vegetation types in the SYRB are mainly alpine meadow and alpine steppe, the latter accounting for about 80% of the total land area, which is a microcosm of the QTP. Therefore, high-precision FVC inversion analysis of alpine grassland in the SYRB is vital for local ecological protection and benefits the entire QTP.

Data source and data preprocessing

Field data based on UAV imagery

In this study, 91 observation sites were set in the SYRB (Fig. 1), and field aerial surveys were carried out from July to August 2015. The 91 observation sites contained different grassland types as well as different underlying surfaces and environmental conditions, and thus they were representative. Our UAV aerial photography operation system, Fragmentation Monitoring and Analysis with aerial Photography (FragMAP) [81] was employed in each observation site to set the UAV flight route. Each observation site contains a route covering the entire monitoring plot and 16 aerial points (Fig. 2). According to the preset parameters to start autonomous flight and aerial photography at a height of 20 m according to the flight route. The spatial resolution of the aerial images was about 1 cm, and the coverage of each aerial image was approximately 30 m × 30 m, which matched the pixel coverage of the Landsat 8 satellite image. The ground truth data is the FVC obtained from each aerial images. The Phantom 3 Professional was used for aerial photography which is a vertical takeoff and landing drone manufactured by SZ DJI Technology Co., Ltd. (Shenzhen, China) that can accurately carry out flight and hovering functions. The GPS/GLONASS dual satellite positioning module was used. The horizontal and vertical accuracy are approximately 1.5 and 0.5 m respectively under hovering, and the gimbal control accuracy is 0.03°. The onboard camera of the UAV was used for photography, which has 12 million camera pixels that can generate a central projection containing three spectral bands of red, green, and blue (RGB); the images were then saved in joint photographic experts group (JPEG) format. During testing, the UAV was flown higher than 4000 m above sea level in the STRB, and the drone could hover for up to 20 min with its maximum flying height exceeding 300 m.

Previous studies have shown that the threshold segmentation method based on the Excess Green Index (EGI = 2G-R-B, where G, R, and B respectively represent the gray values of the green, red, and blue bands in the image) had good accuracy during the FVC extraction of aerial images [11, 12]. Therefore, the EGI threshold segmentation method was also used in this study to extract FVC information from aerial images. The extraction process of FVC from aerial images was as follows. First, the EGI of each pixel of aerial image was calculated, and an initial value (ranging from 40 to 160 based on our experience) of the EGI threshold was set. And the EGI threshold is determined based on the Java-based FVC Estimator software [12], it is not fixed. Second, if the EGI value of a pixel was greater than the threshold, it was classified as a vegetation pixel, otherwise it was classified as a non-vegetation pixel. Third, the result of the segmentation was superimposed with the original image and judged according to whether the segmentation result was accurate by visual interpretation or not. If the segmentation result was not accurate, the initial threshold value was adjusted until the segmentation result was accurate. Finally, the percentage of vegetation pixels out of the total number of pixels was calculated and determined as the FVC of the image [11, 12] (Fig. 3).

Remote sensing data

Landsat 8 Operational Land Imager (Landsat 8 OLI) images were downloaded from the United States Geological Survey (USGS) Earth Explorer website (https://earthexplorer.usgs.gov/). In order to ensure that the acquisition time of the images were consistent with the field investigation time, the images with the cloud cover less than 5% were selected from July 1 to August 31, 2015. A total of 20 Landsat 8 images were needed to cover the entire SYRB. Orthorectification of Landsat 8 images were conducted using the rational polynomial coefficient (RPC) Orthorectification Using Reference Image tool in ENVI 5.3 (Exelis Visual Information Solutions, Boulder, CO, USA) based on the 12.5 m Advanced Land Observing Satellite (ALOS) DEM with an error of less than 0.5 pixels. The Radiometric Correction tool in ENVI 5.3 (Exelis Visual Information Solutions, Boulder, CO, USA) was used for radiation calibration, and the original digital number (DN) value of the Landsat 8 images were converted into spectral reflectance values. Atmospheric correction was performed based on the Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) algorithm. Furthermore, the sensor reflectance value was converted into the surface reflectance value. The detailed information of the Landsat 8 OLI images used in this study is shown in Table 1. DEM data generated from the Shuttle Radar Topography Mission (SRTM) at 30 m spatial resolution were download from the USGS, and the slope and aspect were calculated from the DEM in ArcGIS 10.2 (Environmental Systems Research Institute, Redlands, CA, USA).

Table 1 Characteristics of Landsat 8 OLI image

Full size table

Method

Regression model method

The regression model method is also called the empirical model method, which is used to establish the relationship between the single band of remote sensing images or the vegetation index obtained by the band calculation and the measured data of the FVC, and then extend the relationship to the study area and finally obtain the FVC of the whole study area [37]. Previous studies have shown that the normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), soil-adjusted vegetation index (SAVI), and modified soil-adjusted vegetation index (MSAVI) have a high correlation with FVC and are often used as driving data in FVC inversion studies [12, 39, 41]. Therefore, we selected these four typical vegetation indices for linear fitting and polynomial fitting (Table 2). The fitting formulas are as follows:

$${\text{FVC = }}a \times {\text{VI + }}b$$

(1)

$${\text{FVC = }}c \times {\text{VI}}^{{2}} { + }d \times {\text{VI + }}e$$

(2)

Table 2 Multi-dimensional features used in this study

Full size table

where FVC is fractional vegetation cover; VI is vegetation index; a is the slope of linear fitting; b and e are the intercepts of linear fitting and polynomial fitting, respectively; and c and d are the parameter estimation values of polynomial fitting.

Pixel dichotomy model

The pixel dichotomy model is currently the most widely used method for estimating FVC. It assumes that the pixel information received by the satellite sensor is composed of vegetation and soil, and FVC is the percentage of a pixel occupied by vegetation. The NDVI is considered to be a good indicator for FVC, so the pixel dichotomy model with the NDVI as the input parameter was used in this study to estimate the FVC of the SYRB [22, 70]. The formula is as follows:

$${\text{FVC = }}\frac{{{\text{NDVI - NDVI}}_{{\text{S}}} }}{{{\text{NDVI}}_{{\text{V}}} {\text{ - NDVI}}_{{\text{S}}} }}$$

(3)

where NDVI_s and NDVI_v are NDVI values in the area that were completely covered by soil and vegetation, respectively.

In this study, a total of two sets of NDVI_s and NDVI_v values were used to estimate the FVC of the SYRB based on the pixel dichotomy model: (1) the values of pure vegetation pixels and pure soil pixels based on the statistical results of the ecological function area in the existing literature [40] (NDVI_v = 0.837, NDVI_s = 0.164) and (2) the values of pure vegetation pixels and pure soil pixels determined by 95% confidence intervals [9] (NDVI_v = 0.882, NDVI_s = 0.067).

Machine learning algorithms

For all machine learning algorithms, the input layers are the multi-dimensional features, and the output layer is the FVC results. Firstly, we normalized all the features in the multi-dimensional feature set (original spectral bands, multiple vegetation indices, and topographical factors) whose values were not in the range of 0–1 in R before implementing the machine learning algorithm. And we performed a random cut of the dataset, with 70% of the dataset used for training and 30% for validation. Then, the machine learning algorithm was trained through the training dataset to build the internal relationship between the multi-dimensional features and FVC measured data. Finally, the FVC inversion accuracy was evaluated, and the accuracy was verified through the test training set based on the trained machine learning algorithm.

Optimized MLR

MLR is based on two or more variables for regression analysis. It is considered to be an effective and more realistic statistical analysis method, which is widely used in the field of vegetation physiological structure parameter inversion [32]. The MLR model in this study was constructed and optimized based on the "stats" package of the R language platform and the multiple linear formula is as follows:

$${\text{FVC}} = a + b_{1} {\text{x}}_{1} + b_{2} {\text{x}}_{2} + \cdots + b_{n} {\text{x}}_{n}$$

where a, b₁, b₂, … b_n are parameters to be optimized, FVC is the result of fractional vegetation cover predicted by MLR, and x₁, x₂ … x_n are feature variables in the multi-dimensional feature set.

Optimized BPNNs

BPNNs are a concept proposed by Rumelhart et al. [65], which is a multi-layer feed forward neural network trained according to the backward propagation algorithm of error. BPNNs are one of the machine learning algorithms widely used in the inversion of physiological structure parameters of vegetation [47]. BPNNs in this study were based on the "neuralnet" package of the R language platform. The weight attenuation parameter and threshold value in the BPNN algorithm were set to 0.01. In addition, the grid search method was used to tune the number of hidden layers of the BPNN algorithm and the number of neurons in each hidden layer. The setting range of the number of hidden layers was 1–5, and the setting range of the number of neurons in each hidden layer was 1- 10. After cross-validating 10 times, the model training results showed that the optimal number of hidden layers was two, the optimal number of neurons in the first hidden layer was two, and the optimal number of neurons in the second hidden layer was four. The hidden layer activation function was set to tansig after the optimization of sigmoid, the output layer transfer function was set to purelin to make the constructed BPNNs suitable for the linear model, and trainlm was selected as the training function.

Optimized SVR

Support vector machines (SVMs) are new machine learning algorithms based on the statistical theory that one is able to achieve high accuracy when solving the classification and regression problems of high-dimensional features without needing to rely on all the data to make hyperplane decisions [18]. Support vector regression (SVR) is the performance of the SVM method for regression. [84]. SVR in this study is based on the LIBSVM interface in the "e1071" package of the R language platform, and the FVC for the source area in the Yellow River Basin was predicted via regression. The SVM type was set to e-SVR, the loss function P was 0.01, and the kernel function type was radial basis function (RBF). In order to achieve a better prediction result for SVR, the grid search method was used to optimize the RBF kernel parameter (gamma) and penalty coefficient (cost) in the SVR algorithm. The setting range of gamma was set to 0.5–4, the setting range of cost was set to 0.5–8, and the step length of gamma and cost was 0.5. After cross-validating 10 times, the model training results showed that the optimal gamma and cost values were 0.5 and four, respectively.

Optimized RF

The RF algorithm was proposed by Breiman in 2001. This algorithm is based on the bagging integrated learning method, which integrates multiple decision trees into a forest and combines them to predict the final result [7]. The RF algorithm has a good anti-noise ability. It is simple, fast, easy to achieve parallelization, and avoids overfitting to a certain extent [49]. In the RF regression algorithm, a decision tree represents a set of constraints. These conditions are organized hierarchically and applied from the root to the leaves in succession. Two parameters of the RF algorithm need to be defined: the number of decision trees (ntree) and the number of characteristic variables required to create branches (mtry). Based on the “randomForest” package of the R language platform, the grid search method was used to optimize the parameters of mtry and ntree in the RF regression algorithm. The setting range of mtry was set to 1–31 with a step size of 1, and the setting range of ntree was set to 100–2,000 with a step size of 100. After cross-validating 10 times, the model training results showed that the optimal mtry and ntree values were 13 and 1200, respectively.

Feature selection

Feature selection directly affects the training speed and prediction performance of machine learning algorithms, which enables us to have a better understanding of the true distribution behind the multi-dimensional feature set. It is an important means to eliminate redundant information. If a feature is considered by different variable selection algorithms to have an important effect on the accuracy of the inversion result, it is a feature worthy of attention. In this study, we used Boruta, Sequential Forward Selection (SFS), and Permutation Importance-Recursive Feature Elimination (PI-RFE), which are three different feature selection methods applied to multi-dimensional feature sets to determine the appropriate dimension, eliminate redundant features, and obtain satisfactory FVC inversion accuracy.

Boruta is a fully correlated feature selection algorithm, and its main objective is to select all feature sets related to the dependent variable [45]. The SFS algorithm is a kind of greedy search algorithm that is used to reduce the initial multi-dimensional feature set to a low-dimensional feature set [21]. The main idea of the SFS algorithm is to automatically select the subset of features most relevant to the dependent variable, and improve calculation efficiency and reduce generalization errors by removing irrelevant features. PI-RFE is an optimized RFE algorithm constructed in this research. RFE is a greedy algorithm that finds the best feature subset [27]. The main idea of the RFE algorithm is to repeatedly build the model to select the best feature, and then repeat this process in the remaining features until all the features are evaluted. PI sets a feature in the multi-dimensional dataset as unavailable, and characterizes the importance of the feature through the decrease in accuracy of the inversion model [2]. In this study, the built-in weight parameters of the RFE algorithm were replaced with the important variables determined by PI.

Accuracy assessment

In this study, the data set was randomly divided. Seventy percent was used as model training data while the remaining 30% was used as model test data. The correlation between the inversion results of the model test data and the measured results of FVC was analyzed. The determination coefficient (R²) and the root mean square error (RMSE) were considered to be reasonable evaluation indicators of accuracy. The performance of the above-mentioned inversion models of FVC was evaluated by the values of R² and RMSE. They were calculated by Eqs. (5) and (6) below:

$${\text{R}}^{{2}} = 1 - \frac{{\sum\limits_{i = 1}^{n} {(S_{i} - S_{i}^{^{\prime}} )^{2} } }}{{\sum\limits_{i = 1}^{n} {(S_{i} - \mathop {S_{i} }\limits^{\_} )^{2} } }}$$

(5)

$${\text{RMSE = }}\sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} (S_{i} - S_{i}^{^{\prime}} )^{2} }$$

(6)

where n represents the number of samples, S_i represents the measured values of sites, S_i’ represents the predicted values of the model, and $\overline{{\text{S} }_{\text{i}}}$ represents the mean of the predicted values of the model. Generally, the higher the value of R², the smaller the value of the RMSE, indicating that the model performance was better.

In order to evaluate the sensitivity to the training sample size of the four machine learning algorithms, the R² and RMSE between the training samples and the verification samples were obtained. The sample data set was randomly selected from the total training samples (270), and the training sample for the minimum data set was 30. This was sequentially incremented by 30 until the training sample was 270, with a total of nine data sets.

Results

Regression model method

Linear fitting showed that there was a good relationship between the four vegetation indices (NDVI, EVI, SAVI, and MSAVI) and the measured FVC (Table 3). The FVC obtained by linear fitting inversion showed that the NDVI fitting had the highest accuracy (R²: 0.717, RMSE: 10.8%), followed by SAVI (R²: 0.665, RMSE: 11.4%), MSAVI (R²: 0.642, RMSE: 11.7%), and EVI (R²: 0.635, RMSE: 12.1%), as shown in Table 4. The polynomial fitting relationship between vegetation indices and measured FVC was better than the linear fitting (Table 3). The FVC obtained by polynomial fitting inversion showed that the NDVI fitting had the highest accuracy (R²: 0.745, RMSE: 9.8%), followed by SAVI (R²: 0.725, RMSE: 10.3%), MSAVI (R²: 0.724, RMSE: 10.5%), and EVI (R²: 0.715, RMSE: 11.8%), as shown in Table 4.

Table 3 Linear and polynomial fitting relationships and inversion accuracy between the four vegetation indices (VIs) and FVC

Full size table

Table 4 Features selected by different feature selection algorithms

Full size table

Pixel dichotomy model

The FVC inversion results based on the pixel dichotomy model had good inversion accuracy. The R² based on the ecological function area and based on a 95% confidence interval were both 0.717, while the RMSE of latter was lower than that of the former (Fig. 4).

Machine learning algorithms

FVC evaluation using four typical vegetation indices

FVC inversion results of the four machine learning algorithms showed that when the driving data were the four commonly used vegetation indices, the FVC inversion accuracy was higher than in the regression model method and pixel dichotomy model. The RF regression algorithm (R²: 0.861, RMSE: 9.5%) and SVR (R²: 0.830, RMSE: 10.4%) showed the highest accuracy, followed by BPNNs (R²: 0.764, RMSE: 12.1%) and MLR (R²: 0.689, RMSE: 13.7%), as shown in Fig. 5.

FVC estimation using a multi-dimensional feature set

FVC inversion results showed that the accuracy of the four machine learning algorithms had been improved after adding original spectral bands, 18 vegetation indices and DEM, aspect, and slope (Fig. 6). The R² was greater than 0.81 and the RMSE was less than 11.9%. RF had the highest inversion accuracy among the four machine learning algorithms with an R²: 0.890 and RMSE: 9.0%, followed by SVR (R²: 0.849, RMSE: 10.6%) and BPNNs (R²: 0.820, RMSE: 11.6%). MLR had the lowest inversion accuracy with an R²: 0.812 and RMSE: 11.9%.

Optimal feature subset and feature importance

The results of feature selection for multi-dimensional feature sets based on three feature selection algorithms showed that 22 features in the Boruta model were retained: DEM, VARI, slope, ARVI, NDVI, SR, TDVI, IPVI, GARI, b7, MSR, OSAVI, b2, b4, NLI, MNLI, GNDVI, GRVI, RDVI, b6, aspect, and EVI. In the SFS algorithm, 15 features were retained: DEM, slope, VARI, b7, ARVI, b4, b2, OSAVI, GARI, aspect, b6, IPVI, SR, MSR, and NDVI. In the PI-RFE algorithm, 18 features were retained: DEM, VARI, slope, b7, aspect, b2, b4, ARVI, OSAVI, NDVI, SR, GARI, TDVI, NDVI, IPVI, b1, MSR, and b6. Across all feature selection algorithms, it was consistently revealed that the most important feature was DEM, followed by slope and VARI. A comprehensive comparison of the three algorithms found that among vegetation indices, eight vegetation indices (VARI, ARVI, SR, NDVI, IPVI, GARI, MSR, and OSAVI) were selected as important features. Among reflectance bands, four reflectance bands (b2, b4, b6, and b7) were selected as important features. Among topographical factors, DEM, slope and aspect were all retained as important features (Fig. 7).

Performance evaluation of feature selection algorithms

The variation trend of RMSE with the number of features in the three different feature selection algorithms is shown in Fig. 8. While the Boruta-based feature selection was being employed, the RMSE dropped sharply to 11.7% when the number of features was 1 to 11; the fluctuation then decreased, and when the number of features was 22, the RMSE achieved its lowest value of 11.3%. During SFS-based feature selection, the RMSE dropped sharply to 11.0% when the number of features was 1 to 5; the fluctuation then decreased, and when the number of features was 18, the RMSE achieved its lowest value of 10.6%. When performing PI-RFE-based feature selection, the RMSE dropped sharply to 10.1% when the number of features was 1 to 6 and then maintained a steady downward trend; when the number of features was 15, the RMSE achieved its lowest value of 9.8%.

FVC inversion based on optimized machine learning algorithms and feature subset

The FVC inversion results of the four machine learning algorithms after parameter tuning showed that when the driving data was the optimized feature subset from PI-RFE-based variable selection, the accuracy of FVC inversion was greatly improved compared with the use of multi-dimensional feature sets and original machine learning algorithms. (R² was greater than 0.833, RMSE was less than 11.8%), as shown in Fig. 9. RF had the highest inversion accuracy among the four machine learning algorithms with an R²: 0.917 and RMSE: 7.9%, followed by SVR (R²: 0.870, RMSE: 9.8%) and BPNNs (R²: 0.852, RMSE: 10.5%). MLR had the lowest inversion accuracy with an R²: 0.833 and RMSE: 11.8%.

Computational efficiency

There were obvious differences between the training and estimation times of the four machine learning algorithms (Table 5). SVR had the longest training and estimation time (168.84 and 182.82 s), followed by MLR and BPNNs. Both the training and estimation times of the RF regression algorithm were the shortest, at 87.51 and 99.35 s, respectively.

Table 5 Training/estimation time required for one iteration of the machine learning algorithms

Full size table

Sensitivity to training sample size

The sensitivity test of the training sample size of the four machine learning algorithms showed that as the training sample size increased, the sensitivity difference of the algorithms is obvious (Fig. 10). When the training sample size was small, the four machine learning algorithms were more sensitive to changes in training sample size. The RF and SVR regression algorithms were more sensitive to the training sample size than MLR and BPNNs. With an increase in the training sample size, however, their sensitivity gradually decreased. When the training sample was greater than 120, the sensitivity of the four machine learning algorithms tended to stabilize, where R² and the RMSE did not noticeably change with an increase in training sample size.

In addition, when the training sample size was small and fixed, the sensitivities of the four machine learning algorithms to the training sample size were different (Fig. 10). Among them, the RF regression algorithm was the most robust and MLR was the worst. However, when the training sample size was large, the sensitivity difference of the four machine learning algorithms to the training sample size was not obvious.

Discussion

Accuracy evaluation of different FVC inversion methods

Previous studies had shown that to some extent, the EVI, SAVI, and MSAVI could explain changes in the optical characteristics of the background and correct the effects of atmospheric and soil backgrounds, which was not found with the NDVI [1]. However, we found that the NDVI achieved the highest FVC inversion accuracy (R²: 0.717, RMSE: 11.7%) among the four vegetation indices (NDVI, EVI, SAVI, and MSAVI), which indicated that the NDVI was more suitable for FVC inversion in alpine grassland than the EVI, SAVI, and MSAVI. The specific reason for this may be due to the limited biomass per unit area in the alpine grassland and had no obvious influence on the NDVI saturation phenomenon [14].

The pixel dichotomy model is another commonly used method for FVC remote sensing inversion. The key to the construction of the pixel dichotomy model is the determination of the end-members. Generally, NDVI_s and NDVI_v determined by measured spectral data, would obtain a higher FVC inversion accuracy. However, the special climate and topography of the QTP led to some deviations in data collection which affected the final inversion results [46]. In addition, the final member determination was easily influenced by factors such as soil type, vegetation type, chlorophyll content, etc. The NDVI_s and NDVI_v determined by the statistical results of ecological function areas and the 95% confidence interval proved to have high FVC inversion accuracy [4, 40]. This study proved that the pixel dichotomy model based on 95% confidence intervals for NDVI_s and NDVI_v was more suitable for FVC inversion in alpine grassland than that based on the statistical results of ecological function areas. The reason for this may be that the method of establishing the NDVI_s and NDVI_v based on the statistical results of the ecological function area was a universal method, and the method of establishing the NDVI_s and NDVI_v based on the 95% confidence interval was derived from the statistics of the NDVI value in the study area. Therefore, when applied to a certain vegetation type or a certain area, the former FVC inversion accuracy would be lower than the latter.

In recent years, machine learning algorithms have been widely used in the field of vegetation physiological parameter inversion based on satellite remote sensing images [55, 39]. In this study, four commonly used vegetation indices (NDVI, EVI, SAVI, and MSAVI) were used as driving data for FVC inversion, and the results showed that the RF algorithm obtained the highest inversion accuracy (R²: 0.861, RMSE: 9.5%) while MLR had the lowest inversion accuracy (R²: 0.689, RMSE: 13.7%) during FVC inversion of the four machine learning algorithms. Nonetheless, SVR and BPNNs also had good inversion accuracy. Our findings suggested that the performance of machine learning algorithms in the FVC inversion of alpine grassland was better than the regression model method and the pixel dichotomy model, and these could be used for high-precision FVC inversion of alpine grassland [85].

Factors that influence FVC inversion in machine learning algorithms

Driving data directly affected the accuracy of the prediction results for machine learning algorithms [69]. This study comprehensively selected original spectral reflectance bands, topographic factors, and multiple vegetation indices as the driving data for FVC inversion. Compared with the four commonly used vegetation indices, the alpine grassland FVC inversion accuracy based on the four machine learning algorithms had been obviously improved. Among them, the FVC inversion accuracy of the MLR algorithm was improved the most, which indicated that topographic factors and various vegetation indices also have a high correlation with the FVC of alpine grassland, while the internal factor of how different features in the driving data affect the FVC inversion results of alpine grassland cannot be explained by the “black box model” of machine learning algorithms [86].

Therefore, we further quantitatively evaluated the importance of features in the driving data based on the index of the influence of feature variables on the accuracy of FVC inversion. The results showed that the four commonly used vegetation indices and original spectral reflectance bands are not the most ideal driving data. In fact, three topographic factors play a more important role in the accuracy of the inversion results than other features. The importance of DEM ranked first among all features, because changes in altitude directly affected temperature, precipitation, solar radiation and other factors closely related to the vegetation growth status. In addition, vegetation indices such as VARI, ARVI, and SR were also more important than the four commonly used vegetation indices. This showed that the choice of driving data should not be underestimated. Therefore, the driving data for FVC inversion research should be selected with flexibility under different conditions of regions, vegetation type, and seasons [44].

The introduction of multi-dimensional feature set would inevitably be accompanied by the existence of redundant features. In order to avoid redundant features which reduces the computational efficiency and inversion accuracy of the algorithm in the FVC inversion process while also to avoid the limitations of the single feature selection method as well. Among the three feature selection algorithms, the PI-RFE feature selection algorithm constructed in this study had the best dimensionality reduction performance, which retained 15 features as the input data of the machine learning algorithm. Furthermore, the built-in parameters of different machine learning algorithms have a great impact on algorithm performance [13]. The grid search method was used in this study to tune the built-in parameters of the four machine learning algorithms to avoid the uncertainty of artificial selection of reasonable parameter values and to achieve better model accuracy. It is worth mentioning that the regression line seemed to drift farther away from the 1:1 line when the RF algorithm was used for FVC inversion based on the multi-dimensional feature set, which may be due to the over-fitting phenomenon caused by the addition of more functions; however, after the feature selection, there was no such situation. This proved that feature selection and parameter tuning improved the computational efficiency of machine learning algorithms while further improving the FVC inversion accuracy (the R² of RF was higher than 0.90), and also provided better model parameter selection of machine learning algorithms for alpine grassland FVC inversion.

Evaluation of efficiency and sensitivity for machine learning algorithms

The computational efficiency of machine learning algorithms is considered to be an important evaluation criterion for remote sensing inversion of FVC at high spatial and temporal dimensions [34]. In this study, the training time and prediction time of the four machine learning algorithms were greatly different. For instance, the training and prediction time of the RF algorithm was the shortest, while the SVR algorithm was the longest. These findings indicated that SVR was not suitable for generating long-term serial products [78]. In addition, the sensitivities of different machine learning algorithms to training samples were also different. We found that with the gradual increase in number of training samples, the R² and RMSE of the four algorithms showed a trend of increasing (or decreasing) firstly and then leveling off. However, the inversion accuracy of the RF algorithm and the SVR algorithm exhibited very obvious changes when the sample size was increased from 30 to 120, while the change in inversion accuracy of the BPNN algorithm was relatively stable. Our results suggested that RF and SVR are more sensitive to the sample size than other machine learning algorithms. However, BPNNs were less sensitive to sample size and are an ideal algorithm for FVC inversion with a small sample size. The robustness of the machine learning algorithms was evaluated by analyzing the selection of different training sample sizes. We found that the robustness of the four machine learning algorithms was obviously different when the sample size was small. However, increasing the training sample size obviously improved the stability and differences of this robustness.

Conclusion

In this study, machine learning algorithms with the best performance among the three commonly used FVC inversion methods were optimized. In addition, a multi-dimensional feature set was constructed, and the dimensionality of the feature set was reduced while quantitatively evaluating the importance of different features in the analysis of FVC of alpine grassland using feature selection algorithms. Finally, optimization algorithms and multi-dimensional features were used to improve the estimation of alpine grassland FVC, and the accuracy was verified by a large amount of measured data. The main conclusions are presented as follows:

(1)
Using four typical vegetation indices as driving data, it was observed that the machine learning algorithms perform best among the three FVC inversion algorithms. Compared with four typical vegetation indices, the FVC inversion accuracy of the four machine learning algorithms had been improved using the driving data of the multi-dimensional feature set constructed in this study.
(2)
Topographic factors (DEM, slope, and aspect) and several vegetation indices (VARI, ARVI, SR, and NDVI) played important roles in FVC inversion. The constructed PI-RFE feature selection algorithm had both the best dimensionality reduction effect and the highest accuracy.
(3)
The combination of feature selection and parameter tuning effectively improved the FVC inversion accuracy of the four machine learning algorithms. The optimized RF algorithm had the highest inversion accuracy and computational efficiency, while the BPNN algorithm was more stable.

In conclusion, the proposed FVC inversion method of alpine grassland is reliable and suitable for operationally producing FVC data. At the same time, it is crucial for the quantitative monitoring of the ecological environment.

Availability of data and materials

The remotely sensed data and field measured data used in this study is available upon the approval of Dr. Jianjun Chen from College of Geomatics and Geoinformation, Guilin University of Technology, China.

References

Ahmad F. A review of remote sensing data change detection: comparison of Faisalabad and Multan Districts, Punjab Province, Pakistan. J Geogr Reg Plann. 2012;5(9):236–51. https://doi.org/10.5897/JGRP11.121.
Article Google Scholar
Altmann A, Tolosi L, Sander O, et al. Permutation importance: a corrected feature importance measure. Bioinformatics. 2010;26(10):1340–7. https://doi.org/10.1093/bioinformatics/btq134.
Article CAS PubMed Google Scholar
Bannari A, Asalhi H, Teillet PM. Transformed difference vegetation index (TDVI) for vegetation cover mapping. In: IEEE international geoscience and remote sensing symposium. IEEE; 2002. p. 5. https://doi.org/10.1109/IGARSS.2002.1026867.
Bauer T, Strauss P. A rule-based image analysis approach for calculating residues and vegetation cover under field conditions. CATENA. 2014;113:363–9. https://doi.org/10.1016/j.catena.2013.08.022.
Article Google Scholar
Birth GS, McVey GR. Measuring the color of growing turf with a reflectance spectrophotometer. Agron J. 1968;60:640–3.
Article Google Scholar
Boegh E, Soegaard H, Broge N, et al. Airborne multispectral data for quantifying leaf area index, nitrogen concentration, and photosynthetic efficiency in agriculture. Remote Sens Environ. 2002;81(2–3):179–93. https://doi.org/10.1016/S0034-4257(01)00342-X.
Article Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Article Google Scholar
Bunting EL, Munson SM, Bradford JB. Assessing plant production responses to climate across water-limited regions using Google Earth Engine. Remote Sens Environ. 2019;233: 111379. https://doi.org/10.1016/j.rse.2019.111379.
Article Google Scholar
Castaldi F, Casa R, Pelosi F, et al. Influence of acquisition time and resolution on wheat yield estimation at the field scale from canopy biophysical variables retrieved from SPOT satellite data. Int J Remote Sens. 2015;36(9):2438–59. https://doi.org/10.1080/01431161.2015.1041174.
Article Google Scholar
Chen J, Zhao X, Zhang H, et al. Evaluation of the accuracy of the field quadrat survey of alpine grassland fractional vegetation cover based on the satellite remote sensing pixel scale. ISPRS Int J Geo-Inf. 2019;8(11):497. https://doi.org/10.3390/ijgi8110497.
Article CAS Google Scholar
Chen J, Yi S, Qin Y. The contribution of plateau pika disturbance and erosion on patchy alpine grassland soil on the Qinghai-Tibetan Plateau: implications for grassland restoration. Geoderma. 2017;297:1–9. https://doi.org/10.1016/j.geoderma.2017.03.001.
Article CAS Google Scholar
Chen J, Yi S, Qin Y, et al. Improving estimates of fractional vegetation cover based on UAV in alpine grassland on the Qinghai-Tibetan Plateau. Int J Remote Sens. 2016;37(8):1922–36. https://doi.org/10.1080/01431161.2016.1165884.
Article Google Scholar
Chen J, Sun G, Xing M, et al. A parameter optimization model for geosynchronous SAR sensor in aspects of signal bandwidth and integration time. IEEE Geosci Remote S. 2016;13(9):1374–8. https://doi.org/10.1109/lgrs.2016.2587318.
Article Google Scholar
Chen W, Sakai T, Moriya K, et al. Estimation of vegetation coverage in semi-arid sandy land based on multivariate statistical modeling using remote sensing data. Environ Model Assess. 2013;18(5):547–58. https://doi.org/10.1007/s10666-013-9359-1.
Article Google Scholar
Chen W, Li X, Wang Y, et al. Forested landslide detection using LiDAR data and the random forest algorithm: a case study of the Three Gorges, China. Remote Sens Environ. 2014;152:291–301. https://doi.org/10.1016/j.rse.2014.07.004.
Article Google Scholar
Chen Y, Shi P, Li X, et al. A combined approach for estimating vegetation cover in urban/suburban environments from remotely sensed data. Comput Geosc. 2006;32(9):1299–309. https://doi.org/10.1016/j.cageo.2005.11.011.
Article Google Scholar
Cheng G, Wu T. Responses of permafrost to climate change and their environmental significance, Qinghai-Tibet Plateau. J Geophys Res-Earth. 2007. https://doi.org/10.1029/2006JF000631.
Article Google Scholar
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97. https://doi.org/10.1007/BF00994018.
Article Google Scholar
Crippen RE. Calculating the vegetation index faster. Remote Sens Environ. 1990;34(1):71–3. https://doi.org/10.1016/0034-4257(90)90085-Z.
Article Google Scholar
Deines JM, Kendall AD, Crowley MA, et al. Mapping three decades of annual irrigation across the US High Plains Aquifer using Landsat and Google Earth Engine. Remote Sens Environ. 2019. https://doi.org/10.1016/j.rse.2019.111400.
Article Google Scholar
Demir B, Minello L, Bruzzone L. Definition of effective training sets for supervised classification of remote sensing images by a novel cost-sensitive active learning method. IEEE T Geosci Remote. 2014;52(2):1272–84. https://doi.org/10.1109/tgrs.2013.2249522.
Article Google Scholar
Ding Y, Zheng X, Zhao K, et al. Quantifying the impact of NDVIsoil determination methods and NDVIsoil variability on the estimation of fractional vegetation cover in Northeast China. Remote Sens. 2016;8(1):29. https://doi.org/10.3390/rs8010029.
Article Google Scholar
Gao L, Wang X, Johnson BA, et al. Remote sensing algorithms for estimation of fractional vegetation cover using pure vegetation index values: a review. ISPRS J Photogramm. 2020;159:364–77.
Article Google Scholar
GarcÍA-Haro FJ, Gilabert MA, MeliÁ J. Linear spectral mixture modelling to estimate vegetation amount from optical spectral data. Int J Remote Sens. 2007;17(17):3373–400. https://doi.org/10.1080/01431169608949157.
Article Google Scholar
García-Haro FJ, Campos-Taberner M, Muñoz-Marí J, et al. Derivation of global vegetation biophysical parameters from EUMETSAT Polar System. ISPRS J Photogramm. 2018;139:57–74. https://doi.org/10.1016/j.isprsjprs.2018.03.005.
Article Google Scholar
Ge J, Meng B, Liang T, et al. Modeling alpine grassland cover based on MODIS data and support vector machine regression in the headwater region of the Huanghe River, China. Remote Sens Environ. 2018;218:162–73. https://doi.org/10.1016/j.rse.2018.09.019.
Article Google Scholar
Georganos S, Grippa T, Vanhuysse S, et al. Less is more: optimizing classification performance through feature selection in a very-high-resolution remote sensing object-based urban application. GISci Remote Sens. 2017;55(2):221–42. https://doi.org/10.1080/15481603.2017.1408892.
Article Google Scholar
Gitelson AA, Stark R, Grits U, et al. Vegetation and soil lines in visible spectral space: a concept and technique for remote estimation of vegetation fraction. Int J Remote Sens. 2010;23(13):2537–62. https://doi.org/10.1080/01431160110107806.
Article Google Scholar
Gitelson AA, Kaufman YJ, Merzlyak MN. Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens Environ. 1996;58(3):289–98. https://doi.org/10.1016/S0034-4257(96)00072-7.
Article Google Scholar
Gitelson AA, Merzlyak MN. Remote sensing of chlorophyll concentration in higher plant leaves. Adv Space Res. 1998;22(5):689–92. https://doi.org/10.1016/S0273-1177(97)01133-2.
Article CAS Google Scholar
Goel NS, Qin W. Influences of canopy architecture on relationships between various vegetation indices and LAI and Fpar: a computer simulation. Int J Remote Sens. 1994;10(4):309–47. https://doi.org/10.1080/02757259409532252.
Article Google Scholar
Guerschman JP, Michael JH, Luigi JR, et al. Estimating fractional cover of photosynthetic vegetation, non-photosynthetic vegetation and bare soil in the Australian tropical savanna region upscaling the EO-1 Hyperion and MODIS sensors. Remote Sens Environ. 2009;113(5):928–45. https://doi.org/10.1016/j.rse.2009.01.006.
Article Google Scholar
Guo X, Shao Q, Li Y, et al. Application of UAV remote sensing for a population census of large wild herbivores—taking the headwater region of the yellow river as an example. Remote Sens. 2018. https://doi.org/10.3390/rs10071041.
Article Google Scholar
Han M, Liu B. Ensemble of extreme learning machine for remote sensing image classification. Neurocomputing. 2015;149:65–70. https://doi.org/10.1016/j.neucom.2013.09.070.
Article Google Scholar
Huete A, Didan K, Miura T, et al. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens Environ. 2002;83(1–2):195–213. https://doi.org/10.1016/S0034-4257(02)00096-2.
Article Google Scholar
Huete AR. A soil-adjusted vegetation index (SAVI). Remote Sens Environ. 1998;25(3):295–309. https://doi.org/10.1016/0034-4257(88)90106-X.
Article Google Scholar
Iizuka K, Kato T, Silsigia S, et al. Estimating and examining the sensitivity of different vegetation indices to fractions of vegetation cover at different scaling grids for early stage acacia plantation forests using a fixed-wing UAS. Remote Sens. 2019. https://doi.org/10.3390/rs11151816.
Article Google Scholar
Jia K, Li Y, Liang S, et al. Combining estimation of green vegetation fraction in an arid region from Landsat 7 ETM+ data. Remote Sens. 2017. https://doi.org/10.3390/rs9111121.
Article Google Scholar
Jia K, Liang S, Gu X, et al. Fractional vegetation cover estimation algorithm for Chinese GF-1 wide field view data. Remote Sens Environ. 2016;177:184–91. https://doi.org/10.1016/j.rse.2016.02.019.
Article Google Scholar
Jia K, Liang S, Liu S, et al. Global land surface fractional vegetation cover estimation using general regression neural networks from MODIS surface reflectance. IEEE T Geosci Remote. 2015;53(9):4787–96. https://doi.org/10.1109/tgrs.2015.2409563.
Article Google Scholar
Jiang Z, Huete AR, Didan K, et al. Development of a two-band enhanced vegetation index without a blue band. Remote Sens Environ. 2008;112(10):3833–45. https://doi.org/10.1016/j.rse.2008.06.006.
Article Google Scholar
Kaufman YJ, Tanre D. Atmospherically resistant vegetation index (ARVI) for EOS-MODIS. IEEE T Geosci Remote. 1992;30(2):261–70. https://doi.org/10.1109/36.134076.
Article Google Scholar
Kauth RJ, Thomas GS. The tasselled cap—a graphic description of the spectral temporal development of agricultural crops as seen by LANDSAT. In: Proceedings of the LARS 1976 Symposium of machine processing of remotely-sensed data, West Lafayette. IN: Purdue University. p 4B41–4B51.
Korhonen LH, Packalen P, Rautiainen M. Comparison of Sentinel-2 and Landsat 8 in the estimation of boreal forest canopy cover and leaf area index. Remote Sens Environ. 2017;195:259–74. https://doi.org/10.1016/j.rse.2017.03.021.
Article Google Scholar
Kursa MB, Rudnicki WR. Feature Selection with theBorutaPackage. J Stat Softw. 2010. https://doi.org/10.18637/jss.v036.i11.
Article Google Scholar
Lehnert LW, Meyer H, Wang Y, et al. Retrieval of grassland plant coverage on the Tibetan Plateau based on a multi-scale, multi-sensor and multi-method approach. Remote Sens Environ. 2015;164:197–207. https://doi.org/10.1016/j.rse.2015.04.020.
Article Google Scholar
Li C, Zhu X, Wei Y, et al. Estimating apple tree canopy chlorophyll content based on Sentinel-2A remote sensing imaging. Sci Rep. 2018;8(1):3756. https://doi.org/10.1038/s41598-018-21963-0.
Article CAS PubMed PubMed Central Google Scholar
Liang S, Ge S, Wan L, et al. Characteristics and causes of vegetation variation in the source regions of the Yellow River, China. Int J Remote Sens. 2011;33(5):1529–42. https://doi.org/10.1080/01431161.2011.582187.
Article Google Scholar
Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002.
Liu J, Chen J, Qin Q, et al. Patch pattern and ecological risk assessment of alpine grassland in the source region of the Yellow River. Remote Sens. 2020;12:3460. https://doi.org/10.3390/rs12203460.
Article Google Scholar
Ma Y, Wu H, Wang L, et al. Remote sensing big data computing: Challenges and opportunities. Future Gener Comp Sy. 2015;51:47–60. https://doi.org/10.1016/j.future.2014.10.029.
Article Google Scholar
Maimaitijiang M, Ghulam A, Sidike P, et al. Unmanned Aerial System (UAS)-based phenotyping of soybean using multi-sensor data fusion and extreme learning machine. ISPRS J Photogramm. 2017;134:43–58. https://doi.org/10.1016/j.isprsjprs.2017.10.011.
Article Google Scholar
Melville B, Fisher A, Lucieer A. Ultra-high spatial resolution fractional vegetation cover from unmanned aerial multispectral imagery. Int J Appl Earth Obs. 2019;78:14–24. https://doi.org/10.1016/j.jag.2019.01.013.
Article Google Scholar
Meusburger K, Konz N, Schaub M, et al. Soil erosion modelled with USLE and PESERA using QuickBird derived vegetation parameters in an alpine catchment. Int J Appl Earth Obs. 2010;12(3):208–15. https://doi.org/10.1016/j.jag.2010.02.004.
Article Google Scholar
Omer G, Mutanga O, Abdel-Rahman E, et al. Empirical Prediction of Leaf Area Index (LAI) of endangered tree species in intact and fragmented indigenous forests ecosystems using worldview-2 data and two robust machine learning algorithms. 2016. Remote Sens. https://doi.org/10.3390/rs8040324.
Otero V, Kerchove RVD, Satyanarayana B, et al. Managing mangrove forests from the sky: forest inventory using field data and Unmanned Aerial Vehicle (UAV) imagery in the Matang Mangrove Forest Reserve, peninsular Malaysia. Forest Ecol Manag. 2018;411:35–45. https://doi.org/10.1016/j.foreco.2017.12.049.
Article Google Scholar
Patel NN, Angiuli E, Gamba P, et al. Multitemporal settlement and population mapping from Landsat using Google Earth Engine. Int J Appl Earth Obs. 2015;35:199–208. https://doi.org/10.1016/j.jag.2014.09.005.
Article Google Scholar
Pinty B, Verstraete MM. GEMI: a non-linear index to monitor global vegetation from satellites. Vegetatio. 1992;101(1):15–20. https://doi.org/10.1007/BF00031911.
Article Google Scholar
Qi J, Chehbouni A, Huete AR, et al. A modified soil adjusted vegetation index. Remote Sens Environ. 1994;48(2):119–26. https://doi.org/10.1016/0034-4257(94)90134-1.
Article Google Scholar
Qin Y, Yang D, Gao B, et al. Impacts of climate warming on the frozen ground and eco-hydrology in the Yellow River source region, China. Sci Total Environ. 2017;605–606:830–41. https://doi.org/10.1016/j.scitotenv.2017.06.188.
Article CAS PubMed Google Scholar
Ren X, Dong Z, Hu G, et al. A GIS-based assessment of vulnerability to aeolian desertification in the source areas of the Yangtze and Yellow Rivers. Remote Sens. 2016. https://doi.org/10.3390/rs8080626.
Article Google Scholar
Rondeaux G, Steven M, Baret F. Optimization of soil-adjusted vegetation indices. Remote Sens Environ. 1996;55(2):95–107. https://doi.org/10.1016/0034-4257(95)00186-7.
Article Google Scholar
Roujean J, Breon F. Estimating PAR absorbed by vegetation from bidirectional reflectance measurements. Remote Sens Environ. 1995;51(3):375–84. https://doi.org/10.1016/0034-4257(94)00114-3.
Article Google Scholar
Rouse JWJ, Haas RH, Schell JA, et al. Monitoring vegetation systems in the great plains with ERTS. In: third earth resources technology satellite-1 symposium, NASA, WA; 1973; p 309–17.
Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986;323(6088):533–6. https://doi.org/10.1038/323533a0.
Article Google Scholar
Song W, Mu X, Ruan G, et al. Estimating fractional vegetation cover and the vegetation index of bare soil and highly dense vegetation with a physically based method. Int J Appl Earth Obs. 2017;58:168–76. https://doi.org/10.1016/j.jag.2017.01.015.
Article Google Scholar
Sripada RP, Heiniger RW, White JG, et al. Aerial color infrared photography for determining early in-season nitrogen requirements in corn. Agron J. 2006;98(4):968–77. https://doi.org/10.2134/agronj2005.0200.
Article Google Scholar
Tang L, He M, Li X. Verification of fractional vegetation coverage and NDVI of desert vegetation via UAVRS technology. Remote Sens. 2020. https://doi.org/10.3390/rs12111742.
Article Google Scholar
Tao G, Jia K, Zhao X, et al. Generating high spatio-temporal resolution fractional vegetation cover by fusing GF-1 WFV and MODIS data. Remote Sens. 2019. https://doi.org/10.3390/rs11192324.
Article Google Scholar
Tu Y, Jia K, Liang S, et al. Fractional vegetation cover estimation in heterogeneous areas by combining a radiative transfer model and a dynamic vegetation model. Int J Digit Earth. 2018;13(4):487–503. https://doi.org/10.1080/17538947.2018.1531438.
Article Google Scholar
Tucker CJ. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens Environ. 1979;8(2):127–50. https://doi.org/10.1016/0034-4257(79)90013-0.
Article Google Scholar
Verrelst J, Muñoz J, Alonso L, et al. Machine learning regression algorithms for biophysical parameter retrieval: opportunities for Sentinel-2 and -3. Remote Sens Environ. 2012;118:127–39. https://doi.org/10.1016/j.rse.2011.11.002.
Article Google Scholar
Wang G, Wang Y, Li Y, et al. Influences of alpine ecosystem responses to climatic change on soil properties on the Qinghai-Tibet Plateau, China. CATENA. 2007;70(3):506–14. https://doi.org/10.1016/j.catena.2007.01.001.
Article Google Scholar
Wang W, Ma X, Nizami SM, et al. Anthropogenic and biophysical factors associated with vegetation restoration in Changting, China. Forests. 2018. https://doi.org/10.3390/f9060306.
Article Google Scholar
Williams M, Bell R, Spadavecchia L, et al. Upscaling leaf area index in an Arctic landscape through multiscale observations. Global Change Biol. 2008;14(7):1517–30. https://doi.org/10.1111/j.1365-2486.2008.01590.x.
Article Google Scholar
Yao T, Wu F, Ding L, et al. Multispherical interactions and their effects on the Tibetan Plateau’s Earth system: a review of the recent researches. Natl Sci Rev. 2015;2(4):468–88. https://doi.org/10.1093/nsr/nwv070.
Article Google Scholar
Yang K, Ye B, Zhou D, et al. Response of hydrological cycle to recent climate changes in the Tibetan Plateau. Clim Change. 2011;109(4):517–34. https://doi.org/10.1007/s10584-011-0099-4.
Article Google Scholar
Yang L, Jia K, Liang S, et al. Comparison of four machine learning methods for generating the GLASS fractional vegetation cover product from MODIS data. Remote Sens. 2016. https://doi.org/10.3390/rs8080682.
Article Google Scholar
Yang L, Jia K, Liang S, et al. A robust algorithm for estimating surface fractional vegetation cover from Landsat data. Remote Sens. 2017. https://doi.org/10.3390/rs9080857.
Article Google Scholar
Yang Z, Willis P, Mueller R. Impact of band-ratio enhanced awifs image on crop classification accuracy. In: Proceedings of the pecora 17 remote sensing symposium. 2008.
Yi S. FragMAP: a tool for long-term and cooperative monitoring and analysis of small-scale habitat fragmentation using an unmanned aerial vehicle. Int J Remote Sens. 2016;38(8–10):2686–97. https://doi.org/10.1080/01431161.2016.1253898.
Article Google Scholar
Yi S, Zhou Z, Ren S, et al. Effects of permafrost degradation on alpine grassland in a semi-arid basin on the Qinghai-Tibetan Plateau. Environ Res Lett. 2011. https://doi.org/10.1088/1748-9326/6/4/045403.
Article Google Scholar
Younes N, Joyce KE, Northfield TD, et al. The effects of water depth on estimating Fractional Vegetation Cover in mangrove forests. Int J Appl Earth Obs. 2019. https://doi.org/10.1016/j.jag.2019.101924.
Article Google Scholar
Yu K, Lenz-Wiedemann V, Chen X, et al. Estimating leaf chlorophyll of barley at different growth stages using spectral indices to reduce soil background and canopy structure effects. ISPRS J Photogramm. 2014;97:58–77. https://doi.org/10.1016/j.isprsjprs.2014.08.005.
Article Google Scholar
Yuan H, Yang G, Li C, et al. Retrieving soybean leaf area index from unmanned aerial vehicle hyperspectral remote sensing: analysis of RF, ANN, and SVM regression models. Remote Sens. 2017. https://doi.org/10.3390/rs9040309.
Article Google Scholar
Zabalza J, Ren J, Yang M, et al. Novel Folded-PCA for improved feature extraction and data reduction with hyperspectral imaging and SAR in remote sensing. ISPRS J Photogramm. 2014;93:112–22. https://doi.org/10.1016/j.isprsjprs.2014.04.006.
Article Google Scholar
Zhang X, Liao C, Li J, et al. Fractional vegetation cover estimation in arid and semi-arid environments using HJ-1 satellite hyperspectral data. Int J Appl Earth Obs. 2013;21:506–12. https://doi.org/10.1016/j.jag.2012.07.003.
Article Google Scholar
Zhang Y, Chen L, Wang Y, et al. Research on the contribution of urban land surface moisture to the alleviation effect of urban land surface heat based on Landsat 8 data. Remote Sens. 2015;7(8):10737–62. https://doi.org/10.3390/rs70810737.
Article Google Scholar
Zhao W, Li A, Huang Q, et al. An improved method for assessing vegetation cooling service in regulating thermal environment: a case study in Xiamen, China. Ecol Indic. 2019;98:531–42. https://doi.org/10.1016/j.ecolind.2018.11.033.
Article Google Scholar
Zhou Y, Dong J, Xiao X, et al. Continuous monitoring of lake dynamics on the Mongolian Plateau using all available Landsat imagery and Google Earth Engine. Sci Total Environ. 2019;689:366–80. https://doi.org/10.1016/j.scitotenv.2019.06.341.
Article CAS PubMed Google Scholar
Zhou Y, Li Z, Li J, et al. Glacier mass balance in the Qinghai-Tibet Plateau and its surroundings from the mid-1970s to 2000 based on Hexagon KH-9 and SRTM DEMs. Remote Sens Environ. 2018;210:96–112. https://doi.org/10.1016/j.rse.2018.03.020.
Article Google Scholar
Zhou Z, Yi S, Chen J, et al. Responses of alpine grassland to climate warming and permafrost thawing in two basins with different precipitation regimes on the Qinghai-Tibetan Plateaus. Arct Antarct Alp Res. 2018;47(1):125–31. https://doi.org/10.1657/aaar0013-098.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This study was supported by grants from the National Natural Science Foundation of China (41801030, 41901370, 41961065), Natural Science Foundation of Guangxi Province (2018GXNSFBA281054, 2018GXNSFBA281075, GuikeAD19245032), the BaGuiScholars program of the provincial government of Guangxi (Guoqing Zhou).

Author information

Authors and Affiliations

College of Geomatics and Geoinformation, Guilin University of Technology, No.12 Jiangan Street, Guilin, 541006, China
Xingchen Lin, Jianjun Chen, Haotian You & Xiaowen Han
Guangxi Key Laboratory of Spatial Information and Geomatics, Guilin University of Technology, 12 Jiangan Road, Guilin, 541004, China
Jianjun Chen, Haotian You & Xiaowen Han
Institute of Fragile Ecosystem and Environment, Nantong University, 999 Tongjing Road, Nantong, 226007, China
Shuhua Yi
State Key Laboratory of Cryospheric Science, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, 320 Donggang West Road, Lanzhou, 730000, China
Peiqing Lou & Yu Qin

Authors

Xingchen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Peiqing Lou
View author publications
You can also search for this author in PubMed Google Scholar
Shuhua Yi
View author publications
You can also search for this author in PubMed Google Scholar
Yu Qin
View author publications
You can also search for this author in PubMed Google Scholar
Haotian You
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowen Han
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XL and JC: conceptualization, data processing, analysis, writing original draft; PL: data analysis, reviewing and editing; SY, YQ, HY, and XH: reviewing and editing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jianjun Chen.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Lin, X., Chen, J., Lou, P. et al. Improving the estimation of alpine grassland fractional vegetation cover using optimized algorithms and multi-dimensional features. Plant Methods 17, 96 (2021). https://doi.org/10.1186/s13007-021-00796-5

Download citation

Received: 10 November 2020
Accepted: 07 September 2021
Published: 17 September 2021
DOI: https://doi.org/10.1186/s13007-021-00796-5

Improving the estimation of alpine grassland fractional vegetation cover using optimized algorithms and multi-dimensional features

Abstract

Background

Methods

Results

Conclusion

Introduction

Study area and data source

Study area

Data source and data preprocessing

Field data based on UAV imagery

Remote sensing data

Method

Regression model method

Pixel dichotomy model

Machine learning algorithms

Optimized MLR

Optimized BPNNs

Optimized SVR

Optimized RF

Feature selection

Accuracy assessment

Results

Regression model method

Pixel dichotomy model

Machine learning algorithms

FVC evaluation using four typical vegetation indices

FVC estimation using a multi-dimensional feature set

Optimal feature subset and feature importance

Performance evaluation of feature selection algorithms

FVC inversion based on optimized machine learning algorithms and feature subset

Computational efficiency

Sensitivity to training sample size

Discussion

Accuracy evaluation of different FVC inversion methods

Factors that influence FVC inversion in machine learning algorithms

Evaluation of efficiency and sensitivity for machine learning algorithms

Conclusion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Plant Methods

Contact us