- Open Access
Skewed distribution of leaf color RGB model and application of skewed parameters in leaf color description model
Plant Methods volume 16, Article number: 23 (2020)
Image processing techniques have been widely used in the analysis of leaf characteristics. Earlier techniques for processing digital RGB color images of plant leaves had several drawbacks, such as inadequate de-noising, and adopting normal-probability statistical estimation models which have few parameters and limited applicability.
We confirmed the skewness distribution characteristics of the red, green, blue and grayscale channels of the images of tobacco leaves. Twenty skewed-distribution parameters were computed including the mean, median, mode, skewness, and kurtosis. We used the mean parameter to establish a stepwise regression model that is similar to earlier models. Other models based on the median and the skewness parameters led to accurate RGB-based description and prediction, as well as better fitting of the SPAD value. More parameters improved the accuracy of RGB model description and prediction, and extended its application range. Indeed, the skewed-distribution parameters can describe changes of the leaf color depth and homogeneity.
The color histogram of the blade images follows a skewed distribution, whose parameters greatly enrich the RGB model and can describe changes in leaf color depth and homogeneity.
In recent years, high-throughput techniques for phenotype identification in greenhouses and fields have been proposed in combination with non-invasive imaging, spectroscopy, robotics, high-performance computing and other new technologies, to achieve higher resolution, accuracy and fast [1, 2]. With the increasing maturity of digital image technology and the rising popularity of high-resolution camera equipment, research is becoming more feasible on qualitative and quantitative descriptions of phenotypic traits of plant appearance using digital imaging techniques [3,4,5,6]. Digital cameras can record spectral leaf information in visible color bands, with high resolutions and low costs . In addition, digital color images contain rich information of plant morphology, structure, and leaf colors. So, leaf digital images are often exploited to identify changes in leaf color [8,9,10].
The most commonly used color representation for digital color images is the RGB color model. For an RGB color image, three color sensors per pixel can be used to capture the intensity of light in the red, green, and blue channels, respectively . Existing software tools, such as MATLAB is used to process the obtained digital pictures . The study of RGB color models of plant leaves has a long history . After decades of development, the RGB color information of plant leaves has been exploited for the determination of chlorophyll content and indicators of changes in this content . To exploit the data further, researchers suggested a number of RGB-based color features for the determination of chlorophyll levels in potato, rice, wheat, broccoli, cabbage, barley, tomatoes, quinoa and amaranth [15,16,17,18,19,20,21,22,23]. Many formulas have also been suggested to determine leaf chlorophyll content based on RGB components such as (RMean − BMean)/(RMean + BMean),GMean/(RMean + GMean + BMean), RMean/(RMean + GMean + BMean), GMean/RMean, RMean + GMean + BMean, RMean-BMean, RMean + BMean, RMean + GMean, log sig ((GMean − RMean/3 − BMean/3)/255) . However the problem of the small amount of information still persists. This information scarcity has become a bottleneck in the application of RGB models, greatly limiting their use.
In the analysis of RGB data of leaf images, the cumulative frequency distributions of the RMean, GMean and BMean components have been generally assumed to follow a normal distribution. However, recent studies have reported that the cumulative frequency distributions of leaf colors follow skewed distributions. For example, Wu et al. found that the cumulative frequency of tea leaf color has a skewed distribution, and that the deviations with new and old leaves had clear differences . Also, the moisture condition in maize leaves is related to the deviation of the grayscale values in the RGB blade model . The asymmetry of a skewed distribution can be described by the partial frequency distributions of the skewed distribution curve. Several parameters can be derived from a skewed distribution including the mean, median, mode, skewness, kurtosis, and others.
The SPAD leaf chlorophyll meter is one of the most widely used hand-held meters for rapid and non-destructive assessment of the chlorophyll content in many crops . In this paper, we analyzed the frequency distributions of the red, green, blue and grayscale channels in RGB leaf images and confirmed the skewed characteristics of these distributions. By extracting relevant distribution parameters, models are established for the correlation of the color characteristic parameters and the SPAD chlorophyll concentration values. When the skewness parameter was exploited, we found that both the fitting degree and the prediction accuracy were greatly improved. The proposed spatial model could predict the SPAD values more accurately, and explain the physiological significance of the leaf color changes. We hope that this work would provide researchers with a new method for the analysis of blade color patterns in RGB digital images.
Materials and Methods
In this work, the tobacco was planted in pots on November 25, 2017 at Shanghang County Township, Fujian, China (24°57′N,116°30′E). The 50-day-old seedlings were transferred to the field. Then, tags were made for 400 new tobacco leaves which exhibited consistent normal growth and leaf color, as well as no signs of pests and diseases after 15 days. A total of 100 leaves were collected at 40, 50, 60 and 65 days of leaf age, respectively. For each leaf, the SPAD value was measured at 10 AM. Then, the leaves were picked and sent to a dark room to take photos for them immediately.
Leaf image collection
On the same day of plant sampling, tobacco leaves were transferred to one platform in a dark room. The platform used for image acquisition is a rectangular desktop of a 300-cm length, a 200-cm width, and an 80-cm height. The desktop bottom plate is a white matte scrub countertop. Images were captured using a high-resolution camera (CANON EOS-550D, Canon Company, Japan) with a resolution of 3840 × 5120 pixels. The camera was mounted on atripod at the nadir position with a constant height of 1 m above the top of the platform. The light sources are two 20-W strip white LED lamps with a color temperature of 4000 K. To ensure light uniformity, the lamp suspension positions in the platform are at 1/4th, and 3/4th of the 200 cm distance to the fixed digital camera.
Leaf image segmentation, denoising and color feature extraction
The commercial image-editing software, Adobe Photoshop CS, was used to manually cut each original image, save the PNG image as a transparent background, and adjust the image size to 1000 × 1330. The MATLAB 2016R computing environment was used for the extraction and analysis of the color image data. First, the imread and rgb2gray functions were respectively used to read each color image and obtain its gray-level information. Then, the double function was used to convert each gray-level array into a double-precision array. The mean, median, mode, skewness and kurtosis functions were respectively used to analyze and obtain the mean, median, mode, skewness, kurtosis, and other parameters of the double-precision arrays of the red, green and blue channels as well as the gray-level image for each color leaf image.
Color cumulative histogram construction and normality testing
The imread and rgb2gray functions are used to read each color image and obtain its gray-level counterpart. Then, using the image histogram functions, the cumulative histograms of the double-precision arrays of the red, green, blue and gray-level data were obtained. The Lilliefors and Jarque–Bera tests were used to test the distribution normality.
Chlorophyll concentration measurement
For measuring the chlorophyll concentration, a chlorophyll meter (SPAD-502, Zhejiang Topuiunnong Technology Co., Ltd., China) was used to obtain the SPAD values for 50 pieces of fully-expanded tobacco leaves at 40, 50, 60 and 65 days of age, respectively. Each leaf blade was measured at five points: one on the upper part, two at the middle part, and two at the petiole of both sides of the leaf. The measurement process was designed to ensure that the sample completely covers the receiving window, avoid the veins only, and determine the leaf meat tissue. For each blade, the SPAD value is the mean value of the 5 measured points.
Model building and goodness-of-fit testing
We mainly used the IBM SPSS Statistics22 software to analyze the blade features at ages of 40, 50, 60 and 65 days, and establish multivariate linear regression models, F1 and F2, by stepwise regression. In the F1 model, we got the parameters (RMean, GMean, BMean) using the mean function for three color channels. Then, we used each of these three parameters and ten combinations of them (namely (RMean + GMean + BMean), RMean/(RMean + GMean + BMean), GMean/(R Mean + GMean + BMean), BMean/(RMean + GMean + BMean), RMean − BMean, RMean − GMean, GMean − BMean, RMean + BMean, RMean + GMean, BMean + GMean) to establish a multivariate linear regression model by stepwise regression. The parameter equation with the highest prediction accuracy was used to construct the F1 model. Similarly, all 20 parameters (namely RMean, RMedian, RMode, RSkewness, RKurtosis, GMean, GMedian, GMode, GSkewness, GKurtosi, BMean, BMedian, BMode, BSkewness, BKurtosis, YMean, YMedian, YMode, YSkewness and YKurtosis) were used to establish a multivariate linear regression model by stepwise regression. The parameter associated with the highest prediction accuracy was used to construct the F2 model. Using the MATLAB software, the data was fit with Fourier and spatial functions based on all 20 parameters of 40, 50, 60 and 65 days of blade age, to establish two multivariate linear regressionmodelsF3 and F4. Then, goodness-of-fit testing was performed.
In this work, images and data were processed using a virtual private server. The hardware resources included Intel Xeon CPU E5-2640 2.5 GHz with 2 DDR4 8 GB RAMs. This server type can perform billion double-precision real-time floating-point operations.
Distribution characteristics and normality verification of color gradation cumulative frequency of leaf-color RGB model
In previous studies, the histogram of RGB leaf colors was mostly assumed to follow a normal distribution [24,25,26,27]. However, the validity of this assumption was contested by some reports. To verify the suitability of the proposed method, we designed an experiment that involves tobacco leaf images with different sample sizes and growth periods. We found that the tobacco leaves gradually decayed, and that the leaf color changed from green to yellow after 40 days. All histograms of single-leaf RGB images at different leaf ages (40, 50, 60, and 65 days) had skewed distributions (Fig. 1). No one RGB color distribution (red, green, blue or grayscale) was completely normal and the skewness changed regularly with the increase in the leaf age. To further confirm our histogram-based findings, we performed the Lilliefors and Jarque–Bera normality test using color gradation data of 50 leaves. The results showed that the normal distribution hypothesis value was1, and the p value was 0.001 (< 0.05). That means the leaf color distribution follows a skewed distribution, not a normal one.
Correlation between skewed-distribution parameters and SPAD values
We have shown that the leaf RGB color distribution is a skewed distribution. Using skewed-distribution analysis in MATLAB, we got 20 parameters including the mean, median, mode, skewness and kurtosis for the red, green, blue and grayscale channels, respectively. In the individual-leaf color distribution, the parameters of the skewness and kurtosis represent the state of the leaf color distribution (Table 1). The skewness showed obvious changes with different leaf ages and decreased from positive to negative values. This also indicates that the color distribution of tobacco leaves is skewed throughout their lifetime. The SPAD values showed increasing and then decreasing trends.
We performed correlation analysis using the mean parameters (RMean, GMean, BMean) and their combinations (namely (RMean + GMean + BMean), RMean/(RMean + GMean + BMean), GMean/(R Mean + GMean + BMean), BMean/(RMean + GMean + BMean), RMean − BMean, RMean − GMean, GMean − BMean, RMean + BMean, RMean + GMean, BMean + GMean) while earlier studies only used the parameters in Table 2. In Table 3, we carried on correlation analysis using 20 RGB skewed-distribution parameters with 200 leaves of four leaf ages. The results showed 17 out of 20 parameters were significantly correlated with the SPAD values at the 0.01 level. This means the change of the chlorophyll content was highly correlated with the change of the leaf color. While the chlorophyll distribution area is not uniform, it is numerically related to the increase in skewness.
Construction of the correlation models between the SPAD and leaf color parameters
The correlation model can be established by the leaf color parameters based on the skewed distribution and the SPAD value. In previous studies, researchers generally used stepwise regression methods based on ordinary least squares (OLS) to construct the association model. For comparison with previous models, we used the mean parameters RMean, GMean, BMean and their combinations to establish multivariate linear regression models by stepwise regression, then chose the best combination as the model F1 (Table 4). We also extended the parameter range and adopted 20 parameters to establish multivariate linear regression models by stepwise regression, then chose the best as the model F2. We found that the leaf color parameters changed linearly with increasing leaf ages, while the SPAD value was characterized by first increasing and then decreasing. Since different color gradations represent different wavelengths of light, we were inspired to use the Fourier functions to fit and get the model F3 (Fig. 2). The leaf color showed different kinds of change, both in depth and in heterogeneity at different positions, with non-planar characteristics. Therefore, to model the bidirectional changes of leaf color (i.e. the change of leaf color depth and distribution), we used the MATLAB Curve Fitting Toolbox to fit the polynomial F4 that incorporates spatial bidirectional patterns (Fig. 3).
In order to assess the advantages and disadvantages of the four models, we compare their fitting performance (Table 5). The models F2, F3 and F4 had higher R2. The model F4 increased 21% compared with the model F1. To evaluate the prediction accuracy of the four models, we collected another batch of leaf images with four values of leaf ages and 50 blades for each age value (Table 5). The models F2 and F4 had more accurate prediction, and the accuracy of F4 increased 5% compared with F1. The SSE and RMSE metrics of the F4 model were superior to those of the other models. Therefore, the model F4 based on the spatial feature polynomial with the spatial bidirectional patterns is the optimal model.
In the past, the use of the RGB models for leaf color analysis had obvious limitations. The biggest drawback of such model was that it had too few parameters to use, only the mean values of the red, green, blue, and grayscale intensities . Although previous studies have proposed a variety of models based on combinations of these parameters, no plausible explanation was given for the physiological significance of these parameters in describing leaf color changes [21, 22]. The reason for this was that when RGB features were extracted from digital images, the descriptive statistics were based on a normal distribution. This normality assumption is only a convenience for finding approximate values, but it cannot reflect the distribution of leaf colors in a comprehensive and truthful way.
In this work, we verified through general normality tests that the RGB color gradation histogram followed a skewed distribution for tobacco leaves with different leaf ages. As a result, we extend the color gradation distribution parameters in the RGB model. These parameters include the mean, median, mode, skewness, and kurtosis. This gives a total of 20 parameters for 4 channels, while the common normal-distribution parameter is only the mean value.
Each of these parameters reflects some property or trait of leaf color. When the mean value is extracted based on a normality assumption, the leaf color heterogeneity is ignored. The mean can only describe the state of the leaf color depth quantitatively. This cannot fully reflect a real leaf color distribution at any leaf age. The description of the skewed distribution not only expands quantitative leaf color information but also systematically characterizes the leaf color depth and homogeneity. The skewness and kurtosis are features that mainly reflect the leaf color homogeneity. These features make it possible to accurately and quantitatively describe leaf color from different aspects.
We found 17 of the 20 parameters to be significantly correlated with the SPAD value at the 0.01 significance level. We try to model the chlorophyll content and distribution of leaves with these parameters. In earlier studies, the mean parameters of the R, G, and B components as well as their combinations were generally used with a normality assumption to establish models by stepwise regression. We also used this method to get the model F1. After comparing the models F2, F3 and F4 with F1 using skewed-distribution parameters, we found that the model based on the median and the skewness could better fit the SPAD value. More parameters increased the accuracy of the RGB model description and prediction, and extended its application range. When we used the Fourier method in the model F3, we found that the fitting degree was higher than that in the model F1, indicating that the numerical SPAD distribution was more in line with the curve distribution. Predicting the SPAD value with the mean value only didn’t work well. This means that the depth of the leaf color cannot describe the leaf color accurately. When introduced the skewness, and found that both the fitting degree and the prediction accuracy were greatly improved. So, these skewed-distribution parameters can describe changes in leaf color depth and homogeneity.
To sum up, the color distribution histogram of blade images follows a skewed distribution, whose parameters (such as the mean, median, mode, skewness, and kurtosis) greatly enrich the RGB model. We hope that this work will provide researchers with a new method for the analysis of blade color patterns in RGB digital images. This work shall also inspire the extraction and exploitation of novel leaf color descriptors for plant monitoring and treatment.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Chen DJ, Neumann K, Friedel S, Kilian B, Chen M, Altmann T, Klukas C. Dissecting the phenotypic components of crop plant growth and drought responses based on high-throughput image analysis. Plant Cell. 2014;26:4636–55. https://doi.org/10.1105/tpc.114.129601.
Barker J, Zhang NQ, Sharon J, Steeves R, Wang X, Yong W, Poland J. Development and evaluation of a field-based high-throughput phenotyping platform. Comput Electron Agr. 2016;122:74–85. https://doi.org/10.1071/FP13126.
Vasseur F, Bresson J, Wang G, Schwab R, Weigel D. Image-based methods for phenotyping growth dynamics and fitness components in Arabidopsis thaliana. Plant Methods. 2018;14:63. https://doi.org/10.1101/208512.
Conn SJ, Hocking B, Dayod M, Xu B, Athman A, Henderson S, Aukett L, Conn V, Shearer MK, Fuentes S, Tyerman SD, Gilliham M. Protocol: optimising hydroponic growth systems for nutritional and physiological analysis of Arabidopsis thaliana and other plants. Plant Methods. 2013;9:4. https://doi.org/10.1186/1746-4811-9-4.
Vasseur F, Violle C, Enquist BJ, Granier C, Vile D. A common genetic basis to the origin of the leaf economics spectrum and metabolic scaling allometry. Ecol Lett. 2012;15:1149–57. https://doi.org/10.1111/j.1461-0248.2012.01839.x.
Bresson J, Bieker S, Riester L, Doll J, Zentgraf U. A guideline for leaf senescence analyses: from quantification to physiological and molecular investigations. J Exp Bot. 2018;69:769–86. https://doi.org/10.1093/jxb/erx246.
Bama BS, Valli SM, Raju S, Kumar VA. Content based leaf image retrieval (CBLIR) using shape, color and texture features. Indian J Eng Mater S. 2011;1:202–11.
Zhang YH, Tang L, Liu XJ, Liu LL, Cao WX, Zhu Y. Modeling dynamics of leaf color based on RGB value in Rice. J Integr Agr. 2014;13:749–59. https://doi.org/10.1016/S2095-3119(13)60391-3.
Arai K, Nugraha I, Oku H. Image identification based on shape and color descriptors and its application to ornamental leaf. I J Image Graph Signal Process. 2013;5:1–8. https://doi.org/10.5815/ijigsp.2013.10.01.
Mansour M, Sepideh T, Reza DM. Predicting cut rose stages of development and leaf color variations by means of image analysis technique. J Ornam Plants. 2017;7:25–36.
Bai JY, Ren HE. An algorithm of leaf image segmentation based on color features. Key Eng Mat. 2011. https://doi.org/10.4028/www.scientific.net/KEM.474-476.846.
Dobrescu A, Scorza LCT, Tsaftaris SA, McCormick AJ. A “Do-It-Yourself” phenotyping system: measuring growth and morphology throughout the diel cycle in rosette shaped plants. Plant Methods. 2017;13:95. https://doi.org/10.1186/s13007-017-0247-6.
Kawashima S, Nakatani M. An algorithm for estimating chlorophyll content in leaves using a video camera. Ann Bot. 1998;81:49–54. https://doi.org/10.1006/anbo.1997.0544.
Hu H, Zhang J, Sun X, Zhang X. Estimation of leaf chlorophyll content of rice using image color analysis. Can J Remote Sens. 2013;39:185–90. https://doi.org/10.5589/m13-026.
Adamsen FJ, Pinter PJ, Barnes EM, LaMorte RL, Wall GW, Leavitt SW, Kimball BA. Measuring wheat senescence with a digital camera. Crop Sci. 1999;39:719–24. https://doi.org/10.2135/cropsci1999.0011183x003900030019x.
Hu H, Liu HQ, Zhang H, Zhu JH, Yao XG, Zhang XB, Zheng KF:Assessment of chlorophyll content based on image colour analysis, comparison with SPAD-502. In: Proceedings of 2nd International Conference on Information Engineering and Computer Science (ICIECS), Wuhan, China, 2010. https://doi.org/10.1109/iciecs.2010.5678413.
Cai H, Haixin C, Weitang S, Lihong G. Preliminary study on photosynthetic pigment content and colour feature of cucumber initial blooms. Trans CSAE. 2006;22:34–8.
Ali MM, Al-Ani A, Eamus D, Tan DKYA. new image processing based technique to determine chlorophyll in plants. Am Eurasian J Agric Environ Sci. 2012;12:1323–8.
Yadav SP, Ibaraki Y, Dutta Gupta S. Estimation of the chlorophyll content of micropropagated potato plants using RGB based image analysis. Plant cell Tiss Org. 2010;100:183–8. https://doi.org/10.1007/s11240-009-9635-6.
Zhu J, Deng J, Shi Y, Chen Z, Han N, Wang K. Diagnoses of rice nitrogen status based on characteristics of scanning leaf. Spectrosc Spect Anal. 2009;29:2171–5. https://doi.org/10.3964/j.issn.1000-0593(2009)08-2171-05.
Wu XM, Zhang FG, Lu JT. Research on recognition of tea tender leaf based on image color information. J Tea Sci. 2013;33:584–9.
Han WT, Sun Y, Xu TF, Chen XW, Su KO. Detecting maize leaf water status by using digital RGB images. Int J Agric Biol Eng. 2014;7:45–53. https://doi.org/10.3965/j.ijabe.20140701.005.
Hu H, Liu H, Zhang H, Zhu J, Yao XG, Zhang X, Zheng KF. Assessment of chlorophyll content based on image color analysis comparison with SPAD-502. 2010. https://doi.org/10.1109/iciecs.2010.5678413.
Li L, Zhang Q, Huang DF. A review of imaging techniques for plant phenotyping. Sensors. 2014;14:20078–111. https://doi.org/10.3390/s141120078.
Mercadoluna A, Ricogarcía E, Laraherrera A. Nitrogen determination on tomato (Lycopersicon esculentum Mill.) seedlings by color image analysis (RGB). Afr J Biotech. 2010;9:5326–32. https://doi.org/10.1186/1471-2180-10-219.
Vibhute A, Bodhe SK, More BM. Wavelength based nitrogen estimation of grapes using rgb color images. World Res J Eng Technol. 2014;3:38–40.
Feng Y, Ren G, He K, Liu Y, Li L. RGB color channel variation based segmentation of crop leaf lesion. IEEE Conf Indus Electr Appl. 2015. https://doi.org/10.1109/iciea.2015.7334180.
We would like to thank Rongzhan Guan for assistance during the laboratory experiment.
This study was funded by The National Key Research and Development Program of China (2018YFD1000900).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Chen, Z., Wang, F., Zhang, P. et al. Skewed distribution of leaf color RGB model and application of skewed parameters in leaf color description model. Plant Methods 16, 23 (2020). https://doi.org/10.1186/s13007-020-0561-2
- RGB model
- Leaf color
- Skewed distribution
- Skewed parameters