Two-dimensional multifractal detrended fluctuation analysis for plant identification

Background In this paper, a novel method is proposed to identify plant species by using the two- dimensional multifractal detrended fluctuation analysis (2D MF-DFA). Our method involves calculating a set of multifractal parameters that characterize the texture features of each plant leaf image. An index, I0, that characterizes the relation of the intra-species variances and inter-species variances is introduced. This index is used to select three multifractal parameters for the identification process. The procedure is applied to the Swedish leaf data set containing leaves from fifteen different tree species. Results The chosen three parameters form a three-dimensional space in which the samples from the same species can be clustered together and be separated from other species. Support vector machines and kernel methods are employed to assess the identification accuracy. The resulting averaged discriminant accuracy reaches 98.4% for every two species by the 10 − fold cross validation, while the accuracy reaches 93.96% for all fifteen species. Conclusions Our method, based on the 2D MF-DFA, provides a feasible and efficient procedure to identify plant species.


Introduction
The increasing interest in biodiversity and biocomplexity, together with the growing availability of digital images and image analysis algorithms, makes plant species identification and classification a topic that has attracted many researchers' attention. In general, many parts of a plant such as flowers, seeds, roots, and leaves can be used to identify plant species [1][2][3]. In this paper, we focus on the usage of image of leaves as they are widely available. Leaf's shape, color, vein properties, texture and contours are important features for plant identification. For example, leaf shapes were used in [4][5][6]; complex veins and contours of leaves were used in [7] and leaf texture was used in [8][9][10][11] for plant species identification. For plant species identification using digital morphometrics, we refer the reader to [12][13][14] and the references therein.
Note that in [7], a monofractal method was used to extract plant leaf's features from leaf images. This method was then used in [15,16]. It's been recognized that the monofractal method cannot fully extract detailed information from the leaf image and therefore cannot be efficiently applied to process the images of the objects that are locally irregular [17]. To overcome this difficulty, several multifractal analysis (MFA) methods were proposed [18][19][20][21][22]. For example, Backes et al. [18,19] used multi-scale fractal dimensions to describe the texture property of leaf's surface to identify plants, which turned out to be very efficient. Note that the classical MFA is based on capacity measurement or probability measurement and thus describes only stationary measurements [17]. For a leaf image, the surface itself is hardly stationary. Therefore, the multifractal detrended fluctuation analysis (MF-DFA) method that can deal with non-stationary is a desirable method for leaf image analysis [23]. Though the MF-DFA method has been successfully applied in many fields for non-stationary series and surfaces [24][25][26][27][28][29][30], to the best of our knowledge, no work yet has applied the MF-DFA on leaf images for plant identification and classification. In this paper, we attempt to identify plant species via leaf images by using the MF-DFA. More precisely, we first adopt the MF-DFA to extract important texture features from leaf images and obtain several key multifractal parameters, and then we apply the support vector machines and kernel methods (SVMKM) to distinguish leaves from different plant species. The widely used Swedish leaf data set [31] containing leaves from fifteen different Swedish tree species are used for our experiments. Our results show that the average accuracy is 98.4% for every two species by the 10 − fold cross validation; for the over-all species, the average accuracy reaches 93.96% by the same validation criterion.
We organize the rest of this paper as follows: in Methods and materials we adopt the two-dimensional (2D) MF-DFA to calculate the multifractal parameters. In Results and discussion, we present and discuss our results. Our method is then further tested in Model test. A summary is provided in Conclusions.

Multifractal detrended fluctuation analysis
We first adopt the 2D MF-DFA method proposed in [32] to our setting as follows: Step 1: Regard a leaf image as a self-similar surface and represent it by an M × N matrix X = (X(i, j)), i = 1, 2,…, M and j = 1, 2,…, N. Partition the surface into M s × N s non-overlapping square sub-surface of equal length s, where M s ≡ [M / s] and N s ≡ [N / s] are positive integers (Here [u] stands for the largest integer that is less than or equal to u). Each sub-surface is denoted by X m,n = X m,n (i, j) with X m,n (i, j) = X(r + i, t + j) for 1 ≤ i, j ≤ s, where r = (m-1)s and t = (n-1)s. Note that M and N are not necessarily multiples of the length s, therefore, the sub-surfaces in the upper-right and the bottom may not be taken into consideration. We can then repeat the partitioning procedure starting from the other three corners.
Step 2: For each sub-domain X m,n , find its cumulative sum where 1 ≤ i, j ≤ s, m = 1, 2, …, M s and n = 1, 2, …, N s . Then G m,n = G m,n (i, j) (i, j = 1, 2, · · ·, s) itself is a surface.   Step 3: For each surface G m,n , obtain a local trend Gm ,n by fitting it with a pre-chosen bivariate polynomial function. In this paper, we choose the trending function asG a.where 1 ≤ i, j ≤ s and a, b and c are free parameters to be determined by the least-squares method. The residual matrix is then given by y m,n = y m,n (i, j) with y m;n i; j ð Þ ¼ G m;n i; j ð Þ−G m;n i; j ð Þ: ð3Þ Step 4: Define the detrended fluctuation function F(m, n, s) for the segment X m,n as follows: and the qth-order fluctuation function Step 5: Vary the value of s ranging from 6 to min(M, N)/4. If there is long-range power-law correlation for large values of s, then This allows us to obtain the scaling exponent h(q) via linearly regressing lnF q (s) on lns. Note that h(2) is the so called Hurst index of the surface, we then call h(q) the generalized Hurst index of the surface. For each q, the corresponding classical multifractal scaling exponent τ(q) is given by: where D f is the fractal dimension of the geometric support of the multifractal measure, and takes the value of D f = 2 in our work. The generalized multifractal dimension D q is then given by In the case where q = 1, D 1 can be obtained via a linear regression of X M s m¼1 X N s n¼1 P m;n ln P m;n against lns, where The other two indicators characterizing the singularity strength of the multifractal surface are the Hölder exponent α(q) and the singularity spectrum f (α), which are given by Here α(q) characterizes the local singularity of an image texture, and f (α) measures the global singularity of an image texture. Varying the value of q in the range from −15 to 15 determines Δα and Δf as follows: 15,15]}. Note that the index Δα is considered as an indicator to measure the absolute magnitude of the gray scale volatility. The larger value of Δα, the smaller even distribution of probability measure and the more roughness image surface will be expected. The index Δf is the Hausdorff dimension of the measure object, which measures the degree of confusion. Therefore both Δα and Δf are important multifractal parameters in describing the characteristics of an image in our study.

Experiment materials
To demonstrate our method of identifying plant species by using the leaf texture, we use the Swedish leaf data set [31] for our experiment, which is widely employed in computer vision and pattern recognition fields [4,33,34], plant taxon fields [1] and image processing fields [6,35]. This leaf data set has images of 15 species of leaves with 75 sample images per species. We label the fifteen species by MI, MII, · · ·, MXV (See Figure 1).
We first transform the color image to gray scale so that each image can be viewed as a three-dimensional surface with the first two coordinates (i, j) denoting the 2D position and the third coordinate z denoting the gray level of the corresponding pixel.

Multifractal nature of image surfaces
Each image is stored as a 2D matrix in 256 grey levels. This allows us to follow the procedure introduced in Multifractal detrended fluctuation analysis to calculate the associated h(q) and τ(q). If τ(q) is nonlinear in q, that is h(q) is not independent of q, then the image possesses the multifractal nature.
For the Swedish leaf data set, we find that the leaf images all possess the multifractal nature. Figure 2 and Figure 3 demonstrate the multifractal nature of two randomly chosen leaf images, namely, image MIV004 and image MX017, the former has 1793 × 979 pixels and the latter has 2934 × 1771 pixels. In each the left panel illustrates the dependence of the detrended fluctuation function F q (s) as a function of the scale s for different q. The well fitted straight lines indicate the evident power law scaling of F q (s) versus s. The right panel shows that τ(q) is nonlinear in q, indicated by the fact that h(q) depends on q.

Results and discussion
For each image, we can calculate the generalized Hurst exponents h(q) and six other multifractal parameters including α max , α min , Δα, Δf, D 1 and D 2 . For each tree species, we take the averaged value over the 75 samples and report our calculated values in Figures 4 and  5. Their standard deviations are given in Figures 6 and  7, respectively.
As seen in Figure 4, comparing with h(2) and h(3), the estimations of h(−3), h(−2), h(−1) and h(1) vary in relatively wider dynamic ranges and thus demonstrate better abilities to distinguish textures among different species.
Yet, one notes that there are relatively large variations in the standard deviations among the 75 samples for the h (q) exponents in Figure 5. This suggests that this indicator alone may not be adequate to identify the fifteen tree species. Also as seen in Figure 6 that the three parameters, α max , Δα, and Δf admit wider dynamic ranges than the other three parameters do. The variations among the 75 samples in the same tree species are notably large as shown in Figure 7.
For species i (i = I, II, · · ·, XV), with respect to each calculated multifractal parameter, we denote the standard deviation of the 75 samples by σ in (i) and define σ in as which represents the intra-species variance. Note also that for each indicator, we can calculate its value corresponding to each species and there are 15 values in total for those 15 species. We define σ bet. as the standard deviation of these 15 calculated values. Then the term σ bet. represents the inter-species variance for each multifractal indicator. We now define an index, I 0 , as From the definition, we note that the multifractal parameter with larger I 0 serves better as an indicator to distinguish species. We present the calculated values of I 0 in Table 1.  We choose the combination of three multifractal parameters with larger I 0 values, namely, {h(−3), α min , Δα}, as the feature descriptors for our classification purpose and apply the support vector machines and kernel methods (SVMKM) with the heavy-tailed radial basis function-'htrfb' as the kernel [36]. It is worth mentioning that the combination of 4 or more parameters does not lead to significant higher accuracies, but at a cost with much longer computational time and with no visual advantages. In this sense, the combination of the above three parameters is optimal. For the total sample set containing 75 × 15 = 1125 samples, we use the K − fold cross validation to evaluate the learning performance. This means that 100 (K − 1)/K% samples are randomly chosen as a training set and the remaining 100/K% samples are considered as a test set. The calculation process is then repeated 10 times to eliminate the impact of randomness.
In our first identification experiment, we test the proposed method through examining the distinguishing    effect for every two species. To this end, we form a three-dimensional parameter space with components given by the above chosen feature descriptors {h (−3), α min , Δα}. In this space, one point represents a leaf sample image. In Figure 8(a)-(d), we plot the corresponding points for Ulmus carpinifolia versus Alnus incana, Salix aurita versus Salix alba Sericea, Salix sinerea versus Tilia and Sorbus aucuparia versus Fagus silvatica, respectively. As shown in these plots, the samples from the same tree species are clustered together reasonably well.
In addition, we calculate the discriminant accuracies of every two tree species by SVMKM using the K − fold cross validation with different K values. The average accuracies of 10 trials are shown in Figure 9(a). To display the applicability of identifying different tree species by our proposed method, as an example, we plot the accuracy of identifying species MI (Ulmus carpinifolia) versus other 14 species with K = 10 in Figure 9(b). As expected, the average accuracy of every two species is increasing with respect to K. The obtained best accuracy is 98.40%, higher than 96.82% reported in [35], which requires a very complex pre-processing process for leaf images. It is seen from Figure 9(b) that there are accuracy variations between species Ulmus carpinifolia and the other 14 species. Five species, namely, Salix aurita, Betula pubescens, Ulmus glabra, Salix sinerea and Fagus silvatica, have accuracies below the average accuracy. This suggests that species Ulmus carpinifolia has high similarity with the above mentioned five species, which agrees with the observation from Figure 1.
For each species, the averaged {h(−3), α min , Δα} of the 75 samples is represented by a single point in the threedimensional parameter space (see Figure 10) in which different points representing different species may be clustered into several groups. We use the calculated discriminant accuracy of every two species as the distance between these two points (species). This allows us to    conduct a cluster analysis for all samples of the 15 species by the method of hierarchical clustering [37]. The result is given in Figure 10( Figure 1 showing our proposed approach is applicable. As another important aspect of identification experiment, we next test our method through calculating the identification accuracies for different numbers of species. The averaged accuracy result calculated when K = 10 is shown in Figure 11(a). Note that the average accuracy is decreasing as the number of tree species increases. This is due to the increasing probability of incorrect classification. However, under the worst situation, all 75 × 15 = 1125 sample leaf images are well mixed together, which gives the lowest average accuracy: 93.96%. This is still very convincing that our approach is feasible. We calculate the identification accuracy also when K = 10 for each species and report the result in Figure 11(b), while the identification result for each species is displayed in Table 2. The best three accuracies reach 98.67%, 97.33% and 96%, and the corresponding species are Sorbus aucuparia, Sorbus intermedia and Tilia. As is seen in Figure 1, these three species are clearly distinct from the other species in leaf shapes and textures. This again shows that our method is effective and feasible. Table 2 The results of identification for the fifteen species of tree leaves by the method of SVMKM with K = 10   MI  MII  MIII  MIV  MV  MVI  MVII  MVIII  MIX  MX  MXI  MXII  MXIII  MXIV  MXV   MI  69  0  2  0  1  0  0  0  3  0  0  0  0  0  0   MII  0  70  0  0  1  0  1  0  1  1  0  1  We remark that the sample size of each species has little effect on the average discriminant accuracy. To justify this, we randomly choose n (n ≤ 75) leaf samples for each species and run the procedure. Then repeat the process 10 times and take the average accuracy, which is reported in Figure 12. It can be seen from Figure 12 that as the number of samples changes from 40 to 75, the accuracy changes only 0.73%.

Model test
In this section, we test our proposed method to demonstrate its efficiency. More precisely, we test the validity of the optimal multifractal parameter combination {h(−3), α min , Δα}. To this end, we choose other four combinations composed by three multifractal parameters to construct four three-dimensional spaces from Table 1 (1), h(2), Δf}. One notes that each of the first three combinations contains one multifractal parameter from {h(−3), α min , Δα} and the fourth combination consists of the three parameters that produce the three smallest I 0 values. As in the procedure proposed in the previous subsection, we place the 1125 leaf samples into the four new three-dimensional spaces and also use the SVMKM to distinguish them. Under the K − fold cross validation, the discriminant accuracies with increasing K are shown in Figure 13. Obviously, the highest accuracy  still comes from the combination {h(−3), α min , Δα} for each K and the lowest accuracy comes from the combination {h(1), h(2), Δf}. This again suggests that the index I 0 successfully indicates the optimal multifractal parameter combination.

Conclusions
In this paper we have adopted the 2D MF-DFA method proposed in [32] to extract important texture features from leaf images. This allow us to calculate the generalized Hurst exponents, h(q), and several other multifractal parameters including α max , α min , Δα, Δf, D 1 and D 2 . By defining an index, I 0 , which examines the variation of the inter-species variances and the intra-species variances, we are able to find an optimal combination of the multifractal parameters that best characterizes the key features of plant species allowing high accuracy in plant species identification. For the Swedish leaf data set which contains 15 species and 75 × 15 = 1125 samples in total [31], the combination of {h(−3), α min , Δα} turns out to be optimal compared to other combinations of parameters. We have obtained 98.4% of averaged discriminant accuracy for every two species by SVMKM with the 10 − fold cross validation, while the accuracy reaches 93.96% for the over-all 15 species. Software based on our work can be designed and coded, for that purpose, we provided the corresponding flow chart in the Figure 14.
We should point out that most of the existing work on texture image recognition focuses mainly on the standard multifractal analysis. Our work has shown that the MF-DFA is of particular practice for plant leaf identification as the MF-DFA multifractal parameters can be combined to distinguish similar but different leaf textures.