Unsupervised Bayesian learning for rice panicle segmentation with UAV images

Background In this paper, an unsupervised Bayesian learning method is proposed to perform rice panicle segmentation with optical images taken by unmanned aerial vehicles (UAV) over paddy fields. Unlike existing supervised learning methods that require a large amount of labeled training data, the unsupervised learning approach detects panicle pixels in UAV images by analyzing statistical properties of pixels in an image without a training phase. Under the Bayesian framework, the distributions of pixel intensities are assumed to follow a multivariate Gaussian mixture model (GMM), with different components in the GMM corresponding to different categories, such as panicle, leaves, or background. The prevalence of each category is characterized by the weights associated with each component in the GMM. The model parameters are iteratively learned by using the Markov chain Monte Carlo (MCMC) method with Gibbs sampling, without the need of labeled training data. Results Applying the unsupervised Bayesian learning algorithm on diverse UAV images achieves an average recall, precision and F1 score of 96.49%, 72.31%, and 82.10%, respectively. These numbers outperform existing supervised learning approaches. Conclusions Experimental results demonstrate that the proposed method can accurately identify panicle pixels in UAV images taken under diverse conditions.


Background
Rice is the most consumed staple food on earth. More than half of the world's population depend on rice for their daily calories [1]. The yield of a paddy field is directly related to rice panicles, which are the parts of the plant that carry the grains. Fast panicle screening can help rice yield prediction, disease detection, nutrition value assessment, precision irrigation and fertilization, etc [2,3]. With the rapid development of unmanned aerial vehicle (UAV) and machine learning, there have been growing interests in high throughput rice field phenotyping by using optical images taken by UAVs over paddy fields [4][5][6].
Image-based rice panicle phenotyping relies on accurate panicle segmentation [7]. One of the main challenges faced by rice panicle segmentation with optical images is the diverse conditions under which the images are taken. There are significant variations among images taken under different conditions, such as water reflections, lighting conditions, weather conditions, cluttering backgrounds, panicle rigidness, rice growth phase, rice strains, UAV altitudes, etc. All these factors will affect the accuracy of panicle identification. This motivates the development of panicle segmentation algorithms that can operate over images taken under a large variety of conditions.
Image-based plant phenomics has gained increasing attentions recently. An automated panicle counting algorithm was developed in [7] by using artificial neural network (ANN). The algorithm was developed by using multi-angle images of rice plants, which was rotated on a turntable to obtain images at multiple angles. In [8], a rice panicle segmentation algorithm, Panicle-SEG, is developed by using deep learning with convolutional neural network (CNN) and superpixel optimization. The Panicle-SEG algorithm is trained with a large number of images, from both pot-grown and field plants, to improve its robustness against the diverse conditions of images. CNN-based deep learning algorithms are also used for rice panicle detection in [9], and for sorghum panicle detection in [10] and [11]. Optical images were also used in [12] for wheat ear detection during the wheat heading stage, and in [13] for studying the flowering dynamics of rice plants. Both [12] and [13] use support vector machine (SVM) for detection. Algorithms mentioned above require a significant amount of labeled training data. More recently, an active learning approach with weak supervision is proposed to reduce the number of labeled training images for panicle detection in cereal crops such as sorghum and wheat [14]. In addition to optical images, hyperspectral images have been widely studied for detecting different plant diseases [15] based on machine learning techniques like principle component analysis (PCA) and chi-square kernel support vector machine (chiSVM) [16,17].
All above works are based on supervised learning, which requires a substantial number of labeled images for training. To the best of our knowledge, no unsupervised learning method has been developed or applied for rice panicle segmentation. The performance of supervised segmentation algorithm relies heavily on the quality of the training data set. Due to the diverse conditions of rice fields, there are significant variations in the statistical properties of pixels from different images. For example, the illumination and weather condition will have big impacts on the statistical distributions of panicle pixels in different images. Even though the supervised algorithm can be trained by using a large number of images taken under different conditions, it is almost impossible for to capture the large variations among different images by using a single trained model. As a result, for an algorithm trained with one set of images, it might not perform well in other sets of images taken at different conditions. This motivates us to develop an unsupervised learning algorithm that can learn, identify, and adapt to the underlying statistical properties of each individual image, thus works well under all conditions. The objective of this paper is to develop an unsupervised Bayesian learning algorithm for rice panicle segmentation with UAV images. The algorithm performs panicle detection by identifying the inherent differences in statistical distributions between panicle pixels and non-panicle pixels within the same image, without the need of a training stage. The difference in statistical distributions can then be used to classify the pixels into different categories. The algorithm adopts a probabilistic learning approach that can iteratively calculate the probability of each pixel in an image belonging to different categories, such as panicle, leaves, and background. Such a probabilistic approach can quantify the uncertainty regarding the detection results that is not available in conventional deterministic approaches. Under the Bayesian framework, a multivariate Gaussian mixture model (GMM) is used to represent the pixel intensities in one image, with each component in GMM corresponding to one possible category. With the unsupervised learning approach, the model parameters are directly learned by using unlabeled data from each individual UAV image. Different images will have different model parameters, and this makes the algorithm adaptable to images taken under a wide variety of conditions. Markov chain Monte Carlo (MCMC) [18] with Gibbs sampling [19][20][21] is employed to learn and update the model parameters. Experimental results demonstrate that the unsupervised Bayesian learning approach can achieve accurate panicle segmentation with UAV images, and it outperforms existing supervised learning approaches. Moreover, this algorithm can also be used in active learning and semisupervised learning models.

Results
The proposed unsupervised Bayesian learning algorithm is applied to the UAV images for panicle segmentation. The UAV images were stored in RGB format, with each pixel represented by a p = 3 dimension vector corresponding to the colors of red, green and blue, respectively. The value of each color is normalized to the range between 0 and 1. A total of 12 images were processed by the algorithm. Among them, images 1 to 6 were taken at an altitude of 3 m, and images 7 to 12 were taken at an altitude of 6 m. The average spatial resolution (distance between two adjacent ground samples) for 3 m and 6 m images are 0.52 mm and 1.17 mm per pixel, respectively. Figure 1 shows two images of one square segment taken at an altitude of 3m and 6m, respectively. The images were acquired during middle heading stage of rice on August 21, 2017 and September 1, 2018, respectively. The measurements were carried out between 10:00 a.m. and 2:00 p.m. The weather condition on those two days were sunny with a temperature between 21 and 31 °C and class 1-2 south wind, and sunny with a temperature between 19 and 28 °C and class 1-2 northeast wind, respectively. Table 1 shows the detailed information of all 12 images studied in this paper.
To evaluate the accuracy of the unsupervised Bayesian learning algorithm, the pixels in all UAV images were manually labeled into panicle segments and non-panicle segments, respectively. The manually labeled results are used as a benchmark for evaluation. A pixel-by-pixel comparison is performed between the manually labeled images and automatically segmented images to quantitatively evaluate the results of the proposed algorithm. The percentage of panicle pixels in Table 1 is obtained by using the manually labeled results. As an example, Fig. 2a, b show Image 3 in RGB format and the corresponding manually labeled results, respectively.
In the Bayesian learning algorithm, the initial parameters for the Dirichlet distribution is set as α 1 = · · · = α k = 1 . The prior mean of the mean vector µ j of the GMM model is set as τ 1 = · · · = τ k = 0 p , where 0 p is a length-p all-zero vector. The prior precision matrix of the mean vector µ j is set as    iterations in Gibbs sampling, and samples from the first T 0 = 75 iterations are discarded before evaluation. The pixels in each image are classified into one out of k = 3 categories: panicle, leaves, and dark background. Let µ i,j represent the mean of the j-th color channel in i-th class, where i ∈ {1, 2, . . . , k} and j ∈ {r, g, b} . Define m i as the total mean across all channels for class i as Based on our experiment results, panicle pixels have the largest total mean across all channels, followed by the leaves and background, respectively. Thus the panicle class can be detected as Since the manually labeled results identify only panicle and non-panicle pixels, the automatically classified pixels belonging to the leaves and background categories are grouped together as non-panicle pixels before comparison. Figure 2c shows the classification results of Image 3 with the unsupervised Bayesian learning algorithm and the MAP (maximum a posteriori) decision rule as shown in Algorithm 2 in "Methods" section. A visual comparison between Fig. 2b and c indicates that the automatically detected results are strongly correlated with the manually labeled results.
To quantitatively evaluate the performance of the unsupervised Bayesian learning algorithm, Fig. 3 shows the average receiver operation characteristic (ROC) curves of the proposed algorithm. Each ROC curve is obtained by averaging over all images at the same altitude. Each point on the ROC curve is obtained by adjusting the threshold of the posterior probability of the panicle category. The tradeoff between the probabilities of true positive (TP) and false positive (FP) can be adjusted by tuning the threshold p TH . In this case the goodness of performance depends on lower FP, thus proper selection of p TH is important. As example, for Image 3, a TP probability of 0.9788 is achieved with a FP probability of 0.0144 by setting p TH = 0.9990 . Based on the ROC results, the algorithm operates equally well for images obtained at both 3m and 6m, with the performance of the 3m images slightly better than that of the 6m images. Averaged over all 12 images, the unsupervised Bayesian learning algorithm can achieve an average recall of 96.49% with average precision of 72.31%, and this is achieved by setting p TH = 0.9990 . Figure 2d shows the segmentation result of Image 3 by setting p TH = 0.9990.
The detection results of all 3 m and 6 m images are tabulated in Tables 2 and 3, respectively. The results from k-means clustering [22] and Panicle-SEG [8] are also shown in Tables 2 and 3 for comparison. The k-means clustering is an unsupervised algorithm aiming to minimize the within-cluster variation after assigning each observation to one of the k clusters. In this paper, the within cluster variation is measured by using Euclidean distance, and the algorithm is implemented with the "k-means++" algorithm [23], which is the default k-means implementation in MATLAB. The Panicle-SEG algorithm is based on a pre-trained model with both inlab and field measurements of 684 images, including 49 top-view field rice images, 30 overhead-head view field rice images, 302 pot-grown rice side-view images, and 303 pot-grown rice top-view images [8], and the pretrained model is available online for download [24]. The balancing parameter and optimization coefficient for Panicel-SEG are set as 0.5 and 0.9, respectively, for 3m images, and they are set as 0.5 and 0.8, respectively, for 6m images. In addition, Figs. 4 and 5 compare the performance of the three algorithms by averaging over all images obtained at the same altitude.
It is evident that the Bayesian based method consistently outperformed Panicle-SEG and k-means algorithm for all the images considered in this paper. For the 3 m images, the proposed algorithm can achieve an average recall of 96.35% with an average precision of 75.15%. These two values are 75.48% and 73.35% for Panicle-SEG, and 76.54% and 48.73% for k-means. The corresponding F 1 score of the three algorithms are 83.78%, 74.23%, 51.52%, respectively, for all 3m images. Therefore, compared to the Panicle-SEG algorithm, the proposed algorithm can achieve a much higher recall with a similar precision for 3m images, which results in a significantly improved F 1 score. For 6m images, the performance of the Bayesian based method and the k-means algorithm remain similar to those from the 3m images. However, the performance of the Panicle-SEG method drops significantly for the 6m results. The recall and precision of the Bayesian based method are 96.23% and 69.47%. These two metrics for the Panicle-SEG algorithm drop to 51.70% and 34.38%, and they are 79.36% and 51.94% for the k-means algorithm. Consequently, the F 1 scores of the Bayesian based method, Panicle-SEG, and k-means are 80.42%, 41.24%, and 62.50%, respectively.
The performance degradation of the Panicle-SEG is partly due to the fact that a lot of the training images are taken at close range with pot plants. On the other hand, the unsupervised learning approach can automatically adjust to different altitudes and achieve similar performances regardless of the altitude differences. This again asserts the versatility and adaptability of the Bayesian based unsupervised learning approach.

Discussion
The data used in this paper was collected in a sunny and uncloudy day during the middle heading stage of rice. The proposed algorithm relies on the brightness of the pixels, so proper care should be taken to ensure uniform brightness within each image. Failure to maintain this condition can seriously deteriorate the performance. Weather, like other methods [8], is an important factor when the performance is evaluated. As long as panicle pixels and nonpanicle pixels maintain different Gaussian distribution this algorithm is going to work quite efficiently irrespective of the height at which the images are captured. In higher altitudes UAV can scan the field with less number of images rendering faster implementation of this algorithm. The results presented in this paper are obtained using just one variate of rice. Results may vary depending on rice variate, rigidness and brightness of rice panicle. All simulations have been done using custom routines in MATLAB. The computer used in the simulation was equipped with 8GB RAM and Intel Core i7-4790 processor. No GPU or parallel computing paradigms have been used. As the number of pixels in 3m images are almost 6 times compared to number of pixels in 6m images, the 6m images are much faster to process. T 0 has been chosen to make sure that the samples drawn from following   iterations are almost from a stationary distribution. Also, non-informative prior for precision matrix has been used in this paper for faster implementation. Informative priors of precision matrix can also be used. Figure 6 illustrates probability density functions of pixels under three different classes obtained from Image 7. The probability density functions are obtained by using the classification results from the Bayesian based method. As can be seen from the estimated distributions, the pixels in different color channels and under different categories roughly follow Gaussian distributions which justifies the selection of Gaussian distribution in this mixture model. The principle of maximum entropy [25] states that for a given mean and variance, the Gaussian distribution has the maximum entropy among all distributions. Even if the actual distribution of the underlying data is not Gaussian, under the same mean and variance, the Gaussian assumption represents the worst case with the maximum uncertainty. Thus the Gaussian assumption is a good starting point when the prior knowledge of the actual distribution is not known. Due to the above two reasons, the multivariate Gaussian distribution is used to model the pixel distributions under different clusters.

Number of classes
Under different illumination conditions the leaf pixels might correspond to multiple categories due to reflection, diffusion, and shadowing. In that case the number of categories k can be increased to capture diverse conditions, and some of the clusters close to each other can be combined later before detection. Details of the method of determining the number of clusters are given in Algorithm 3 in "Methods" section. Since the objective is to identify panicles, all categories other than panicles are grouped together at final output of the segmentation. Segmentation results of Image 3 with k = 4 and k = 5 are in Fig. 7. The recall, precision, and F1 score for Image 3 with k = 3 , 4, and 5 categories are summarized in Table 4. For k = 3 and 4, the results are almost the same, but the performance drops considerably when k is increased to 5. Setting k too high creates unnecessary categories that will negatively affect the performance. Therefore, the number of categories should depend on the illumination conditions to achieve better classification results.

Robustness against anomaly object
The proposed algorithm is robust against the existence of anomaly objects. In case of an anomaly object, the number of clusters k can be increased to account for the distribution of pixels of the anomaly object. Cluster i is anomaly if the total mean of a cluster across all channels is greater than a predefined threshold ǫ a , as m i ≥ ǫ a . Details of anomaly detection is discussed in Algorithm 3 in "Methods" section. In this paper ǫ a = 0.9 has been used. Figure 8 shows an image with a white rectangle used to mark the rice field. The image is segmented by using k = 4 clusters. After classification, the white rectangle has the highest average mean across channels (0.9239), and the panicle pixels have the second highest  average mean across channels (0.4973). Detection results for panicle and anomaly object are shown in Fig. 8c, d, respectively. With the existence of the anomaly object, the recall, precision, and F1 values for panicle pixels with p TH = 0.999 are 0.86, 0.80, and 0.83 respectively.

Spatial information
The Bayesian based method treats all pixels as independent in the spatial domain whereas the spatial information is utilized by the Panicle-SEG method. The omission of spatial information in the Bayesian based method can sometimes lead to the misclassification of stems as panicles as shown in Figs. 2d and 7a. Such misclassification is not present in the Panicle-SEG as shown in Fig. 7c. However, a larger number of background pixels surrounding the panicles are misclassified as panicles by the Panicle-SEG algorithm, which leads to a relatively high false positive rate in Panicle-SEG. The CNN of Panicle-SEG was trained on patches of 32 × 32 pixels thus some panicle pixels remain undetected in rectangular region because of this patch based training. This phenomenon will also increase false negative rate in Panicle-SEG and it gets worse in 6m low resolution images resulting low      Table 3.

Conclusions
The rice panicle segmentation in UAV images with unsupervised Bayesian learning has been studied in this paper. The unsupervised learning approach does not require a training phase, which makes it extremely useful for dealing with images taken under diverse conditions and at different UAV altitudes. Each pixel in the UAV image was modeled by using the multivariate GMM, and the model parameters of different categories were iteratively learned from the UAV data by using MCMC with Gibbs sampling. Experimental results demonstrated that the proposed algorithm can detect panicle pixels in UAV images with very high accuracy, and it outperforms existing supervised learning approach such as panicle-SEG.
To the best of the authors knowledge, there does not exist any unsupervised method for panicle segmentation in the literature. For future works, the results from this paper will be leveraged to estimate the number of rice panicles in a unit area. The results can be used to predict the rice yields for a given field by building new statistical models linking yields with panicle counts in UAV images. In addition, in this paper each pixel is assumed to be independent from neighboring pixels but in practice neighboring pixels are dependent on each other. It is expected that the performance can be further improved by considering spatial dependence among the pixels.

Experiment setup and data collection
The field experiments were conducted in 2017 and 2018 at the Super Rice achievement Transformative Base (SRTB) (E 123 • 55 ′ 85 ′′ , N 41 • 81 ′ 63 ′′ ) of the Shenyang Agricultural University (SYAU) in northeastern China. Shenyang has a temperate semi-humid continental climate, where annual mean temperatures range between 6.2 and 9.7 • C and rainfalls range between 600 and 800 mm. Both experiments were performed during middle heading stage of rice using a randomized complete block design with 7 types of nitrogen treatments (N1-N7). The seven nitrogen fertilizers were: null N (0 kg/ha), low N (150 kg/ha), moderate N (240 kg/ha), high N (330 kg/ ha), organic fertilizer substitution 10% , organic fertilizer substitution 20% , and organic fertilizer substitution 30% . Each nitrogen treatment has three replicates (R1-R3), which result in a total of 21 plots, as shown in Fig. 9a. Each plot has an area of 30 m 2 (4.2 m × 7.61 m), separated by dirt paths. The rice cultivar was Shennong 9816.
Images were acquired during middle heading stage of rice on August 21, 2017, and September 2, 2018,

Problem formulation with Bayesian mixture model
Assume each UAV image contains n pixels. The i-th pixel in an image can be represented as a p-dimension vector as where a T represents matrix transpose. For a regular optical camera, we have p = 3 , with the three dimensions corresponding the the intensities of red, green, blue of the pixel. The UAV image can thus be represented as the collection of the n pixels as X = [x 1 , x 2 , . . . , x n ].
Each pixel can be classified into one of k categories, such as panicles, leaves, dirt, water, etc. Define a sequence of independent latent variable z i ∈ {1, 2, . . . , k} , for i = 1, 2, . . . , n . The latent variables are used to indicate the classification result, that is, z i = j means that the i-th pixel belongs to the j-th category, for j = 1, . . . , k . Define z = [z 1 , z 2 , . . . , z n ] T . It is assumed that the latent (1) variables follow a multinomial distribution, with the probability mass function (PMF) of z i represented as where q j is the prior probability of i-th pixel belonging to the j-th category.
The objective of the unsupervised classifier is to identify the value of z i , for i = 1, . . . , n , based on the UAV image data X . The optimum classifier that can minimize the classification error is the maximum a posterior probability (MAP) classifier, which maximizes the posterior probability of z i as where ẑ i is the classification results, and Pr(z i |X) is the posterior probability of z i given the UAV data X . It is in general difficult, if not impossible, to directly calculate the posterior probability Pr(z i = j|X) . We propose to iteratively learn the posterior probability and corresponding probability distributions by using Bayesian mixture model and Markov chain Monte-Carlo.
A multi-modal Bayesian mixture model is used to represent the probability distributions of the intensities of pixels in the UAV image, with each component in the mixture model corresponding to one possible category. The probability density function (pdf ) of the i-th pixel can be represented as where f (x i |z i = j, θ j ) is the likelihood function of x i given that the i-th pixel is in the j-th category, and θ j is the corresponding distribution parameters of the j-th category. In Bayesian inference, θ j is assumed to be unknown and random, with a prior distribution π(θ j ).
The multivariate Gaussian mixture model (GMM) is adopted in this paper, where the likelihood function is assumed to follow a Gaussian distribution with mean vector µ j ∈ R p×1 and covariance matrix −1 j ∈ R p×p as (2) π(z i = j) = q j , where the inverse of the covariance matrix, j is the precision matrix. Using precision matrix instead of the covariance matrix can reduce the number of matrix inversions in the learning process. The corresponding distribution parameters are thus θ j = {µ j , � j } . Under the Bayesian setting, the mean vector µ j and precision matrix j are unknown and random. The Bayesian posterior probability can then be calculated as The calculation of the posterior probability requires multi-level integration with respect to the multi-dimensional parameter θ j and q j , which are usually difficult to carry out either analytically or numerically. We propose to solve this problem by employing unsupervised Bayesian learning with Gibbs sampling [19,20], and details are given in the next section.

Unsupervised Bayesian learning with Gibbs sampling
In this section, an unsupervised Bayesian learning method with the Gaussian mixture model (GMM) is used to classify the pixels in the UAV images into one of several categories, such as panicles, leaves, dirt, water, etc. The classification is performed by analyzing and identifying the statistical properties of the pixels belonging to different categories, without the need of a training phase.
As in the problem formulated in (4) and (7), the classification requires the knowledge of the posteriori probability. MCMC with Gibbs sampling can obtain a numerical approximation of Pr(z i = j|x i ) by iteratively taking samples from the joint distribution For a given X , if T samples are drawn from the joint distribution and, and the samples are denoted as

Initialization
In order to start the iterative sampling process, the values of the unknown variables and parameters need to be initialized. The values obtained using results from k-means clustering [22,23] are used as initial values. With the k-means algorithm, the pixels {x i } n i=1 are classified into k categories. Consider the set of pixels that correspond to the j-th category as S (0) Then the vector q (0) is initialized as Define a matrix X j = {x i } i∈S j , which contains all pixels x i labeled as z i = j.
The unknown parameters θ j can then be estimated from X j by using maximum likelihood estimation. Under GMM, the unknown parameters are θ j = {µ j , � j } , and they can be initialized as

Gibbs sampling with GMM
Gibbs sampling is used to iteratively take samples from the joint distribution f (z, {θ j } k j=1 , q|X) . The results are then used to estimate the posteriori probability as in (8).
With GMM, the unknown model parameters are θ j = {µ j , � j } for j = 1, 2, . . . , k . Under the Bayesian setting, both µ j and j are assumed to be unknown and random, and their values will be learned from the data. The prior for the mean vector µ j is assumed to be Gaussian distributed with mean vector τ j and precision matrix j as The parameters τ j and j will be iteratively updated during the Gibbs sampling process.
The non-informative prior [26] for precision matrix j is taken as During the iterative Gibbs sampling process, the samples of different variables at each step are drawn based (9) (13) π( j ) ∝ | j | (p+1)/2 on their respective posterior distributions, conditional on current states of all other variables. Thus the implementation of Gibbs sampling requires the knowledge of full conditional posterior distribution of all parameters of interests which include z, q, {µ j } k j=1 , and { j } k j=1 . The full conditional posterior distributions of all parameters are given as follows. Detailed derivations of (14)-(17) are given in Appendix.
Posterior distribution of q Let n j denote the number of pixels belonging to the j-th category, then where n = [n 1 , n 2 , . . . , n k ] T .

Posterior distribution of j (non-informative Prior)
where S j = i∈S j (x i − µ j )(x i − µ j ) T , S j = {i : z i = j} , and W p (S −1 j , n j ) is Wishart distribution with n j degrees-of-freedom.