Leaf to panicle ratio (LPR): a new physiological trait for rice plant architecture based on deep learning


 Background: Identification and characterization of new traits with a sound physiological foundation for crop breeding and management is one of the primary objectives for crop physiology. New technological advances in high throughput phenotyping have strengthened the power of physiological breeding. Methods for data mining of the big data acquired by various phenotyping platforms are developed, among which deep learning is used in image data analysis to explore spatial and temporal information concerning crop growth and development. However, method development is still necessary to enable simultaneous extraction of both leaf and panicle data from the complex field backgrounds, as required by the breeder for the adoption of physiological strategies to balance source and sink for yield improvement.Results: We applied a deep learning approach to accurately extract leaf and panicle data and subsequently developed the procedure of GvCrop to calculate the leaf to panicle ratio (LPR) of rice populations during grain filling. Images of the training data set were captured in the field experiments, with large variations in camera shooting angle, the elevation angle and the azimuth angle of the sun, rice genotype, and plant phenological stages. Accurately labeled by manually annotating all the panicle and leaf regions, the resulting dataset were used to train FPN-Mask models, consisting of a backbone network and a task-specific sub-network. The model with the highest accuracy is then selected to study the variations in LPR among 192 rice germplasms and among agronomical practices. Despite the challenging field conditions, FPN-Mask models achieved a high detection accuracy, with Pixel Accuracy being 0.99 for panicles and 0.98 for leaves. The calculated LPRs showed large spatial and temporal variations as well as genotypic differences. Conclusion: Deep learning techniques can achieve high accuracy in simultaneously detecting panicle and leaf data from complex rice field images. The proposed FPN-Mask model is applicable to detect and quantify crop performance under field conditions. The proposed trait of LPR should provide a high throughput protocol for breeders to select superior rice cultivars as well as for agronomists to precisely manage field crops that have a good balance of source and sink.


Background 3
Sustainable improvement in crop production is crucial for supporting the demand from an increasing global population, particularly considering that there are 821 M people who lack sufficient food to support their daily lives [1]. Recent technological advances in genome biology like next generation sequencing, genome editing and genomic selection have paved the way for crop breeders to identify, characterize, transfer, or modify the genes responsible for grain yield or quality traits in a rapid and precise way [2]. However, there is a huge gap between the fundamental plant sciences and the applied science of crop breeding, as reflected by the limited understanding of the link between genotype and phenotype. Crop physiology is a key interface between the genome and the plant phenotype, and thus is dispensable to hasten crop improvement [3]. Accordingly, physiological breeding, a methodology for selection of physiological traits such as canopy temperature, carbon isotope discrimination, and stomatal conductance, was proposed This approach has heightened impacts over conventional breeding such as in water stressed Australian environments and in heat and drought stressed conditions of the International Wheat Improvement Network [4].
Chemically, cereal grain yield consists of photosynthetic assimilates first produced in the leaf source organs which are translocated to the sink organ of grain. Therefore, source and sink relations, the core concept of crop physiology, is the critical factor dominating crop yield formation. Improving the source activity of leaf photosynthesis to harness light irradiation more efficiently is one of the major targets of crop breeding. In rice, ideotypes have long been pursued by breeders, resulting in several successfully implemented theories such as the New Plant Type, Super Rice, and Ideal Plant Architecture [5][6][7]. One common future shared by these new plant types is the emphasis on leaf erectness, especially the top three leaves, which is supposed to be essential for improving source activity. However, some of the main cultivars with this ideotype had problems of incomplete filling of inferior grains, especially for those with large numbers of grains, indicating the importance of optimization of the source-sink ratio [8,9].
In addition to storing photosynthetic assimilates from leaves, sink organs like glumes and awns have photosynthetic activity. Cumulative evidence favors the sizable contribution of spike or panicle to grain filling in terms of providing carbohydrates as well as nitrogen (N), magnesium, and zinc [10,11]. 4 In wheat and barley, contribution of spikes to grain filling has a range of 10% to 76% [12]. In rice, gross photosynthetic rate of the panicle is 30% of that of the flag leaf, and it was estimated that panicle photosynthesis contributed 20% to 30% of the dry matter in grain [13]. Thus light interception of the ear or panicle should be integrated into the breeding programmes aiming for source-sink balance.
Technical advances in highthroughput field phenotyping on a breeding scale in realistic field environments have strengthened the power of physiological breeding [4]. Concurrently, methods for data mining of the big data acquired by various phenotyping platforms are developed. Among them, deep learning has been widely used in image data analysis to explore spatial and temporal information concerning crop growth and development [14]. Leaf area and number indicate the photosynthetic capacity of the crop canopy, and the precise segmentation and counting of leaves has been one of the objectives of image processing. Studies have resulted in robust methodology of deep learning for quantifying leaf number from 2D images [15] and 3D images [16][17][18], providing effective tools for growth estimation and yield prediction of crop plants. Spike (wheat) or panicle (rice) number per square meter is the key component of cereal grain yield. Numerous attempts have been made to segment and count this reproductive organ accurately in rice [19][20][21] and wheat [22][23][24]. Collectively, these robust, low-cost and efficient methods to assess the number of economic organs are of high relevance for phenotyping efforts towards increases in cereal grain yield. However, to our knowledge, method development is still necessary to simultaneously extract both leaf and panicle from the background of a field crop population, as required by the breeder to adopt physiological strategies to balance source and sink.
In this study, we applied a deep learning approach to accurately extract leaf and panicle image data and subsequently calculate the leaf to panicle ratio (LPR) of rice populations during grain filling stage.
Of note, the LPR is a proximate estimation of the distribution of light interception between leaf and panicle, since that the light captured by the camera is the sunlight reflected by the leaf or panicle.
Images of the training data set were captured in the field experiments, with large variations in camera shooting angle, the elevation angle and the azimuth angle of the sun, rice genotype, and 5 plant phenological stages. Accurately labeled by manually annotating all the panicle and leaf regions, the resulting dataset were used to train FPN-Mask models, consisting of a backbone network and a task-specific sub-network. The model with the highest accuracy was then selected to study the variations in LPR among 192 rice germplasms and among agronomical practices. Our aim wa to provide a high throughput protocol for breeders to select superior rice cultivars as well as for agronomists to precisely manage field crops that have a good balance of source and sink.

Methods
We explored an end-to-end pixel-wise segmentation method to label each pixel as panicle, leaf or background automatically under natural field conditions, and then generated the leaf to panicle ratio (LPR) by division of the number of pixels assigned for each class in each field image. Figure 1 shows the overall work-flow of this method, including two parts. Part 1 is the offline training workflow, which builds a deep learning network called FPN-Mask to segment panicle and leaf from field RGB images.
Part 2 is the procedure of GvCrop to develop a software system for calculating LPR.

Experimental setup
In 2018, plots of ongoing field experiments at Danyang (31°54′31″N, 119°28′21″E), Jiangsu Province, China were selected to take pictures for the training data set. Of note, these experiments were not specially designed for a phenotyping study. In brief, the plant materials of these experiments were highly diverse in genotypic variation, containing seven main cultivars of Jiangsu and 195 mutants with contrasting agronomical traits as reported by Abacar et al [25]. Further, the seven cultivars had two sowing dates, resulting in obviously different phenotypes for a certain genotype. Thus the diversity in plant architecture and canopy structure of the tested materials can provide as many kinds of phenotypes as possible for image analysis.
In 2019, three experiments were conducted to test and apply the proposed FCN-Mask model. (1) Genotypic variations in LPR. A total of 192 mutants were investigated. The plot area was 2.4 m × 1.4 m with a row spacing of 30cm and plant spacing 20cm. Nitrogen, phosphate (P 2 O 5 ) and potassium (K 2 O) fertilizers were applied at a rate of 240 kg ha -1 , 120 kg ha -1 and 192 kg ha -1 , respectively, and were equally separated into basic fertilizers (before transplanting) and topdressing (at 4th leaf age in 6 reverse order). (2) N fertilization effects on LPR. A japonica rice cultivar, Wuyunjing 30, was selected for field experiment with a randomized complete-block design. It had three replications and a plot area of 2.4m ×1.4m. Total N fertilizer was 240 kg ha -1 N, and two N fertilization modes with different base/topdressing ratios were applied: (1) N5-5: base/topdressing, 5/5; (2) N10-0: base/topdressing, uniconazole, 25mM 2, 4-epibrassinolide, 25mM brassinolide as well as the control, water, were made up in distilled water with 0.5% TWEEN-20. One cultivar, Ningjing 8, from the N treatment was used as material. Spraying was conducted at the rate of 500ml m -2 after sunset, with three times starting at booting stage on August 22 and with a 2-day interval.
In addition, a dynamic canopy light interception simulating device (DCLISD) was designed for capturing images from the sun's position installed on a supporting track. The bottom part consists of 4 pillars with wheels and the upper part is comprised of two arches consolidated by two steel pipes, and a moveable rail for mounting the RGB camera ( Fig. 2 A). The sun's trajectory is simulated by two angles, the elevation angle and the azimuth angle ( Fig. 2 B, C), which is calculated according to the latitude, longitude, as well as the growth periods at the experimental site.

Image acquisition
Images of the training data set were captured in the field experiments in 2018, reflecting the large variations in camera shooting angle, the elevation angle and the azimuth angle of the sun, rice genotype, and plant phenological stages (Fig. 3). Images for validation and application of the proposed model were acquired in 2019. For the three treatments of genotypes, N fertilization, and spraying, an angle of 40° was selected for the tripod. The height of the camera (Canon EOS 750D, 24.2 megapixels) was 167.1 cm, the average height of a Chinese adult, and the distance between the central point of the target area and vertical projection of the camera on the ground was 90cm. The camera settings were as follows: focal length, 18 mm; aperture, f/11; ISO, automatic; and exposure time, automatic. In the experiment with DCLISD, the camera model was SONY DSC-QX100, with settings were as follows: focal length, 10 mm; aperture, f, automatic; ISO, automatic; and exposure time, 1/200s.

Dataset preparation
Training dataset: Considering the camera angle, solar angle, panicle type and growth stage (Fig.3), we selected 360 representative images as the training dataset (Table 1) Testing datasetDepending on the conditions for data acquisition, we divide all collected images into 3 groups based on the growth stages. From each group, we randomly selected 30 testing images and finally selected 90 images as testing dataset ( Table 2). As to the testing dataset, the captured field images include many other objects including tracks, chains, neighbor plots, color-chart and sky which were not required in our approach. So, a significant region from the plot was selected as region of interest (ROI) and cropped manually for all selected testing images.

Network structure
In this study, we proposed a deep learning-based method for rice panicle segmentation, called FPN-Mask. The method consisted of a backbone network and a task-specific subnetwork. The Feature Pyramid Network (FPN) [27] was selected as backbone network for extracting features over an entire input data, originally designed for object detection and has the advantages of extracting multi-level feature pyramid from an input image with a single scale. The subnetwork is referenced of the Unified 8 Perceptual Parsing Network [28], which performs semantic segmentation based on the output of the backbone network (Fig. 4).
Backbone network for feature extraction: The FPN [27] is a standard feature extractor with a topdown architecture and lateral connections, the top-down architecture is based on Residual networks (ResNet) [29], which consists of 4 stages, each stage is denoted as C 2 , C 3 , C 4 and C 5 , respectively.
We denoted the last feature map of each stage in ResNet as {C 2 , C 3 , C 4 , C 5 }, respectively. In our backbone network, we removed the global max pooling layer before C 2 , because, it will drop out semantic information, therefore, the down-sample rates of each stage {C 2 , C 3 , C 4 , C 5 } from {4,8,16,32} to {1,2,4,8}. And down-sampling rates of feature maps output by FPN {P 2 , P 3 , P 4 , P 5 } are {1, 2, 4, 8}, respectively, namely, the size of P 2 is same as the original image 256×256, P 3 size is 128×128, P 4 size is 64×64 and P 5 size is 32×32. The number of feature maps output per stage is equal to 32.
Subnetwork for semantic segmentation: the subnetwork is based on the multi-level features extracted from the backbone network introduced above. Each level of the features will be fused together as an input feature map for semantic segmentation, which has been proved to outperform semantic segmentation compared to using only the highest resolution feature map [28,30]. To up-sample lowlevel feature maps {p 3 , p 4 , p 5 } to get the same size feature as the original image, we directly adopt the bilinear interpolation layer instead of the time-consuming deconvolution layer, and attach a convolution layer followed by each interpolation layer to refine the interpolation result. After upsampling, different levels of features are concatenated as the final semantic feature. The concatenated multi-level features are convoluted by a convolution layer to refine the result and a convolution layer to reduce channel dimensions (these two convolution layers are all attached to a batch normal layer and a relu layer). Finally, we get a 3-channel semantic segmentation result.

Loss function for semantic segmentation
The cross-entropy loss function is a standard method for classification [31]. In practical application, due to the uneven number of pixels in different categories, the loss calculated by cross-entropy loss function cannot reflect the real situation [32], so our paper uses the focal loss, which is specifically designed to solve the imbalance problem [32], and the loss function focuses on the more difficult classification location by changing the weight of different categories. For specific descriptions refer to [32].

Training
We experiment with ResNet-18 as the FPN backbone. All conv layers are initialized as in He et al [33].
Batch layers are simply initialized with bias and weight . The mini-batch size is 24, Adam optimization method, and training for 7 days with the base learning rate of 1e-3. For improving the robustness of our model and avoiding overfitting, we test the model performance every day, and add bad performance samples iteratively, adding 40 samples every day. All the experiments in this article were conducted on a high-performance computer with Intel 3.50 GHz processor and 128 GB of memory. Two NVIDIA 1080 GeForce graphics processing unit (GPU) has a 12GB memory used to accelerate the training of our model.
During the training, we tested model performance with all collected images every day, and selected the images that did not perform very well as supplementary training samples to make sure training samples covered all cases of 6GB images. There were 60 field images which generated 302 patches which were added as supplementary training samples. The good or bad performance standards were determined by ourselves through observation. The training period continued until the testing performance of all images met the accuracy requirements visually, and the loss function curve was smooth without fluctuations.

PostProcess
Although deep network has a strong ability to process semantic segmentation problems, it is impossible to achieve 100% accuracy depending only on auto segmentation methods. So, developing a tool for manually modifying the segmentation results is very necessary. To solve that problem, we developed software called GvCrop, which not only integrates the pixel-wise segmentation method ( Fig.1, (6)), but also integrates the ability to modify the segment results by human interaction (Fig.1,   (7)). But pixel-level labelling of the wrong location is time consuming, Thus, the idea of processing the image regions with homogeneous characteristics instead of single pixels can help us to accelerate the manual label speed (Fig.1, (7)). So, according to the image's color space and boundary cues, we used the gSLICr [34] algorithm to group pixels into perceptually homogeneous regions, which is the Simple Linear Iterative Clustering (SLIC) [35] implemented on GPU using the NVIDIA CUDA framework, 83× faster than the SLIC CPU implementation. The gSLICr has three parameters: S, C and N. S is super pixel size, C is compact coefficient degree, N is the number of iterations. In our paper, S is set to 15, C is set to 0.2, and N is set to 50. After super pixel segmentation, users can modify auto-segmentation results based on super pixels.

Accuracy assessment
To quantify the performance of our method, we choose Pixel Accuracy (P.A.) (1) and mean IoU (mIoU) (2) as the metrics to evaluate our semantic segmentation. These two metrics are standard metrics to quantify semantic segmentation tasks [28]. Where n is the number of class, p ij is the number of pixels of class i predicted to belong to class j, so the p ii is the true positive, the p ij is the false negative, the p jj is the false positive, the p jj is the true negative.

Calculation of leaf-panicle ratio (LPR)
Based on the extraction and identification of leaf and panicle, software, CvCrop, was developed to calculate LPR, based on the quantity of pixels contained in the leaf and panicle regions in an image.
The formula of LPR was as follows: LPR = L / P, where L and P is the total quantity of the pixels of leaf and panicle in the picture, respectively.

1.Accuracy verification
The semantic segmentation results for the 90 field images were assessed both visually (Fig.5) and quantitatively (Table 3). Figure 5 show some examples of semantic segmentation results. Visual assessment suggested that the tested results and real data were very similar in different conditions. However, we still found some subtle segmentation errors: 1) The background and shadow pixels of leaves were very similar visually, resulting in some shadow pixels misclassified to background; 2) The segmentation was a little poor at the borders of plant parts. Pixels at the junction between leaf and panicle were misclassified into error categories; 3) some scattered small patches on the leaves were misclassified as panicle.  Table 4, which showed the P.A. of each single class, the highest accuracy was on the classification of panicle (mean 0.99), and then leaf pixels (mean 0.982), the worse was background pixels (mean 0.814).
The efficiency of a training model can also be described by observing the training loss while the model is training. Figure 6 shows that the loss metric was decreased quickly over subsequent epochs of training, although the loss was initially high. To avoid overfitting, we iteratively added samples to training dataset ( Fig. 1(5)), so the curve was not decreased smoothly.

Verification and application of the FPN-Mask model
The most important output of this FPN-Mask model is to estimate the distribution of light interception between leaf and panicle. Using CvCrop, we calculated the LPR values of the crop stand for various field experiments and detected large spatial and temporal variations as well as genotypic differences.
Overall, these results suggest the feasibility of the model in detecting and quantifying crop performance under field conditions.

(1) Daily changes of LPR
LPR showed an obvious pattern of daily change, being higher after sunrise and before sunset but lower at noon (Fig. 7). The larger values of LPR in the morning or afternoon can be explained by the shading of leaves when the solar angle of incidence is lower.

(2) Genotypic variations in LPR
Large genotypic differences in LPR were detected among the 192 mutants, ranging from 1.32 to 7.44 (Supplemental Table 1). As shown in Fig. 8, the six panicle types showed marked differences in LPR. Generally, cultivars with compact panicle (CP) had the highest value, while those with loose panicle and awns (LPA) had the lowest. The former can be associated with the high density of spikelets on the panicle that caused smaller panicle area. The latter can be explained by the large panicle area due to sparse spikelets. Temporal variations of LPR were revealed showing a diminishing trend from the early stage to the late stage of grain filling. This means the relative area of leaf was reduced, as it is partly due to the increased area of panicle that changes its shape from erect and dense at early stage to loose and drooping at late stage.

(3) N effect on LPR
N fertilization mode exerted substantial influence on LPR. On average, N topdressing of the N5-5 increased LPR by 0.80 and 0.82 at middle and late stage, respectively (Fig. 9) compared with N10-0.
The promoting effect of N topdressing is associated with the elongation of flag leaf (Fig. 9). Similarly, LPR decreased gradually as grain filling progressed for both N treatments.

(4) Modification of LPR by plant growth regulators
Plant growth regulators brassinolide, brassinazole, gibberellin, and uniconazole obviously reshaped plant architecture (Fig. 10). The effects of these regulators agreed well with their well-documented phenotypes, for example, the drooping flag leaf caused by brassinolide spraying [36,37] and the elongated upper internode caused by gibberellin [38]. More importantly, LPR can be either upregulated or down-regulated by these regulators, depending on growth stages. As shown in Fig. 10, LPR at grain filling stage was increased by brassinazole and uniconazole, whereas reduced by bassinolide and gibberellin. In addition, the degree of increase or decrease depended on regulators, with uniconazole having the most significant influence.

Weakness of the methodology and improvement
In this study, we built a robustn and high accuracy deep learning network, FPN-Mask, which could easily segment panicle, leaf and background at a pixel level from a field RGB image. For convenience, we developed the GvCrop software, which not only included some basic image processing functions such as I/O, cut, rotation, zoom in/out, translation, but also integrated the above mentioned auto semantic segmentation method, manual modification of auto-segmentation result function and exportation of LPR report function.
The work represents a proof of the concept that the deep learning can enable accurate organ level (panicle, leaf) pixel-wise segmentation of field rice images. However, there are several challenges which should be resolved in future work.
First, segmentation accuracy was quite high for these 6GB datasets, but if conditions or objects were not included in the training dataset, it will not perform as well as the 6GB dataset. In other words the robustness of a deep learning model is partially dependent on the diversity of the training dataset. In future, we will seek to improve the robustness of FPN-Mask through collecting a wider range of field data. Second, the shadow on leaves and background exhibited very similar patterns visually. It is difficult to distinguish red, green and blue in the visible band. The junctions between different parts of plants are also quite difficult to distinguish. This explained most of the low precision for the semantic segmentation, and these type of errors occurred in every image in the testing dataset. Other research also met the same problem [18].
Third, perspective photography can cause deformation of objects projected into 2D images, which in turn affects the accuracy of LPR. Recently, light detection and ranging (lidar) have shown its advantages for showing high resolution 3-dimantion(3D) structure information from terrain and vegetation [39][40][41] and the advantage for segmentation of the organ of plants [16,42,43]. Shi et al [18] also showed that a multi-view 3D system can avoid these errors. In the future, we will combine the height information provided by lidar to the texture and color information provided by the RGB image to distinguish object categories more effectively and accurately.

Significance of LPR for crop breeding and management
To some degree, the essence of crop sciences is the knowledge of selection (by breeders) or regulation (by agronomists) of agronomical traits. Traditionally, crop scientists heavily depend on visual inspections of crops in the field as well as their evaluation of target traits based on their experience and expertise of the crop, which is labor intensive, time consuming, relatively subjective, and prone to errors [14,44]. In addition, the target traits are mainly morphological traits including leaf senescence, plant height, tillering capacity, panicle or spike size, and growth periods, while fewer physiological traits are monitored and analyzed. With the development of plant phenotyping techniques, image-based methods have been successfully applied to obtain phenotypic data related to crop morphology and physiology [16]. In wheat, high throughput methods for a large array of traits are available for the breeders, including canopy temperature, normalized difference vegetation index (NVDI), and chlorophyll fluorescence [45]. However, the capacity for undertaking precision phenotyping of physiological traits is lagging far behind the requirement of crop sciences.
In this study, we propose a new physiological trait, LPR, based on deep learning. Physiologically, LPR indicates the distribution of light interception within the canopy between the source organ leaf and the sink organ panicle. Historically, breeders and agronomists focused on improvement in source activity, with traits of the leaf such as photosynthesis, erectness, and stay-green as the main targets.
On the other hand, the role of panicle was largely overlooked, with less attention except for grain number per panicle or erectness of the panicle [7]. The significance of the panicle has been increasingly recognized in terms of its substantial contribution of carbohydrates, nitrogen, and minerals to grain filling. Therefore, light interception of panicles is dispensable for yield formation, and there should be a suitable LPR value for a crop stands growing in a given ecological condition. Further, LPR was sensitive to foliar application of plant growth regulators like BR and GA, and can be 15 increased by brassinazole and uniconazole, or reduced by bassinolide and gibberellin. Thus it is possible to develop methods for targeted regulation of crop stands with undesirable LPR by chemical intervention. In addition, LPR can be easily measured by digital camera and even a smartphone camera (data not shown), being a kind of high throughput and user-friendly phenotyping that can be widely used in crop sciences. Nevertheless, more work is needed when applying LPR in crop breeding or management, in particular elucidating the inherent link between LPR and yield, and proposing a set of suitable LPR values for different environments or plant types.

Conclusion
The work represents a proof of the concept that the deep learning can achieve high accuracy in simultaneously detecting panicle and leaf data from complex rice field images. The FPN-Mask model is applicable to detect and quantify crop performance under field conditions. The proposed trait of LPR should provide a high throughput protocol for breeders to select superior rice cultivars as well as for agronomists to precisely manage field crops that have a good balance of source and sink.
However, there are several challenges which should be resolved in future work, in particular combining plant height by lidar with the texture and color information from RGB image to distinguish object categories more effectively and accurately.

Declarations
Ethics approval and consent to participate.
Not applicable.

Consent for publication
Not applicable.

Availability of data and materials
All data analyzed during this study are presented in this published article.

Competing interest
The authors declare that they have no conflicts of interest.

Funding
The research was supported by the National Key R&D Program, Ministry of Science and Technology,

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.

Supplement.pdf
Equations.pdf