DeepCob: precise and high-throughput analysis of maize cob geometry using deep learning with an application in genebank phenomics

Kienbaum, Lydia; Correa Abondano, Miguel; Blas, Raul; Schmid, Karl

doi:10.1186/s13007-021-00787-6

Methodology
Open access
Published: 21 August 2021

DeepCob: precise and high-throughput analysis of maize cob geometry using deep learning with an application in genebank phenomics

Plant Methods volume 17, Article number: 91 (2021) Cite this article

4115 Accesses
8 Citations
6 Altmetric
Metrics details

Abstract

Background

Maize cobs are an important component of crop yield that exhibit a high diversity in size, shape and color in native landraces and modern varieties. Various phenotyping approaches were developed to measure maize cob parameters in a high throughput fashion. More recently, deep learning methods like convolutional neural networks (CNNs) became available and were shown to be highly useful for high-throughput plant phenotyping. We aimed at comparing classical image segmentation with deep learning methods for maize cob image segmentation and phenotyping using a large image dataset of native maize landrace diversity from Peru.

Results

Comparison of three image analysis methods showed that a Mask R-CNN trained on a diverse set of maize cob images was highly superior to classical image analysis using the Felzenszwalb-Huttenlocher algorithm and a Window-based CNN due to its robustness to image quality and object segmentation accuracy ($r=0.99$). We integrated Mask R-CNN into a high-throughput pipeline to segment both maize cobs and rulers in images and perform an automated quantitative analysis of eight phenotypic traits, including diameter, length, ellipticity, asymmetry, aspect ratio and average values of red, green and blue color channels for cob color. Statistical analysis identified key training parameters for efficient iterative model updating. We also show that a small number of 10–20 images is sufficient to update the initial Mask R-CNN model to process new types of cob images. To demonstrate an application of the pipeline we analyzed phenotypic variation in 19,867 maize cobs extracted from 3449 images of 2484 accessions from the maize genebank of Peru to identify phenotypically homogeneous and heterogeneous genebank accessions using multivariate clustering.

Conclusions

Single Mask R-CNN model and associated analysis pipeline are widely applicable tools for maize cob phenotyping in contexts like genebank phenomics or plant breeding.

Background

High-throughput precision phenotyping of plant traits is rapidly becoming an integral part of plant research, plant breeding, and crop production [4]. This development complements the rapid advances in genomic methods that, when combined with phenotyping, enable rapid, accurate, and efficient analysis of plant traits and the interaction of plants with their environment [65]. However, for many traits of interest, plant phenotyping is still labor intensive or technically challenging. Such a bottleneck in phenotyping [17] limits progress in understanding the relationship between genotype and phenotype, which is a problem for plant breeding [24]. The phenotyping bottleneck is being addressed by phenomics platforms that integrate high-throughput automated phenotyping with analysis software to obtain accurate measurements of phenotypic traits [28, 46]. Existing phenomics platforms cover multiple spatial and temporal scales and incorporate technologies such as RGB image analysis, near-infrared spectroscopy (NIRS), or NMR spectroscopy [31, 32, 60]. The rapid and large-scale generation of diverse phenotypic data requires automated analysis to convert the output of phenotyping platforms into meaningful information such as measures of biological quantities [11, 22]. Thus, high-throughput pipelines with accurate computational analysis will realize the potential of plant phenomics by overcoming the phenotyping bottleneck.

A widely used method for plant phenotyping is image segmentation and shape analysis using geometric morphometrics [70]. Images are captured in standardized environments and then analyzed either manually or automatically using image annotation methods to segment images and label objects. The key challenge in automated image analysis is the detection and segmentation of relevant objects. Traditionally, object detection in computer vision (CV) has been performed using multivariate algorithms that detect edges, for example. Most existing pipelines using classical image analysis in plant phenotyping are species-dependent and assume homogeneous plant material and standardized images [40, 45, 68]. Another disadvantage of classical image analysis methods is low accuracy and specificity when image quality is low or background noise is present. Therefore, the optimal parameters for image segmentation often need to be fine-tuned manually through experimentation. In recent years, machine learning approaches have revolutionized many areas of CV such as object recognition [37] and are superior to classical CV methods in many applications [48]. The success of machine learning in image analysis can be attributed to the evolution of neural networks from simple architectures to advanced feature-extracting convolutional neural networks (CNNs) [64]. The complexity of CNNs could be exploited because deep learning algorithms offered new and improved training approaches for these more complex method networks. Another advantage of machine learning methods is their robustness to variable image backgrounds and image qualities when model training is based on a sufficiently diverse set of training images. Through their capability to learn from small training datasets, these deep learning techniques have a huge potential to carry out few-shot learning in agriculture, thereby saving work effort and costs in generating large real-world training datasets [5, 67]. Although CNN have been very successful in general image classification and segmentation, their application in plant phenotyping is still limited to a few species and features. Current applications include plant pathogen detection, organ and feature quantification, and phenological analysis [16, 31, 62].

Maize cobs can be described with few geometric shape and color parameters. Since the size and shape of maize cobs are important yield components with a high heritability and are correlated with total yield [43, 53], they are potentially useful traits for selection in breeding programs. High throughput phenotyping approaches are also useful for characterizing native diversity of crop plants to facilitate their conservation or utilize them as genetic resources [41, 47]. Maize is an excellent example to demonstrate the usefulness of high throughput phenotyping because of its high genetic and phenotypic diversity, which originated since its domestication in South-Central Mexico about 9,000 years ago [27, 34, 42]. A high environmental variation within its cultivation range in combination with artificial selection by humans resulted in many phenotypically divergent landraces [8, 69]. Since maize is one of the most important crops worldwide, large collections of its native diversity were established in ex situ genebanks, whose genetic and phenotypic diversity are now being characterized [56]. This unique pool of genetic and phenotypic variation is threatened by genetic erosion [23, 49,50,51] and understanding its role in environmental and agronomic adaptation is essential to identify valuable genetic resources and develop targeted conservation strategies.

In the context of native maize diversity we demonstrate the usefulness of a CNN-based deep learning model implemented in a robust and widely applicable analysis pipeline for recognizing, semantic labeling and automated measurements of maize cobs in RGB images for large scale plant phenotyping. Highly variable traits like cob length, kernel color and number were used for classification of the native maize diversity of Peru ([52] and are useful for the characterization of maize genetic resources because cobs are easily stored and field collections can be analyzed at a later time point. We demonstrate the application of image segmentation to photographs of native maize diversity in Peru. So far, cob traits have been studied for small sets of Peruvian landraces, only such as cob diameter in 96 accessions of 12 Peruvian maize landraces [2], or cob diameter in 59 accessions of 9 highland landraces ([49, 50]. Here we use image analysis to obtain cob parameters from 2,484 accessions of the Peruvian maize genebank hosted at Universidad Nacional Agraria La Molina (UNALM) by automated image analysis. We also show that the DeepCob image analysis pipeline can be easily expanded to different image types of maize cobs such as segregating populations resulting from genetic crosses.

Results

Comparison of image segmentation methods

To address large-scale segmentation of maize cobs, we compared three different image analysis methods for their specificity and accuracy in detecting and segmenting both maize cobs and measurement rulers in RGB images. Correlations between true and derived values for cob length and diameter show that Mask R-CNN far outperformed the classical Felzenszwalb-Huttenlocher image segmentation algorithm and a window-based CNN (Window-CNN) (Fig. 1). For two sets of old (ImgOld) and new (ImgNew) maize cob images (see Materials and Methods), Mask R-CNN achieved correlations of 0.99 and 1.00, respectively, while correlation coefficients ranged from 0.14 to 0.93 with Felzenszwalb-Huttenlocher segmentation and from 0.03 to 0.42 with Window-CNN, respectively. Since Mask R-CNN was strongly superior in accuracy to the other two segmentation methods, we restricted all further analyses to this method only.

Parameter optimization of Mask R-CNN

We first describe parameter optimizations during training of the Mask R-CNN model based on the old (ImgOld) and new (ImgNew) maize cob image data from the Peruvian maize genebank. A total of 90 models were trained, differing by the parameters learning rate, total epochs, epochs.m, mask loss weight, monitor, minimask (see Material and Methods), using a small (200) and a large (1,000) set of randomly selected images as training data. The accuracy of Mask R-CNN detection depends strongly on model parameters, as $AP@$[0.5:0.95] values for all models ranged from 5.57 to 86.74 for 200 images and from 10.49 to 84.31 for 1,000 images for model training (Additional file 1: Table S1). Among all 90 models, M104 was the best model for maize cob and ruler segmentation with a score of 86.74, followed by models M101, M107, and M124 with scores of 86.56. All four models were trained with the small image dataset.

Given the high variation of the scores, we evaluated the contribution of each training parameter to this variation with an ANOVA (Table 1). There is an interaction effect between the size of the training set and the total number of epochs trained, as well as an effect of a minimask, which is often used as a resizing step of the object mask before fitting it to the deep learning model. The other training parameters learning rate, monitoring, epochs.m (mode to train only heads or all layers), and mask loss weight had no effect on the $AP@$[0.5:0.95] value. The lsmeans show that training without minimask leads to higher scores and more accurate object detection. Table 1 shows an interaction between the size of the training set and the total number of epochs. Model training with 200 images over 200 epochs was not significantly different from training over 50 epochs or from model training with 1,000 images over 200 epochs at $p<0.05$. With the same number of training epochs, we did not observe an advantage 1,000 over 200 training images. In contrast, model training over 15 epochs only resulted in lower AP@[0.5:0.95] values.

Table 1 Lsmeans of AP@[.5:.95] in the ANOVA analysis for Mask R-CNN model parameters minimask and the interaction of training set size x total number of epochs

Full size table

Loss behavior of Mask R-CNN during model training

Monitoring loss functions of model components (classes, masks, boxes) during model training identifies components that need further adjustments to achieve full optimization. Compared to the other components, mask loss contributed the highest proportion to all losses (Fig. 2), which indicates that the most challenging process in model training and optimization is segmentation by creating masks for cobs and rulers. The training run with the parameter combination generating the best model M104 in epoch 95 shows a decreasing training and validation loss during the first 100 epochs and a tendency for overfitting in additional epochs (Fig. 2A, B). This suggests that model training on the Peruvian maize images over 100 epochs is sufficient. Other parameter combinations like M109 (Fig. 2c) exhibit overfitting with a tenfold higher validation loss than M104. Instead of learning patterns, the model memorizes training data, which increases the validation loss and results in weak predictions for object detection and image segmentation.

Visualization of feature maps generated by Mask R-CNN

Although neural networks are considered a "black box" method, a feature map visualization of selected layers shows interpretable features of trained networks. In a feature map, high activations correspond to high feature recognition activity in that area, as shown in Fig. 3A for the best model M104. Over several successive CNN layers, the cob shape is increasingly well detected until, in the last layer (res4a) the feature map indicates a robust distinction between foreground with the cob and ruler objects and the background. High activations occur at the top of the cobs (Fig. 3A, res4g layer), which may contribute to localization. Because the cobs were oriented according to their lower (apical) end in the images, it may be more difficult for the model to detect the upper edges, which are variable in height. Overall, the feature maps show that the network learned specific features of the maize cob and the image background.

The Mask R-CNN detection process can be visualized by its main steps, which we demonstrate using the best model (Fig. 3B). The top 50 anchors are output by the Region Proposal Network (RPN) and the anchored boxes are then further refined. In the early stages of refinement, all boxes already contain a cob or ruler, but boxes containing the same image element have different lengths and widths. In later stages, the boxes are further reduced in size and refined around the cobs and rulers until, in the final stage, mask recognition provides accurate-fitting masks, bounding boxes, and class labels around each recognized cob and ruler.

The best Mask R-CNN model for detection and segmentation of both maize cobs and rulers is very robust to image quality and variation. This robustness is evident from a representative subset of ImgOld and ImgNew images that we did not use for training and show a high variation in image quality, backgrounds and diversity of maize cobs (Fig. 4). Both the identification of bounding boxes and object segmentation are highly accurate regardless of image variability. The only inaccuracies in the location of bounding boxes or masks occur at the bottom edge of cobs.

Maize model updating on additional image datasets

To extend the use of our model for images of maize cobs taken under different circumstances and in different environments (e.g., in the field), we investigated whether updating our maize model for new image types with additional image data included in the ImgCross and ImgDiv data sufficiently improves the segmentation accuracy of cob and ruler elements compared to a full training process starting again with the standard COCO model. We used the best maize model trained on ImgOld and ImgNew data (model M104, hereafter maize model), which is pre-trained only on the cob and ruler classes. In addition to updating to our maize model, we updated the COCO model with the same images. In this context, the COCO model serves as a validation, as it is a standard mask-R CNN model trained on the COCO image data [38], which contains 80 annotated object classes in 330 K images.

Overall, model updating using training images significantly improved the $AP@$[0.5:0.95] scores of the additional image datasets (Fig. 5), with scores differing between image sets, initial models, and training set sizes. With standard COCO model weights (Fig. 6a, c), $AP@$[0.5:0.95] scores were initially low, down to a value of 0, in which neither cobs nor rulers were detected. However, scores increased rapidly during up to 0.7 during the first 30 epochs. In contrast, with the pre-trained weights (Fig. 5b, d) of the maize model $AP@$[0.5:0.95] scores were already high during the first epochs and then rapidly improved to higher values than with the COCO model. Therefore, object segmentation using additional maize cob image data was significantly better with the pre-trained maize model from the beginning and throughout the model update.

Given the high variation in these scores, we determined the contribution of the three factors starting model, training set size and training data set to the observed variation in $AP@$[0.5:0.95] scores with an ANOVA. In this analysis, the interactions between dataset and starting model were significant. By accounting for the lsmeans of these significant interactions (Table 2), updating of the pre-trained maize model than of the COCO model was better in both data sets. With respect to traing set sizes, $AP@$[0.5:0.95] scores of maize model were essentially the same for different sizes and were always higher than of the COCO model. In summary, there is a clear advantage in updating a pre-trained maize model over the COCO model for cob segmentation with diverse maize cob image sets.

Table 2 Lsmeans of AP@[.5:.95] score of the significant interactions for model updating, dataset x starting model and starting model x training set size

Full size table

Descriptive data obtained from cob image segmentation

To demonstrate that the Mask R-CNN model is suitable for large-scale and accurate image analysis, we present the results of a descriptive analysis of 19,867 maize cobs that were identified and extracted from the complete set of images from the Peruvian maize genebank, i.e., the ImgOld and ImgNew data. Here, we focus on the question whether image analysis identifies genebank accessions which are highly heterogeneous with respect to cob traits by using measures of trait variation and multivariate clustering algorithms.

Our goal was to identify heterogeneous genebank accessions that either harbor a high level of genetic variation or are admixed because of co-cultivation of different landraces on farmers fields or mix-ups during genebank storage. We therefore analysed variation of cob parameters within images to identify genebank accessions with a high phenotypic diversity of cobs using two different multivariate analysis methods to test the robustness of the classification.

The first approach consisted of calculating a $Z$-score of each cob in an image as measure of deviation from the mean of the image (Within image $Z$-scores), clustering these scores with a PCA, followed by applying CLARA and determining the optimal number of clusters with the average silhouette method. The second approach consisted of calculating a centered and scaled standard deviation of cob parameters for each image, applying a PCA to the values of all images, clustering with $k$-means and determining the optimal cluster number with the gap statistic. With both approaches, the best-fitting numbers of clusters was $k=2$ with a clear separation between clusters and little overlap along the first principal component (Fig. 7). The distribution of trait values between the two groups shows that they differ mainly by the three RGB colors and cob length (in the $Z$-score analysis only) suggesting that cob color tends to more variable than most morphological traits within genebank accessions. Additional file 1: Figure S1 shows images of genebank accessions classified as homogeneous and variable, respectively.

Discussion

Our comparison of three image segmentation methods showed Mask R-CNN to be superior to the classic image analysis method Felzenszwalb-Huttenlocher segmentation and Window-CNN for maize cob detection and segmentation. Given the recent success of Mask R-CNN for image segmentation in medicine or robotics, its application for plant phenotyping is highly promising as demonstrated in strawberry fruit detection for harvesting robots [72], orange fruit detection [18], pomegranate tree detection [73], disease monitoring in wheat [59], and seed analysis in rice and soybean [30, 71]. Here we present another application of Mask R-CNN for maize cob instance segmentation and quantitative phenotyping in the context of genebank phenomics. In contrast to previous studies we performed a statistical analysis on the relative contribution of Mask R-CNN training parameters, and our application is based on more diverse and larger training image sets of 200 and 1,000 images. Finally, we propose a simple and rapid model updating scheme for applying the method on different maize cob image sets to make this method widely useful for cob phenotyping. The provided manuals offer a simple application and update of the deep learning model on custom maize cob datasets.

Advantages and limitations of the method for few-shot learning in agriculture

After optimizing various model parameters, the final Mask R-CNN model detected and segmented cobs and rulers very reliably with a very high $AP@\left[.5:.95\right]$ score of 87.7, enabling accurate and fast extraction of cob features. Since such scores have not been reported for existing pipelines for maize cob annotation because they are mainly used for deep learning, we compared them to other contexts of image analysis and plant phenotyping where these parameters are available. Our score is higher than the original Mask R-CNN implementation on COCO with Cityscapes images [55], possibly due to a much smaller number of classes (2 versus 80) in our dataset. Depending on the backend network, the score of the original implementation ranged between 26.6 and 37.1. The maize cob score is also greater than 57.5 in the test set for pomegranate tree detection [73] and comparable to a score of 89.85 for strawberry fruit detection [72]. Compared to such Mask R-CNN implementations on other crops, our method reached similar or even higher accuracy by requiring substantially less images. Only a small dataset of 200 images was required for the initial training, and only a few images (10–20) for model updating on a custom image set are needed. Thereby, this method has the potential to contribute to few-shot learning in agriculture if applied to other crops or plant phenotypes. By releasing relevant Mask R-CNN parameters for fine-tuning the model to a specific crop like maize cobs, the development of standard Mask R-CNN models for different crops or plant phenotypes is facilitated by this work. A unique Mask R-CNN model covering many crops and plant phenotypes is unrealistic in short-term due to the very different plant features and unavailability of large annotated image data sets. However, such a goal could be created in an open source project with a large and diverse set of annotated crop images and extensive model training, similar to the Model Zoo project (https://modelzoo.co/). Although both maize cob and ruler detection and segmentation performed well, we observed minor inaccuracies in some masks. A larger training set did not improve precision and eliminate these inaccuracies, as the resolution of the mask branch in the Mask R-CNN framework may be too low, which could be improved by adding a convolutional layer of, for example, 56 $\times$ 56 pixel instead of the usual 28 $\times$ 28 pixel at the cost of longer computing time.

Mask R-CNN achieved higher correlation coefficients between true and predicted cob measurements than existing image analysis methods, which reported coefficients of $r=0.99$ for cob length, $r=0.97$ for cob diameter [40] and $r=0.93$ for cob diameter [45]. Our Mask R-CNN achieved coefficients of $r=0.99$ for cob diameter and $r=1$ for cob length. Such correlations are a remarkable improvement considering that they were obtained with the highly diverse and inhomogeneous ImgOld and ImgNew image data (Fig. 8 and Additional file 1: Table S4), whereas previous studies used more homogeneous images with respect to color and shape of elite maize hybrid breeding material taken with uniform backgrounds. The high accuracy of Mask R-CNN indicates the advantage of the learning on specific cob and ruler patterns in deep learning.

Another feature of our automated pipeline is the simultaneous segmentation of cob and ruler, which allows pixel measurements to be instantly converted to centimeters and morphological measurements to be returned. Such an approach was also used by Makanza et al. [40], but no details on ruler measurements or accuracy of ruler detection were provided. The ability to detect rulers and cobs simultaneously is advantageous in a context where professional imaging equipment is not available, such as agricultural fields.

Selection of training parameters to reduce annotation and training workload

Our Mask R-CNN workflow consists of annotating the data, training or updating the model, and running the pipeline to automatically extract features from the maize cobs. The most time-consuming and resource-intensive step was the manual annotation of cob images to provide labeled images for training, which took several minutes per image, but can be accelerated by supporting software [12]. In the model training step, model weights are automatically learned from the annotated images in an automated way, which is a major advantage over existing maize cob detection pipelines that require manual fine-tuning of parameters for different image datasets using operations such as thresholding, filtering, water-shedding, edge detection, corner detection, blurring and binarization [40, 45, 68].

Statistical analysis of each Mask R-CNN training parameters helps to reduce the amount of annotation and fine-tuning required (Tables 1 and 2). For example, there was no significant improvement on a large training set of 1,000 compared to 200 images, as learning on and segmenting of two object classes only seems to be a simple task for Mask R-CNN. Therefore, we do not expect further model improvement on a set of more than 1,000 images and the significant amount of work involved in manual image annotation can be reduced if no more than 200 images need to be annotated. Since many training parameters did not have a strong impact on the final model result, this suggests that such parameters do not need to be fine-tuned. For example, using all layers instead of only the network heads (only the last part of the network involving the fully-connected layers) did not improve significantly the final detection result. Training image datasets with only a few object classes on network heads greatly reduces the runtime for model training.

Technical equipment and computational resources for deep learning

The robustness of the Mask R-CNN approach imposes only simple requirements for creating images for both training and application purposes. RGB images taken with a standard camera are sufficient. In contrast, neural network training requires significant computational resources and is best performed on a high performance computing cluster or on GPUs with significant amounts of RAM. Training of the 90 different models (Additional file 1: Table S6) was executed over 3 days, using 4 parallel GPUs on a dedicated GPU cluster. However, once the maize model is trained, model updating with only a few annotated images from new maize image data does not require a high performance computing infrastructure anymore, as in our case updating with 20 images was achieved in less than an hour on a normal workstation with 16 CPU threads and 64 GB RAM.

Model updating with the pre-trained maize model on two different image datasets ImgCross and ImgDiv significantly improved the $AP@\left[.5:.95\right]$ score for cob and ruler segmentation on the new images. The improvement was achieved despite additional features in the new image data that were absent from the training data. New features include rotated images, cobs in different orientation (horizontal instead of vertical) and different backgrounds (Fig. 6). The advantage of a pre-trained maize model over the standard COCO model was independent of the image data set and achieved higher $AP@\left[.5:.95\right]$ scores with a small number of epochs (Fig. 5) because it saves training time for new image types, is widely applicable, and can be easily transferred to new applications for maize cob phenotyping. Importantly, the initial training set is not required for model updating. Our analyses indicate that only 10–20 annotated new images are required and the update can be limited to 50 epochs. The updated model can then be tested on the new image dataset, either by visual inspection of the detection or by annotating some validation images to obtain a rough estimate of the $AP@\left[.5:.95\right]$ score. The phenotypic traits can then be extracted by the included post-processing workflow, which itself only needs to be modified if additional parameters are to be implemented.

The runtime of the pipeline after model training is very fast. Image segmentation with the trained Mask R-CNN model and parameter estimation of eight cob traits took on average of 3.6 s per image containing an average of six cobs. This time is shorter than previously published pipelines (e.g., 13 s per image in [45]), although it should be noted that any such comparisons are not based on the same hardware and the same set of traits. For example, the pipeline for three dimensional cob phenotyping performs a flat projection of the surface of the entire cob, but is additionally capable of annotating individual cob kernels and the total time for analyzing a single cob is 5–10 min [68]. The ear digital imaging (EDI) pipeline of Makanza et al. [40] processes more than 30 unthreshed ears at the same time and requires more time per image at 10 s, but also extracts more traits. However, this pipeline was developed on uniform and standardized images and does not involve a deep learning approach to make it generally applicable.

Application of the Mask R-CNN pipeline for genebank phenomics

To demonstrate the utility of our pipeline, we applied it to original images of maize cobs from farmer’s fields during the establishment of the official maize genebank in Peru in the 1960s and 1970s (ImgOld) and to more recent photographs taken during the regeneration of existing maize material in 2015 (ImgNew). The native maize diversity of Peru was divided into individual landraces based mainly on cob traits. Our interest was to identify genebank accessions with high or low diversity of cob traits within accessions to classify accessions as ’pure’ representatives of a landrace or as accessions with high levels of native genetic diversity, evidence of recent gene flow, or random admixture of different landraces. We used two different approaches to characterize the amount of variation for each trait within the accessions based on the eight traits measured by our pipeline. Unsupervised clustering of variance measure identified two groups of accessions that differed in their overall level of variation. The distribution of normalized variance parameters (Z-scores and standard deviations) within both groups indicate that variation in cob color has the strongest effect on variation within genebank accessions, suggesting that cob color is more variable that morphometric characters like cob length or cob diameter. This information is useful for subsequent studies, in terms of the relationship between genetic and phenotypic variation in native maize diversity, the geographic patterns of phenotypic variation within landraces, or the effect of seed regeneration during ex situ conservation on phenotypic diversity, which we are currently investigating in a separate study.

Conclusion

We present the successful application of deep learning by Mask R-CNN to maize cob segmentation in the context of genebank phenomics by developing a pipeline written in Python for a large-scale image analysis of highly diverse maize cobs. We also developed a post-processing workflow to automatically extract measurements of eight phenotypic cob traits from cob and ruler masks obtained with Mask R-CNN. In this way, cob parameters were extracted from 19,867 individual cobs with a fast automated pipeline suitable for high-throughput phenotyping. Although the Mask R-CNN model was developed based on native maize diversity of Peru, the model can be easily used and updated for additional image types in contexts like the genetic mapping of cob traits or in breeding programs. It therefore is of general applicability in maize breeding and research and for this purpose, we provide simple manuals for maize cob detection, parameter extraction and deep learning model updating. Future developments of the pipeline may include linking it to mobile phenotyping devices for real-time measurements in the field and using the large number of segmented images to develop refined models for deep learning, for example, to estimate additional parameters such a row numbers or characteristics of individual cob kernels.

Materials and methods

Plant material

The plant material used in this study is based on 2,484 genebank accessions of 24 Peruvian maize landraces collected from farmer’s fields in the 1960s and 1970s, which are stored the Peruvian maize genebank hosted at the Universidad Agraria La Molina (UNALM), Peru. These accessions originate from the three different ecogeographical environments (coast, highland and rainforest) present in Peru and therefore represent a broad sample of Peruvian maize diversity.

Image data of maize cobs

All accessions were photographed during their genebank registration. An image was taken with a set of 1–12 maize cobs per accession laid out side by side with a ruler and accession information. Because the accessions were collected over several years, the images were not taken under the same standardized conditions of background, rulers and image quality. Prints of these photographs were stored in light-protected cupboards of the genebank and were digitized with a flatbed scanner in 2015 and stored as PNG files without further image processing. In addition, all genebank accession were regenerated in 2015 at three different locations reflecting their ecogeographic origin and the cobs were photographed again with modern digital equipment under standardized conditions and also stored as PNG images. The image data thus consist of 1,830 original (ImgOld) and 1,619 new (ImgNew) images for a total of 3,449 images. Overall, the images show a high level of variation due to technical and genetic reasons, which are outlined in Fig. 8. These datasets were used for training and evaluation of the image segmentation methods. Passport information available for each accession and their assignment to the different landraces is provided in Additional file 1: Table S5. All images were re-scaled to a size of 1000 × 666 pixels with OpenCV, version 3.4.2 [7].

We used two different datasets for updating the image segmentation models and evaluating their robustness. The ImgCross image dataset contains images of maize cobs and spindles derived from a cross of Peruvian landraces with a synthetic population generated from European elite breeding material and therefore reflects genetic segregation in the F2 generation. The images were taken with digital camera at the University of Hohenheim under standardized conditions and differ from the other data sets by a uniform green background, a higher resolution 3888 × 2592 pixels (no re-sizing), a variable orientation of the cobs, orange labels and differently colored squares instead of a ruler.

A fourth set of images (ImgDiv) was obtained mainly from publicly available South American maize genebank catalogs and from special collections available as downloadable figures on the internet. The ImgDiv data vary widely in terms of number and color of maize cobs, image dimensions and resolution, number, position and orientation of cobs. Some images also contain rulers as in ImgOld and ImgNew.

Software and methods for image analysis

Image analysis was mainly performed on a workstation running Ubuntu 18.04 LTS and the analysis code was written in Python (version 3.7; [63]) for all image operations. OpenCV (version 3.4.2 [7]) was used to perform basic image operations like resizing and contour finding.

For Window-CNN and Mask R-CNN, deep learning was performed with the Tensorflow (version 1.5.0; [1]) and Keras (version 2.2.4; [10]) libraries. In Mask R-CNN, the framework [25] from the matterport implementation (https://github.com/matterport/ Mask_RCNN) was used and adapted to the requirements of the maize cob image datasets. Statistical analyses for evaluating the contribution of different parameters in Mask R-CNN and for the clustering of the obtained cob traits was carried out with R version 3.6.3 [54].

Due to the lack of previous studies on cob image analysis in maize genetic resources, we tested three very different approaches (Felzenszwalb-Huttenlocher segmentation, Window-CNN and Mask R-CNN) for cob and ruler detection and image segmentation. Details on their implementation and comparison can be found in the Additional file 2: Text, but our approach is briefly described below. For image analysis using traditional approaches, we first applied various tools such as filtering, water-shedding, edge detection and corner detection to representative subsets of ImgOld and ImgNew. These algorithms can be tested fast and easily on image subsets, however they are usually not robust towards changes in image properties (i.e. color, brightness, contrast, object size) and require manual fine-tuning of parameters. With our image dataset, the best segmentation results were obtained with the graph-based Felzenszwalb-Huttenlocher image segmentation algorithm [15] implemented in the Python scikit-image library version 0.16.2 [66] and the best ruler detection with the naive Bayes Classifier, implemented in the PlantCV library [19]. The parameters had to be manually fine-tuned for each of the two image datasets.

To evaluate deep learning, we used a windows-based (Window-CNN) and a Mask R convolutional neural network (Mask R-CNN), both of which require training on annotated and labeled image data. Convolutional Neural Networks [36] (CNNs) are known to be the most powerful feature extractors and their popularity for image classification dates back to the ImageNet classification challenge, which was won by the architecture AlexNet [35]. Generally, a CNN consists of 3 different layer types, which are subsequently connected: Convolutional layers, Pooling Layers and Fully-Connected (FC) Layers. In a CNN for cob detection the classes ‘cob’ and ‘ruler’ can be learned as a feature using deep learning, which provides maize cob feature extraction independent of the challenges in diverse images like scale, cob color, cob shape, background color and contrast.

Since our goal was to localize and segment the cobs within the image, we first used sliding window CNN (Window-CNN), which passes parts of an image to a CNN at a time and returns the probability that it contains a particular object class. Sliding windows have been used in plant phenotyping to detect plant segments [3, 9]. The main advantage of this method is the ability to customize the CNN structure to optimize automatic learning of object features. Our implementation of Window-CNN is described in detail in Additional file 2: Text.

Since sliding window CNNs have low accuracy and very long runtime, feature maps are used to filter out putative regions of interest on which boxes are refined around objects. Mask R-CNN [25] is the most recent addition to the family of R-CNNs [21] and includes a Region Proposal Network (RPN) to reduce the number of bounding boxes by passing only $N$ region proposals that are likely to contain some object to a detection network block. The detection network generates the final object localizations along with the appropriate classes from the RPN proposals and the appropriate features from the feature CNN. Mask R-CNN extends a Fast R-CNN [20] with a mask branch of two additional convolutional layers that perform additional instance segmentation and return a pixel-wise mask for each detected object containing a bounding box, a segmentation mask and a class label. We tested Mask R-CNN on our maize cob image set to investigate the performance of a state-of-the-art deep learning object detection, classification and segmentation framework. The method requires time-consuming image annotation and expensive computational resources (high memory and GPU’s).

Implementation of Mask R-CNN to detect maize cobs and rulers

The training image data (200 or 1,000 images) were randomly selected from the two datasets ImgOld and ImgNew to achieve maximum diversity in terms of image properties (Additional file 1: Tables S1, S8). Both subsets were each randomly divided into a training set (75%) and a validation set (25%). Both image subsets were annotated using VGG Image Annotator (via; version 2.0.8 [13]). A pixel-precise mask was drawn by hand around each maize cob (Additional file 1: Figure S2). The ruler was labeled with two masks, one for the horizontal part and one for the vertical part, which facilitates later prediction of the bounding boxes of the ruler compared to annotating the entire ruler element as one mask. Each mask was labeled as "cob" or "ruler", and the annotations for training and validation sets were exported separately as JSON files.

The third step consisted of model training on multiple GPUs using a standard tensorflow implementation of Mask R-CNN for maize cob and ruler detection. We used the pre-trained weights of the COCO model, which is the standard model [25] derived from training on the MS COCO dataset [38], in the layout of resnet 101 (transfer learning). The original Mask R-CNN implementation was modified by adding two classes for cob and ruler in addition to the background class. Instead of saving all models after each training epoch, only the best model with the least validation loss was saved to save memory. For training the Mask R-CNN models, we used Tesla K80 GPUs with 12 GB RAM each on the BinAC GPU cluster at the University of Tübingen.

We trained 90 different models with different parameter settings (Additional file 1: Tables S1, S6) on both image datasets. The learning rate parameter learningrate was set to vary from ${10}^{-3}$, as in the standard implementation, to ${10}^{-5}$, since models with smaller datasets often suffer from overfitting, which may require smaller steps in learning the model parameters. Training was performed over 15, 50, or 200 epochs (epochsoverall) to capture potential overfitting issues. The parameter epochs.m distinguishes between training only the heads, or training the heads first, followed by training on the complete layers of resnet101. The latter requires more computation time, but offers the possibility to fine tune not only the heads, but all the layers to obtain a more accurate detection. The mask loss weight (masklossweight) was given the value of 1, as in the default implementation, or 10, which means a higher focus on reducing mask loss. The monitor metric (monitor) for the best model checkpoint was set to vary between the default validation loss and the mask validation loss. The latter option was tested to optimize preferentially for mask creation, which is usually more challenging than determining object class, bounding box loss, etc. The use of the minimask (minimask) affects the accuracy of mask creation and in the default implementation consists of a resizing step before the masks are forwarded by the CNN during the training process.

The performance of these models for cob and ruler detection was evaluated by the IoU (Intersection over Union) score or Jaccard index [29], which is the most popular metric to evaluate the performance of object detectors. The IoU score between a predicted and a true bounding box is calculated by

$$IoU=\frac{\text{Area of Overlap}}{\text{Area of Union}}$$

The most common threshold for IoU is 50% or 0.5. With IoU values above 0.5, the predicted object is considered as true positive (TP), else as a false positive (FP). Precision is calculated by

$$P=\frac{\text{TP}}{{\text{TP}}+{\text{FP}}}$$

The average precision (AP) was calculated by averaging $P$ over all ground-truth objects of all classes in comparison to their predicted boxes, as demonstrated in various challenges and improved network architectures [14, 26, 57].

Following the primary challenge metric of the COCO dataset [44], the goodness of our trained models was also scored by $AP@\left[.5:.95\right]$, sometimes also just called AP, which is the average AP over different IoU thresholds from 50 to 95% in 5% steps. In contrast to usual object detection models where IoU/AP metrics are calculated for boxes, in the following IoU relates to the masks [55], because this explores the performance of instance segmentation. We performed an ANOVA with 90 model results scores to evaluate the individual impact of the parameters on the $AP@\left[.5:.95\right]$ score. Logit transformation was applied to fit the assumptions of heterogeneity of variance and normal distribution (Additional file 1: Figure S4). Model selection was carried out including parameters learningrate (${10}^{-3},{10}^{-4},{10}^{-5}$, epochs.m (1:only heads, 2:20 epochs heads, 3:10 epochs heads; for the rest all model layers trained), epochsoverall (15, 50, 200), masklossweight (1,10), monitor (val loss, mask val loss) and minimask (yes, no). Also all two-way interactions were included in the model, dropping non-significant interactions first and then non-significant main effects if none of their interactions were significant.

These results allow to formulate the following final model to describe contributions of the parameters on Mask R-CNN performance:

$${y}_{ijh}=\mu +{b}_{i}+{v}_{j}+{k}_{h}+{\left(bk\right)}_{ih}+{e}_{ijh}$$

where $\mu$ is the general effect, ${b}_{i}$ the effect of the $i$-th minimask, ${v}_{j}$ the effect of the $j$-th overall number of epochs, ${k}_{h}$ the effect of the $h$-th training set size, the interaction effect between the number of epochs and the training set size and ${e}_{ijh}$ the random deviation associated with ${y}_{ijh}$. We calculated ANOVA tables, back-transformed lsmeans and contrasts (confidence level of 0.95) for the significant influencing variables. As last step of model training, we set up a workflow with the best model as judged by its $AP@\left[.5:.95\right]$ score and performed random checks whether objects were detected correctly.

Workflow for model updating with new pictures

To investigate the updating ability of Mask R-CNN on different maize cob image datasets, we annotated additionally 150 images (50 training, 100 validation images) from each of the ImgCross and ImgDiv datasets. For ImgCross, the high resolution of $3888\times 2592$ pixels was maintained, but 75% of the images were rotated (25% by 90, 25% by 180, and 25% by 270) to increase diversity. The corn cob spindles on these images were also labeled as cobs and the colored squares were labeled as rulers. The ImgDiv images were left at their original resolution and annotated with the cob and ruler classes.

The model weights of the best model (M104) obtained by training with ImgOld and ImgNew were used as initial weights and updated with ImgCross and ImgDiv images. Based on the statistical analysis, optimal parameter levels of the main parameters were used and only the network heads were trained with a learning rate of ${10}^{-3}$ for 50 epochs without the minimum mask. Training was performed with different randomly selected sets (10, 20, 30, 40, and 50 images) to evaluate the influence of the number of images on the quality of model updating. For each training run, all models with an improvement step in validation loss were saved, and the $AP@$[0.5:0.95] score was calculated for each of them. For comparison, all combinations of models were also trained with the standard COCO weights.

Statistical analysis of model updating results

To evaluate the influence of the data set, the starting model, and the size of the training set, an ANOVA was performed on the data set of $AP@\left[.5:.95\right]$ from all epochs and combinations. Logit transformation was applied to meet the assumptions of heterogeneity of variance and normal distribution. Epoch was included as a covariate. Forward model selection was performed using the parameters dataset (ImgCross, ImgDiv), starting model (COCO, pre-trained maize model), and training set size (10, 20, 30, 40, 50). All two-way and three-way parameter interactions were included in the model. Because the three-way interaction was not significant, the significant two-way interactions and significant main effects were retained in the final model, which can be denoted as follows:

$${y}_{ijh}=\mu +{c}_{i}+{n}_{j}+{k}_{h}+{\left(cn\right)}_{ih}+{\left(nk\right)}_{jh}+{e}_{ijh}$$

ANOVA tables, back-transformed lsmeans and p-values (Additional file 1: Tables S7 and S8; confidence level of 0.95) for the significant influencing variables were calculated.

Post-processing of segmented images for automated measurements and phenotypic trait extraction

Mask R-CNN images are post-processed (Fig. 9) with an automated pipeline to extract phenotypic traits of interest, being either relevant for maize yield (i.e. cob measurements) or for genebank phenomics (i.e. cob shape or color descriptors to differentiate between landraces). The Mask R-CNN model returns a list of labeled masks, which are separated into cob and ruler masks for subsequent analysis. Contour detection is applied to binarized ruler masks to identify individual black or white ruler elements, whose length in pixel is then average for elements of a ruler to obtain a pixel value per cm for each image. Length and diameter of cob masks are then converted from pixel into cm values using the average ruler lengths. The cob masks are also used to calculate the mean RGB color of each cob. In contrast to a similar approach by Miller et al. [45], who sampled pixels from the middle third of cobs for RGB color extraction, we used the complete cob mask because kernel color was variable throughout the cob in highly diverse image data. We also used the complete cob mask to extract cob shape parameters that include asymmetry and ellipticity similar to a previous study of avian eggs [58], who characterized egg shape diversity using the morphometric equations of Baker [6]. Since our image data contained a high diversity of maize cob shapes we reasoned that shape parameters like asymmetry and ellipticity are useful for a morphometric description of maize cob diversity. For demonstration examples of symmetrical/asymmetrical and round/elliptical cobs please refer to Additional file 1: Figure S3. Overall the following phenotypic traits were extracted from almost 19,867 cobs: Diameter, length, aspect ratio (length/diameter), asymmetry, ellipticity and mean RGB color separated by red, green, blue channels. Our pipeline returned all cob masks for later analysis of additional parameters as .jpg images.

Quantitative comparison between Felzenszwalb-Huttenlocher segmentation, Window-CNN and Mask R-CNN

For quantitative comparisons between the three image segmentation methods, a subset of 50 images from ImgOld and 50 images from ImgNew were randomly selected. None of the images were included in the training data from Window-CNN or Mask R-CNN, and the subset is unbiased against the training data. Therefore, overfitting issues were avoided. True measurements of cob length and diameter were obtained using the annotation tool via [13]. Individual cob dimensions per image could not be directly compared to predicted cob dimensions because Felzenszwalb-Huttenlocher segmentation and Window-CNN often contained multiple cobs in a box or certain cobs were contained in multiple boxes. Therefore, the mean of the predicted cob width and length per image was calculated for each approach, penalizing incorrectly predicted boxes. Pearson correlation was calculated between the true and predicted mean diameter and length of the cob per image separately for the ImgOld and ImgNew sets.

Unsupervised clustering to detect images with high cob diversity

To identify genebank accessions with high phenotypic diversity in ImgOld and ImgNew images, we used two different unsupervised clustering methods. In the first approach, individual cob features (width, length, asymmetry, ellipticity, and mean RGB values) were scaled after their extraction from the images. The Z-score of each cob was calculated as $Z_{ij} = \frac{{x_{ij} - \dot{X}_{j} }}{{S_{j} }}$, where ${Z}_{ij}$ is the Z-score of the $i$ th cob in the $j$ th image, ${x}_{ij}$ is a measurement of the $i$ th cam of the $j$ th image, and $\dot{X}_{j}$ and ${S}_{j}$ are the mean and are the standard deviation of the $j$-th image, respectively. The scaled dataset was analyzed using CLARA (Clustering LARge Applications), which is a multivariate clustering method suitable for large datasets, using the cluster R package [39]. The optimal cluster number was determined by the average silhouette method implemented in the R package factoextra [33].

In the second approach, we used the standard deviations of individual measurements within each each image (${S}_{j}$) as input for clustering. The standard deviations of each image were centered and standardized so that the values obtained for all images were on the same scale. This dataset was then clustered with $k$-means and the number of clusters, $k$, was determined using the gap statistic [61], which compares the sum of squares within clusters to the expectation under a zero reference distribution.

Availability of data and materials

Image files and annotations: http://doi.org/10.5281/zenodo.4587304. Deep learning model and manuals with codes for custom detections and model updating: https://gitlab.com/kjschmidlab/deepcob.

Abbreviations

$AP@\left[.5:.95\right]$ :: AP@[ IoU = 0.50:0.95], sometimes also called mAP
CLARA:: Clustering Large Applications
RPN:: Region Proposal Network

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS et al. TensorFlow: large-scale machine learning on heterogeneous systems; 2015. http://tensorflow.org/.
Abu Alrob I, Christiansen JL, Madsen S, Sevilla R, Ortiz R. Assessing variation in peruvian highland maize: tassel, kernel and ear descriptors. Plant Genet Resour Newsltr. 2004;137:34–41.
Google Scholar
Alkhudaydi T, Reynolds D, Griffiths S, Zhou Ji, De La Iglesia B, et al. An exploration of deep-learning based phenotypic analysis to detect spike regions in field conditions for UK bread wheat. Plant Phenomics. 2019;2019:7368761.
Article PubMed PubMed Central Google Scholar
Araus JL, Cairns JE. Field high-throughput phenotyping: the new crop breeding frontier. Trends Plant Sci. 2014;19(1):52–61.
Article CAS PubMed Google Scholar
Argüeso D, Picon A, Irusta U, Medela A, San-Emeterio MG, Bereciartua A, Alvarez-Gila A. Few-shot learning approach for plant disease classification using images taken in the field. Comput Electron Agric. 2020;175:105542.
Article Google Scholar
Baker DE. A geometric method for determining shape of bird eggs. Auk. 2002;119(4):1179–86.
Article Google Scholar
Bradski G. The OpenCV Library. Dr. Dobb’s Journal of Software Tools; 2000.
Campos H, Caligari PDS. Genetic improvement of tropical crops. Berlin: Springer; 2017.
Book Google Scholar
Cap QH, Suwa K, Fujita E, Uga H, Kagiwada S, Iyatomi H. An End-to-end practical plant disease diagnosis system for wide-angle cucumber images. Int J Eng Technol. 2018;7(4.11):106–11.
Article Google Scholar
Chollet F et al. Keras; 2015. https://keras.io.
Czedik-Eysenberg A, Seitner S, Güldener U, Koemeda S, Jez J, Colombini M, Djamei A. The ‘PhenoBox’, a flexible, automated, open-source plant phenotyping solution. New Phytol. 2018;219(2):808–23.
Article PubMed PubMed Central Google Scholar
Dias PA, Shen Z, Tabb A, Medeiros H. FreeLabel: a publicly available annotation tool based on freehand traces. arXiv:1902.06806 [cs], February; 2019.
Dutta A, Zisserman A. The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM international conference on multimedia. MM ’19. New York, NY, USA: ACM; 2019.
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The pascal visual object classes (VOC) challenge. Int J Comput Vision. 2010;88(2):303–38.
Article Google Scholar
Felzenszwalb PF, Huttenlocher DP. Efficient graph-based image segmentation. Int J Comput Vision. 2004;59(2):167–81.
Article Google Scholar
Fuentes A, Yoon S, Kim S, Park D. A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors. 2017;17(9):2022.
Article PubMed Central Google Scholar
Furbank RT, Tester M. Phenomics–technologies to relieve the phenotyping bottleneck. Trends Plant Sci. 2011;16(12):635–44.
Article CAS PubMed Google Scholar
Ganesh P, Volle K, Burks TF, Mehta SS. Deep orange: mask r-CNN based orange detection and segmentation. IFAC-PapersOnLine. 2019;52(30):70–5.
Article Google Scholar
Gehan MA, Fahlgren N, Abbasi A, Berry JC, Callen ST, Chavez L, Doust AN, et al. PlantCV V2: image analysis software for high-throughput plant phenotyping. PeerJ. 2017;5(December):e4088.
Article PubMed PubMed Central Google Scholar
Girshick R. Fast r-Cnn. In: Proceedings of the IEEE international conference on computer vision; 2015, p. 1440–48.
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2014, p. 580–87.
Granier C, Vile D. Phenotyping and beyond: modelling the relationships between traits. Curr Opin Plant Biol. 2014;18:96–102.
Article PubMed Google Scholar
Grobman A. Races of maize in Peru: their origins, evolution and classification. Vol. 915. National Academies; 1961.
Großkinsky DK, Svensgaard J, Christensen S, Roitsch T. Plant phenomics and the need for physiological phenotyping across scales to narrow the genotype-to-phenotype knowledge gap. J Exp Bot. 2015;66(18):5429–40.
Article PubMed CAS Google Scholar
He K, Gkioxari G, Dollár P, Girshick R. Mask r-CNN. In: Proceedings of the IEEE international conference on computer vision, p. 2961–69; 2017.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016, p. 770–78.
Heerwaarden J van, Hufford MB, Ross-Ibarra J. Historical genomics of North American maize. In: Proceedings of the National Academy of Sciences, July; 2012, p. 201209275.
Houle D, Govindaraju DR, Omholt S. Phenomics: the next challenge. Nat Rev Genet. 2010;11(12):855–66.
Article CAS PubMed Google Scholar
Jaccard P. Étude Comparative de La Distribution Florale Dans Une Portion Des Alpes Et Des Jura. Bull Soc Vaudoise Sci Nat. 1901;37:547–79.
Google Scholar
Jeong YS, Lee HR, Baek JH, Kim KH, Chung YS, Lee CW. Deep learning-based rice seed segmentation for phenotyping. J Korea Ind Inform Syst Res. 2020;25(5):23–9.
Google Scholar
Jiang Yu, Li C, Rui Xu, Sun S, Robertson JS, Paterson AH. DeepFlower: a deep learning-based approach to characterize flowering patterns of cotton plants in the field. Plant Methods. 2020;16(1):156.
Article CAS PubMed PubMed Central Google Scholar
Jin X, Pablo Zarco-Tejada U, Schmidhalter MP, Reynolds MJ, Hawkesford RK, Varshney TY, et al. High-throughput estimation of crop traits: a review of ground and aerial phenotyping platforms. IEEE Geosci Remote Sens Mag. 2020;9(1):200–31.
Article Google Scholar
Kassambara A, Mundt F. Factoextra: extract and visualize the results of multivariate data analyses. R Package Version. 2020;1:7.
Google Scholar
Kistler L, Yoshi Maezumi S, Gregorio J, de Souza NAS, Przelomska FM, Costa OS, Loiselle H, et al. Multiproxy evidence highlights a complex evolutionary legacy of maize in South America. Science. 2018;362(6420):1309–13.
Article CAS PubMed Google Scholar
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;25:1097–105.
Google Scholar
Le C, Yann LD, Jackel BB, Denker JS, Graf HP, Guyon I, Henderson D, Howard RE, Hubbard W. Handwritten digit recognition: applications of neural network chips and automatic learning. IEEE Commun Mag. 1989;27(11):41–6.
Article Google Scholar
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44.
Article CAS PubMed Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: common objects in context. In: European conference on computer vision; 2014, p. 740–55. Springer.
Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K. Cluster: cluster analysis basics and extensions; 2019.
Makanza R, Zaman-Allah M, Cairns JE, Eyre J, Burgueño J, Pacheco Á, Diepenbrock C, et al. High-throughput method for ear phenotyping and kernel weight estimation in maize using ear digital imaging. Plant Methods. 2018;14(1):49.
Article CAS PubMed PubMed Central Google Scholar
Mascher M, Schreiber M, Scholz U, Graner A, Reif JC, Stein N. Genebank genomics bridges the gap between the conservation of crop diversity and plant breeding. Nat Genet. 2019;51(7):1076–81.
Article CAS PubMed Google Scholar
Matsuoka Y, Vigouroux Y, Goodman MM, Sanchez J, Buckler E, Doebley J. A single domestication for maize shown by multilocus microsatellite genotyping. Proc Natl Acad Sci. 2002;99(9):6080–4.
Article CAS PubMed PubMed Central Google Scholar
Messmer R, Fracheboud Y, Bänziger M, Vargas M, Stamp P, Ribaut J-M. Drought stress and tropical maize: QTL-by-environment interactions and stability of QTLs across environments for yield components and secondary traits. Theor Appl Genet. 2009;119(5):913–30.
Article PubMed Google Scholar
Metrics of COCO Dataset. n.d. https://cocodataset.org//#detection-eval.
Miller ND, Haase NJ, Lee J, Kaeppler SM, de Leon N, Spalding EP. A robust, high-throughput method for computing maize ear, cob, and kernel attributes automatically from images. Plant J. 2017;89(1):169–78.
Article CAS PubMed Google Scholar
Mir RR, Reynolds M, Pinto F, Khan MA, Bhat MA. High-throughput phenotyping for crop improvement in the genomics era. Plant Sci. 2019;282:60–72.
Article CAS PubMed Google Scholar
Nguyen GN, Norton SL. Genebank phenomics: a strategic approach to enhance value and utilization of crop germplasm. Plants. 2020;9(7):817.
Article CAS PubMed Central Google Scholar
O’Mahony N, Campbell S, Carvalho A, Harapanahalli S, Hernandez GV, Krpalkova L, Riordan D, Walsh J. Deep learning vs. traditional computer vision. In: Science and information conference, p. 128–44. Springer; 2019.
Ortiz R, Crossa J, Franco J, Sevilla R, Burgueño J. Classification of Peruvian highland maize races using plant traits. Genet Resour Crop Evol. 2008;55(1):151–62.
Article Google Scholar
Ortiz R, Crossa J, Sevilla R. Minimum resources for phenotyping morphological traits of maize (zea Mays l.) genetic resources. Plant Genet Resour. 2008;6(3):195–200.
Article Google Scholar
Ortiz R, Taba S, Tovar VH, Mezzalama M, Xu Y, Yan J, Crouch JH. Conserving and enhancing maize genetic resources as global public goods—a perspective from CIMMYT. Crop Sci. 2010;50(1):13–28.
Article Google Scholar
Ortiz R, Sevilla R. Quantitative descriptors for classification and characterization of highland peruvian maize. Plant Genet Resourc Newsl. 1997;110:49–52.
Google Scholar
Peng B, Li Y, Wang Y, Liu C, Liu Z, Tan W, Zhang Y, et al. QTL analysis for yield components and kernel-related traits in maize across multi-environments. Theor Appl Genet. 2011;122(7):1305–20.
Article PubMed Google Scholar
R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2020.
Ren S, He K, Girshick R, Sun J. Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell. 2016;39(6):1137–49.
Article PubMed Google Scholar
Romero Navarro J, Alberto MW, Burgueño J, Romay C, Swarts K, Trachsel S, Preciado E, et al. A study of allelic diversity underlying flowering-time adaptation in maize landraces. Nat Genet. 2017;49(3):476–80.
Article CAS PubMed Google Scholar
Russakovsky O, Deng J, Hao Su, Krause J, Satheesh S, Ma S, Huang Z, et al. Imagenet large scale visual recognition challenge. Int J Comput Vision. 2015;115(3):211–52.
Article Google Scholar
Stoddard MC, Yong EH, Akkaynak D, Sheard C, Tobias JA, Mahadevan L. Avian egg shape: form, function, and evolution. Science. 2017;356(6344):1249–54.
Article CAS PubMed Google Scholar
Su WH, Zhang J, Yang C, Page R, Szinyei T, Hirsch CD, Steffenson BJ. Automatic evaluation of wheat resistance to fusarium head blight using dual mask-RCNN deep learning frameworks in computer vision. Remote Sens. 2021;13(1):26.
Article Google Scholar
Tardieu F, Cabrera-Bosquet L, Pridmore T, Bennett M. Plant phenomics, from sensors to knowledge. Curr Biol. 2017;27(15):R770–83.
Article CAS PubMed Google Scholar
Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B. 2001;63(2):411–23.
Article Google Scholar
Ubbens J, Cieslak M, Prusinkiewicz P, Stavness I. The use of plant models in deep learning: an application to leaf counting in rosette plants. Plant Methods. 2018;14(1):6.
Article PubMed PubMed Central Google Scholar
Van Rossum G, Drake FL. Python 3 reference manual. Scotts Valley: CreateSpace; 2009.
Google Scholar
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: a brief review. Comput Intell Neurosc. 2018. https://doi.org/10.1155/2018/7068349.
Article Google Scholar
Wallace JG, Rodgers-Melnick E, Buckler ES. On the road to breeding 4.0: unraveling the good, the bad, and the boring of crop quantitative genomics. Annu Rev Genet. 2018;52(1):421–44.
Article CAS PubMed Google Scholar
van der Walt S, Schönberger JL, Nunez-Iglesias J, Boulogne F, Warner JD, Yager N, Gouillart E, Tony Y. Scikit-image: image processing in Python. PeerJ. 2014;2(June):e453.
Article PubMed PubMed Central Google Scholar
Wang Y, Yao Q, Kwok JT, Ni LM. Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surv. 2020;53(3):1–34.
Article Google Scholar
Warman C, Fowler JE. Custom built scanner and simple image processing pipeline enables low-cost, high-throughput phenotyping of maize ears. bioRxiv 2019;780650.
Wilkes G. Corn, strange and marvelous: but is a definitve origin known. In: Smith CW, Betran J, Runge ECA, editors. Corn: origin, history, technology, and production. Hoboken: Wiley; 2004. p. 3–63.
Google Scholar
Xu H, Bassel GW. Linking genes to shape in plants using morphometrics. Annu Rev Genet. 2020;54(1):417–37.
Article CAS PubMed Google Scholar
Yang S, Zheng L, He P, Wu T, Sun S, Wang M. High-throughput soybean seeds phenotyping with convolutional neural networks and transfer learning. Plant Methods. 2021;17(1):1–17.
Article CAS Google Scholar
Yu Y, Zhang K, Yang L, Zhang D. Fruit detection for strawberry harvesting robot in non-structural environment based on mask-RCNN. Comput Electron Agric. 2019;163:104846.
Article Google Scholar
Zhao T, Yang Y, Niu H, Wang D, Chen Y. Comparing u-Net convolutional network with Mask r-CNN in the performances of pomegranate tree canopy segmentation. In: Multispectral, hyperspectral, and ultraspectral remote sensing technology, techniques and applications VII, 10780:107801J. International Society for Optics; Photonics; 2018.

Download references

Acknowledgements

We are grateful to Gilberto Garcia for scanning and photographing the maize genebank accessions at UNALM, Emilia Koch for annotating the images, and Hans-Peter Piepho for statistical advice.

Funding

Open Access funding enabled and organized by Projekt DEAL. This work was funded by the the Gips Schüle Foundation Award to K.S. and by KWS SEED SE Capacity Development Projekt Peru grant to R.B. and K.S. We acknowledge support by the High Performance and Cloud Computing Group at the Zentrum für Datenverarbeitung of the University of Tübingen, the state of Baden-Württemberg through bwHPC and the German Research Foundation (DFG) through Grant No INST 37/935–1 FUGG.

Author information

Authors and Affiliations

Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart, Germany
Lydia Kienbaum, Miguel Correa Abondano & Karl Schmid
Universidad National Agraria La Molina (UNALM), Lima, Peru
Raul Blas
Computational Science Lab, University of Hohenheim, Stuttgart, Germany
Karl Schmid

Authors

Lydia Kienbaum
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Correa Abondano
View author publications
You can also search for this author in PubMed Google Scholar
Raul Blas
View author publications
You can also search for this author in PubMed Google Scholar
Karl Schmid
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LK and KS designed the study. LK performed the image analysis, implemented Felzenszwalb-Huttenlocher segmentation, Window-CNN and Mask R-CNN on the datasets, developed the model updating and carried out the statistical analyses. MCA conducted the multivariate analysis of phenotypic cob data. RB coordinated and designed the acquisition of the maize photographs. LK and KS wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Karl Schmid.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Additional Tables and Figures.

Additional file 2.

Additional Text.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Kienbaum, L., Correa Abondano, M., Blas, R. et al. DeepCob: precise and high-throughput analysis of maize cob geometry using deep learning with an application in genebank phenomics. Plant Methods 17, 91 (2021). https://doi.org/10.1186/s13007-021-00787-6

Download citation

Received: 17 March 2021
Accepted: 30 July 2021
Published: 21 August 2021
DOI: https://doi.org/10.1186/s13007-021-00787-6

DeepCob: precise and high-throughput analysis of maize cob geometry using deep learning with an application in genebank phenomics

Abstract

Background

Results

Conclusions

Background

Results

Comparison of image segmentation methods

Parameter optimization of Mask R-CNN

Loss behavior of Mask R-CNN during model training

Visualization of feature maps generated by Mask R-CNN

Maize model updating on additional image datasets

Descriptive data obtained from cob image segmentation

Discussion

Advantages and limitations of the method for few-shot learning in agriculture

Selection of training parameters to reduce annotation and training workload

Technical equipment and computational resources for deep learning

Application of the Mask R-CNN pipeline for genebank phenomics

Conclusion

Materials and methods

Plant material

Image data of maize cobs

Software and methods for image analysis

Implementation of Mask R-CNN to detect maize cobs and rulers

Workflow for model updating with new pictures

Statistical analysis of model updating results

Post-processing of segmented images for automated measurements and phenotypic trait extraction

Quantitative comparison between Felzenszwalb-Huttenlocher segmentation, Window-CNN and Mask R-CNN

Unsupervised clustering to detect images with high cob diversity

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1.

Additional file 2.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Plant Methods

Contact us