TasselNet: counting maize tassels in the wild via local counts regression network
 Hao Lu^{1},
 Zhiguo Cao^{1}Email author,
 Yang Xiao^{1},
 Bohan Zhuang^{2} and
 Chunhua Shen^{2}
Received: 29 June 2017
Accepted: 11 September 2017
Published: 1 November 2017
Abstract
Background
Accurately counting maize tassels is important for monitoring the growth status of maize plants. This tedious task, however, is still mainly done by manual efforts. In the context of modern plant phenotyping, automating this task is required to meet the need of largescale analysis of genotype and phenotype. In recent years, computer vision technologies have experienced a significant breakthrough due to the emergence of largescale datasets and increased computational resources. Naturally imagebased approaches have also received much attention in plantrelated studies. Yet a fact is that most imagebased systems for plant phenotyping are deployed under controlled laboratory environment. When transferring the application scenario to unconstrained infield conditions, intrinsic and extrinsic variations in the wild pose great challenges for accurate counting of maize tassels, which goes beyond the ability of conventional image processing techniques. This calls for further robust computer vision approaches to address infield variations.
Results
This paper studies the infield counting problem of maize tassels. To our knowledge, this is the first time that a plantrelated counting problem is considered using computer vision technologies under unconstrained fieldbased environment. With 361 field images collected in four experimental fields across China between 2010 and 2015 and corresponding manuallylabelled dotted annotations, a novel Maize Tassels Counting (MTC) dataset is created and will be released with this paper. To alleviate the infield challenges, a deep convolutional neural networkbased approach termed TasselNet is proposed. TasselNet can achieve good adaptability to infield variations via modelling the local visual characteristics of field images and regressing the local counts of maize tassels. Extensive results on the MTC dataset demonstrate that TasselNet outperforms other stateoftheart approaches by large margins and achieves the overall best counting performance, with a mean absolute error of 6.6 and a mean squared error of 9.6 averaged over 8 test sequences.
Conclusions
TasselNet can achieve robust infield counting of maize tassels with a relatively high degree of accuracy. Our experimental evaluations also suggest several good practices for practitioners working on maizetassellike counting problems. It is worth noting that, though the counting errors have been greatly reduced by TasselNet, infield counting of maize tassels remains an open and unsolved problem.
Keywords
Background
We consider the problem of counting maize tassels from images captured in the field using computer vision. Maize tassels are the male flowers of maize plants. The emergence of tassels indicates the arrival of the reproductive stage. During this stage, the total tassel number is an important cue to monitor the growth status of maize plants. It is closely related to the growth stage [1] and yield potential [2]. In practice, counting maize tassels still mainly depends on human efforts, which is inefficient and fallible. Such a tedious task should be replaced by machines in modern plant phenotyping.

Maize tassels emerge suddenly and vary significantly in shape and size as plants grow over time;

Different cultivars of maize plants exhibit different appearance variations, such as colour and texture;

Illumination changes dramatically due to different weather conditions, especially during the sunny day;

The wind, imaging angle and perspective distortions cause various pose variations;

Occlusions occur frequently, which renders the difficulty for counting even for a human expert;

The cluttered background make visual patterns of maize tassels diverse and misleading;

The quality of images degrades because of the dust or rain drops on the camera lens;

Textural patterns also change essentially due to different flowering status.
Though efforts have been made to tackle above problems and have achieved a moderate degree of success, the precision of the stateoftheart tassel detection method is still below 50% [2]. This may be largely due to the inherent limitation of the nonmaximum suppression mechanism within object detection [7]—it cannot appropriately distinguish overlapping objects. Such a mechanism poses problems for accurate maize tassels detection because overlaps between different tassels are common patterns in the field. We have to ask: is the object detection the best way to count maize tassels? From a point of view of Computer Vision, the objective of object detection is to localise individual instances and output their corresponding bounding boxes. Since the locations of objects are identified, it is easy to derive the number of instances. However, the number of instances actually has nothing to do with the location. If one only cares about estimating the total number of instances, the problem is another important research topic in Computer Vision—object counting. In this paper, we show that it is better to formulate the task of maize tassels counting as a typical counting problem, rather than a detection one. In fact, object detection is generally more difficult to solve than object counting.
Nevertheless, object counting remains a known challenging task [8, 9], in both Plant Science and Computer Vision communities. Three sessions of Leaf Counting Challenge have been held in conjunction with the Computer Vision Problems in Plant Phenotyping workshops (CVPPP2014 [10]/CVPPP2015 [11]/ CVPPP2017 [12]), expecting to showcase visual challenges for plant phenotyping. Many efforts are also made in recent years in Computer Vision to improve the counting precision of crowds [13, 14], cells [15, 16], cars [17, 18], and animals [19]. However, little attention has been paid to plantsrelated counting tasks. To our knowledge, only two published papers considered counting problems relating to plants. Giuffrida et al. [20] proposed a learningbased approach to count leaves in rosette plants. Rahnemoonfar and Sheppard [21] presented a deep simulated learning approach to count tomato images. A limitation is that both papers only report their results on potted plants, which is far different from fieldbased scenarios. In contrast, our experiments use images captured exactly under unconstrained infield environment, leading to a more challenging situation and a more reasonable experimental evaluation.
According to the taxonomy of [22], existing object counting approaches can be classified into three categories: counting by clustering, counting by detection, and counting by regression. The countingbyclustering approaches often rely on the extraction of motion features (see [23] for example), which is not applicable to the plants because the motion of plants is almost unobservable within limited time. In addition, the countingbydetection approaches [24, 25] tend to suffer in crowded scenes with significant occlusions, so this type of method is also not a good choice for our problem. In fact, the transductive principle suggests never to solve a harder problem than the target application necessitates [26]. As a consequence, recent countingbyregression models [13, 15, 17] have demonstrated that it is indeed unnecessary to detect or segment individual instances when estimating their counts. In particular, the key component of modern countingbyregression approaches is the introduction of the density map by Lempitsky and Zisserman [15]. Objects in an image are described by a density map given dot annotations. During the prediction, each object will be assigned a density that sums to 1, so the total number of objects can be reflected by summing over the whole density map. Overlapping objects are naturally taken into account in this paradigm.
Further, countingbyregression approaches can be divided into two subcategories: global regression [13, 20, 27] and local regression [14, 15, 19, 28]. Some early attempts try to regress the global image count directly via either Gaussian regression [13] or regression forest [27]. Chen et al. [29] estimates the local image count using a multioutput ridge regression model. Lempitsky and Zisserman [15], however, chooses to regress the local density map, which is found to be more effective than regressing just the global/local image count. At this time, although a moderate degree of counting accuracy is achieved, the performance is limited by the power of the feature representation. Such a circumstance eases in the era of deep learning, when the feature could be learnt and adjusted given a specific problem. The first deep counting approach can be found in [14], where the problem is addressed by regressing a local density map with deep networks. In fact, most subsequent deep counting approaches also follow this paradigm [18, 19, 30]. More recently, Cohen et al. [28] presents a somewhat different idea that regresses the local subimage count with deep networks. We also take inspirations from Cohen et al. [28]. Readers can refer to Sindagi and Patel [31] for a comprehensive survey to the recent advance of deep networks in counting problems.
To validate the effectiveness of the proposed approach, a novel Maize Tassel Counting (MTC) dataset is constructed and will be released together with this paper. Our MTC dataset contains 361 images chosen from 16 image sequences. These sequences are collected from 2010 to 2015, covering 4 different experimental fields across China. All challenges described in Fig. 1 are involved in this dataset. The number of maize tassels in images varies between 0 and around 100. Following the standard annotation used in objection counting problems [15], a single dot is manually assigned for each maize tassel. We hope such a dataset could be used as a benchmark for evaluating infield counting approaches and could draw attention from practitioners working in this area to attach importance to these infield challenges.
Extensive evaluations are performed on the MTC dataset. Experimental results demonstrate that TasselNet outperforms other stateoftheart methods and significantly reduces the counting errors by large margins. Moreover, based on the experimental results, we also suggest several good practices for infield counting problems.

A novel counting problem of maize tassels whose sizes are selfchanging over time. To the best of our knowledge, this is the first time that a plantrelated counting problem is considered under unconstrained field conditions;

A challenging MTC dataset with 361 field images and corresponding manuallylabelled dotted annotations;

TasselNet: an effective deep CNNbased solution for infield counting of maize tassels via local counts regression.
Methods
Experimental fields and imaging devices
Maize tassels counting dataset
Given 16 independent time series image sequences, images captured from the tasselling stage to the flowering stage are considered in our MTC dataset. In particular, according to the variability each sequence presents, 8–45 images are manually chosen from each sequence. If extrinsic conditions, such as weather conditions or the wind, change dramatically, more images will be chosen in one day, otherwise only 1 or 2 images are chosen. Such a sampling strategy is used with the motivation to avoid repetitive samples as much as possible, because images captured in one day usually do not exhibit many variations. However, the ability to model various data variabilities is much more important than blindly fitting a large number of repetitive samples for an effective computer vision approach. Thus, 361 field images in all are chosen to construct the MTC dataset. The MTC dataset is divided into the training set, validation set and test set. The training set and validation set share the same image sequences, while the test set uses different image sequences to enable a reasonable evaluation. Such an intentional setting is motivated by the fact that images in one sequence are often highly correlated, it is thus inappropriate to place them into both the training and test stages. Table 1 summarises the information of the MTC dataset. Overall, we have 186 images for training and validation and 175 images for test.
Training set (train), validation set (val) and test set (test) settings of the MTC dataset
Sequence  Num  Cultivar  train  val  test 

Zhengzhou2010  37  Jundan No. 20  \(\checkmark\)  \(\checkmark\)  
Zhengzhou2011  24  Jundan No. 20  \(\checkmark\)  
Zhengzhou2012  22  Zhengdan No. 958  \(\checkmark\)  \(\checkmark\)  
Taian2010_1  30  Wuyue No. 3  \(\checkmark\)  \(\checkmark\)  
Taian2010_2  32  Wuyue No. 3  \(\checkmark\)  
Taian2011_1  21  Nongda No. 108  \(\checkmark\)  \(\checkmark\)  
Taian2011_2  19  Nongda No. 108  \(\checkmark\)  
Taian2012_1  41  Zhengdan No. 958  \(\checkmark\)  \(\checkmark\)  
Taian2012_2  23  Zhengdan No. 958  \(\checkmark\)  
Taian2013_1  8  Zhengdan No. 958  \(\checkmark\)  \(\checkmark\)  
Taian2013_2  8  Zhengdan No. 958  \(\checkmark\)  
Gucheng2012  15  Jidan No. 32  \(\checkmark\)  \(\checkmark\)  
Gucheng2014  45  Zhengdan No. 958  \(\checkmark\)  
Jalaid2015_1  12  Tianlong No. 9  \(\checkmark\)  \(\checkmark\)  
Jalaid2015_2  12  Tianlong No. 9  \(\checkmark\)  
Jalaid2015_3  12  Tianlong No. 9  \(\checkmark\) 
Local counts regression network
In this section we describe our proposed local counts regression network and show how to use it to address effectively the infield counting problem of maize tassels.
The highlevel idea of counting by regression is simple: given an image I and a regression target T, the goal is to seek some kind of regression function F so that \(T\approx F(I)\). Standard solutions are to regress explicitly the raw counts in an image [13] (T is the global counts) or to regress implicitly the density map of an image [15] (T becomes a density map, and the counts can be acquired by integrating over the entire density map). However, as what we will show in our experiments, both solutions are not effective for maize tassels counting. The reason may boil down to the heterogeneity of maize tassels. As shown in Fig. 4, maize tassels exhibit uncertain poses and varying sizes, making them hard to be described by only a global image representation or a density map given only dotted annotations. Indeed, this is what makes maize tassels counting different from other standard counting problems.
Regression target
Network architecture
The network architecture closely relates to the model capacity, and the model capacity is also a key factor that affects the counting performance. Motivated by the leading role of CNNs in Computer Vision, in this paper we evaluate three typical network architectures: a lowcapacity 4layer model identical to the seminal LeNet architecture [35], a mediumcapacity 7layer model similar to the AlexNet architecture [32], as well as a highcapacity 16layer model sharing the same spirit of the VGGVG16Net [34].
Loss function
Merging and normalizing subimage counts
Implementation and learning details
We implement TasselNet based on MatConvNet [36]. Original highresolution images are resized to their 1/8 sizes to reduce computational burden. During training, we densely crop \(r\times r\) subimages with a stride of \(s_r\) from 186 images belonging to the training and validation sequences of MTC dataset. We perform a random shuffling of these subimages, 90% subimages are used for training, and the rest for validation. Before feeding the image samples into the network, each subimage is preprocessed by mean subtraction (the mean is computed from the training subset).
Note that, no data augmentation is performed when reporting the results, because we consider fieldbased conditions already cover various scenarios (the diversity of training data can be guaranteed). One may further improve the network performance with random rotation, flipping and cropping of training images. It also should be kept in mind that the ground truth counts may change accordingly.
The parameters of the convolution kernels are initialised with the improved Xavier method [37]. The standard stochastic gradient descent is used to optimise the network parameters. The learning rate is initially set to 0.01, and is decreased by a factor of 10 after 5 epochs and further decreased by a factor of 10 after another 10 epochs. Thus, we train TasselNet for 25 epochs in all. To allow the gradient to backpropagate easily from the output layer to the input layer, we add a batch normalisation layer [38] after each convolutional layer before ReLU. The training time of TasselNet varies from half a day to 2 days depending on the number of training samples and the network architecture used. The prediction time for each image takes about 2.5 seconds (Matlab 2016a, OS: Ubuntu 14.04 64bit, CPU: Intel E52630 2.40GHz, GPU: Nvidia GeForce GTX TITAN X, RAM: 64 GB).
Default parameters setting used in our experiments
Parameter  Remark  Value 

Network architecture  AlexNetlike  
Loss function  \(\ell _1\)  
r  Subimage size  32 
\(s_r\)  Sampling stride during training  r/4 
\(s_e\)  Sampling stride during prediction  r/4 
\(\sigma\)  Gaussian kernel parameter  8 
Results and discussion
Evaluation metric
Choices of different network architectures, number of training samples, loss functions, Gaussian kernel parameters, and subimage sizes
Here we perform extensive evaluations to justify our design choices. Notice that, in a principal way, the inclusion of specific design choices should be justified on the validation set. However, since we enforce the test set to be different sequences, the validation set thus exhibits a substantially different data distribution. Validating our design choices based on the validation set seems suboptimal. Instead, as a preliminary study, we direct report the counting performance on the test set to see how the variations of these design choices affect the final counting performance. Although with a little abuse, we demonstrate later that the performance of TasselNet with any design choice shows a notable improvement over other baseline approaches by large margins. Below we follow the default parameters setting when reporting experimental results unless a specific design choice is declared.
Network architecture
Comparison of different network architectures for maize tassels counting on the test set of MTC dataset
Network  Sequences  Overall  

Zhengzhou2011  Taian2010_2  Taian2011_2  Taian2012_2  Taian2013_2  Gucheng2014  Jalaid2015_2  Jalaid2015_3  
MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  
LeNet  4.4  5.4  6.3  8.0  2.9  3.7  6.4  7.9  4.9  5.8  **3.8  5.0  16.3  17.0  28.7  33.0  7.2  11.3 
AlexNet  4.9  6.1  5.2  6.6  2.5  2.9  4.8  5.8  4.0  5.0  5.3  6.5  16.0  16.6  20.7  25.2  6.6  9.6 
VGGVD16Net  2.1  2.7  10.6  12.4  13.1  15.9  5.5  10.0  4.3  5.4  10.0  11.3  10.7  11.2  20.8  24.9  9.3  12.4 
Further, it is worth noting that there exists some recent network architectures, such as ResNets [39] and DenseNets [40], that exhibit more powerful modelling ability than the three baseline architectures presented in this paper. One may find better counting performance using advanced architectures. We leave these explorations open at present.
Number of training samples
Counting performance with different number of training samples (\(N_{train}\)) on the MTC dataset
\(N_{train}\)  MAE  MSE 

\(2.37\times 10^4\)  9.5  14.2 
\(9.13\times 10^4\)  8.5  13.4 
\(3.56\times 10^5\)  6.6  9.6 
\(1.41\times 10^6\)  6.5  10.8 
Loss function
Comparison of different loss functions for maize tassels counting on the MTC dataset
Loss  MAE  MSE 

Huber (\(\delta =0.1\))  8.5  12.2 
Huber (\(\delta =1\))  7.5  10.5 
Huber (\(\delta =10\))  7.3  10.0 
\(\ell _2\)  7.3  10.3 
\(\ell _1\)  6.6  9.6 
Gaussian kernel parameter
Comparison of different Gaussian kernel parameter \(\sigma\) for maize tassels counting on the MTC dataset
\(\sigma\)  MAE  MSE 

\(\sigma =4\)  7.0  11.3 
\(\sigma =8\)  6.6  9.6 
\(\sigma =12\)  7.6  10.9 
Subimage sizes
Comparison of different subimage sizes for maize tassels counting on the MTC dataset
\(r\times r\)  MAE  MSE 

\(16\times 16\)  9.9  13.4 
\(32\times 32\)  6.6  9.6 
\(64\times 64\)  6.8  10.8 
\(96\times 96\)  6.9  11.5 
Comparison with the state of the art

JointSeg [41]: JointSeg is the stateoftheart tassel segmentation method. The number of object counts can be easily inferred from the segmentation results. We further perform some morphological operations as postcorrection to reduce the segmentation noises. This approach can be viewed as a countingbysegmentation baseline. It is not specially designed for a counting problem, but the comparison somewhat justify whether our problem could be addressed by a simple image processing technique.

mTASSEL [2]: mTASSEL is the stateoftheart tassel detection approach designed specifically for maize tassels. mTASSEL uses multiview representations to characterise the visual characteristics of tassels to achieve robust detection. This is a countingbydetection baseline.

GlobalReg [42]: GlobalReg directly regresses the global count of images. Offtheshelf fullyconnected deep activations extracted from a pretrained model are used as a holistic image representation. Then the global image feature is linearly mapped into a global object count by ridge regression. This is a global countingbyregression baseline.

DensityReg [15]: DensityReg is the seminal work that proposes the idea of density map regression. It predicts a count density for every pixel by optimising a socalled MESA distance. This is a global densitybased countingbyregression baseline.

CountingCNN (CCNN) [18]: CCNN is a stateoftheart object counting approach. It treats the local density map as the regression target and also uses a AlexNetlike CNN architecture. This is a local densitybased countingbyregression baseline.

TasselNet outperforms other baseline approaches in 7 out of 8 test sequences and achieves the overall best counting performance—MAE and MSE are significantly lower than other competitors.

The poor performance of JointSeg and mTASSEL implies that the problem of infield counting of maize tassels cannot be solved by simple colourcuebased image segmentation or standard object detection.

Even a simple global regression can achieve comparable counting performance against mTASSEL in which the boundingboxlevel annotations are utilized. This suggests it is better to formulate the problem of maize tassels counting in a countingbyregression manner.

Regressing the global density map can also reduce the counting error effectively. However, it is hard to extend this idea to the deep CNNbased paradigm, because there is currently no dataset with thousands of labelled images samples to make the learning of deep networks tractable, especially in the plantrelated scenarios. Hence, DensityReg cannot enjoy the bonus brought by deep CNN, and the performance may be limited by the power of feature representation.

The performance of CCNN even falls behind the global regression baseline. In experiments we observe that CCNN performs poorly when given an image with just a few tassels of different types. Compared to regressing local counts as in TasselNet, CCNN needs to fit harsher pixellevel ground truth density, so it likely suffers in the vague definition of density map due to different tassel sizes. This may explain why local density regression does not work when given varying object sizes like maize tassels.

Qualitative results in Fig. 10 show that TasselNet can give reasonable approximations to the ground truth density maps. In most cases, the estimated counts are similar to the ground truth counts. However, there also exists some circumstances that TasselNet cannot give an accurate prediction. The last row in Fig. 10 show three failure cases: (1) when the image is captured under extremely strong illuminations, highlight regions of leaves will contribute to several fake responses; (2) if maize tassels present longtailed shapes in images, the longtailed parts only receive partial local counts, resulting in a underestimate situation; (3) the extremely crowded scene is also beyond the ability of TasselNet. To alleviate these issues, one may consider to add extra training data that contain the extremely crowded scenarios. Alternatively, since the training sequences and test sequences exhibit more or less different data distributions, it may be possible to use domain adaptation [43] to fill the last few percent of difference between sequences. We leave these as the future explorations of this work.
 1.
Try the idea of counting by regression if the objects exhibit significant occlusions.
 2.
Try local counts regression if the physical size of objects varies dramatically.
 3.
Use a relatively small subimage size so that a sufficient number of training samples could be sampled.
 4.
It is safe to use a moderately complex deep model.
 5.
Try \(\ell _1\) loss first to achieve a robust regression.
Mean absolute errors (MAE) and mean squared errors (MSE) for maize tassels counting on the test set of MTC dataset
Method  Sequences  Overall  

Zhengzhou2011  Taian2010_2  Taian2011_2  Taian2012_2  Taian2013_2  Gucheng2014  Jalaid2015_2  Jalaid2015_3  
MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  MAE  MSE  
JointSeg  20.9  23.2  46.6  47.9  16.4  19.7  25.1  29.8  6.5  8.0  7.3  10.5  27.8  29.1  53.2  61.3  24.2  31.6 
mTASSEL  9.8  14.9  18.6  22.1  11.6  12.7  5.3  7.8  13.1  16.6  31.1  35.3  16.2  18.0  46.6  51.0  19.6  26.1 
GlobalReg  19.0  21.5  23.0  24.7  14.1  16.8  13.5  15.7  19.6  25.2  19.5  21.7  11.2  13.7  42.1  45.4  19.7  23.3 
DensityReg  16.1  20.2  9.9  10.7  9.2  11.7  10.8  12.7  20.2  23.7  9.4  10.5  7.2  7.9  23.5  26.9  11.9  14.8 
CCNN  21.3  23.3  28.9  31.6  12.4  16.0  12.6  15.3  18.9  23.7  21.6  24.1  9.6  12.4  39.5  46.4  21.0  25.5 
TasselNet  4.9  6.1  5.2  6.6  2.5  2.9  4.8  5.8  4.0  5.0  5.3  6.5  16.0  16.6  20.7  25.2  6.6  9.6 
Conclusions
In this paper, we rethink the problem nature of infield counting of maize tassels and novelly formulates the problem as an object counting task. A tailored MTC dataset with 361 field images captured during 6 years and corresponding manuallylabelled dotted annotations is constructed. An effective deep CNNbased solution, TasselNet, is also presented to count effectively maize tassels via local counts regression. We show that local counts regression is particularly suitable for counting problems whose ground truth density maps cannot be precisely defined. Extensive experiments are conducted to justify the effectiveness of our proposition. Results show that TasselNet achieves the stateoftheart performance and outperforms previous baseline approaches by large margins.
For future work, we will continue to enrich the MTC dataset, because the training data are always the key to the good performance, especially the data diversity. In addition, we will explore the feasibility to improve the counting performance in the context of domain adaptation, because the adaptation of object counting problems still remains an open question. Infield counting of maize tassels is a challenging problem, not only because the unconstrained natural environment but also because the selfchanging rule of plants growth. We hope this paper could attract interests of both Plant Science and Computer Vision communities and inspires further studies to advance our knowledge and understanding towards the problem.
Declarations
Authors' contributions
HL proposed the idea of counting maize tassels via local counts regression, implemented the technical pipeline, conducted the experiments, analysed the results, and drafted the manuscript. ZG, YX and CS cosupervised the study and contributed in writing the manuscript. BZ helped to design the experiments and provided technical support and computational resources for efficient model training. All authors read and approved the final manuscript.
Acknowledgements
The authors would like to thank XiuShen Wei for useful discussion.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
The MTC dataset and other supporting materials are available online at: https://sites.google.com/site/poppinace/.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Funding
This work was supported in part by the Special Scientific Research Fund of Meteorological Public Welfare Profession of China under Grant GYHY200906033 and in part by the National Natural Science Foundation of China under Grant 61502187.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Ye M, Cao Z, Yu Z. An imagebased approach for automatic detecting tasseling stage of maize using spatiotemporal saliency. In: Proceedings of eighth international symposium on multispectral image processing and pattern recognition; 2013. p. 89210. International Society for Optics and Photonics. doi:10.1117/12.2031024.
 Lu H, Cao Z, Xiao Y, Fang Z, Zhu Y, Xian K. Finegrained maize tassel trait characterization with multiview representations. Comput Electron Agric. 2015;118:143–58. doi:10.1016/j.compag.2015.08.027.View ArticleGoogle Scholar
 Guo W, Fukatsu T, Ninomiya S. Automated characterization of flowering dynamics in rice using fieldacquired timeseries RGB images. Plant Methods. 2015;11(1):7. doi:10.1186/s1300701500479.View ArticlePubMedPubMed CentralGoogle Scholar
 Yang W, Guo Z, Huang C, Duan L, Chen G, Jiang N, Fang W, Feng H, Xie W, Lian X, et al. Combining highthroughput phenotyping and genomewide association studies to reveal natural genetic variation in rice. Nat Commun. 2014. doi:10.1038/ncomms6087.Google Scholar
 Gage JL, Miller ND, Spalding EP, Kaeppler SM, de Leon N. TIPS: a system for automated imagebased phenotyping of maize tassels. Plant Methods. 2017;13(1):21. doi:10.1186/s1300701701728.View ArticlePubMedPubMed CentralGoogle Scholar
 Fiorani F, Schurr U. Future scenarios for plant phenotyping. Annu Rev Plant Biol. 2013;64:267–91. doi:10.1146/annurevarplant050312120137.View ArticlePubMedGoogle Scholar
 Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D. Object detection with discriminatively trained partbased models. IEEE Trans Pattern Anal Mach Intell. 2010;32(9):1627–45. doi:10.1109/TPAMI.2009.167.View ArticlePubMedGoogle Scholar
 Minervini M, Scharr H, Tsaftaris SA. Image analysis: the new bottleneck in plant phenotyping [applications corner]. IEEE Signal Process Mag. 2015;32(4):126–31. doi:10.1109/MSP.2015.2405111.View ArticleGoogle Scholar
 Ali S, Nishino K, Manocha D, Shah M. Modeling, simulation and visual analysis of crowds: a multidisciplinary perspective. In: Ali S, Nishino K, Manocha D, Shah M, editors. Modeling, simulation and visual analysis of crowds, vol. 11. New York: Springer; 2013. doi:10.1007/9781461484837_1.View ArticleGoogle Scholar
 Tsaftaris SA, Scharr H (2014) Computer vision problems in plant phenotyping (CVPPP). https://www.plantphenotyping.org/CVPPP2014. Accessed 25 Sept 2017.
 Tsaftaris SA, Scharr H, Pridmore T (2015) Computer vision problems in plant phenotyping (CVPPP). https://www.plantphenotyping.org/CVPPP2015. Accessed 25 Sept 2017.
 Tsaftaris SA, Scharr H, Pridmore T (2017) Computer vision problems in plant phenotyping (CVPPP). https://www.plantphenotyping.org/CVPPP2017. Accessed 25 Sept 2017.
 Chan AB, Liang ZSJ, Vasconcelos N. Privacy preserving crowd monitoring: counting people without people models or tracking. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 2008. p. 1–7. doi:10.1109/CVPR.2008.4587569.
 Zhang C, Li H, Wang X, Yang X. Crossscene crowd counting via deep convolutional neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR); 2015. p. 833–841. doi:10.1109/cvpr.2015.7298684.
 Lempitsky V, Zisserman A. Learning to count objects in images. In: Advances in neural information processing systems (NIPS); 2010. p. 1324–1332. http://papers.nips.cc/paper/4043learningtocountobjectsinimages.
 Xie W, Noble JA, Zisserman A. Microscopy cell counting and detection with fully convolutional regression networks. Comput Methods Biomech Biomed Eng Imaging Vis. 2016. doi:10.1080/21681163.2016.1149104.Google Scholar
 Arteta C, Lempitsky V, Noble JA, Zisserman A. Interactive object counting. In: Proceedings of European conference on computer vision (ECCV). Springer; 2014. p. 504–518. doi:10.1007/9783319105789_33.
 OnoroRubio D, LópezSastre RJ. Towards perspectivefree object counting with deep learning. In: Proceedings of European conference on computer vision (ECCV). Springer; 2016. p. 615–629. doi:10.1007/9783319464787_38.
 Arteta C, Lempitsky V, Zisserman A. Counting in the wild. In: Proceedings of European conference on computer vision (ECCV). Springer; 2016. p. 483–498. doi:10.1007/9783319464787_30.
 Giuffrida MV, Minervini M, Tsaftaris SA. Learning to count leaves in rosette plants. In: Proceedings of British Machine Vision Conference Workshops (BMVCW); 2015. doi:10.5244/c.29.cvppp.1
 Rahnemoonfar M, Sheppard C. Deep count: fruit counting based on deep simulated learning. Sensors. 2017;17(4):905. doi:10.3390/s17040905.View ArticlePubMed CentralGoogle Scholar
 Loy CC, Chen K, Gong S, Xiang T. Crowd counting and profiling: methodology and evaluation. In: Modeling, simulation and visual analysis of crowds. New York: Springer; 2013. p. 347–382. . doi:10.1007/9781461484837_14.
 Rabaud V, Belongie S. Counting crowded moving objects. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), vol. 1. IEEE; 2006. p. 705–711. doi:10.1109/cvpr.2006.92.
 Li M, Zhang Z, Huang K, Tan T. Estimating the number of people in crowded scenes by mid based foreground segmentation and headshoulder detection. In: Proceedings of international conference on pattern recognition; 2008. p. 1–4. doi:10.1109/icpr.2008.4761705.
 Dollar P, Wojek C, Schiele B, Perona P. Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell. 2012;34(4):743–61. doi:10.1109/TPAMI.2011.155.View ArticlePubMedGoogle Scholar
 Vapnik VN, Vapnik V. Statistical learning theory, vol. 1. New York: Wiley; 1998.Google Scholar
 Fiaschi L, Köthe U, Nair R, Hamprecht FA. Learning to count with regression forest and structured labels. In: Proceedings of international conference on pattern recognition (ICPR). IEEE; 2012. p. 2685–2688.Google Scholar
 Cohen JP, Lo HZ, Bengio Y. Countception: counting by fully convolutional redundant counting. arXiv 2017.Google Scholar
 Chen K, Loy CC, Gong S, Xiang T. Feature mining for localised crowd counting. In: Proceedings of British Machine Vision Conference (BMVC), vol. 1; 2012. p. 3. doi:10.5244/c.26.21.
 Zhang Y, Zhou D, Chen S, Gao S, Ma Y. Singleimage crowd counting via multicolumn convolutional neural network. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR); 2016. p. 589–597. doi:10.1109/cvpr.2016.70.
 Sindagi VA, Patel VM. A survey of recent advances in cnnbased single image crowd counting and density estimation. Pattern Recognit Lett. 2017. doi:10.1016/j.patrec.2017.07.007.Google Scholar
 Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (NIPS); 2012. p. 1097–1105.Google Scholar
 Lu H, Cao Z, Xiao Y, Fang Z, Zhu Y. Toward good practices for finegrained maize cultivar identification with filterspecific convolutional activations. IEEE Trans Autom Sci Eng. 2016. doi:10.1109/TASE.2016.2616485.Google Scholar
 Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. CoRR abs/1409.1556 2014.Google Scholar
 LeCun Y, Bottou L, Bengio Y, Haffner P. Gradientbased learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324. doi:10.1109/5.726791.View ArticleGoogle Scholar
 Vedaldi A, Lenc K. MatConvNet: convolutional neural networks for matlab. In: Proceedings of ACM international conference on multimedia; 2015. p. 689–692. doi:10.1145/2733373.2807412.
 He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: surpassing humanlevel performance on imagenet classification. In: Proceedings of IEEE international conference on computer vision (ICCV); 2015. p. 1026–1034. doi:10.1109/iccv.2015.123.
 Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of international conference on machine learning (ICML); 2015.Google Scholar
 He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR); 2016. doi:10.1109/cvpr.2016.90.
 Huang G, Liu Z, Weinberger KQ, van der Maaten L. Densely connected convolutional networks. In: IEEE conference on computer vision and pattern recognition (CVPR); 2016.Google Scholar
 Lu H, Cao Z, Xiao Y, Li Y, Zhu Y. Regionbased colour modelling for joint crop and maize tassel segmentation. Biosyst Eng. 2016;147:139–50. doi:10.1016/j.biosystemseng.2016.04.007.View ArticleGoogle Scholar
 Tota K, Idrees H. Counting in dense crowds using deep features. In: CRCV; 2015.Google Scholar
 Lu H, Cao Z, Xiao Y, Zhu Y. Twodimensional subspace alignment for convolutional activations adaptation. Pattern Recognit. 2017;71:320–36. doi:10.1016/j.patcog.2017.06.010.View ArticleGoogle Scholar