Image capture
Wheat images were scanned using an Epson Perfection V330 (Seiko Epson Corporation, Suwa, Japan) and B. distachyon images with a Canon CanoScan LiDE 700 F (Canon Inc, Tokyo, Japan), which are both consumer grade flatbed scanners (<$250 AUD). To standardise image capture, scanning was managed through VueScan (Hamrick Software, http://www.hamrick.com), which allows for a wide range of flatbed scanner manufacturers. All images were scanned at 300 dpi with no colour adjustment or cropping applied. For wheat scanning, grains were spread onto a glass bottomed tray for ease of collection, while for B. distachyon, seeds were spread on an overhead transparency film both to avoid scratching the scanner glass and to allow the seeds to be easily collected. Since the wheat seed was bulked from field trial material, a non-uniform subsample of seed was scattered from a seed packet. The operator assessed the appropriate amount of seed to avoid excessive touching of grains. The number of seeds per image ranged from 382 to 985 with a mean value of 654. For B.distachyon, seeds were assessed from single spikes from individual plants and all seeds from a spike were measured. The average number of seeds per scan was 18. To maximise contrast at the border of each seed, either a piece of black cardboard, or a matte black box was upturned over the scanning surface, minimizing reflection and shadow. All wheat images used to compare methods are available online [33].
To allow standardisation of colour measurements to the CIELAB colourspace, a Munsell ColorChecker Mini card (X-Rite Corp., MI, USA) was scanned under the same settings as the seed, and used within GrainScan to generate conversion parameters for the colour information measured by the flatbed scanner.
Image analysis
The image analysis workflow in GrainScan is as follows. A grayscale image is derived from the scanned colour image by averaging the Red and Green channels, since these provide the greatest contrast for seeds considered. Preprocessing is applied to simplify the image prior to segmentation. The functions used in this simplification are mostly connected component (or attribute) morphological operators [34]. These operators are used in preference to older structuring element based morphological functions because they are contour-preserving and there is more selectivity in the way the image is modified. The preprocessing steps include Gaussian smoothing to reduce noise, an attribute closing based on width (0.3 × Min grain width, a variable accessible to the user) to fill in the grain crease, a morphological thinning based on elongation to remove any scratches in the background, an attribute opening based on width (0.7 × Min grain width) to remove thin debris and an attribute opening based on length (0.7 × Min grain length) to remove thick debris.
Because flatbed scanners have uniform lighting and the scanner background provides good contrast with the grain colour, there is no need for sophisticated segmentation techniques. The grains can be separated from the background through simple global thresholding. This threshold is determined using an automated thresholding method, based on a bivariate histogram of input grey level versus gradient, as it is more reliable than methods based on the simple image histogram and is used in image normalisation [35]. Touching grains are separated using a common binary object splitting technique based on finding the troughs between regional maxima in the smoothed distance transform. To remove any small regions created by the grain splitting step, a filtering based on the connected component area (0.5 × Min grain width × Min grain length) is then performed.
Individual grains are labelled and measurements made of their size and colour. The dimension measurements are area, perimeter, and surrogates for length and width – the major and minor axes of the best fit ellipse (called majellipse and minellipse respectively). These surrogates are quick to compute and tend to be more robust to noise (small bumps and dents) in the segmented grain boundary which can cause problems with algorithms that measure the exact length and width. The dimension units are converted from pixels to millimetres (mm) based on the input Scanner resolution in dots per inch (dpi).
The software has two independent options in the analysis of colour. One option is to make the colour measurements for each grain in CIELAB values rather than the raw RGB values measured by the scanner. To use the colour calibration option, the image of a calibrated colour checker card must first be analysed using the ColourCalibration software. This software locates the card, segments each of the colour swatches, extracts the mean RGB values for each swatch, and determines the transformation matrix, RGB2Lab, by linear regression between the measured RGB values and the supplied CIELAB values for each swatch. For convenience, the transformation matrix is saved as two images, one containing the 3×3 matrix and one the 3x1 offset (with filename suffixes of *RGB2Labmat.tif and *RGB2Laboff.tif respectively). By inputting this transformation matrix into the GrainScan software, colour measurements made within each labelled grain can be converted from raw RGB values to calibrated L*, a*, and b* values.
The second colour analysis option is to detect the grani crease and to make additional colour measurements in the non-crease region and if present, the crease region. The crease detection is performed on each grain by finding the shortest path along the long axis of the grain after mean filtering preferentially along this axis to suppress intensity variability unrelated to the crease. The resulting dimension and colour measurements are saved to a Results sub-directory in Comma Separated Variable (CSV) format. To permit visual inspection of the segmentation results, the labelled grain image and optionally the labelled crease image are saved (with filename suffixes of *.grainLbl.tif and *.creaseLbl.tif respectively). Overlay images with each labelled grain, or crease, overlaid in a different colour on the input image are also saved (with filename suffixes of *.grainOvr.jpg and *.creaseOvr.jpg respectively, Figure 5).
Comparison to other methods
To compare the image analysis algorithm for size parameters, scanned images were processed with both GrainScan and SmartGrain [29]. Output from these systems was compared to results from a SeedCount system, which was used as a standard for size parameters. SeedCount measurements were taken according to manufacturer’s instructions. To compare between colour measurements determined by GrainScan and SeedCount, output was compared to measurements taken by a Minolta CR-400 chroma meter (Konica Minolta Sensing, Osaka, Japan), an industry standard device for CIE L*, a* and b* values.
Experimental design
Grain samples were collected from a field trial of a diverse mapping population grown in Leeton, New South Wales. For GrainScan and SmartGrain, seed was scanned from 300 field plots, each of which corresponded to a different genotype. It is important to note that no field replicates of any of the genotypes were available in this study. Prior to scanning, seed was cleaned by a vacuum separator to remove chaff. Packets of seed from each plot were tested using an experimental design in which a proportion (p = 0. 4) of the packets was tested with replication. Thus 120 packets were tested twice and the remaining 180 were tested once. This equated to a total of 420 scans which were conducted by a single operator in 14 batches. Each batch comprised 30 scans done sequentially. Replication was achieved for a packet by tipping out seeds and scanning to obtain the first image, then tipping the seeds back into the packet for a subsequent scan. The second image for any packet was always obtained from a different batch to the first image. Thus the design was a p − replicate design [36] with batches as blocks. The SeedCount method was tested on 150 packets, 45 of which were tested with replication, making a total of 195 images. The experimental design was similar to GrainScan and SmartGrain in the sense of involving batches (13 batches with 15 images per batch). Colorimeter (Minolta) measurements were not taken according to a p-replicate design with a blocking structure, but were in duplicate for the 300 packets that were included for GrainScan and SmartGrain.
Data analysis
Analyses were conducted using the ASReml-R package [37] in the R statistical computing environment [38]. For the size data, the analysis commenced with the fitting of a separate mixed model for each trait and method. Since the SeedCount and the SmartGrain methods produce a single value per packet, mean values of the GrainScan data were used to allow comparisons between methods. Each model included random effects for packets and batches. The separate analyses for each method were used to obtain a measure of accuracy for each, defined in terms of the correlation between the predicted packet effects and the true (unknown) packet effects. The data for the different methods were then combined in a multi-variate analysis. The mixed model included a separate mean for each method, random packet effects for each method, random batch effects for each method and a residual for each method. The variance model used for the random packet effects was a factor analytic model [39] which allows for a separate variance for each method and separate correlations between pairs of methods. The other variance models were commensurate with the structure of the experiment. In particular we note that correlations between the GrainScan and SmartGrain methods were included for the batch and residual effects, since these methods were used on the same experimental units (images). The multi-variate analysis provides residual maximum likelihood (REML) estimates of the correlations between the true (unknown) packet effects for different methods. It also provides best linear unbiased predictions (BLUPs) of the packet effects for each method.
For colour measurements, comparisons were made between the complete GrainScan output, GrainScan output for seeds where no crease was detected (abbreviated GSncd), GrainScan output for the non-crease portion of seeds where a crease was detected (abbreviated GSwc), SeedCount and Minolta colorimeter. Since SeedCount and the Minolta methods produce a single value per packet, mean values of the GrainScan data were used to make comparisons between methods.
Initially a separate mixed model analysis was conducted for the data for each trait for each method apart from Minolta. Measurements using the latter were not derived using a design or replication structure as per the other methods and so could not be assessed in the same way. Each model included random effects for packets and batches. The data for the different methods (including Minolta) were then combined in a multivariate analysis. The mixed model was analogous to that used for the seed size analyses.
Brachypodium size analysis was only performed with GrainScan, so no comparisons with other methods were performed.