Comparison and extension of three methods for automated registration of multimodal plant images

With the introduction of high-throughput multisensory imaging platforms, the automatization of multimodal image analysis has become the focus of quantitative plant research. Due to a number of natural and technical reasons (e.g., inhomogeneous scene illumination, shadows, and reflections), unsupervised identification of relevant plant structures (i.e., image segmentation) represents a nontrivial task that often requires extensive human-machine interaction. Registration of multimodal plant images enables the automatized segmentation of ’difficult’ image modalities such as visible light or near-infrared images using the segmentation results of image modalities that exhibit higher contrast between plant and background regions (such as fluorescent images). Furthermore, registration of different image modalities is essential for assessment of a consistent multiparametric plant phenotype, where, for example, chlorophyll and water content as well as disease- and/or stress-related pigmentation can simultaneously be studied at a local scale. To automatically register thousands of images, efficient algorithmic solutions for the unsupervised alignment of two structurally similar but, in general, nonidentical images are required. For establishment of image correspondences, different algorithmic approaches based on different image features have been proposed. The particularity of plant image analysis consists, however, of a large variability of shapes and colors of different plants measured at different developmental stages from different views. While adult plant shoots typically have a unique structure, young shoots may have a nonspecific shape that can often be hardly distinguished from the background structures. Consequently, it is not clear a priori what image features and registration techniques are suitable for the alignment of various multimodal plant images. Furthermore, dynamically measured plants may exhibit nonuniform movements that require application of nonrigid registration techniques. Here, we investigate three common techniques for registration of visible light and fluorescence images that rely on finding correspondences between (i) feature-points, (ii) frequency domain features, and (iii) image intensity information. The performance of registration methods is validated in terms of robustness and accuracy measured by a direct comparison with manually segmented images of different plants. Our experimental results show that all three techniques are sensitive to structural image distortions and require additional preprocessing steps including structural enhancement and characteristic scale selection. To overcome the limitations of conventional approaches, we develop an iterative algorithmic scheme, which allows it to perform both rigid and slightly nonrigid registration of high-throughput plant images in a fully automated manner.


Introduction
In the last decade, multisensory camera systems have become indispensable tools for the high-throughput screening of quantitative plant traits upon perturbation of environmental and/or molecular-genetic factors. Multimodal screening facilities enable plant scientists to generate large quantities of image data including visible light (VIS), fluorescence (FLU), near-infrared (NIR) and 3D images that are typically analyzed separately from each other. Some image modalities such as visible light or near-infrared images exhibit low contrast between plant and background image regions, which complicates automated findings of plant structures (i.e., image segmentation). Limited efficiency of existing manual and semi-automated approaches to image segmentation has been identified as the major bottleneck of quantitative plant phenotyping pipelines [1]. A combination of lowand high-contrast image modalities (e.g., fluorescence images) by means of multimodal image registration can help to overcome the limitations of unimodal image processing. Once aligned, the binary mask of a segmented FLU image can be applied for extraction of plant regions in optically more heterogeneous VIS images. Consequently, multimodal image registration is an important tool for the automatization of plant image analysis and quantitative trait derivation from high-throughput phenotyping data.
Multimodal image alignment begins with establishment of mutual correspondences between each two structurally similar but nonidentical images. Due to large variability in optical appearance of different plants as well as the same plant in different image modalities, it is not evident what kind of image features and registration algorithms can be universally applied for the alignment of different multimodal plant images.
Differences in spatial camera resolution, position and orientation can, in general, be modeled by a combination of scaling, translations, and rotations. A plethora of methods for image registration has been developed in the past, particularly in the context of biomedical image analysis [2][3][4][5][6]. Depending on the type of image features or intrinsic algorithmic principles, different categorizations of registration techniques have been suggested in the literature. Here, we rely on the algorithm-focused classification of registration methods into three major groups: (i) feature-point, (ii) frequency domain and (iii) intensity-based techniques.
Methods based on the matching of feature-points (FPs) are applied when corresponding image regions exhibit local structural similarity. Pairwise correspondences between two sets of feature-points are then used for calculation of geometrical transformations. Common approaches for the detection of feature-points are based on edges and corners (e.g., FAST [7], Shi and Tomasi [8], Harris operators [9], SUSAN [10]), blob detection (e.g., MSER [11], DoG, DoH), structure tensors, and generalized feature descriptions (e.g., SURF [12], HOG, SIFT [13]). The main limitation of FP methods is the difficulty in finding a sufficient number of corresponding points in similar but nonidentical images of different modaliti [14].
Another prominent approach to image alignment relies on finding correspondences in the frequency domain. For example, Fourier-or Fourier-Mellin phase correlation (PC) techniques make use of the Fourier-shift theorem, which reformulates the problem of finding a shift in Cartesian or polar system coordinates to the phase-shift of Fourier transforms [15][16][17]. A closer analysis of PC methods shows that they basically perform correlations of all image structures that contribute to the synchronization of Fourier phases such as edges and corners [18]. Previous works reported that PC is surprisingly robust with respect to statistical structural image noise [19][20][21]. This remarkable feature of PC originates from the insensitivity of inverse Fourier integrals with respect to distortions of just a few spectral bands such as high-or low-frequency noise [22]. However, PC is also known to be less accurate in the presence of multiple structurally similar patterns or considerable structural dissimilarities such as nonrigid image transformations. The necessity of additional preprocessing steps including image filtering and scaling for improved performance of multimodal image registration using PC was repeatedly reported in the previous literature [23,24]. Downscaling to a proper size appears to improve the robustness and accuracy of image registration by suppressing modality-specific highfrequency noise, which effectively enhances image similarity [25].
Alternatively to landmarks and frequency domain features, intensity-based methods rely on maximization of global image similarity measures such as the normalized cross-correlation (NCC) [26,27] or the mutual information (MI) [28][29][30][31][32]. As a dimensionless quantity, characterizing structural image similarity of the mutual information has a considerable advantage of being independent from differences between image intensity functions and histograms [33]. This property makes MI-based registration particularly suitable for image alignment that exhibits partial structural similarity but different image intensity levels.
The above registration techniques were previously applied for alignment of medical, microscopic and aerial images. Applications of image registration in the context of multimodal plant image analysis are, however, relatively scarce [34][35][36]. Structural differences between images of different modalities, the presence of nonuniform image motion and blurring make alignment of multimodal plant images a challenging task. Here, we investigate the performance of three registration methods by a direct comparison with manually segmented FLU and VIS plant images of different plants. The developed algorithmic scheme is, however, not limited to FLU/VIS images and can principally be applied to coregistration of other modalities (e.g., near-infrared, 3D projection images) as well. Our experimental results show limitations of conventional approaches by straightforward application to the registration of FLU/VIS plant images. Extensions of conventional algorithmic schemes are presented that allow improvement of the robustness and accuracy of image registration by application to the automated processing of large quantities of image data in the context of high-throughput plant phenotyping.

Image acquisition and preprocessing
Time-series of visible light (VIS) and fluorescence (FLU) top-/side-view mages of developing Arabidopsis, wheat and maize shoots were acquired from high-throughput measurements over more than two weeks using Lem-naTec-Scanalyzer3D high-throughput phenotypic platforms (LemnaTec GmbH, Aachen, Germany). Figure 1 and Table 1 give an overview of the image data modalities and formats used in this study. To assess robustness and accuracy of image registration, investigations were performed with both original (i.e., unsegmented) and manually segmented FLU/VIS images that represent ideally filtered data free of any background structures. Manual segmentation was performed using supervised global thresholding of the background regions, followed by manual removal of any remaining structural artifacts. Since fluorescence and visible light cameras generate images of different dimensions (i.e., FLU-2D grayscale, VIS-3x2D color images), original RGB visible light images images are converted to grayscale. In addition to grayscale intensity images, registration was performed with edge-magnitude images that were calculated as suggested by [37]. Before registration was applied, FLU images were resampled to the same spatial resolution as the VIS images, which improves the robustness of image alignment algorithms, as shown in Fig. 2a. Furthermore, to study the effects of the characteristic image scale on algorithmic performance, registration was applied to both originally sized and equidistantly downscaled images, which effectively performs progressive low-pass smoothing. No further preprocessing steps were used  with exception of top-view Arabidopsis images, where the contrasting blue mat was eliminated prior to image registration.

Image registration using built-in and extended MATLAB functions
Image registration was performed using the following three groups of registration routines, as provided with the MATLAB 2018a Image Analysis toolbox (The Math-Works, Inc., Natick, Massachusetts, United States): • For feature-point matching, several different edge-, corner-and blob-detectors were used. In addition to built-in MATLAB functions that rely on one particular feature detector, an integrative multifeature generator was introduced that merges the results of different feature-point detectors. • Alternative image registration techniques based on frequency domain features rely on the MATLAB imregcorr function, which performs Fourier-Mellin phase correlation of the corresponding spectral image transforms. For assessment of image trans-formation reliability, a fixed threshold of the maximum PC peak height (i.e., H > 0.03 ) was used as suggested in [16]. Transformations obtained with H < 0.03 typically indicate a failure of PC registration, for example, due to excessively low and missing structural similarities between two images. • The third method of image registration is based on maximization of the Mattes mutual information between each two images using the MATLAB imregister function [30,31].
All registration methods were applied to determine a global rigid transformation including rotation, scaling and translation, which correspond to the 'similarity' option of MATLAB transformation routines; see an overview in Table 2.

Evaluation of image registration
To evaluate the results of image registration, two criteria for characterizing the robustness and accuracy of image alignment are used.

Success rate of image registration
To assess the robustness of image registration, the success rate (SR) is calculated as the ratio between the number of successfully performed image registrations ( n s ) and the total number of registered image pairs (n): Image registration was defined as successful when components of the transformation matrix lay within a range of admissible values of translation ( |T | < 300 pixels), rotation ( | cos(α)| < 0.15 ) and scaling ( S ∈ [0.75, 1.25] ). Geometrical transformations that do not fit in this range were treated as a failure of image registration.

Accuracy of image registration
The second criterion is constructed to quantify the accuracy of image registration. For this purpose, geometrical transformations acquired for a pair of FLU/VIS images are applied to manually segmented images, and the overlap ratio (OR) between the area of VIS plant regions (1) SR = n s n .
covered by the registered FLU image ( a r ) and the total area of manually segmented plant regions (a) in VIS image is calculated, as shown in the scheme of evaluation of image registration in Fig. 2: Asymmetric definition of OR, which considers only VIS images, was used because the primary goal of FLU/VIS registration consists of segmentation of plant regions in VIS images.

Experimental results
First, the built-in MATLAB routines for feature-point (FP)-, phase correlation (PC)-and intensity (INT)-based image registration were applied for alignment of original (i.e., unscaled, unfiltered) FLU and VIS images of developing Arabidopsis, wheat and maize shoots. The results of this first feasibility test show a superior success rate of INT registration in comparison to FP-and PC-based approaches; see Table 3. However, the accuracy of INT registration exhibits substantial variations among different plant species.
To dissect possible causes of reduced robustness and accuracy of image registration methods by application to original FLU/VIS images, a systematic analysis of the effects of structural image enhancement and scaling was performed. Figure 3 gives an overview of the preprocessing conditions that were evaluated with respect to image registration outcome, including 35  (2) OR = a r a .

Table 2 Overview of three groups of image alignment methods including feature-point (FP) matching, phase correlation (PC) and image intensity (mutual) information (INT) image features and corresponding MATLAB functions used for calculation of pairwise image correspondences
All methods are used with the 'similarity' option, which restricts the class of possible image transformations to a combination of global rotation, scaling and translation  To dissect the effects of characteristic image scale on the results of image registration, equidistant downscaling of FLU/VIS images in the range of scaling factors between [0.3, 1.0] was applied. Figures 5 and 6 show a summary of success rate and overlap ratio calculations for time-series of developing Arabidopsis, wheat and maize shoots. As seen in the FP/PC diagrams of Fig. 5a, the FP and PC methods exhibit reduced success rates of registration for originally sized and moderately downscaled images. Background filtering in manually segmented images significantly improves the success rate of FP and PC registration; see Fig. 5b. Among these techniques, INT registration shows the most robust performance in terms of SR.
Complementary plots of registration accuracy in Fig. 6 measured using Eq. 2 indicate, however, that a formally successful image alignment within the range of admissible transformations is not always associated with a good overlap between registered and manually segmented (ground-truth) plant areas. In particular, exceptionally high SR values of INT-based registration (Fig. 5) are not accompanied by high OR. Further, one can see that some plant images (e.g., Arabidopsis, top view) can be generally aligned more accurately than the others (e.g., wheat, maize, side view). Thereby, the deviation of registered plant areas from the ground-truth data is larger for original images in comparison to manually segmented plants, cf. Fig. 6a versus b.  Fig. 7c, it is clearly visible that some plants (e.g., Arabidopsis) can generally be registered more accurately by one single registration step than others, and background elimination decisively improves the accuracy performance of FLU/VIS registration.  A closer analysis of cases with low OR revealed several possible causes for inaccurate FLU/VIS alignment including repeated patterns (e.g., multiple similar leaves) and nonuniform image motion due to inertial movements of leaves. Different registration methods exhibit different tolerance levels with respect to  Depending on image preprocessing, registration algorithms may calculate quite different image transformations. Figure 9 shows component distributions of the transformation matrix that were assessed with different registration techniques and preprocessing conditions (i.e., scaling factors, background filtering). As one can see, the values of scaling, rotation and translation undergo considerable variations that correspond to both optimal and suboptimal FLU/VIS image alignments, such as those shown in Fig. 8. At first glance, registration dependency on structural image content and preprocessing appears to be disadvantageous. However, it turns out to be a very helpful feature. Here, we exploit the variability of geometrical transformations resulting from optimal and suboptimal image registration to construct an integrated registration mask that allows for a piecewise approximation of nonuniformly moving plant regions that otherwise could not be completely covered by a single-step registration; see Fig. 8e.
Computational costs of pairwise image registration are essentially dependent on image size, type of registration method and diverse algorithmic parameters. To demonstrate the above-described parameter-dependent performance of FP/PC/INT registration techniques for the automated alignment of multimodal plant images, a GUI software tool with examples of plant images is provided for direct download from our homepage; 1 a screen shot is shown in Fig. 10. While the performance of image registration algorithms was primarily evaluated with FLU and VIS images, our exemplary tests show that they are also applicable to fusion of other image modalities, e.g., FLU/ NIR or VIS/NIR. Examples of FLU, VIS and NIR plant images are included in our online file repository.

Conclusion
Multimodal image registration opens new possibilities for the automatization of image segmentation and analysis in high-throughput plant phenotyping. Using image registration, the result of a straightforward FLU image segmentation can, for example, be applied to automatically detect plant regions in optically more heterogeneous visible light images. Furthermore, the spatial alignment of different image modalities paves the way for consistent assessment of a multiparametric plant phenotype including information on local chlorophyll/water content and disease-/stress-related pigmentation. Our experimental results using three common registration techniques (FP, PC, and INT) show that the robustness and accuracy of FLU/VIS image alignment undergo substantial variations depending on the plant species, interplay between the background and plant intensities, and image preprocessing conditions. In general, background filtering, structural enhancement and downscaling significantly improve the performance of FLU/VIS image registration. However, none of the methods and preprocessing conditions offers universal advantages that guarantee optimal results of single-step registration by application to arbitrary image data. On the basis of insights gained in this study, we conclude that a combination of different registration techniques, scaling levels and image representations (i.e., grayscale and color-edge) enables significantly more robust and accurate results to be obtained when compared to single-step image alignment using one particular method and/or one particular image preprocessing filter. We began this study with the assumption of global rigid image transformations. However, it turned out that FLU/VIS images may exhibit nonuniform motion due to uncorrelated inertial movements of tillers and leaves after relocation or rotation of plant carriers during stepwise image acquisition. Integration of multiple registration results obtained for different preprocessing conditions into one single integrated mask allows this problem to be overcome by constructing a piecewise approximation of nonuniform image motion, which otherwise would require the application of significantly more expensive nonrigid registration.
The basic approach to automated alignment of plant images using a combination of feature detectors and preprocessing conditions presented in this work was evaluated with fluorescence and visible light images, but the results can principally be applied to coregistration of other image modalities, e.g., near-infrared images.