Image-based methods for phenotyping growth dynamics and fitness components in Arabidopsis thaliana

Background The model species Arabidopsis thaliana has extensive resources to investigate intraspecific trait variability and the genetic bases of ecologically relevant traits. However, the cost of equipment and software required for high-throughput phenotyping is often a bottleneck for large-scale studies, such as mutant screening or quantitative genetics analyses. Simple tools are needed for the measurement of fitness-related traits, like relative growth rate and fruit production, without investment in expensive infrastructures. Here, we describe methods that enable the estimation of biomass accumulation and fruit number from the analysis of rosette and inflorescence images taken with a regular camera. Results We developed two models to predict plant dry mass and fruit number from the parameters extracted with the analysis of rosette and inflorescence images. Predictive models were trained by sacrificing growing individuals for dry mass estimation, and manually measuring a fraction of individuals for fruit number at maturity. Using a cross-validation approach, we showed that quantitative parameters extracted from image analysis predicts more 90% of both plant dry mass and fruit number. When used on 451 natural accessions, the method allowed modeling growth dynamics, including relative growth rate, throughout the life cycle of various ecotypes. Estimated growth-related traits had high heritability (0.65 < H2 < 0.93), as well as estimated fruit number (H2 = 0.68). In addition, we validated the method for estimating fruit number with rev5, a mutant with increased flower abortion. Conclusions The method we propose here is an application of automated computerization of plant images with ImageJ, and subsequent statistical modeling in R. It allows plant biologists to measure growth dynamics and fruit number in hundreds of individuals with simple computing steps that can be repeated and adjusted to a wide range of laboratory conditions. It is thus a flexible toolkit for the measurement of fitness-related traits in large populations of a model species. Electronic supplementary material The online version of this article (10.1186/s13007-018-0331-6) contains supplementary material, which is available to authorized users.


Background
Relative growth rate (RGR) and fruit number are two essential parameters of plant performance and fitness [1][2][3]. Proper estimation of RGR is achieved with the destructive measurement of plant biomass across several individuals sequentially harvested [4,5]. However, sequential harvesting is space and time consuming, which makes this approach inappropriate for large-scale studies. Furthermore, it is problematic for evaluating measurement error, as well as to compare growth dynamics and fitness-related traits, like fruit production, on the same individuals. Thus, a variety of platforms and equipment have been developed in the last decade for highthroughput phenotyping of plant growth from image analysis, specifically in crops [6][7][8][9][10] and in the model species A. thaliana [11][12][13][14]. Because commercial technologies are powerful but generally expensive [6,8,11,13], low-cost methods have been proposed, for instance

Open Access
Plant Methods *Correspondence: franc.vasseur@gmail.com; weigel@weigelworld.org 1 Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany Full list of author information is available at the end of the article to estimate rosette expansion rate from sequential imaging of A. thaliana individuals [14][15][16]. These methods can be adapted to a variety of lab conditions, but they do not allow the quantification of complex traits like biomass accumulation, RGR and fruit production.
Strong variation in RGR has been reported across and within plant species [17][18][19][20][21][22], which has been assumed to reflect the inherent diversity of strategies to cope with contrasting levels of resource availability [3,23,24]. For instance, species from resource-scarce environments generally show a lower RGR than species from resourcerich environments, even when they are grown in nonlimiting resource conditions [25,26]. Ecophysiological studies [18,26] have shown that plant RGR depends on morphological traits (e.g. leaf mass fraction, leaf dry mass per area) and physiological rates (e.g. net assimilation rate) that differ between species, genotypes, or ontogenetic stages. For instance, plants become less efficient to accumulate biomass as they get larger and older, resulting in a decline of RGR during ontogeny [4]. This is due to developmental and allometric constraints such as selfshading and increasing allocation of biomass to supporting structures, like stems, in growing individuals.
To assess plant performance, response to environment, or genetic effects, it is important to link individual's growth trajectory to productivity, yield or reproductive success. However, while several methods have been proposed to estimate growth dynamics from image analysis [8,[11][12][13][14][15][16], methodologies for automated, high-throughput phenotyping of fruit number per plant remain surprisingly scarce [27,28]. Yet, the analysis of inflorescence images in A. thaliana could offer a valuable tool to connect growth dynamics and plant fitness. Because of its small size, inflorescences can easily be collected, imaged and analyzed with simple equipment. Furthermore, the genetic resources available in this species enable largescale analyses (mutants screening, quantitative trait loci mapping and genome-wide association studies). For instance, the recent analysis of 1135 natural accessions with complete genomic sequences [29] allows conducting large comparative analysis of phenotypic variation within the species [30,31].
With the methods proposed here, we aimed at developing flexible and customizable tools based on the automated computerization and analysis of plant images to estimate fruit number and growth dynamics, including RGR throughout the life cycle. We focused on A. thaliana because it is a widely used model in plant science and also increasingly being used in ecology, although the same approach could be performed on other rosetteshaped species. The estimation of biomass accumulation was semi-invasive, as it requires sacrificing some individuals to train a predictive model. This approach considerably reduced the number of plants needed to estimate RGR during ontogeny, from seedling establishment to fruiting. Furthermore, the estimation of fruit number from automated image analysis of A. thaliana inflorescences could greatly help link growth variation to plant performance and fitness, in various genotypes and environmental conditions.

Estimation of biomass accumulation, RGR and growth dynamics Description
The method for growth analysis requires a set of plants on which we want to non-destructively measure dry mass, and a set of individuals harvested to train a predictive model (Fig. 1). In the case study presented here, we evaluated the method on 472 genotypes of A. thaliana grown in trays using a growth chamber equipped with Raspberry Pi Automated Phenotyping Array (hereafter RAPA) built at the Max Planck Institute (MPI) of Tübingen. We partitioned the whole population (n = 1920) in two subpopulations: the focal population (n = 960) on which growth dynamics (and fruit production) were measured, and the training population (n = 960) on which a predictive model of plant dry mass was developed.
Individuals of the focal population were daily photographed during ontogeny (Fig. 1a), and harvested at the end of reproduction when the first fruits (siliques) were yellowing (stage 8.00 according to Boyes et al. [32]). Topview images were manually taken during the first 25 days of plant growth (Additional file 6: Fig. S1). Plants of the training population were harvested at 16 days after germination (DAG), dried and weighed for building a predictive model of rosette biomass with top-view images (Fig. 1b). Predictive models were trained and evaluated with a cross-validation approach (Fig. 1c). Once a predictive model has been chosen and validated, rosette dry mass can be non-destructively estimated on all individuals of the focal population, which allows modeling growth trajectory, biomass accumulation and RGR throughout the plant life cycle.

Implementation
We developed an ImageJ [33] macro (Additional file 1) to extract shape descriptors of the rosette from tray or individual pot images (Fig. 1a). The macro guides users in the different steps of image analysis to label plant individuals, perform segmentation and measure rosette shape descriptors. It processes all images (trays or individual pots) present in an input folder, and returns shape descriptors of individual rosettes in an output folder defined by users. Shape descriptors include individual rosette area (RA) and π ×Major axis length 2 . Rosette area and perimeter can be converted into cm 2 and cm, respectively, by measuring the area and perimeter of a surface calibrator defined by users. Predictive models of plant dry mass from shape descriptors were tested against measurements in the training population (R code in Additional file 2). Depending on the training population size, we observed variable prediction accuracy for different models, as measured by the coefficient of correlation (r 2 ) between measured and predicted rosette dry mass in individuals not used to train the model (Fig. 1c). LASSO and RIDGE models reached high prediction accuracy even with very small training population size (< 20 individuals). However, with a minimum of 50 training individuals, lm and RIDGE/LASSO performed equally, with a prediction accuracy > 90%. Using stepwise regression, we showed that using only rosette area and circularity as predictors in a simple linear model framework can reach high prediction accuracy (r 2 = 0.91, Fig. 1d). Thus, the final equation we used to estimate rosette dry mass from rosette pictures was Fig. 1d).

Application
From estimated rosette dry mass during the ontogeny and final rosette dry mass measured at the end of the life cycle (maturity), we modeled sigmoid growth curves of biomass accumulation (mg), M(t), for all individuals in the focal population with a three-parameter logistic function [4,34] ( Fig. 2a, b), as in Eq. 1: where A, B and t inf are the parameters characterizing the shape of the curve, which differ between individuals depending on the genotypes and/or environmental conditions. A is the upper asymptote of the sigmoid curve, which was measured as rosette dry mass (mg) at maturity. The duration of growth was estimated as the time in days between the beginning of growth after vernalization (t 0 ) and maturity. B controls the steepness of the curve, as the inverse of the exponential growth coefficient r (r = 1/B). t inf is the inflection point that, by definition, corresponds to the point where the rosette is half the final dry mass. Both B and t inf were estimated for every (1) individual by fitting a logistic growth function to the data in R (Additional file 3). Growth dynamics variables were computed from the fitted parameters, such as GR(t), the derivative of the logistic growth function (Fig. 2c, d), as in Eq. 2: and the relative growth rate (mg d −1 g −1 ), RGR (t), measured as the ratio GR(t)/M (t) (Fig. 2e, f ), as in Eq. 3: Comparing growth traits measured at t inf , i.e. when GR is maximal for all individuals [4], revealed important variations between accessions ( Fig. 2g-i), with an important part of phenotypic variance accounted by genetic variability, as measured by broad-sense heritability (H 2 = 0.93, 0.90 and 0.65 for M(t inf ), GR(t inf ) and RGR (t inf ), respectively). To evaluate the robustness of the method, we repeated an experiment on 18 accessions selected for their highly contrasted phenotypes (Additional file 6: Fig. S2). Results showed a good correlation between the rosette dry mass at the inflection point estimated in the first experiment and the dry mass destructively measured in the second experiment (r 2 = 0.67; Additional file 6: Fig. S3a).

Estimation of fruit number from inflorescence images Description
The method to estimate fruit number from inflorescence images requires to manually counting fruits on a fraction of individuals in order to train predictive models (Fig. 3). All individuals were harvested at the same stage, when the first fruits started to dry. Inflorescence and rosette of individuals of the focal population were separated and both photographed (Fig. 3a). Fruits were manually counted on the inflorescence images of 352 out of 856 plants harvested (Fig. 3b). In parallel, we analyzed the inflorescence skeletons of all the 856 harvested plants with a dedicated ImageJ macro (Additional file 4). Using the skeleton descriptors computed with the macro and manual measurements in the population subset, we evaluated the accuracy of different models to predict the number of fruit per individual (Fig. 3c), and applied the best model to the whole focal population.

Implementation
For all images present in the input folder, the "RAPA-macro_InflorescenceSkeleton.txt" macro (Additional file 4) automatically performs image segmentation, skeletonization, and computation of 2D skeleton parameters of inflorescence (Fig. 3a). 2D skeletons analysis with ImageJ returns nine vectors of variables for each plant (described in Fig. 3), which were automatically saved as.xls files by the macro (in an output folder defined by user). The sums of these nine vectors per individual were used as nine predictors of fruit number. Using the same approach as for estimating rosette dry mass, we tested different models and different training population size with cross-validation (R code in Additional file 5). As for rosette dry mass, results showed that the nine skeletons descriptors predict > 90% of fruit number in 100 individuals not used to train the model (Fig. 3c). With a training population size > 30 individuals, lm performed equally than LASSO and RIDGE regressions. As for dry mass estimation, quadratic models performed poorly. For small training population size, LASSO and RIDGE regressions reached higher prediction accuracy than linear or quadratic models. Using stepwise regression, we showed that the best model to estimate fruit number in a linear model framework is: Fruit Nb = 0.181 × Nb actual junctions + 0.003 ×Nb slab pixels + 0.226 × Nb triple points (cross-validation r 2 = 0.91, Fig. 3d).

Application
The model to estimate fruit number from inflorescence images was applied on all individuals of the focal population (Fig. 4a). We measured a relatively high broadsense heritability for fruit production across accessions (H 2 = 0.68), compared to H 2 estimates of morphological and physiological traits measured in previous studies [35]. In addition, fruit number estimated from image analysis was well correlated with fruit number manually counted on 18 genotypes phenotyped in a second experiment (r 2 = 0.70; Additional file 6: Fig. S3b). To further validate the method, we applied the predictive model on an independent set of inflorescence images taken at the Compared to wild-type Col-0, rev5 produced less fruits due to the effect of the mutation on branching pattern and flower development [36]. This was well captured by the predictive model (Fig. 4b), yet trained on the natural accessions.

Discussion
Arabidopsis thaliana is the most widely used plant species in molecular biology, ecology and evolution, but we still largely ignore how growth dynamics is related to individual performance and fitness [37]. This is mainly because traits like RGR and fruit number remain difficult to measure in large scale experiments. Thus, our goal was to develop a set of tools for biologists to analyze these traits with low-cost equipment. Rather than the development of a new methodology or algorithm, we propose an application guide for the implementation of image computerization with free software (R, ImageJ). From simple top-view imaging of rosette and inflorescence, we built robust predictive models of plant dry mass and fruit number. Based on a semi-invasive approach and two computing steps-one to analyze images with ImageJ and one to model data with R -, the method allows a single experimenter to simultaneously measure biomass accumulation, RGR and fruit production over thousands of plants.
For rosette-shaped species like A. thaliana, top view pot or tray imaging can easily be done in any laboratory or facilities. In this study, we used pictures of trays manually taken during ontogeny with a regular camera. The same approach has been proposed in low-cost systems for high-throughput phenotyping in A. thaliana, using projected rosette area to measure growth during several hours or days [14][15][16]. Comparatively, our method allows measuring the absolute and relative rate of biomass accumulation during the whole life cycle of a plant. The time lapse and frequency of tray imaging is important for proper fitting of the growth curve. We used daily imaging during the 25 first days of growth after vernalization, although growth curves can be fitted with only one picture every 2-3 days. The ImageJ macro we developed here automatically processes tray images when plants are young and do not overlap. When they become too large (20-25 DAG in our study), the macro offers the possibility to spatially separate plants (manual mode). We estimated that, on a desktop computer, the macro takes approximately 20-25 s per tray (30 individuals) when running on automatic mode, and between 1 and 2 min in manual mode (depending on the number and amplitude of corrections to make).
The semi-invasive approach drastically reduces the number of replicates necessary to measure the growth dynamics, or the time needed for manual measurement of fruit number. Furthermore, it allows experimenter to compute biomass accumulation non-destructively until the end of the life cycle, and thus, to compare growth and reproductive success on the same individuals. We showed that the method is robust and reproducible across experiments. Moreover, the model for fruit prediction correctly predicted the decrease in rev5 due to flower abortion in a complete independent experiment. However, we recommend making a new predictive model of plant biomass with cross-validation for each experiment (code example available in Supplementary File 2), specifically if growth conditions change, as the relationship between Prediction of fruit number (mean ± 95% CI) from model trained on accessions and applied to rev5 mutant and Col-0 wild-type (n = 5). Results are compared to observed fruit number manually counted at harvesting rosette morphology and rosette biomass is expected to differ depending on genotypes and environments. Furthermore, our approach for estimating growth dynamics was powerful in A. thaliana, a rosette-shaped species for which size can be estimated from on 2D images. Although our method must be efficient in other rosetteshaped species, biomass estimation in plants with complex 3D architecture requires more sophisticated image analysis. A recent study in maize offers a nice example of 3D reconstruction and biomass prediction with a dedicated phenotyping platform [8]. The same limitations hold true for the estimation of fruit number: our imagebased method can only be performed on species with inflorescences that can be imaged on a 2D plan.
In this study, we propose flexible methods and customizable tools for researchers to characterize plant phenotype in their own facilities. This should lower the barriers of high-throughput phenotyping, and help dissect the relationships between growth dynamics and reproductive success in various laboratory conditions. Methods were developed for A. thaliana, which is the favorite model in plant genetics and molecular biology, and it is also becoming a model in evolutionary biology and ecology [30,31,[37][38][39]. We hope that these tools will encourage researchers to analyze complex traits and fitness components in various conditions and genotypes, thus participating to the effort to better understand the physiological bases of plant adaptation.

Plant material
472 natural accessions of A. thaliana were selected from the initial germplasm of the 1001 Genomes project [29] (http://1001g enome s.org/; Additional file 7: Table S1). Seeds used in this study were obtained from parental plants propagated under similar conditions in greenhouse. All seeds were stored overnight at − 80 °C and surface-sterilized with 100% ethanol before sowing. A transgenic line of A. thaliana affecting in branching pattern and fruit production was used: rev5, which is a strong ethyl-methylsulfonate (A260 V) knock-out mutation of REVOLUTA in the Col-0 background [36].

Growth conditions
Plants were cultivated in hydroponics, on inorganic solid media (rockwool cubes) watered with nutrient solution [40]. Four replicates of 472 accessions were grown, with pots randomly distributed in 64 trays of 30 pots each. Seeds were sowed on 3.6 cm × 3.6 cm × 3 cm depth rockwool cubes (Grodan cubes, Rockwool International A/S, Denmark) fitted in 4.6 cm (diameter) × 5 cm (depth) circular pots (Pöppelmann GmbH and Co., Germany). The pots were covered with a black foam disk pierced in the center (5-10 mm hole manually made with a puncher). Before sowing, the dry rockwool cubes were watered with 75% strength nutrient solution. The chemical composition of the nutrient solution was obtained from Conn et al. [40].
After sowing, trays were incubated for 2 days in the dark at 4 °C for seed stratification, and then transferred for 6 days to 23 °C for germination. After germination, all plants, having two cotyledons, were vernalized at 4 °C during 41 days to maximize the flowering of all the different accessions. Plants were thinned to one plant per pot, and trays were moved to the RAPA room, set to 16 °C with a temperature variability of close to ± 0.1 °C, air humidity at 65%, and 12 h day length, with a PPFD of 125-175 µmol m −2 s −1 provided by a 1:1 mixture of Cool White and Gro-Lux Wide Spectrum fluorescent lights (Luxline plus F36 W/840, Sylvania, Germany). All trays were randomly positioned in the room, and watered every day with 100% strength nutrient solution.
Replicates 1 and 2 (the focal population, n = 960) were harvested when the first fruits started to dry. Due to germination failure, mortality or missing data, only 451 accessions were phenotyped for growth, and 441 for fruit number. Replicates 3 and 4 (the training population, n = 960) were harvested at 16 DAG for dry mass measurement.
A second experiment was performed on a set of 18 contrasted accessions (Additional file 6: Fig. S2), grown in the same conditions. Three replicates per genotype were harvested at the estimated inflection point for rosette dry mass measurement (inflection point estimated from the first experiment), and five replicates were harvested at the end of the life cycle for manual fruit counting.
rev5 and Col-0 were cultivated in the Center for Plant Molecular Biology (ZMBP, University of Tübingen, Germany). Plants were grown on standard soil (9:1 soil and sand) under controlled conditions: in long days (16 h day; 8 h night), low light (70-80 µE m −2 s −1 ) and an ambient temperature of 21 °C (see [41] for details).

Plant imaging and harvesting
All trays were manually imaged every day during the first 25 days after vernalization with a high-resolution camera (Canon EOS-1, Canon Inc., Japan). Individual labeling (i.e. genotype, replicate and date of measurement) was performed with ImageJ [33] during the image analysis process with the "RAPAmacro_RosetteShape.txt" macro. Image segmentation was performed on rosette and inflorescence, after inverting images and adjusting color saturation between 35 and 255. However, it is important to note that color threshold for segmentation depends on light conditions during imaging, and thus, needs to be adjusted by users on a set of template images. To clean segmented images, undesirable dots that remained after segmentation were removed with the 'Remove outliers' function in ImageJ. After segmentation, inflorescence skeletonization and 2D skeleton analysis were automatically performed with the corresponding functions in ImageJ (see code in Additional file 4). Skeletons were not pruned for loops. Extracted rosette shape and inflorescence skeleton parameters were automatically saved as.xls files.
Plants in the training population were harvested at 16 days after vernalization, rosette were dried at 65 °C for three days, and separately weighed with a microbalance (XA52/2X, A. Rauch GmbH, Graz, Austria). All individual rosette parameters extracted after segmentation were saved as.xls files, each row corresponding to a specific date, tray label and pot coordinates.
At the end of the life cycle, inflorescence and rosette of the focal population were harvested and separately photographed. They were dried at 65 °C for at least three days, and weighed with a microbalance (XA52/2X). In the experiment at ZMBP, whole plants of rev5 and Col-0 were photographed at the end of the life cycle (first yellowish fruits) by taking side pictures of each pot separately (n = 5).

Statistical analyses
Different predictive models were evaluated for both the estimation of rosette dry mass and fruit number. We notably compared linear models, quadratic models-where each predictor was fitted as a two-order polynomial function, RIDGE and LASSO regressions (Additional files 2 and 5). Prediction accuracy was tested by cross-validation on 100 individuals not used to train the model, using the Pearson's coefficient of correlation (r 2 ) between observed and predicted trait values. For each model, we tested prediction accuracy according to training population size across 100 random permutations of the training dataset (R code in Additional files 2 and 5). Training population size varied between 10 and 250 for dry mass estimation, and between 10 and 120 for fruit number estimation.
Stepwise regression, using step function in R, was used to identify the best model, with minimum predictors, of rosette dry mass and fruit number.
Non-linear fitting of individual growth curves (Eq. 1) were performed with the nls function in R (Additional file 3). Since some plants germinated during or, for a few, after vernalization, we considered the first day of growth (t 0 ) for each individual of the focal population as the day at which it had a minimum size. For convenience, we used the size of the biggest measured plant across all individuals at the first day of growth following vernalization, which corresponded to a plant with first true leaves just emerged. Growth was expressed as a function of days after germination (DAG, starting at t 0 ). This procedure allowed for normalization of growth trajectories from the same starting point between individuals that differ in germination speed [42]. Growth dynamics variables were computed from the fitted parameters, such as absolute growth rate, GR(t), the derivative of the logistic growth function (Eq. 2), and RGR (t) (Eq. 3).
Broad-sense heritability (H 2 ) was calculated with a Bayesian approach implemented in a MCMCglmm model performed in R, considering the accession as a random factor, as: where y is trait of interest in individual k of genotype i, G i is accession i, and e ik is the residual error. H 2 was calculated at the proportion of genotypic variance ( σ 2 G ) over total variance ( σ 2 G + σ 2 E ):

Additional files
Additional file 1. ImageJ macro used to extract rosette shape descriptors from top-view tray or pot images.
Additional file 2. R code used to predict rosette dry mass from rosette shape descriptors, with cross-validation approach to train and test different models and training population size.
Additional file 3. R code used to model sigmoid growth curves and growth dynamics (M(t), GR(t), and RGR (t)) from predicted rosette dry mass during ontogeny and measured rosette dry mass at maturity.
Additional file 4. ImageJ macro used to extract inflorescence skeleton descriptors from top-view images of plant inflorescence.
Additional file 5. R code used to predict fruit number from inflorescence skeleton descriptors, with cross-validation approach to train and test different models and training population size.
Additional file 6: Figure S1. The RAPA facility. Entrance view of the growth chamber with a zoom on the camera installed between light tubes (top-left panel). On the right is the setup to water the plants and take manual tray picture. Figure S2. Representation of the 18 accessions phenotyped in the second experiment. Nine phenotypic groups represented by the purple circles (three groups of RGR and three groups of growth duration) were selected, each containing two accessions. Figure  S3. Inter-experiment reproducibility of rosette dry mass and fruit number estimation. Measured across 18 contrasted accessions. (a) Pearson's coefficient of correlation (r 2 ) between rosette dry mass M estimated at the inflection point t inf in the first experiment and rosette dry mass M measured at t inf in the second experiment. (b) r 2 between the number of fruits estimated in the first experiment and the number of fruits measured in the second experiment. Additional file 7: Table S1. List of the 451 accessions phenotyped (n = 2), with measured and estimated traits, and fitted model parameters of Eq. 1.