Image-based methods for phenotyping growth dynamics and fitness in large plant populations

Background With the development of next-generation sequencing technologies, high-throughput phenotyping has become the new bottleneck of quantitative genetics analyses. The model species Arabidopsis thaliana offers extensive resources to investigate intraspecific trait variability and the genetic bases of ecologically relevant traits, such as growth dynamics and reproductive allocation. However, reproducible and cost-effective methods need to be developed for the measurement of growth and especially fitness related traits in large populations. Here we describe image-based methods that can be adapted to a wide range of laboratory conditions, and enable the reliable estimation of biomass accumulation and fruit production in thousands of A. thaliana individuals. Results We propose a semi-invasive approach, where part of a population is used to predict plant biomass from image analysis. The other part of the population is daily imaged during three weeks, then harvested at the end of the life cycle where rosette and inflorescence are separately imaged. We developed ImageJ macros and R codes for image segmentation, 2D skeletonization and subsequent statistical analysis. First, ontogenetic growth is modelled from estimated and measured dry mass for all individuals with non-linear regressions, from which the dynamics of absolute growth rate (GR) and relative growth rate (RGR) are calculated. Second, analysis of the 2D inflorescence skeleton allows the estimation of fruit production, an important component of plant fitness. Our method was evaluated across 451 natural accessions of A. thaliana. Cross-validation revealed that our image-based method allows predicting approximately 90% of biomass variation and 70% of fruit production. Furthermore, estimated traits - like measured traits - showed high heritabilities and inter-experiment reproducibility. Conclusions We propose a flexible toolkit for the measurement of growth and fitness related traits in large plant populations. It is based on simple imaging, making the method reproducible at low cost in different facilities. However, as manual imaging of large plant populations can quickly become a limiting factor, we also describe an automated high-throughput imaging coupled with micro-computers that enables large phenotypic screening for genome-wide association studies and stress experiments.

populations can quickly become a limiting factor, we also describe an automated highthroughput imaging coupled with micro-computers that enables large phenotypic screening for genome-wide association studies and stress experiments.

Background
Plants become less efficient to accumulate biomass as they get larger and older, resulting in a decline of the relative growth rate (RGR) in the course of ontogeny [1][2][3]. This is due to developmental constraints such as increasing allocation of biomass to supporting structures like stems, and self-shading due overlapping leaves in larger individuals. Strong variation of growth has been reported across plant species [1,[4][5][6][7][8][9][10][11], which has been assumed to reflect the inherent diversity of plant strategies to cope with contrasting levels of resource availability [1,5,8,12,13]. For instance, species from resource-scarce environments generally show a reduced speed of biomass accumulation compared to species from resource-rich environments, even when they are placed in the same, non-limiting conditions [2,14]. The same observation has been made within species [5,15]. However, the genetic bases of plant growth dynamics remain unclear, due to the complexity of the genetic architecture of underlying traits [16,17], and thus, to the necessity of phenotyping very large plant populations.
To compare growth dynamics between genotypes or treatments, several methods have been proposed to compute RGR from a small numbers of observations [7, 18,19].
These methods are valuable to investigate changes in growth over a short period of time, for instance in response to environmental stress. However, sequential harvesting of many individuals during ontogeny is space and time consuming. Furthermore, it is problematic for evaluating standard deviation and measurement error across genotypes, as the same individual is not followed during ontogeny. Ecophysiological and ecological studies to make such comparison [20]. Because of its selfing mating system and its small size, numerous individuals can be grown in relatively small facilities, and experiments are reproducible indefinitely on the same genotypes [21]. In 2016, the genetic resources available for this species have reached a milestone with the complete sequencing of 1,135 natural accessions with publicly available seeds to conduct comparative analysis (http://1001genomes.org/) [22].
Many high-throughput platforms have been developed in the last decades for plant phenotyping in crops [23][24][25][26][27], and in A. thaliana [10,[28][29][30][31][32][33]. However, most of them are (i) either commercial or require the purchase of expensive materials [23,25,[27][28][29]31] but see [34], (ii) not suitable for the analysis of growth phenotypes from germination to fitness parameters [30,32,34,35], or (iii) do not allow the quantification of complex traits like biomass accumulation and RGR [30,36]. Thus, we aimed at developing flexible and easily reproducible methods to measure individual growth dynamics and fruit production across large populations of A. thaliana. Our pipeline is based on manual or automated imaging, and subsequent analysis with ImageJ [37] and R [38]. The method was evaluated on a set of 451 worldwide accessions.

Plant imaging and population sub-sampling
In our study, we used a hydroponic system where plants are cultivated in trays on inorganic solid media (rockwool cubes) and nutrients are daily provided through a solution [39] (Fig.   1a; Table S1). This system allows plants to have an unlimited source of nutrients for growth, as well as a highly controlled environment to avoid pests and herbivores. Furthermore, the hydroponic system also enables the measurement of the weight of the root system.
The individual plant growth was followed using a code system: the genotype sowed in each pot was defined before the beginning of the experiment by the label of the tray (e.g., T01, T02, etc.) and the coordinate of the pot in each tray (e.g., A1, C5, D8, etc.).
The label of the tray was placed at the top-left corner, which defined the coordinate of the first position (A1) on each tray (Fig. 1b). Four replicates of 480 genotypes were grown, with pots randomly distributed in 64 trays of 30 pots each (Fig. 1b) For modelling growth, we partitioned the whole population into two subpopulations: replicates 1 and 2 of every genotype represent the focal population on which growth dynamics and fruit production were measured. They were harvested at the end of reproduction when the first fruits (siliques) were yellowing (stage 8.00 according to Boyes We then developed a predictive model of rosette dry mass from the five traits extracted from rosette images with ImageJ. Generalized linear models (glm) were evaluated with a cross-validation approach on the training population (Additional File 2).
Prediction accuracy was measured as the coefficient of correlation (r 2 ) between measured and predicted rosette dry mass. Results showed that we can explain ca. 90% (P < 0.001) of rosette dry mass with the model that included rosette area, perimeter, circularity, aspect ratio and roundness. However, as removing rosette circularity, aspect ratio and roundness did not impact the prediction accuracy (r 2 = 0.91 when replicate 3 is used to train the model, Fig. 3a; and r 2 = 0.90 when replicate 4 is used to train the model, Fig. 3b), these traits were therefore removed in subsequent analyses. Thus, the final equation we used to estimate rosette dry mass from rosette pictures, in the 25 first days following vernalization, is given by Eq. 1 of Table 1, where the three coefficients (given in Table S2) were all significant (P = 0.03 for the intercept, P < 10 -4 for both the coefficients of rosette area and rosette perimeter).

Fitting non-linear model of individual growth dynamics in the focal population
Since some plants germinated during or, for a few, after vernalization, we considered the first day of growth (t 0 ) for each individual of the focal population (replicates 1 and 2) as the day at which it had a minimum size. For convenience, we used the size of the biggest measured plant across all individuals at the first day of growth following vernalization.
This corresponded to a plant with first true leaves just emerged (2-3 mm each, Additional File 2). This procedure allowed for normalization of growth trajectories from the same starting point between individuals that differ in germination speed. We used the predictive model established above on the training population to estimate rosette dry mass on the focal population during the 25 first days following vernalization (Fig. 4). Previous studies have shown that leaf and whole-plant growth follow a sigmoid curve in A. thaliana [41][42][43]. We thus used a three-order logistic function f of rosette dry mass (M) in the course of time t (Eq. 2 in Table 1) [3] as: where A, B and t i are the parameters characterizing the shape of the curve, which are expected to differ between individuals depending on genotypes and/or environmental conditions. A is the upper asymptote of the sigmoid curve, which was measured as rosette dry mass (mg) at maturity (Fig. 4). B controls the steepness of the curve, as the inverse of the exponential growth coefficient r (r = 1/B). t i is the inflection point that, by definition, corresponds to the time point where the rosette is at 50% of its final dry mass, and where the absolute growth rate (GR, mg d -1 ) is maximal [3]. Both B and t i were estimated by fitting a logistic growth function on every individual in R (Table S2; Additional File 2). We then estimated several dynamical variables from the fitted parameters ( Fig. 4; Table 1; Table S2), such as absolute growth rate (GR(t), the derivative of the logistic growth function, mg d -1 ; Eq. 3 in Table 1) and RGR(t) (GR(t) / M (t), mg d -1 g -1 ; Eq. 4 in Table 1).  Table 1) are given in Table S2.

Robustness of the method and reproducibility of trait measurement
Our analysis of trait variation focuses on eight phenotypic traits: five were measured (growth duration, final rosette and root dry mass, root and reproductive allocation), and three were estimated (GR and RGR at the inflection point, number of siliques). Trait values for all individuals are available in Table S3. They were all strongly variable across the 451 accessions (Fig. 6). The part of phenotypic variance accounted by genetic variation across individuals (broad-sense heritabilities, H 2 ) for all traits were higher than 0.70, except for root allocation, a measured trait, with a H 2 of 0.40 (Table S4) Table S4). To evaluate the reproducibility of trait value estimated across genotypes with our method, we repeated a second experiment on 18 accessions selected for their highly contrasted phenotypes (Fig. S3). We used nine phenotypic groups, each containing two accessions, spanning the range of growth duration and RGR variation (three groups of growth duration and three groups of RGR, see Table S5 and  Fig. 7b).

Discussion
Biomass accumulation is a key physiological parameter related to individual performance and stress response [1,44]. applicable in any facilities (e.g., growth chambers, greenhouses). Our method developed for A. thaliana has the advantage of being cost-effective and flexible to any facilities. From simple imaging, it allows a single experimenter to simultaneously measure rosette biomass accumulation, RGR and fruit production over thousands of plants. Importantly, all the growth parameters measured in this study varied strongly between individuals, notably between early and late flowering plants (Fig. 4, Table S3). A. thaliana is the favorite model in plant genetics and molecular biology, and it is also becoming a model in evolutionary biology and ecology [9,21,45]. Our approach therefore offers new avenues for dissecting the links between growth and fitness, and the genetic and evolutionary bases of phenotypic variation. The same method could be applicable to other species, but the 3D architecture in non-rosette species requires more sophisticated image analysis. A recent study in maize offers a nice example of high-throughput 3D reconstruction of plants and biomass prediction [25].

Technical considerations about high-throughput imaging and analysis
In this study, we took advantage of the RAPA system to collect plant images at highthroughput during growth. Additionally, we took pictures of the trays with a regular camera every day when we watered the plants. Measurements of rosette morphology (area, perimeter and shape descriptors) were the same on manual images and RAPA images.

Reproducible and low-cost methods for genetic and ecophysiological studies
The semi-invasive approach based on statistical modelling and cross-validation to convert rosette morphology into biomass drastically reduces the number of replicates necessary to measure the dynamics of biomass accumulation. Here, we presented the analysis of eight growth trait across 451 accessions, for which we measured high heritabilities using only four replicates per genotype (Table S4). The reproducibility of our method is supported by the second experiment where 18 contrasted accessions were phenotyped in the same condition (Fig. 7). Importantly, we voluntarily chose 18 accessions that maximize differences in growth duration and RGR to reproduce the most extreme phenotypes ( Taking advantage of its flexibility, we hope that our method can help future studies to dissect the relationships between genetic diversity, environment and plant performance. Genomes project [22] (http://1001genomes.org/; a list of the accessions used in this study is available in Table S4) for monitoring vegetative growth and fitness in the RAPA system at MPI-Tübingen. Seeds used in this study were obtained from parental plants propagated under similar conditions in greenhouse. All seeds were stored overnight at -80 °C and surface-sterilized with 100% ethanol before sowing.

Growth conditions
Plants were cultivated on a semi-hydroponic system. Seeds were sowed on 3.6 cm x 3.6 cm x 4 cm depth rockwool cubes (Grodan cubes, Rockwool International A/S, Denmark) fitted in 4.6 cm (diameter) x 5 cm (depth) circular pots (Pöppelmann GmbH and Co., Germany).
The pots were covered with a black circle pierced in the centre (5-10 mm hole manually made with a puncher). Before sowing, the dry rockwool cubes were weighed with a microbalance. Then, they were watered by dipping them completely into 75% strength nutrient solution in order to achieve full humidification and fertilization. The chemical composition of the nutrient solution was obtained from Conn et al. [39] (also available in Table S2).
After sowing, trays were incubated for two days in the dark at 4 °C for seed stratification, and then transferred for six days to 23 °C for germination. After six days, individual was estimated as the difference in the weight of the dry rockwool cube before and after plant growth. Root and reproductive allocation were measured as the ratio of root and inflorescence dry mass, respectively, over total (rosette + roots + inflorescence) plant dry mass. For the training population harvested at 16 days after vernalization, rosette were dried at 65 °C for three days, and separately weighed with a microbalance (XA52/2X).

Statistical analyses
All coefficients of correlation (r 2 ) used to estimate the prediction accuracy of our crossvalidation approaches were obtained from Pearson's coefficients between estimated and measured data, using the cor.test function in R. Non-linear fitting of the logistic growth functions (Eq. 2 in Table 1) were performed on every individual with the nls function in R (detailed code in Additional File 2). Broad-sense heritabilities (H 2 ) were calculated with a Bayesian approach implemented in a MCMCglmm model performed in R, considering the accession as a random factor, as: where y is any trait of interest after logarithmic transformation for the individual k of genotype i, G i is the identifier of the accession i, and e ik is the residual error associated with every individual. H 2 was calculated at the proportion of genotypic variance ( σ G 2 ) over total variance ( σ G 2 +σ e 2 ):  Table 1) • RGR: relative growth rate (mg d -1 g -1 ) measured as the ratio between GR and M for each time point throughout ontogeny.

Ethics approval and consent to participate
Not applicable.

Consent for publication
Not applicable.

Availability of data and material
The datasets supporting the conclusions of this article are included within the article and its additional files. Correspondence and requests for materials should be addressed to weigel@weigelworld.org or franc.vasseur@gmail.com .             TableS2_ModelParameters.xlsx"). Parameters of the models fitted, corresponding to equations 1 (tab 1 in file), 2 to 4 (tab 2 in file), and 5 (tab 3 in file) of Table 1.