### Plant material and dataset composition

The pipeline was tested on a dataset from an experiment conducted in 2017 involving a set of 60 commercial maize hybrids representative of breeding history in Europe during the last 60 years. This material covers a wide range of plant architecture, growth and development, leading to an appreciable variability of performances in the field [^{33}]. The experiment was conducted in the PhenoArch phenotyping platform (Fig. 1A) hosted at the M3P (Montpellier Plant Phenotyping Platforms) [^{6}].

Briefly, plants were sown in 9L pots filled with a 30:70 (v/v) mixture of a clay and organic compost. Two levels of soil water content were imposed: (i) retention capacity (WW, soil water potential of −0.05 MPa) and (ii) water deficit (WD, soil water potential of −0.3 MPa). Each combination of genotype and water treatment was replicated 7 times, 4 with early harvesting (until 12 visible leaves stage, ~ 40 days after plant emergence) and 3 with late harvesting (until ~ 55 days after plant emergence, i.e. ~ 15 days after panicle emergence), resulting in a total of 840 plants. Greenhouse temperature was maintained at 25 ± 3 °C during the day and 20 °C during the night. Details of experimental growing conditions can be found at [^{34}].

RGB images (2048 × 2448 pixels) were taken daily for each plant with twelve side views from 30° rotational difference (Fig. 1B), using the imaging units of the PhenoArch platform. Each unit is composed of a cabin involving an RGB camera (Grasshopper3, Point Grey Research, Richmond, BC, Canada) equipped with 12.5–75 mm TV zoom lens (Pentax, Ricoh Imaging, France) and LED illumination (5050–6500 K colour temperature). Images were captured while the plant was rotating at constant rate (20 rpm) using a brushless motor (Rexroth, Germany).

### 3D reconstruction and organ segmentation

Phenotrack3D requires as input a time-series of 3D reconstructed plants with stem and leaves individually segmented. Several existing methods allow to segment maize or sorghum plants [^{35}, ^{31}, ^{36}]. In our study, we apply Phenomenal [^{16}] on single time-point data to reconstruct 3D volumes of the plants (Fig. 1C), extract 3D skeletons (Fig. 1D) and 3D segmentations of plant volumes into individual plant organs (stem, leaves) (Fig. 1E). Briefly, Phenomenal estimates a 3D plant volume by applying a space carving algorithm [^{37}] on a regular grid of voxels, so that the projection of the plant volume matches all plant silhouettes of the multi-view image stack. It then searches iteratively among voxels the longest shortest paths connecting plant base to the most distant voxel, removing at each iteration all voxels intercepted by a sweeping perpendicular plane along the path. The result is a set of paths, the 3D skeleton, with each path being associated to a set of voxels representing individual leaves. The only exception is for the first path found, that contains both the stem and a leaf. This path is therefore further segmented, by finding stem tip, computed as the highest minimum of the voxel interception curve (i.e. the number of voxels intercepted by the sweeping plane as a function of curvilinear abscissa).

Two post-processing steps have been designed to improve the quality of the segmentation. First, we extrapolate midribs up to the leaf tips to reconstruct part of the leaves that have been shortened during the 3D reconstruction. Second, due to severe imprecision of the location of stem tip detection at late developmental stages, Phenomenal was complemented with a new stem tip detection method based on deep-learning.

We post-processed the 3D skeleton output by projecting it on the corresponding 2D binary images, to extend the leaf tips which are often shortened during the reconstruction. To that end, 2D binary images extracted with Phenomenal were also skeletonized, and the segments of both skeletons were matched to find an extension path for each segment of the original 3D skeleton. This step is illustrated in Fig. 2, and was repeated for each of the 12 camera angles. For a given 2D binary image, each 3D skeleton branch is projected then associated with the closest branch of the 2D skeleton of the binary image to identify the best extension path to the leaf tip (Fig. 2D; green lines). The corrected length of the 3D skeleton branch is finally computed as the median of all extension paths found for this branch over the 12 camera angles.

To precisely locate the stem tip, which is of particular importance for Phenotrack3D, we trained an object detection model to detect collars on the 2D images, like in [^{38}]. To that end, we used a YOLOv4 deep-learning model [^{39}] (see Additional file 1 for training details). The detected collars were used to define the stem height as the highest collar height among the 12 side-view images. The leaves with an insertion point below the stem tip were defined as ligulated leaves (Fig. 3).

From this point, the plant is represented by a time-series of 3D observations, from which stem height can be extracted at each date and smoothed over time. Finally, time points with abnormal stem shapes were removed. These points were detected by first constructing a median stem, obtained as the polyline joining the (x, y) median position of all stem polyline points in the time-series, grouped by discrete z coordinates. Then, all time points with a stem polyline whose directed Hausdorff distance from the median was greater than 10 cm were removed from the time-series. (Additional file 3).

### Time-lapse tracking of 3D organs

We track the ligulated leaves using a multiple sequence alignment algorithm. First, at each time point \(t\) we build a leaf sequence \({P}_{t}\) composed of feature vectors representing key characteristics of the segmented ligulated leaves. From this representation, we define a cost function for comparing leaves pairwise, and derive a sequence alignment algorithm to establish the correspondences between two successive leaves sequences. We then propagate this procedure to determine ligulated leaf tracks along the whole time-course. Each organ track is then identified by a leaf rank, which corresponds both to its order of emergence and its location along the stem. Finally, we track backwards these leaves from the moment of their ligulation to the moment of their emergence at the top of the stem and get the whole 3D + t reconstruction.

#### Pairwise sequence alignment

The detected ligulated leaves of a date \(t\) are ordered from the bottom to the top of the plant, i.e. by ascending leaf rank, in a leaf sequence \({S}_{t}\). The alignment of two sequences can be defined as a set of gap placements at the beginning, end, or between elements of these sequences, resulting in two new sequences of the same length with no gaps facing each other [^{32}]. Pairwise sequence alignment algorithms are designed to identify the alignment that optimises an alignment score between two sequences, often defined as the sum of the scores associated with each pair of matched elements, plus gap penalties [40].

Each ligulated leaf is described using a feature vector computed from geometrical features that are assumed to be constant over time: insertion height \(h\) *(*mm*),* length \(l\) *(*mm*) a*nd azimuth \(\alpha\) (\(\alpha\) ∈ [−π, π]). These features are extracted from the segmented leaves [^{16}] and concatenated into a feature vector \(v\in {R}^{4}\) summarizing the leaf morphology:

$$v\, = \,\left[ {v^{i} } \right]_{1 \le i \le 4} \, = \,\left[ {\cos \,(\alpha ),\,\sin \,(\alpha ),\,w_{h} \cdot h,\,w_{l} \cdot l} \right]$$

(1)

Weights \({w}_{h}=0.03\) and \({w}_{l}=0.004\) were fine-tuned to scale features and adjust their relative importance. Each pair of matched leaves is associated to a cost \({c}_{vv}\) equal to the euclidean distance between the feature vectors \({v}_{1}\) and \({v}_{2}\) of those two leaves:

$$c_{vv} \left( {v_{1} ,v_{2} } \right) = \sqrt {\mathop \sum \limits_{1 \le i \le 4} \left( {v_{1}^{i} - v_{2}^{i} } \right)^{2} }$$

(2)

A gap penalty parameter \(g\) is used to penalise the addition of each of the \(n\) gaps placed in the alignment of sequences \({S}_{t1}\) and \({S}_{t2}\). We define \(g\) from \(\overline{{c }_{adj}}=4.23\), the average value of \({c}_{vv}\)*(*\({v}_{i}\), \({v}_{j}\)*)* for all couples of topologically adjacent leaves \(i\) and \(j\) in the sequences \({S}_{t}\) of our dataset, and \({w}_{gap}=3\) a weight:

$$g = w_{gap} .{ }\overline{{c_{adj} }}$$

(3)

Using a different parameter value for the terminal gaps has proven effective in aligning sequences of different lengths [^{41}]. A weight \({w}_{tml}=0.2\) is therefore used to lower the penalty of each of the \({n}_{tml}\) terminal gaps, since leaves are expected to appear and disappear successively over time. The optimal alignment between \({S}_{t1}\) and \({S}_{t2}\) is then defined as the one that minimises the global alignment cost \(C\). We define \(I\) as the set of indexes of sequences \({S}_{t1}\) and \({S}_{t2}\) with a match (i.e. without gap). With *v*_{i} and *vʹ*_{I} the vectors associated respectively with the \(i\)-th elements of the sequences \({P}_{t1}\) and \({P}_{t2}\):

$$C\left( {S_{t1} ,S_{t2} } \right) = (n - n_{tml}) .g + n_{tml} .w_{tml} .g + \mathop \sum \limits_{i \in I} c_{vv} \left( {v_{i} ,v^{\prime}_{i} } \right)$$

(4)

To find the optimal alignment between \({S}_{t1}\) and \({S}_{t2}\), the Needleman-Wunsch (NW) algorithm [42] has been adapted to consider the terminal gap weight. This algorithm is based on dynamic programming and guarantees an optimal solution for pairwise alignment.

#### Multiple sequence alignment

The initial ordering of each sequence gives a relative leaf rank to each leaf that may differ from their absolute rank, due to the fall of senescent leaves during plant development or due to segmentation errors (Fig. 4B). The alignment of all the leaves sequences in the time-series is achieved using a progressive method [^{32}] to perform the multiple alignment task. It consists in a succession of pairwise alignments of the sequences \({S}_{t}\) using the NW algorithm, in ascending temporal order. A profile is defined as an alignment of several sequences treated as a unique sequence of columns [^{43}]. Let \({\Omega }_{1 -> k}\) be the profile constituted of the alignment of the first \(k\) sequences in the time-series \((S{)}_{k=1...T}\), each sequence containing \(n\) elements after the addition of possible gaps. At each time point \(t>1\), the sequence \({S}_{t}\) is aligned with the profile \({\Omega }_{1 -> t-1}\), resulting in a new profile \({\Omega }_{1 -> t}\). Aligning \({S}_{t}\) with \({\Omega }_{1 -> t-1}\) requires a sequence-profile cost function \(c\) that we adapt from \({c}_{vv}\) [44]. Let \(\omega\) be a column of \({\Omega }_{1 -> t-1}\) of length \(t-1\) containing \(t-1-k\) gaps, and \(k\) leaves associated to the feature vectors {\({v}_{1}, \dots ,{v}_{k}\}\). Let *vʹ* be the feature vector associated with a leaf observation present in the sequence to align \({S}_{t}\). Then the cost \(c\) of the match between \(\omega\) and *vʹ* is defined as:

$$c\left( {{\upomega },v^{\prime}} \right) = \frac{1}{k}.\mathop \sum \limits_{1 \le i \le k} c_{vv} \left( {v_{i} ,v^{\prime}} \right)$$

(5)

Sequences are aligned progressively until obtaining the final profile \({\Omega }_{1 -> T}\) that aligns all the sequences in \((S{)}_{t=1..T}\), yielding a first estimation of the leaf ranks (Fig. 4C).

Finally, each tracked leaf, except the first and last, is deleted if \({n}_{k}<({n}_{k-1}+{n}_{k+1})/4\), \({n}_{k}\) being the number of times that the \({k}^{th}\) leaf appears in the time-series, to cope with possible segmentation errors (Fig. 4D).

#### Backwards tracking of growing leaves from ligulated ones

The final tracking step consists of predicting the rank of the detected growing leaves (Fig. 4E). Unlike ligulated leaves, growing leaves cannot be ordered a priori by their topology since they all emerge from the same point, and their geometry is not constant. However, it can be assumed that the shape, orientation and position of a leaf evolves smoothly over time during its growth phase, with only minor changes for an observation frequency of 24 h. Leaves can therefore be tracked backwards by associating leaf observations sharing a similar shape, from the ligulated stage to the leaf emergence. To that end, a metric \(D\) is used to quantify the dissimilarity of two growing leaves, based on the distance between their central line, given as a 3D polyline. Each 3D polyline is converted to a set of \(n=20\) points [\({pl}_{1},\dots ,{pl}_{n}]\) regularly spaced along the polyline. With \(d({p}_{1},{p}_{2})\) the euclidean distance between two points, we define the distance between two polylines such as:

$$D\left( {pl, pl^{\prime}} \right) = \mathop \sum \limits_{1 \le i \le n} d\left( {pl_{i} , pl^{\prime}_{i} } \right)$$

(6)

The algorithm tracks backwards the growth trajectory of each leaf, starting from rank 1, and finishing by the last rank. For each processed leaf, the algorithm works iteratively, from the starting point (i.e. when the leaf is ligulated) to the ending point (i.e. at leaf emergence). At each tracking step the algorithm computes the metric \(D\) between the last associated polyline and every remaining non-ligulated leaves, and selects the minimum.

#### Computation of phenotypic traits

Various phenotypic traits are extracted from the 3D + t plant reconstruction, to quantify (i) rank-based phenotypes, (ii) plant development, (iii) individual leaf development.

**Rank-based phenotype** \({x}_{pf}\) describes the variation of a morphological variable \(x\) of leaves (e.g. length, insertion height, azimuth, etc.) as a function of their position on the stem (rank). For each rank \(r\), \({x}_{pf}(r)\) represents the value of \(x\) for the leaf \(r\) once it has reached ligulation. \({x}_{pf}(r)\) is calculated as the median of the values of \(x\) associated with the ligulated leaves along the time course. Leaf observations exceeding 20 day_{20 °C} after ligulation are not considered. Here we consider the case of leaf length \(({l}_{pf}\)) and leaf insertion height \(({h}_{pf}\)).

**Plant development** is quantified at any date through the following traits:

-Stem height \({h}_{s}\) corresponds to the height of the highest collar and is directly extracted from the plant reconstruction.

-Visible leaf stage \({n}_{vis}\) corresponds to the rank of the latest emerging leaf. Let \({r}_{vis}(t)\) be the maximum rank among observed leaves at date \(t\), and \({t}_{med}(r)\) the median of time points \(t\) where \({r}_{vis}(t) = r\). We define the emergence timing \({t}_{vis}\) of the \(r\)-th leaf such as:

$$t_{vis} \left( r \right) = \frac{{t_{med} \left( {r - 1} \right) + t_{med} \left( r \right)}}{2} \left( {r \in N + } \right)$$

(7)

We deduce \({n}_{vis}\) such that \({n}_{vis}\left(t\right)={t}_{vis}^{ -1}(t)\) for each emergence timing \(t\). Finally, \({n}_{vis}\) is extended to any value of \(t\) by linear interpolation.

-Ligulated leaf stage \({n}_{lig}\) corresponds to the rank of the last ligulated leaf. Using the same method as for \({n}_{vis}\), it is deduced from ligulation timing \({t}_{lig}\) such as:

$$t_{ lig} \left( r \right) = h_{s}^{ - 1} \left( {h_{pf} \left( {r - 1} \right)} \right) \left( {r \in N + } \right)$$

(8)

Here, a piecewise constant interpolation is used to restrict \({n}_{lig}\) to integer values.

**Leaf growth** is given by the successive length value of the observed \(r\)-th leaf until ligulation.

### Validation with manual measurements

Ground-truth data was manually collected on a randomly selected subset of plants with late harvesting. This validation data was then used to evaluate tracking performance, and the accuracy of the phenotypic traits obtained with the pipeline.

Leaf ranks were annotated on 30 plants at each time point using the images (10980 annotations). Segmented leaves corresponding to artefacts (e.g. the ear of maize) were not annotated.

A subset of 10 plants were randomly selected for manual measurements on images. Leaf lengths and leaf insertion heights were measured at the ligulation stage for all ranks (113 and 173 annotations respectively). Additional leaf lengths measurements were also performed for leaf ranks 6 and 9 during their whole growth phase (234 annotations). Stem height was annotated for all time points (369 annotations).

Ligulated and visible leaf stages were measured in the greenhouse on the 355 plants with late harvesting, with an average of 7 time points per plant (2289 and 1891 annotations respectively). Ligulated leaf stage is given by an integer, while visible leaf stage is given by a real number: for example, a value of 7.4 means that the last visible leaf is of rank 7, and has reached 40% of its growth.

The phenotypic traits were compared with ground-truth observation using the following metrics: bias, root-mean-square error (RMSE), mean absolute percentage error (MAPE) and coefficient of determination (R^{2}).

The pipeline conception and the data analysis were performed with Python.