### Greenhouse setup and data collection

Data presented and used in this work is coming from a previous study that has been published [26]. A summary of the greenhouse setup and data collection of the materials and methods are presented here. Sesame was grown in 64 custom-made rhizoboxes. Each rhizobox was filled with 1550g of inert calcine clay (Turface Athletics, Buffalo Grove, IL, USA), henceforth referred to as soil. A single seed from one of four non-dehiscent sesame cultivars, all provided by Sesaco Inc. (Sesaco32, Sesaco35, Sesaco38 and Sesaco40, referred to as S32, S35, S38 and S40 thereafter), was planted per box. Four soil water content treatments were implemented: 60, 80, 100 and \(120\%\) of the soil water holding capacity, corresponding to 688, 918, 1147 and 1376 mL of water per rhizobox, respectively. The soil and water were mixed thoroughly together before filling the rhizoboxes to promote homogeneity of the soil water content throughout. The top of each rhizobox was covered with Press’N Seal Cling Wrap (Glad, Oakland, CA, USA) to minimize water evaporation. No water was added afterwards throughout the experiment with the assumption that soil water content, in the absence of evaporation, stayed relatively constant throughout the 16–21 days of each run experiment. A small hole was pierced in the plastic wrap upon seedling emergence to allow for leaf and stem growth. The two factors were arranged in a complete randomized design with 4 replications in a greenhouse where the daily temperature was maintained between 25 and 35 degrees Celsius.

Four runs of 64 rhizoboxes were completed. Run 1 and 2 were prepared with soil and water only, while run 3 and 4 were fertilized with 1.51*mg* of 15-5-15 + Ca + Mg Peters Excel mix fertilizer (ICL Specialty Fertilizers, Summerville SC) and 0.29*g* of ammonium sulfate were dissolved in the water applied to each rhizobox prior to mixing with the soil. Baking soda was added as needed to neutralize the fertilizer solution to a pH of 6. Plants were grown for a duration of 21 days after planting (DAP) for the runs without fertilizer and 16 DAP for the runs with added fertilizer due to their fast growth.

On the last day of the experiment, all rhizoboxes were scanned with a Plustek OpticSlim 1180 A3 flatbed scanner (Plustek Inc., Santa Fe, CA, USA) to generate the raw images with the size \(6800\times 4676\) at a resolution of 400 Dots Per Inch (DPI). Rhizoboxes were all set on the scanner the same way using a physical guide so that all rhizoboxes had the same position in the scanned images. For each run, images from seeds that did not germinate or seedlings that died during the experiment were not considered in the analysis; 11 images were thus removed from Run 1, 10 from Run 2, 11 from Run 3 and 5 from Run 4. The total number of analyzed images was 107 Runs 1 and 2, and 113 for Run 3 and 4. Representative RGB image samples for each cultivar and water level are shown in Additional file 3: Figure S3 and Additional file 4: Figure S4. The roots from all analyzed images were hand-traced with \(\hbox {WinRHIZOTron}^{\hbox {TM}}\) (Regent Instruments Inc., Quebec, Canada) and total root length (TRL) was obtained.

### Spatial and Texture Analysis of Root SystEm distribution with Earth mover’s Distance (STARSEED)

The overall pipeline of our proposed method is illustrated in Fig. 1 and explained in more detail in Algorithm 1. STARSEED consisted of five major steps. The first step was to take the input images and perform preprocessing to isolate the roots pixels from the background to focus on the architecture. For step 2, we divided the image into equal-sized bins by generating local regions of the image to improve computational efficiency and incorporate spatial information. Step 3 involved extracting texture features to describe the root pixels within each bin and generate a compact representation of the image (i.e., signature). The fourth step used EMD to measure the magnitude and direction of changes between each root signature. Lastly, we projected the pairwise distance matrix to perform qualitative and quantitative analysis of the root images and their corresponding treatment. Details and rationale behind each step of this algorithm are provided in the following subsections.

*Image Preprocessing—Step 1* The initial step of the proposed approach was to perform preprocessing of the images to isolate the root pixels from the background. Raw images were manually traced using \(\hbox {WinRHIZO}^{\hbox {TM}}\) Tron software to generate the labels for root pixels in each image. The black and white binary images were reconstructed based on these labels, in which pixels that corresponded to roots were assigned a value of *1* in the mask and the non-root pixels were assigned a value of *0*. The binary images were then cropped and calibrated, and downsampled by a factor of eight through average pooling and smoothed by a Gaussian filter (\(\sigma =1\)) to mitigate noise before extracting texture features.

*Sub-region Generation and Feature/Signature Extraction—Steps 2 & 3* In order to apply EMD to characterize and compare RSDs using STARSEED, we constructed a signature representation, a composition of texture features computed spatially, for each image. Since texture is undefined at a single point [28], a local neighborhood needed to be identified to compute the texture features comprising the signature. In our approach, we divided each image into bins of equal sizes using a grid, each bin serving as the local neighborhood. Within each bin, texture features were computed to locally characterize the RSD. The size of the grid/bins provides trade-offs between computational efficiency and localization. For example, a larger grid yielding large bins corresponds to less computation cost, but each texture feature is computed over a larger area, resulting in a loss of localization. The computational time for the method scaled linearly with the number of bins used (e.g., more bins corresponded to increased computational time) and this information is captured in Additional file 1: Figure S1 and Additional file 2: Figure S2. For the proposed method, the grid size that corresponded to the maximum CH score was used to further analyze STARSEED as discussed in the following Experiments and Results section.

Each bin of the image was represented through a cluster representative. The cluster representative consisted of spatial coordinates (i.e., horizontal and vertical position of the bin center) and a weight to create a \(\mathbb {R}^3\) vector to represent a local spatial bin of the image. The weight given to each bin was computed by the texture feature extracted from the pixels in the bin. An example of steps 2 and 3 is shown in Fig. 2. We calculated three feature values in each bin: percentage of root pixels, fractal dimension, and lacunarity. To calculate the percentage of root pixels, we computed the total number of root pixels present in each bin divided by the area (i.e., total number of pixels) of that same bin. The percentage of root pixels provides direct insight into the distribution of roots in the image. If a region of the image has denser roots, the percentage of root pixels in the bins present in that region will be higher. The final representation of the image was a signature, which was the set of clusters from the image. By constructing our signature in this manner, we incorporated spatial (locally at the bin level and globally over the whole image) and texture information, to represent the root distributions.

*Novel Earth Mover’s Distance Application—Step 4* EMD can be used to effectively evaluate certain characteristics between different distributions, including images [24]. Images are comprised of pixels and these pixels can be clustered or assigned to meaningful groups based on shared characteristics such as spatial location. The set of these clusters are used to form a *signature*, a more compact representation of the image to increase computational efficiency [24]. Given an image with *C* clusters, the signature representation, *P*, is given by Eq. 1 where \(\textbf{p}_i\) is the cluster representative and \(w_{\textbf{p}_i}\) is the weight of the cluster:

$$\begin{aligned} P = \{(\textbf{p}_1,w_{\textbf{p}_1}),...,(\textbf{p}_C,w_{\textbf{p}_C})\}. \end{aligned}$$

(1)

Typically, the cluster representative is a feature vector and the weight is the percentage of pixels in a cluster expressing that feature [24]. While the selection of the cluster representative is application dependent, defining the cluster representative as the information/descriptor (i.e., features) and importance (i.e., weight) provides a clear interpretation for EMD within disparate areas of an image. In our proposed STARSEED method, the signature representations were generated as mentioned in Step 3.

Once the signature representations are constructed, we used EMD to generate a pairwise distance matrix \(\varvec{D}\) between each pair of RSD signatures, choosing one RSD signature as the reference within each pair. After determining the magnitude of changes between two clusters, the minimal flow matrix, \(\textbf{F}\), is used to compute EMD [24]. Given two image signatures, *P* and *Q*, with *S* and *T* representatives respectively, EMD is formulated in Eq. 2 where \(d_{ij}\) is the ground distance between the centroid of regions \(\textbf{p}_i\) and \(\textbf{q}_j\) and \(f_{ij}\) is the optimal flow between \(\textbf{p}_i\) and \(\textbf{q}_j\):

$$\begin{aligned} EMD(P,Q) = \frac{\sum _{i=1}^{S}\sum _{j=1}^{T}d_{ij}f_{ij}}{\sum _{i=1}^{S}\sum _{j=1}^{T}f_{ij}} \end{aligned}$$

(2)

EMD captures the dissimilarity between spatial distributions: larger values indicate more dissimilarity or “work” to move the defined “earth” feature to the cluster representative. EMD allows one to measure the global change between two spatial distributions (i.e., distance measure) as well as local changes between the two sources of information through the flow matrix, \(\varvec{F}\). If we can translate a 2D root image into a spatial distribution of the values for a given texture feature calculated from this image, EMD would then be able to measure the changes in this distribution from one root system image to the next. Consequently, EMD would provide a two-level comparison between these distributions: one at the global scale (i.e., the whole distribution or the whole image) that we can term “holistic” through the EMD measure, and one at the local scale through the flow matrix that shows the distinct magnitude and direction of changes between each image.

In our experiments, we first computed EMD between each pair of images, separately between Runs 1–2 and Runs 3–4. Regions containing no roots were removed from our calculations to mitigate the impact of background on our results. Therefore, we were comparing root signatures of different sizes. In order to not favor smaller signatures, the normalization factor was added to the EMD calculation as shown in Eq. 2. Any distance can be used to define the ground distance, \(d_{ij}\) [24]. To consider the root spatial distribution, we compared the center location of each spatial bin by selecting the Euclidean distance as our ground distance to compute \(d_{ij}\).

*Projection Using MDS—Step 5* Once the distances between each image signature representation were computed to generate the pairwise distance matrix, we can use a dimensionality reduction approach to project the matrix into two dimensions. We projected the data into two dimensions to qualitatively and quantitatively evaluate the pairwise distance matrix. We used multi-dimensional scaling (MDS) [27] as this method had been shown to work well with EMD to identify patterns among images that share some characteristics [24, 29].

To illustrate step 5 in the algorithm, Fig. 3 provides a visualization of the EMD matrix projection in a 2D space through MDS for Run 3 and Run 4 with the percentage of root pixels as the feature calculated for the highest number of spatial bins (2000 in each image). The output from the MDS method is an \(\mathbb {R} ^{N \times 2}\) embedding matrix, *E*, that is a projection of the pairwise distance matrix in the Euclidean space. Once *E* is computed, the embedding is then used in Step 6 to calculate Calinski-Harabasz (CH) index [30] to quantitatively assess relationships among the root images when considering the treatment values (cultivar and water levels).

*Assessment of Relationships for RSD - Steps 6* Lastly, we calculated the CH [30] index to measure the intra-cluster similarity (i.e., RSD representatives of the same cultivar or water level) and inter-cluster dissimilarity (i.e., RSD representatives of different cultivar or water level) across treatment levels in each pair of runs. By using the EMD approach, we were able to score the differences between each treatment level, highlighting: (a) the magnitude and (b) the location of the RSD dissimilarities. Ideally, samples that belong to the same cluster should be “close” (i.e., smaller EMD distances or intra-cluster variance) while samples from different clusters should be “well-separated” (i.e., larger EMD distances or inter-cluster variance). When the clusters are dense and well-separated, the CH index is larger. Finally, a non-parametric Kruskal-Wallis test was performed on the CH scores within each treatment for each pair of runs across features, and then separately for each feature across cultivars or across water levels. When the test was rejected (\(p<.05\)), Dunnett’s test with Bonferroni’s adjustment was used to compare treatment levels with one another.