Plant growth and data acquisition
The 32 pots involved in the experiment were placed into 32 carts on the conveyor system within a Smarthouse at the Australian Plant Phenomics Facility, University of Adelaide, where they occupied two lanes by 16 positions. There were four replicates of each treatment and a randomized complete-block design was used to assign the treatments. The plants were imaged daily from 17 to 51 DAP using RGB cameras. From these images the PSA of the plant was obtained by summing the areas as measured (in kilopixels or kpixels) from two side views at an angular separation of 90° and a view from above [2]. This resulted in 1120 PSA values, which are used as a measure of plant biomass, having been shown to be related to plant fresh weight for numerous species [2, 12, 32,33,34]. Note that watering was unintentionally interrupted for 3 days (39–41 DAP inclusive).
Calculating growth rates
The AGR and RGR between two time points, tj followed by tk, can be calculated as follows [14]:
$$AGR_{{\left( {t_{j} ,t_{k} } \right)}} = \frac{{PSA_{{t_{k} }} - PSA_{{t_{j} }} }}{{t_{k} - t_{j} }}\;\;\;{\text{and}}\;\;RGR_{{\left( {t_{j} ,t_{k} } \right)}} = \frac{{\ln \left( {PSA_{{t_{k} }} } \right) - \ln \left( {PSA_{{t_{j} }} } \right)}}{{t_{k} - t_{j} }}$$
(1)
If there are observations at several time points between tj and tk, it can be proved that the weighted mean of the AGRs and RGRs for all pairs of observed time points in the interval tj to tk, is given by the formulae in (1), when the weight for each GR being averaged is the time range for each GR. Such a GR is referred to as an interval GR and those for successive time points are called continuous GRs. That is, the AGR and RGR for a time interval from tj to tk that covers several subintervals are given by Eq. (1). The formulae can be adapted to traits other than PSA. Thus, if a WUI has been obtained for a set of time points, then the continuous and interval WUIs can be derived from the AGR formula. This method avoids the problematic assumption of a homogeneous GR for the interval, as required for the use of linear regression as outlined in Paine et al. [15].
The growth rates can also be calculated using the first derivatives of the smoothing splines. However, not all software provides them and they are more complicated to use, especially for obtaining interval GRs. Further, differencing is the only way to calculate observed or raw GRs.
SET procedures
A SET-based analysis, as exemplified for the tomato data, involved the six steps shown in Fig. 7. The first five of these steps, the SET, amount to data preparation for the remaining analysis step. In the SET, the raw data is explored, smoothed and cleaned to obtain a data set that has had transient deviations from trend removed by the smoothing and has had outlying plants that can be identified as being unambiguously flawed, removed They are graphics intensive, as is also advocated by others [15].
The SET was done using growthPheno [24] and nlme [35], both of which are packages for the R statistical computing environment [36]. The functions used from these packages are those that allow the simultaneous processing of all plants from an experiment. They require a single column for (i) a plant identifier, (ii) the times for each observation, (iii) each of the factors specifying the treatment associated with an observation, and (iv) each of the traits. The package growthPheno has options that allow the specification of the handling of missing values. While the traits that we are using as examples are botanical in nature, the software is not restricted to processing this type of trait.
Import, select and derive longitudinal data
The data from processing the images of the plants in the 32 carts in the tomato experiment over DAP 17–51 was supplied in an Excel file and imported into R [36]. The trait of interest is the projected shoot area (PSA); the PSA values were used to derive the continuous absolute growth rates (AGRs) and relative growth rates (RGRs) for adding to the data, a continuous growth rate (GR) being a GR obtained for each time of imaging. Based on Eq. (1), the continuous AGR was calculated by taking the difference between successive PSA values and the continuous RGR calculated by taking the difference between the logarithms of successive values, the time difference being one day; a value for the first imaging time is not produced for either GR.
Exploratory analysis
Profile plots for PSA, PSA AGR and PSA RGR have been produced for the exploratory analysis, a profile plot for a trait containing a line for the trait values over time for each cart.
Data smoothing
The PSA values are to be smoothed either by fitting natural cubic smoothing splines or by fitting a three-parameter logistic function to the data for each cart. For smoothing using splines, there is a choice of two methods: (i) log-smoothing for which splines are fitted to the natural logarithms of the PSA values for each cart and then backtransforming the fitted values, i.e. by taking the exponentials of the fitted values; (ii) direct smoothing, for which the spline is fitted to the untransformed data. In addition, the smoothing DF must be specified. The equation for a three-parameter logistic function is given by
$$PSA_{t} = {{\phi_{1} } \mathord{\left/ {\vphantom {{\phi_{1} } {\left\{ {1 + \exp \left[ {{{ - \left( {t - \phi_{2} } \right)} \mathord{\left/ {\vphantom {{ - \left( {t - \phi_{2} } \right)} {\phi_{3} }}} \right. \kern-\nulldelimiterspace} {\phi_{3} }}} \right]} \right\}}}} \right. \kern-\nulldelimiterspace} {\left\{ {1 + \exp \left[ {{{ - \left( {t - \phi_{2} } \right)} \mathord{\left/ {\vphantom {{ - \left( {t - \phi_{2} } \right)} {\phi_{3} }}} \right. \kern-\nulldelimiterspace} {\phi_{3} }}} \right]} \right\}}},$$
where ϕ1 is the upper asymptote, ϕ2 is the time at which half the asymptote is reached and ϕ3 is a scale parameter that is approximately the time elapsed between reaching half and three-quarters of the asymptote. It is noted that, if there is missing data for one or more time points, imputed smoothed values can be obtained for them.
To choose the smoothing method and smoothing DF for the tomato experiment, median-deviations plots were generated for smoothed data obtained from the PSA, PSA AGR and PSA RGR by (i) both direct and log smoothing in combination with four, five, six and 12 smoothing DF, and (ii) the fitting of a three-parameter logistic curve.
The median deviations plots for PSA are in Fig. 5, while the complete set is in Additional file 1: Figures S1–S3. For these plots, the median deviations are obtained by (i) calculating the deviation, observed minus smoothed value, for each plant at each time point, and (ii) calculating the median of the deviations at a single time point for the plants to be plotted in a single pane of the graph. One would expect the magnitude of the deviations to increase as the smoothing DF decreases, because there is, as ever, a trade-off between the amount of smoothing and the magnitude of the deviations from the observed trend.
The median deviations of the observed PSA values from directly smoothed PSA values for four, five and six smoothing DF deviate markedly from the observed data, especially prior to DAP 31; the median deviations for the logistic also have larger negative values up to DAP 23, especially for the +AMF treatment. The deviations in this earlier period have many values that are in excess of 10% of median smooth PSA values. It seems that direct smoothing with these DF underestimates the initial trend and the logistic overestimates it. Log smoothing with smoothing DF equal to four or five and the logistic tend to produce the largest deviations post DAP 25, while the logistic unsurprisingly produces large deviations post DAP 47. In these latter periods overestimation seems to be occurring. On the other hand, Additional file 1: Figures S2 and S3, reveal that the PSA AGR and PSA RGR are underestimated prior to DAP 23, particularly when direct smoothing is used with small smoothing DF. This is to be expected given that the PSA values range from 1.75 to 185.50 kpixels.
Taking into account all of the plots presented in Additional file 1, log smoothing with six smoothing degrees of freedom is chosen for representing the PSA trend over DAPs 17–51. This differs from the previously reported analysis [11], where direct smoothing with six smoothing DF were used because the statistical analyses focused on DAP 27–43. To confirm this choice, a comparison of the results of direct and log smoothing for PSA, PSA AGR and PSA RGR when six smoothing DF are used is made using Additional file 1: Figures S4–S6, with the comparison for PSA AGR and PSA RGR also shown in Fig. 6. These profile plots have the feature that the medians of the data for each pane of the plot are included as a black line and the outer “whiskers” are included as dashed black lines. The lower (upper) outer whisker is the median minus (plus) 1.5 times the interquartile range, the interquartile range being the 75th quartile minus the 25th quartile of the data. Points outside the outer whiskers are regarded as potential outliers. The largest difference between the two smoothing schemes occurred with sPSA RGR, where direct smoothing with small smoothing DF is clearly over-estimating the RGR in the first 3 days. The effect of the unintentional watering interruption from DAP 39–43 on subsequent growth is apparent, especially in the GRs. This is a situation in which segmented smoothing may have been appropriate. However, the smoothed trend lines for log smoothing with six DF appear to follow adequately the overall trend through this period. Besides, smoothing a short interval, such as DAP 43–51, is problematic and in this case there is a large decrease in the PSA AGR for DAP 44, especially for −AMF.
Data cleaning
In the tomato experiment, there was a plant in the +AMF treatment that was obviously much slower growing than other plants (see Additional file 1: Figures S4 and S5, where this plant is outside the lower outer whisker at times). It was a small plant whose AGR was low, but whose RGR was similar to other plants and so does not show in Fig. 6. It had low AMF root colonization and a random mutated shoot phenotype, which could explain why its behaviour was consistent with a plant that was not inoculated with AMF. As before [11], we have omitted the plant from the analyses reported here.
Extracting per-cart growth traits
The plots in Fig. 2 and the Log-6 plots in Fig. 6 and Additional file 1: Figures S4–S6 were examined by statisticians and researchers to subjectively identify time intervals during each of which growth dynamics appear to be homogeneous. The vertical dashed lines in Additional file 1: Figures S4–S6 and in Fig. 6 mark the chosen intervals: DAP 18–22 was a period of increasing AGR; DAP 22–27 was a period of even faster increases in the AGR; DAP 27–33 was a period during which the AGR peaked; DAP 33–39 shows a relatively constant rate of decrease in the AGR; DAP 39–43 is the period of restricted watering and followed by a 2-day recovery period; DAP 43–51 was a period in which the AGR is flat but fluctuating wildly. Traits can be formed from the values of sPSA, sPSA AGR and sPSA RGR for time-interval end points and the mean values of the sPSA AGR and sPSA RGR over each of the time intervals. Thus, there are potentially 33 extracted traits:
- 1.
sPSA, sPSA AGR and sPSA RGR at DAPs 18, 22, 27, 33, 39, 43, 51;
- 2.
the sPSA AGR and sPSA RGR for each of the intervals DAP 18–22, 22–27, 27–33, 33–39, 39–43, 43–51.
Analyzing per-cart growth traits
Of the 33 extracted traits, those for sPSA AGR and sPSA RGR at DAP end points will be omitted, leaving 19 extracted traits to be analyzed. The per-cart model fitting involves fitting a mixed model to each of these traits and incorporates terms to account for (i) the spatial variation that affected the experiment and (ii) the effect of treatments on the phenotypic response. The mixed model is of the following form:
$${\mathbf{y}} = {\mathbf{X{\varvec{\upbeta}}}} + {\mathbf{Zu}} + {\mathbf{e}}$$
(2)
where y is the response vector of values for the trait being analysed, β is the vector of fixed effects, u is the vector of random effects, and e is the vector of residual effects. Both X and Z are design matrices, corresponding to β and u, respectively.
For the maximal model for a per-cart trait for the tomato experiment, the fixed-effect vector β is partitioned into subvectors as follows: \(\left[ {\begin{array}{*{20}c} \mu & {{{\varvec{\upbeta}}}_{{\text{B}}} } & {{{\varvec{\upbeta}}}_{{\text{Z}}} } & {{{\varvec{\upbeta}}}_{{\text{A}}} } & {{{\varvec{\upbeta}}}_{{{\text{ZA}}}} } \\ \end{array} } \right]\), where μ is the overall mean parameter and the β subvectors are, respectively, the subvectors of Block (B) parameters, Zn (Z) parameters, AMF (A) parameters, and parameters for Zn-AMF combinations (ZA). There are no random effects and the residual vector e is assumed to be normally distributed with mean vector 0 and variance σ2I32, where σ2 is the residual variance and I32 is the identity matrix of order 32 i.e. the residuals are independently distributed with the same variance for all. To check this latter assumption residual-versus-fitted-values and normal probability plots were obtained for all traits (see Additional file 1: Figures S7–S25). There was no evidence of substantial departure from the equal variance and normality assumptions.
The R packages asreml [37] and asremlPlus [38] were used to fit the mixed model to the 19 extracted traits. Wald F-statistics, with Kenward and Roger [39] calculation of their denominator degrees of freedom, were obtained to assess the significance of the fixed effects involving Zn and AMF. These packages were also used to produce predicted values and their standard errors, the latter used to compute LSD (α = 0.05) values for comparing pairs of predicted values.
An additional analysis was performed for each of sPSA, sPSA AGR and sPSA RGR. For each, their extracted traits were combined and a joint analysis conducted.
The following tips are offered, based on our experience with using the SET method:
- 1.
When the smoothing DF are two, a straight line is fitted, and when the smoothing DF equals the number of observed time points the fitted curve goes through all of the observed points ([40], Section 3.3.4). Hence, the smoothing DF are usually greater than two and less than the number of observed time points, the actual value depending on the amount of smoothing desired.
- 2.
When looking for bias arising from the smoothing of individuals, perhaps using the median deviations plots, pay particular attention to either end of the observed range of times and when there are sudden changes in trend.
- 3.
Collect data before and after the times during which the interest in growth is concentrated. Always smooth all of the data for a plant, even if interest is restricted to a subinterval of the observed times. One reason for this is to avoid the bias in the smoothing that can occur at either end of the period of observation.
- 4.
When examining deviations plots it is expected that there will be periodic patterns in the deviations, these being due to the removed transient effects.
- 5.
Facet profile plots so that differences between the facets are maximized.
- 6.
Choose intervals based on the growth dynamics, it not being necessary for the intervals to be equal in size. Although the original trait and its AGR and RGR should be examined, it often happens that the AGR is the most important guide. Focus on broad patterns in the trends.
- 7.
Examine the profile plots for likely sources of differential variance and for anomalous individuals.
Longitudinal analysis
The aim of the longitudinal analysis was to obtain estimates of the trend over time of the PSA, PSA AGR and PSA RGR for each combination of Zn and AMF by fitting a model to describe the behaviour of the complete set of values for one of the traits, without any pre-smoothing of the data. Only the first two and the fourth steps of a SET analysis were relevant to the longitudinal analysis, because it was carried out on the raw, cleaned data. It has two components: an analysis and a prediction component. For the analysis component, both the PSA and the logarithm of the PSA are analysed, the analysis of ln(PSA) only being necessary if the estimation of the PSA RGR is required. For the prediction component, the predicted values over time are obtained for the PSA and ln(PSA) from the results of the analysis component. The estimates of the trends over time are obtained for (i) the PSA AGR, using the predicted values from the analysis of the PSA, and for (ii) the PSA RGR, using the predicted values from the analysis of the ln(PSA). For each GR, the predicted values for consecutive DAPs are differenced. Calculating the GRs in this way maintains a consistency between a GR and its parent trait and allows the calculation of the standard errors and LSD values from those for the parent trait.
In the analysis component, as in the per-cart analysis, splines were used to describe the trend. Variation at the cart (subject) level was modelled using (i) RRMSs, in which random nonlinear trends between individual carts were specified by fitting natural cubic smoothing splines with random intercepts and slopes, with a knot per observed DAP and homogeneous splines fitted so that the amount of smoothing was the same for all carts, (ii) unequal cart variance between DAPs, and (iii) first-order autoregressive correlation between different DAPs. An FRMS was used to describe the trend over time for each combination of Zn and AMF; it employed heterogeneous splines, that is, splines for which the amount of smoothing was allowed to differ between the combinations. However, the same number of equally spaced knots was specified for these splines, although four analyses were conducted with one of 10, 15, 20 or 35 knots.
The mixed model on which the analysis of these two traits is based is of the form given in Eq. (2). The maximal model for this analysis includes a subject-specific RRMS, a subject being a Block-Cart combination. It has the fixed-effect vector β partitioned into subvectors as follows: \(\left[ {\begin{array}{*{20}c} \mu & {{{\varvec{\upbeta}}}_{{\text{B}}} } & {{{\varvec{\upbeta}}}_{{\text{Z}}} } & {{{\varvec{\upbeta}}}_{{\text{A}}} } & {{{\varvec{\upbeta}}}_{{{\text{ZA}}}} } & {{{\varvec{\upbeta}}}_{{\text{D}}} } & {{{\varvec{\upbeta}}}_{{{\text{DB}}}} } & {{{\varvec{\upbeta}}}_{{{\text{DZ}}}} } & {{{\varvec{\upbeta}}}_{{{\text{DA}}}} } & {{{\varvec{\upbeta}}}_{{{\text{DZA}}}} } \\ \end{array} } \right]\), where μ is the overall mean parameter and the β subvectors are, respectively, the subvectors of Block (B) parameters, Zn (Z) parameters, AMF (A) parameters, parameters for Zn-AMF combinations (ZA), DAP (D) parameters, parameters for DAP-Zn combinations (DZ), parameters for DAP-AMF combinations (DA), and parameters for DAP-Zn-AMF combinations (DZA). The incidence matrix X is partitioned to conform to the partition of β. That is, the maximal fixed model involves a full three-factor interaction model for the factors Zn, AMF and DAPs, which provides an unsmoothed representation of the differences in the trend over the DAPs for the different Zn-AMF combinations. To specify the RRMS, and using “xD” to signify a centred, numeric covariate for the categorical factor DAPs, the random effects vector u is partitioned as \(\left[ {\begin{array}{*{20}c} {{\mathbf{u}}_{{{\text{BC}}}} } & {{\mathbf{u}}_{{{\text{BCxD}}}} } & {{\mathbf{u}}_{{\text{BCspl(xD)}}} } \\ \end{array} } \right]\), where uBC is the subvector of Block-Cart random effects, uBCxD is the subvector of random DAP slopes over xDAP for each Block-Cart combinations and uBCspl(xD) is the subvector of random spline coefficients of the spline basis functions for xDAP for each Block-Cart combination. The incidence matrix Z is partitioned to conform to the partition of u. The vector \(\left[ {\begin{array}{*{20}c} {{\mathbf{u}}_{{{\text{BC}}}} } & {{\mathbf{u}}_{{{\text{BCxD}}}} } \\ \end{array} } \right]\) is assumed to be normally distributed with mean vector 0 and variance \({{\varvec{\Sigma}}}_{{{\text{xD}}}} \otimes {\mathbf{I}}_{32}\), where \({\varvec{\Sigma}}\)xD is 2 × 2 matrix specifying the variances and covariance of the intercepts and slopes and I32 is an identity matrix of order 32, the number of Block-Cart combinations. The vector uBCspl(xD) is assumed to be normally distributed with mean vector 0 and variance \(\sigma_{{\text{BCspl(xD)}}}^{2} {\mathbf{G}}_{{\text{BCspl(xD)}}}\), where GBCspl(xD) is a matrix derived from the knot points for the fitted spline. The residual vector e is assumed to be normally distributed with mean vector 0 and variance \({\varvec{\Sigma}}\)1120, where, for y ordered by Block-Cart then DAP, all elements are zero except for 32 diagonal blocks, one for each Block-Cart combination, The variance matrix for the ith Block-Cart, \({\varvec{\Sigma}}_{i}\), is 35 × 35 and allows for different variances for different DAPs and first-order autoregressive correlation between DAPs i.e. correlation that decreases according to a power law as the number of intervening DAPs between a pair of DAPs increases. Formally,
$${{\varvec{\Sigma}}}_{i} = \left[ {\begin{array}{*{20}c} {\sigma_{1} } & 0 & \cdots & \cdots & 0 & 0 \\ 0 & {\sigma_{2} } & \cdots & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & {} & \vdots & \vdots \\ \vdots & \vdots & {} & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & \cdots & {\sigma_{34} } & 0 \\ 0 & 0 & \cdots & \cdots & 0 & {\sigma_{35} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} 1 & {\rho^{1} } & \cdots & \cdots & {\rho^{33} } & {\rho^{34} } \\ {\rho^{1} } & 1 & \cdots & \cdots & {\rho^{32} } & {\rho^{33} } \\ \vdots & \vdots & \ddots & {} & \vdots & \vdots \\ \vdots & \vdots & {} & \ddots & \vdots & \vdots \\ {\rho^{33} } & {\rho^{32} } & \cdots & \cdots & {\sigma_{34} } & {\rho^{1} } \\ {\rho^{34} } & {\rho^{33} } & \cdots & \cdots & {\rho^{1} } & {\sigma_{35} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\sigma_{1} } & 0 & \cdots & \cdots & 0 & 0 \\ 0 & {\sigma_{2} } & \cdots & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & {} & \vdots & \vdots \\ \vdots & \vdots & {} & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & \cdots & {\sigma_{34} } & 0 \\ 0 & 0 & \cdots & \cdots & 0 & {\sigma_{35} } \\ \end{array} } \right]$$
where \(\sigma_{j}\) is the cart standard deviation on the jth DAP and \(\rho^{{\left| {t_{k} - t_{j} } \right|}}\) is the correlation between DAPs tk and tj, being the correlation between consecutive DAPs, ρ, raised to the power equal to the number of DAPs between the two DAPs. In terms of variation, this model allows for Block differences, random variation between carts, random variation between carts in the curved trend over the DAPs that each follows and random deviations from this trend on a particular DAP for a particular cart; this last variation varies from one DAP to the next and there is correlation between observations on different DAPs that is strongest for consecutive DAPs.
The model selection strategy for a response proceeded in stages as follows:
- 1.
Select the model to describe the spatial and temporal variation present in the response The maximal mixed model was fitted to the response. Then tests of whether the random model could be simplified were carried out: is unequal variance between times needed? Is the autocorrelation significant? Can the spline component for variation in time trends between carts be removed to leave a random linear term? If variance components were estimated to be very close to zero, they were either removed, if they were single-component terms, or fixed at \(1 \times 10^{ - 4}\), if they were one of the DAP variances. The latter is justified in that it is indicating that the variation in variance for that DAP is derived entirely from variance in the curved trend. Unsmoothed predictions for the combinations of Zn, AMF and DAPs are obtained from this model.
- 2.
Explore the amount of smoothing to apply to the FRMS describing the time trend of each combination of Zn and AMF For this, the three-factor model for Zn, AMF and DAPs was reparameterized to specify an FRMS that includes fixed linear and random spline terms, based on xDAP a centred, numeric covariate for the DAPs, and random deviations terms for DAPs; heterogeneous spline terms for each combination of Zn and AMF were specified leading to separate variance components for each combination and hence different amounts of smoothing for them. That is, the partition of β was modified to \(\left[ {\begin{array}{*{20}c} \mu & {{{\varvec{\upbeta}}}_{{\text{B}}} } & {{{\varvec{\upbeta}}}_{{{\text{ZA}}}} } & {{{\varvec{\upbeta}}}_{{{\text{xDZA}}}} } \\ \end{array} } \right],\) where the βZA contains the intercept parameters for each combination of Zn and AMF and βxDZA contains the fixed slope parameters over xDAP for each combination of Zn and AMF. The random vector u is partitioned into \(\left[ {\begin{array}{*{20}c} {{\mathbf{u}}_{{\text{het(ZA)spl(xD)}}} } & {{\mathbf{u}}_{{{\text{ZAD}}}} } & {{\mathbf{u}}_{{{\text{BC}}}} } & {{\mathbf{u}}_{{{\text{BCxD}}}} } & {} \\ \end{array} {\mathbf{u}}_{{\text{BCspl(xD)}}} } \right]\), where uhet(ZA)spl(xD) is the subvector of random spline coefficients of the spline basis functions for xDAP with heterogeneous variance components for each Zn-AMF combination and uZAD is the subvector of random deviations from the fitted curved trend for each Zn-AMF-DAPs combinations The degree of smoothing was altered by fitting the FRMS with 10, 15, 20 or 35 equally-spaced knots for the Zn by AMF by spl(xDAP) term, uZAspl(xD). For each fit, a test for the significance of the random deviations was conducted.
- 3.
Choose the number of knots by obtaining the predictions for different amounts of smoothing for each trait and calculate GRs from each set of predictions The predictions for the observed values of xDAP in combination with the levels of Zn and AMF were obtained, along with the predictions and LSD (5%) for comparing predictions at the same DAP. The predictions and half-LSD intervals were plotted for each of the four knot numbers and subjectively compared. The calculation of the predicted growth rates were obtained by taking differences between consecutive predictions of either the PSA or the ln(PSA). The LSD (5%) for the predicted growth rates were calculated from those of the corresponding primary response, PSA or ln(PSA). Also, to examine the fit of these (smoothed) predictions, plots of the trend deviations were produced, the trend deviations being the difference between the unsmoothed predictions and the smoothed predictions. These are analogous to the median deviations used in the SET.
- 4.
Look to simplify the FRMS describing the effect of Zn and AMF on the trend over DAPs Firstly, for the chosen number of knots, a test of significance was conducted to determine if the ZAD random deviations were significant; if they were not significant, they were dropped from the model. Then tests of significance were made to compare models with heterogeneous spline terms for each Zn-AMF combination, heterogeneous spline terms for each AMF level, heterogeneous spline terms for each Zn level and a single, homogeneous spline term across all Zn-AMF combinations. If none of the models with heterogeneous spline terms was significant, then a third parameterization of the model was fitted, this one incorporating interactions between (i) Zn, (ii) AMF, and (iii) the Zn by AMF interaction with xDAP and spl(xDAP) and DAPs, where each spline term has a single, homogeneous spline component. In this case the partition of β was modified to \(\left[ {\begin{array}{*{20}c} \mu & {{{\varvec{\upbeta}}}_{{\text{B}}} } & {{{\varvec{\upbeta}}}_{{\text{Z}}} } & {{{\varvec{\upbeta}}}_{{\text{A}}} } & {{{\varvec{\upbeta}}}_{{{\text{ZA}}}} } & {{{\varvec{\upbeta}}}_{{{\text{xDZ}}}} } & {{{\varvec{\upbeta}}}_{{{\text{xDZA}}}} } & {{{\varvec{\upbeta}}}_{{{\text{xDZA}}}} } \\ \end{array} } \right]\) and that for the random vector u to \(\left[ {\begin{array}{*{20}c} {{\mathbf{u}}_{{\text{Zspl(xD)}}} } & {{\mathbf{u}}_{{\text{Aspl(xD)}}} } & {{\mathbf{u}}_{{\text{ZAspl(xD)}}} } & {{\mathbf{u}}_{{{\text{ZD}}}} } & {{\mathbf{u}}_{{{\text{AD}}}} } & {{\mathbf{u}}_{{{\text{ZAD}}}} } & {{\mathbf{u}}_{{{\text{BC}}}} } & {{\mathbf{u}}_{{{\text{BCxD}}}} } & {} \\ \end{array} {\mathbf{u}}_{{\text{BCspl(xD)}}} } \right]\). The difference between this and the previous parameterization is that main effect terms for Zn and AMF have been included for each of the intercepts, slopes and splines. For each response, a test of significance was conducted to determine if the ZAD random deviations were significant; if they were not significant, they were dropped from the model and tests of the ZD and MD random deviations conducted, dropping any nonsignificant deviations. On completion of these tests, a test was conducted to establish whether the ZAspl(xD) spline term was significant; if it was not significant, it was dropped from the model and tests of the Zspl(xD) and Mspl(xD) spline terms conducted, dropping any nonsignificant spline terms. The predictions from the fitted models were obtained for the Zn-AMF combinations over the observed DAPs and the growth rates calculated from them.
Again, the R packages asreml [37] and asremlPlus [38] were used to fit the mixed models to both response variables. Testing for variance terms used Restricted Maximum Likelihood Ratio Tests (REMLRT), the calculation of the p-value being adjusted when the test involved a variance component constrained to be nonnegative [21]. Wald F-statistics, with Kenward and Roger [39] calculation of their denominator degrees of freedom, were obtained to assess the significance of the fixed effects. These packages were also used to produce predicted values and their standard errors, the latter used to compute LSD (α = 0.05) values for comparing pairs of predicted values.
To investigate the adequacy of the fitted models, residual versus-fitted-values, boxplots of the residual for each DAP and normal probability plots of the residuals were obtained and are presented in Additional file 3: Figures S10–S15. The residual versus-fitted-values and residual boxplots appear to be satisfactory, there not being any evidence of heterogeneity of variance. However, the normal probability plots indicate that the normality assumption underlying the longitudinal analysis is not met. On the other hand, the shape made by the residuals in the plot indicates that the data are symmetrically distributed. It is concluded that the results of the analyses will be approximately correct, especially given the large number of observed values in each analysis.