Digital imaging of root traits (DIRT): a high-throughput computing and collaboration platform for field-based root phenomics

Background Plant root systems are key drivers of plant function and yield. They are also under-explored targets to meet global food and energy demands. Many new technologies have been developed to characterize crop root system architecture (CRSA). These technologies have the potential to accelerate the progress in understanding the genetic control and environmental response of CRSA. Putting this potential into practice requires new methods and algorithms to analyze CRSA in digital images. Most prior approaches have solely focused on the estimation of root traits from images, yet no integrated platform exists that allows easy and intuitive access to trait extraction and analysis methods from images combined with storage solutions linked to metadata. Automated high-throughput phenotyping methods are increasingly used in laboratory-based efforts to link plant genotype with phenotype, whereas similar field-based studies remain predominantly manual low-throughput. Description Here, we present an open-source phenomics platform “DIRT”, as a means to integrate scalable supercomputing architectures into field experiments and analysis pipelines. DIRT is an online platform that enables researchers to store images of plant roots, measure dicot and monocot root traits under field conditions, and share data and results within collaborative teams and the broader community. The DIRT platform seamlessly connects end-users with large-scale compute “commons” enabling the estimation and analysis of root phenotypes from field experiments of unprecedented size. Conclusion DIRT is an automated high-throughput computing and collaboration platform for field based crop root phenomics. The platform is accessible at http://dirt.iplantcollaborative.org/ and hosted on the iPlant cyber-infrastructure using high-throughput grid computing resources of the Texas Advanced Computing Center (TACC). DIRT is a high volume central depository and high-throughput RSA trait computation platform for plant scientists working on crop roots. It enables scientists to store, manage and share crop root images with metadata and compute RSA traits from thousands of images in parallel. It makes high-throughput RSA trait computation available to the community with just a few button clicks. As such it enables plant scientists to spend more time on science rather than on technology. All stored and computed data is easily accessible to the public and broader scientific community. We hope that easy data accessibility will attract new tool developers and spur creative data usage that may even be applied to other fields of science. Electronic supplementary material The online version of this article (doi:10.1186/s13007-015-0093-3) contains supplementary material, which is available to authorized users.


Background
Global food demand is projected to double by the year 2050 [1,2]. Meeting this increased demand requires significant improvements in crop yield and the development of crop plants adapted to water-stress [3] and low fertility soils [4,5]. Breeding more efficient roots is increasingly recognized as a high-priority target to achieve yield improvements [6] because roots are essential for nutrient and water uptake [7][8][9]. Yet, little is known regarding the relationship between root system architecture (RSA) and crop function with few examples linking root phenotype with genotype and phenotypic advantages under given field conditions [10][11][12].
Developing new crop varieties includes both laboratory-and field-based studies [13,14]. Especially field studies to characterize RSA of mature field-grown crops involve laborious manual tasks that limit the achievable sample size. Extending field-based studies and sample sizes is a widely shared goal for future phenotyping scenarios [15,16]. Indeed, phenotyping rather than genotyping is recognized as the bottleneck limiting advances [17,18], given inexpensive next-generation sequencing technologies that have paved the way for characterizing the genotypes of diversity panels of thousands of recombinant inbred lines [19]. In response, a number of national and international efforts, including the International Plant Phenotyping Network, have established "plant phenomics" centers to quantify plant phenotypes and their genetic origin [20].
Similarly, despite some successes, there are relatively few publicly available root phenotyping datasets [21]. Available large datasets are pre-dominantly derived from laboratory-based root phenotyping platforms. Laboratory studies benefit from increased levels of control and, at least in a few cases, have identified loci with candidate genes underlying RSA in early root development [22,23]. However, growth containers used in these studies, filled with real or artificial soil [24][25][26][27], limit observations spatially and temporally to small or immature root systems [28,29].
Establishing a link between RSA and genotypes requires the measurement of root phenotypes [30], often derived from automatic analysis of two-dimensional and three-dimensional digital images [31][32][33][34][35][36][37][38][39]. A comprehensive overview of existing software for root image analysis is maintained at the site: http://plant-image-analysis. org [40]. The scope of this software collection is impressive, in that individual tools provide different degrees of computational automation, ranging from manual, semiautomatic to fully automatic. However, none of these provide an integrated platform that can (a) associate root images with environmental and phenotypic meta-data, (b) provide seamless access to scalable, supercomputing resources for non-technical users and (c) share information within a collaborative team and the plant science community.
In order to address these issues we have developed DIRT. The DIRT platform provides a number of major functionalities that enable researchers to: (a) manage root image collections and metadata; (b) interactively calibrate measurement pipelines; (c) compute crop root traits on scalable high-throughput compute platforms; and (d) analyze the results of computations. Broadly DIRT enables researchers to process thousands of root images through the pipeline with custom parameters and view and analyze computed RSA output associated to the raw images. Thus, our platform makes high-throughput scalable computational platforms available to the researchers with no technical expertise.

Utility
DIRT addresses the phenotyping bottleneck within the computational plant sciences, by providing a single platform to meet the demands of data access and storage, exchange and sharing, and image-based high-throughput root phenotyping [41]. The DIRT platform enables users to organize and share images as datasets per experiment (Fig. 1a), run image processing algorithms on the datasets such that computed root trait values can be downloaded directly from the user interface (Fig. 1b). Visual quality control is implemented as a calibration tool for the masking threshold needed to separate the root from the background (Fig. 1c) and the possibility to investigate all intermediate image processing steps (Fig. 1d). The algorithms deployed on DIRT have been specifically designed and tested on two-dimensional images taken of root systems in the field. By focusing on crop root traits, DIRT also overcomes the time consuming manual measurement processes involved in Shovelomics [42], while enabling measurements of manually inaccessible traits such as the dominant root tissue angle. Overall, DIRT is a unique root phenotyping platform, accessible by everybody via an interactive web-based interface without the need to install software locally on a computer.
The RSA trait computation pipeline available in DIRT is fully automated and includes automatic estimation of 78 traits in total (see Additional file 1: Section S3). Traits are categorized into common traits for all root system architectures, monocot traits, dicot traits and traits for excised root samples. We provide a separate, optional threshold calibration tool that allows the researcher to select a representative image from the marked collection and compute binary image masks using different segmentation threshold values. Within this calibration workflow, the user selects the most appropriate value by visually checking the image mask.
As a response to community requests, the original trait computation pipeline in DIRT was extended. The current pipeline includes previously unpublished algorithms to measure traits such as top and bottom angle in monocots (see Additional file 1: Section S3). The pipeline is best used by following the DIRT imaging protocol to process 2D root images. In brief, a washed root is imaged against a dark diffuse reflecting background that contains a light colored circle with known diameter. Additionally, a barcode, QR-code or simple text can be placed above Fig. 1 Major DIRT functionalities. a A cowpea root dataset annotated with experiment parameters and location and shared with three other members (names were replace with red bars). b The overview of the computed cowpea data set shown in (a). The computation parameters are shown along with icons of the image mask. Computed traits and entered image metadata can be downloaded as Excel compatible.csv files. c A user can visually choose the best threshold parameter to separate the root from the background. d Each of the images in the computation shown in (b) can be assessed in detail. Every image processing step can be followed visually per image and compared to the original image and the computed traits the root for automatic identification to be associated with trait computations (see Additional file 1: Figure S2). On completion of the computation, masked images, computed traits, and corresponding CSV and RSML files [43] populate the computation view tab. See Additional files 2 and 3 for examples of produced CSV and RSML files.
DIRT was designed to enable full data control for researchers, whether individually or as part of collaborative teams. As such we realized sharing options, where each newly created collection is designated to be private by default. The owner of a collection can share data and computed results privately with one or many collaborators via the platform's web-interface or publish collections and computations publically under a chosen creative commons license. Furthermore, DIRT enables different functions based on user access rights. The owners of data can edit, upload, download and delete images and corresponding metadata. Metadata can be associated to whole experiments or data sets to document experiment conditions (e.g. FAO soil type, GPS location, soil moisture content). The association is realized as an upload of a CSV file containing the metadata or is entered via a web form directly in the web browser. On top of suggested standard experiment parameters a dynamic form allows the documentation of nonstandard parameters such as nitrogen content per depth level. Similarly, each root image can be annotated manually or by uploading a pre-formatted CSV file with specific metadata (e.g. genotype, dry biomass) and may contain RSML files of manual measurements to annotate the image, e.g. from RootNav [44] (Additional file 1: Section S6.3.7).
DIRT is hosted publically on the iPlant cyber-infrastructure [45,46] leveraging its cloud data storage and the Advanced Agave API to communicate with the Texas Advanced Computing Center (TACC) for highthroughput computation of stored root images. It is built as a multi-tiered application consisting of a web server, a database server, iPlant's data store, middleware and grid computing. The core middleware components are the PHP modules interfacing the database, iPlant data store and grid-computing environment. DIRT's web interface is developed using the widely adopted open source content management system Drupal (http://drupal.org). DIRTs' graphical interfaces (Fig. 1) are accessible via standard web browsers and abstract the organization and storage of root images and their metadata in a MySQL database and iPlant's data store from the user. The imageprocessing pipeline is developed in Python and runs on TACC. The trait computation pipeline is abstracted from the computational resources and from the aggregation and sharing of images. Hence, it is possible for developers to extend DIRT by incorporating new pipelines adapted to distinct imaging and experiment conditions (see Additional file 1: Section S7.3). The DIRT source code and installation instructions are available for download from the DIRT website (see Additional file 1: Section S7.2) to facilitate use of private supercomputing resources for the plant science community. As a proof of concept we have also released an installation of DIRT at Georgia Tech (http://dirt.biology.gatech.edu) that uses Georgia Tech's high performance computing environment; instructions for a local installation of DIRT on proprietary computing resources are described in Additional file 1: Section S7.3. Altogether, DIRT assembles a unique root phenotyping platform that is accessible to non-technical users via an interactive web-based interface.

Design and implementation
In this section we describe the high-level view of the system to give insight into the extensibility and sustainability rationale underlying the platform design. DIRT is a multi-tiered online platform developed with the Drupal framework [47]. Drupal is an open source content management system and framework made up of a software stack that can be used to build content-rich web applications. Figure 2 shows the three tier architecture: the client tier constitutes the user interfaces in web browser, the processing tier encompasses the Drupal modules and image processing pipeline, and the storage tier consists of the database and file systems. Figure 1 shows a high-level overview of the interfaces available to DIRT users. In particular, the functional specifications of DIRT were defined to meet the demands of field root phenotyping: In the following we detail the content, component and deployment model of the DIRT system to inform developers about our extensions to DRUPAL.

Content model
The content model is best described as a class association model that defines the storage architecture for contents with different attributes (e.g. root images, collections, virtual collections, metadata). A class association diagram in the unified modeling language (UML) [50] is a type of static structure diagram that describes the structure of a system by showing the system's contents or classes, their attributes, operations and relationships. Figure 3 is a class association diagram of the DIRT's contents or classes depicting the major attributes or fields of the contents and their relationships.
Within the Drupal framework, each content type has a set of common attributes: • NID (Node ID): Every node or content in the Drupal system has a unique ID assigned, irrespective of the content type. • Title: Every node or content in the system is required to have a title. • UID: Every node or content in the system is explicitly tied to its creator i.e. the user of the system who created it. • Status: Every node or content in the system has one of the two states, published or unpublished. This feature assures that content is kept offline, until the content is valid and complete to be taken online. • Created and changed: A timestamp monitor content or node changes. • VID (version ID): Every node or content in the system maintains its version information. If enabled, all changes to a content or node is stored and maintained.
In addition to Drupal's common attributes, DIRT content types require custom attributes to meet the system's requirement specifications. Here we describe these content types briefly: • Calibrated mask images contains attributes to associate an image to multiple image masks created during the calibration of an original root image and an attribute referencing the original root image in the system database. • Computation references to a marked collection, a RSA trait computation pipeline, the pipeline parameters and the traits available in a pipeline. Further- The symbols at the end of these lines represent the association type and the text on these lines represents the attributes of the association more, Computation contains an attribute to define its visibility. A computation also contains a field of type file to link to a CSV file containing computed RSA trait values of a referred Marked Collection. • DIRT Output defines the output produced for each raw root image by the RSA trait computation pipeline. It contains attributes to refer to a computation and original root image. Additionally, the content type contains attributes to refer to the image mask of the original image, each RSA trait value and the output RSML file. • Image processing pipeline has attributes for the pipeline parameters and each available trait. • License defines attributes for the licenses supported by the DIRT platform. The License content type is associated to computation and root image collection content types. • Marked collection has attributes that describe a list of root images. • Metadata has attributes that refer to a root image collection and a file that links a pre-formatted CSV file. • Root refers to an original root image within a root image collection. Hence, the attributes hold a reference to a root image, a root image collection and each associated metadata entry. • Root image collection has the attributes collection visibility, collection, membership, collection license and all collection metadata.

Component model
The DIRT platform consists of three major components: 1. Web server component: These are the Drupal components including core, community contributed and custom DIRT modules that orchestrate the whole platform in cohort. The content model described in the previous section is designed and implemented using these module types. 2. RSA trait computation component: These are the Python code used for the trait computation that is deployed to both the web server and grid computing node to meet the calibration and trait computation system specifications respectively. 3. Interface component: These are the shell scripts that reside on the web server and grid-computing node to interface between DIRT and the grid job scheduler.
In accordance with the Drupal architecture guidelines, DIRT is modular and every process in DIRT involves several components or modules. In Fig. 4 we show the components and their interactions in DIRT for the RSA trait computation process. The computation process in DIRT involves the user interface component, rules component, workflow component, custom DIRT components and core components. The processes start whenever a new content of type "Computation" is created. The user provides a computation name, selects a "Marked Collection" and the RSA trait computation pipeline, provides the respective pipeline parameters and selects traits to be computed. By clicking the "Save" button in the computation interface the rules engine is notified to trigger two DIRT workflows. The first workflow starts the DIRT job submission module as background process to run the RSA pipeline on the grid-computing environment. The background process receives the configuration details of the grid job, updates the database system, changes the computation status and notifies the user about the computation status via email if the computation is started successfully. As a second workflow the background process schedules the DIRT job status check module to run in background in every 10 min (until job completion or termination). When executed the grid is pinged for the job status and the job status is checked. If the job is completed, the computed output is transferred to the web server, the database is updated with the computed values, DIRT output contents are created and the user is notified. Each step in these workflows in turn is associated with other sub-modules or components located across different software nodes of the platform.

Deployment model
The deployment model is the static view of the run-time configuration of the processing nodes and all executed components. The deployment model defines the distribution of all DIRT components across different physical nodes in terms of folder structures an access rights. This deployment model is largely automated. Therefore we refer for detailed practical information to the Additional file 1: Section S7.

Discussion and conclusion
DIRT is designed as a community platform. As such we collected 10 public data sets that are available to every iPlant user. These initial data sets contain 4894 root images of field-grown roots excavated with the shovelomics technique. Four of these data sets are published on DIRT before the publication of their related projects. Furthermore, we expect the content volume to grow rapidly through additions from the plant science community. This expectation relies on the observed growth of DIRT users. At time of publication we counted 31 users from 14 institutions and we are confident that our users follow the open science example of open data, source code and documentation sharing. For example, in Figs. 5 and 6 we show a typical community contribution, where a public maize dataset (Fig. 5) is used to compare and validate the manual Shovelomics traits with automatically computed DIRT traits (Fig. 6). In the given maize example, previously unavailable traits were added to DIRT (Root Top Angle, Root Bottom Angle) and subsequently validated by the DIRT user community. The data set and its stored computation results were shared on the website (http:// dirt.iplantcollaborative.org/content/maize-validationset) along with a reference to the validation presented in this paper. Overall, the contributed validation showed excellent results by reassuring the known correlations of stem diameter (R 2 = 0.69, p < 0.0001), median width (R 2 = 0.88, p < 0.0001) and maximum width (R 2 = 0.83, p < 0.0001), as well as establishing new correlations for the previously unpublished traits root top angle (R 2 = 0.87, p < 0.0001) and root bottom angle (R 2 = 0.75, p < 0.0001). Details on other public data sets can be found in the Additional file 1: Section S5.
From our experience, the simple excavation and imaging protocol enables 2-3 persons to phenotype 500-700 common bean roots per day in soil with high clay content. Here, the limiting factor are soil properties such as clay content or compactness that impede root excavation, while sandy soils allow fast and easy root excavation. Until now we did not experience the limits of the computing resources. However, the growing community of DIRT users will increase the computational load on the computing resources and eventually reveal the limits of the current system.
We presented DIRT as an open online platform that stores and organizes root image data sets, executes RSA Fig. 4 Component diagram showing the components involved in RSA trait computation process on the DIRT platform. In UML [50], a component diagram represents the structural relationships between the components that form larger subsystems. A component is considered as an autonomous, encapsulated unit within a software system that provides one or more interfaces (See figure on next page.) Fig. 5 Screenshot from the DIRT web-application. The screenshot shows the root collection overview tab for a maize validation data set collected at the Ukulima Root Biology Center in South Africa. On the top the main menu is visible that contains all functionality to manage root images, create marked collections, run computations and perform the threshold calibration. Individual root images are shown below, along with an informal description of the dataset, an accompanied creative commons license and the location of the root excavation trait estimations and documents performed computations on root image data sets. DIRT allows contributions from the whole root phenotyping community, including users and developers, and enables sharing and documentation of experiments. It is encouraged to submit images taken with the DIRT imaging protocol to make use of all DIRT features. However, proprietary imaging protocols are often supported with limitations. Additionally, our efforts to make DIRT an open-source, transparent and freely accessible tool will enable further development and adaptation of the platform in response to research demands of free public data sets [21]. Overall DIRT is a unique computational resource that promotes automated, yet researcher independent, root phenotyping as a response to the demands of researchers working under field conditions, to discover novel links between root morphology and the plant genome.

Availability and requirements
DIRT is freely accessible and usable at http://dirt. iplantcollaborative.org. In the spirit of open-source development, we have hosted DIRT on iPlant's cyber infrastructure, which is open to the public. All source code is available on the DIRT GitHub repository (https:// github.com/abucksch/DIRT) and on the DIRT website (http://dirt.iplantcollaborative.org/about-us?qt-about_ us_quicktabs=2#qt-about_us_quicktabs). A user manual guide is included as part of the Additional file 1.