Phylogenomic synteny network analyses reveal ancestral transpositions of auxin response factor genes in plants

Background Auxin response factors (ARFs) have long been a research focus and represent a class of key regulators of plant growth and development. Integrated phylogenomic synteny network analyses were able to provide novel insights into the evolution of the ARF gene family. Results Here, more than 3500 ARFs collected from plant genomes and transcriptomes covering major streptophyte lineages were used to reconstruct the broad-scale family phylogeny, where the early origin and diversification of ARF in charophytes was delineated. Based on the family phylogeny, we proposed a unified six-group classification system for angiosperm ARFs. Phylogenomic synteny network analyses revealed the deeply conserved genomic syntenies within each of the six ARF groups and the interlocking syntenic relationships connecting distinct groups. Recurrent duplication events, such as those that occurred in seed plants, angiosperms, core eudicots and grasses contributed to the expansion of ARF genes which facilitated functional diversification. Ancestral transposition activities in important plant families, including crucifers, legumes and grasses, were unveiled by synteny network analyses. Ancestral gene duplications along with transpositions have profound evolutionary significance which may have accelerated the functional diversification process of paralogues. Conclusions The broad-scale family phylogeny in combination with the state-of-art phylogenomic synteny network analyses not only allowed us to infer the evolutionary trajectory of ARF genes across distinct plant lineages, but also facilitated to generate a more robust classification regime for this transcription factor family. Our study provides insights into the evolution of ARFs which will enhance our current understanding of this important transcription factor family.


Background
The plant hormone auxin, or indole-3-acetic acid, controls many physiological and developmental processes in land plants including but not limited to organogenesis, tissue differentiation, apical dominance, gravitropism as reviewed previously [1,2]. Completion of the genomes of the moss Physcomitrella patens, [3,4], the liverwort Marchantia polymorpha [5] and the lycophyte Selaginella moellendorffii [6] revealed that many core functional proteins required for auxin biosynthesis, perception, and signaling were present in the early-diverging land plant lineages. Comprehensive evolutionary studies also suggested that the auxin molecular regulatory network evolved at the latest in the common ancestor of embryophytes [7,8] and the auxin response factor (ARF) genes evolved from

Open Access
Plant Methods *Correspondence: cmx2009920734@gmail.com; jzhang@hkbu.edu.hk 1 State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong, China 4 CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China Full list of author information is available at the end of the article the charophyte ancestors [9]. A recent review demonstrated that with the exception of the PGP/ABCB auxin transporters, homologues of all the other core components for hormonal control of physiology and development by auxin could be identified in P. patens [10]. Changes in auxin perception and signaling that occurred through evolution could have generated the diversification of plant forms that occurred during the past ~ 474-515 million-year history of the land plants [11], which eventually led to the complex vegetative innovations that shape the modern terrestrial and freshwater ecosystems [2].
Auxin response factors, as core components in auxin signaling, have long been a focus of plant signaling research [12]. The 23 ARFs identified in the Arabidopsis thaliana genome were phylogenetically clustered into three subfamilies (Clades A, B and C) which were subsequently divided into seven groups (ARF9, ARF1, ARF2, ARF3/4, ARF6/8, ARF5/7 and ARF10/16/17), a classification that was well supported by ARF genes from other angiosperms and representative non-flowering lineages [13]. Generally, ARF proteins can be functionally divided into transcriptional activators (ARF5-8 and 19 in A. thaliana) and repressors (remaining ARFs in A. thaliana) with well-characterized functional domain architectures [13,14]. ARFs bind to the auxin response elements (AuxRE: TGT CTC ) in the promoter region of downstream auxininducible genes [15] and function in combination with Aux/IAA repressors, which dimerize with ARF activators in an auxin-regulated manner [14,16]. Unlike ARF activators, few reports have demonstrated that ARF repressors are able to interact with other ARF proteins or Aux/IAA proteins [17]. Recent work revealed a newly identified mechanism whereby the IAA32 and IAA34 transcriptional repressors are stabilized by the transmembrane kinase 1 (TMK1) at the concave side of the apical hook of the kinase to regulate ARF gene expression and ultimately inhibit growth [18].
In most of the well-established transcription factor annotation procedures, such as those implemented by the PlnTFDB [19], PlantTFDB [20], iTAK [21] and TAPScan [22], ARFs were identified using two signature domains: the B3 (PF02362) domain and the auxin-response (PF06507) domain, although some ARF proteins (e.g. ARF23 in A. thaliana) may be truncated and lack the auxin-response domain [13,14,23]. Finet et al. [13] established a robust and comprehensive phylogenetic framework for the ARF gene families, however ARF genes from non-flowering plants were under-represented. Comprehensive annotation of transcription factors covering distinctive plant clades demonstrated that a number of plant specific transcription factor families (including ARF) evolved in streptophytic algae [9,22,24], suggesting an earlier origin of ARF than that proposed by Finet and colleagues [13].
Compared to conventional gene family studies that focus on one or a limited number of species of interest [25][26][27], phylogenetic studies on a broader scale that include multiple plant lineages were able to generate more robust insights into the evolutionary process that gave rise to the modern assemblage of a target gene family [13,28]. The inclusion of genomic synteny data provides important information that impacts the determination of the evolutionary past of a gene family, especially when the gene family of interest evolved in parallel with ancestral genome duplication events [29]. The conventional genomic block alignment that connects orthologues, retained on genomic syntenic blocks, worked well for a limited number of species [29,30], but a network approach was more effective when multiple genomes were included in the synteny analyses [31,32]. A comprehensive genomic synteny network can be constructed using nodes to represent the target genes and associated adjacent genomic blocks and the network edges (connecting lines) to represent syntenic relationships [29,32]. The recently established phylogenomic synteny network methodology was able to integrate and summarize genomic synteny relationships to uncover and place genomic events (e.g. ancient tandem duplications, lineage-specific transposition activities) into the evolutionary past of a target gene family [31,32].
In this study, we collected more than 3500 ARF members to generate a comprehensive gene-family phylogeny with the aim of filling evolutionary gaps in the non-flowering plants and splitting the long branches present in the current phylogeny [13]. We propose an updated model for the evolution of ARF family that covered the major streptophytic clades that was based on the six-group classification system we proposed for the ARF genes in angiosperms. Phylogenomic synteny network analyses of angiosperm genomes revealed the deep positional conservation of ARF members within each of the six groups. Detailed individual synteny network analyses together with phylogenetic reconstructions for the six ARF groups revealed their distinctive evolutionary histories. Ancestral duplication events in angiosperms, and subsequent WGDs in eudicots and monocots have contributed to the expansion of ARF members. Ancestral lineage-specific transpositions in important angiosperm families such as cucifers, legumes and grasses were also unveiled. Together, the results presented here add to our current understanding of the evolutionary process that established ARF genes in plants. We also expect this broadscale evolutionary framework could help direct future functional studies that further explore the interplay between auxin signaling and the evolution of land plants.

Evolution of auxin response factors in streptophytic algae
To generate a broad-scale phylogenetic profile for ARF genes in plants, we collected a total of 3502 ARF homologues in the streptophytes. ARFs were present in all major clades of streptophytes including charophytes, hornworts, liverworts, mosses, lycophytes, ferns and seed plants (Additional file 1: Fig. S1). In chlorophytes, the Auxin-response domain was not detected although some chlorophyte genes did contain the B3 domain. Genes containing both the B3 (PF02362) and the Auxinresponse (PF06507) domains were identified in streptophytic algae (charophytes). This was consistent with the observation that a number of plant-specific transcription factors evolved in streptophytic algae [22]. Charophytes represented a paraphyletic clade encompassing successive sister lineages to the land plants [22,33]. We identified ARF homologues in species that are found in three charophyte orders: Zygnematales, Coleochaetales and Chlorokybales, but ARF homologues were not identified in the transcriptomes of Charales and Klebsormidiales. However, the presence of ARF in charophytes was affirmed by the Chara braunii (Charales) genome [24]. The identification of the single ARF gene in Chlorokybus atmophyticus (Chlorokybaceae) suggests that the origin of ARF genes probably trace back to the root position of streptophytes (Additional file 1: Fig. S1). This observation suggests an earlier origin of ARF gene than those reported earlier [13,22] and consistent with a previous study [8].

Broad-scale phylogenetic profile of ARFs in plants
Overall, the numbers of ARF genes in individual angiosperm genomes are greater than those in the individual genomes of non-flowering plants and the 'recent' polyploids, such as Glycine max, possess conspicuously more ARF genes than other plants (Additional file 1: Fig. S2). The inclusion of homologues identified from the 1KP transcriptome database provided a comprehensive atlas for the ARF family phylogeny. Overall, the broad-scale phylogeny of ARFs generated in this analysis was closely in parallel with the phylogenetic relationships among plant lineages (Fig. 1) derived from large-scale phylotranscriptomic study [34]. The phylogenetic tree generated from the ARF gene collection was rooted by the ARF gene from Chlorokybus atmophyticus, an early diverging charophyte, and exhibited a consistent tree topology with that reported previously [13]. The incorporation of transcriptomic data from non-flowering plants enabled long evolutionary branches to be split. The phylogenetic analyses also provided robust evidence that angiosperm ARFs could be separated clearly into three major subfamilies (Clade A, B and C; consistent with previously reported groupings [13]). The three major subfamilies encompassed the six groups (designated as Group I through VI) in this study (Fig. 1). In comparison to the classification proposed by Finet et al. [13], the Group-I ARFs contained the ARF1 and ARF9 subfamilies (which were likely to have derived from an ancient angiosperm-wide duplication) and Group II through VI correspond to the ARF 2, ARF 3/4, ARF 6/8, ARF 5/7 and ARF 10/16/17 subfamilies (Additional file 1: Table S1), respectively.
Groups I, II and III were clustered in the subfamily clade-B, groups IV and V in clade-A and group VI in clade-C. The reconstructed family phylogeny suggested clade-C as a sister group to clades A and B. ARFs from the charophytes (Zygnematales and Coleochaetales) were separated into clade-C and clade-B failing to partition into a basal mono-or para-phyletic clade, which suggested an ancient diversification of ARF genes within the charophytes. The family phylogeny also revealed that each of the six angiosperm ARF family groups were located with gymnosperm ARF genes as the closest sister lineage. The tree branches of gymnosperm ARF genes are conspicuously shorter than those for angiosperms ( Fig. 1 and Additional file 1: Fig. S3), which suggested lower amino acid substitutional rates and higher levels of protein sequence conservation in gymnosperm ARF genes, likely a result of longer generation times that are common in the gymnosperms [35].
Clade-A contains ARF genes that cover all major embryophyte clades and contains ARF genes of group-III together with orthologues from gymnosperms and ferns. The ARF genes from seed plants and ferns were separated into two major clades which are sister to each other which constituted a tree topology that was consistent with two child clades derived from an ancient duplication. While lycophyte ARF genes were placed outside of and sister to the large duplication clade shared by ferns and seed plants. ARF genes from hornworts were identified as basal-most in clade-A, followed by genes from mosses and liverworts.
Clade-B was the most diversified lineage containing the angiosperm group I and II genes and along the gymnosperm orthologous genes delineated a conspicuous seedplant duplication (the event) [36]. However, ARF genes from hornworts, liverworts and ferns were mixed into this large duplication clade (Fig. 1). We hypothesize that they might be derived from convergent evolution, though the possibilities of horizonal gene transfer or sequence contaminations cannot be eliminated. Genes from ferns, mosses, liverworts and lycophytes were placed as successive sister lineages to this duplication clade.
Clade-C was situated as the basal clade with a relatively simple phylogenetic profile and contains genes from every major plant lineage (from charophytes to angiosperms, Fig. 1). This configuration updated the evolutionary model in which clade-C ARFs were absent in gymnosperms [13].
The broad-scale phylogenetic analyses in this study established a robust and unified six-group classification system for angiosperm ARF genes, which is consistent with previous phylogenetic and domain architecture studies [13,14]. The relative phylogenetic positions of other land plant lineages were also clarified ( Fig. 1), providing a consistent phylogenetic framework for subsequent synteny network analyses.

Evolutionary trajectory of ARFs augmented with current genomic and transcriptomic data
In concordance with the phylogenetic analyses described by Finet et al. [13], we augmented the evolutionary trajectory of ARF family in plants with gene sequences from the currently available genomic and transcriptomic data. The resulting phylogenetic trajectory path (Fig. 2) suggested that the three ARF subfamilies (clades A, B and C) were likely diversified through an ancient duplication in the charophytes, which is consistent to the evolutionary trajectory proposed previously [7,8]. Tree uncertainties and unresolved land plant phylogenies were also reflected in the ARF gene-family phylogeny, leaving some of the evolutionary processes elusive.
All of the ARF transcriptional activators (ARF 5-8 and 19 in A. thaliana) were clustered in the clade-A subfamily. Within clade-A the ARF genes were wellconserved in all land plant lineages and appear to have experienced a conspicuous ancient duplication event that occurred in the ancestor of ferns and seed plants. This ancient duplication generated groups IV (ARF6 and 8 in A. thaliana) and V (ARF5, 7 and 19 in A. thaliana) in the angiosperms and the corresponding sister groups in gymnosperms and ferns (Fig. 2). The ARF genes in bryophytes (including hornworts, liverworts and mosses) and lycophytes were outside of this duplication. The ARF genes in clade-A also exhibited a gene tree topology consistent with the 'hornwort-sister' land The broad-scale family phylogeny of ARF genes in plants. The broad-scale family phylogeny of ARF genes in different plant lineages estimated using IQ-TREE maximum-likelihood algorithm. Branches representing ARF genes from different plant clades were colored and six conspicuous groups for angiosperm ARF genes were obtained and labeled on the tree (Groups I through VI). Gene tree structure and the three subfamilies (denoted as clades A, B and C) were consistent with that reported by Finet and colleagues with more ARF genes identified from plant genomic or transcriptomic datasets. Green circles indicate branch bootstrap support values larger than 80% plant phylogeny in contrast to the 'bryophytes-monophyletic' phylogeny [11,34]. The evolutionary well-conserved aspect of the ARF activator genes indicates an early genetic foundation for auxin signaling networks in the embryophytes [10].
In clade-B, unlike clade-A, ARF genes from hornworts were not found densely populated at the basal position of the subtree and some hornwort ARFs were found clustered with angiosperm ARF genes, making the evolution of clade-B ARFs in hornworts elusive. The trajectory analysis suggests two ancient duplications in clade-B, an embryophyte duplication shared by mosses, liverworts and tracheophytes that occurred before the diversification of groups I/II and group III, and a seed plant duplication that generated the groups: I and II. However, the close sister groups for group III were only found in the gymnosperms and ferns which suggested there were gene losses in mosses, liverworts and lycophytes (Fig. 2).
The subfamily of clade-C is well-conserved, covering all streptophytic lineages, and generated the simplest phylogenetic profile (Fig. 2), containing the group-VI angiosperm ARF genes (the ARF 10, 16, and 17 in A. thaliana).
Hornwort ARF genes were placed as direct sisters to the vascular plants (tracheophytes) and the ARF genes of mosses and liverworts were placed at the base of the subtree (Fig. 1), generating discrepancies in the gene tree topology and the phylogeny of early land plant lineages.

Phylogenomic synteny network analyses of ARF genes
The broad-scale phylogenetic analyses suggested some subtree topologies that are consistent with the occurrence of ancient gene duplications but genomic synteny analyses are required to provide more substantive evidence [37]. The recently established synteny network approach, taking advantages of accumulated plant genomes, was able to provide such substantive evidence for ancient evolutionary processes of a specific gene family [31,32]. Applying this approach, we conducted a phylogenomic syntenic network analyses for ARF genes using a collection of available plant genomes (Additional file 1: Fig. S2). Syntenic ARF genes (syntelogs) were observed in some non-flowering plants (e.g. a lycophyte and a moss), but all represented in-paralogues which were considered to have derived from lineage-specific duplications. The ARF genes identified in angiosperm genomes were the primary target of the analysis and used as anchors to construct the genomic synteny network.
Among the 1227 annotated angiosperm ARF genes containing valid B3 and Auxin-response domains (Additional file 2: Table S2), 1096 (89.3%) were detected to be located within genomic synteny regions that demonstrated genomic collinear relationships with at least one other ARF gene, and a total of 18,511 syntenic connections among ARF genes were detected (Fig. 3a,  b). Consistent with the family phylogeny described previously, most of the genome syntenic connections were observed within each of the six groups. ARF genes from distinctive ARF groups were syntenically connected (Fig. 3a), for example, ARF genes from group VI were connected to ARF genes from group III/IV/V and group III ARF genes with group I. The ARF synteny network analyses uncovered a total of 82 inter-group connections (Fig. 3a).
In the ARF gene synteny network, we detected 96 ARF genes that did not pass our ARF identification procedure but were demonstrated to be homologous and syntenic to the annotated ARF genes. These syntelogs were further inspected and most contained truncated B3 and/ or Auxin-response domains or lacking either or both of these signature domains. These truncated or pseudogenes that were retained in the syntenic genomic blocks were not incorporated in the phylogenetic analyses, however, we were able to assign and label them into one of the six angiosperm ARF gene groups by aligning them to classified angiosperm genes. In this way, both intact (total 1096) and truncated (total 96) ARF genes involved in the synteny network were classified. The classification for these truncated genes was considered reliable because of the distant phylogenetic relationships among the six groups (Fig. 1). This may suggest that using genomic syntenic relationships could be a robust approach for detecting pseudogenes retained in the syntenic genomic blocks and which exhibit significant local sequence identity with intact functional paralogues.
ARF genes from each group were found in separate and distinct syntenic communities in the initial synteny network visualization (Fig. 3b). The ARF synteny network was further dissected to find subnetwork communities by the use of clique percolation clustering at k = 3 implemented in CFinder v2.0.6 [38]. A total of 25 communities (numbered 0 through 24) (nodes clustered within a subnetwork usually possess more connections in its community than with nodes in other communities) were obtained (Fig. 4). Among the 1192 ARF syntelogs that were extracted from the synteny network database, 1128 (94.6%) were identified in the 25 network communities, other syntelogs that had a single syntenic connection or were not involved in a clique (at k = 3) were excluded. For example, among the 22 ARF genes in Arabidopsis thaliana, 17 members were clustered in 13 synteny network communities (Fig. 4a). The chromosome-level genome assemblies represented the best material for genome synteny analyses, but some plant genome assemblies currently available are still highly fragmented. For example, in the Malus domestica (apple) genome, only one ARF gene was clustered in the synteny network because the genome assembly version we obtained from Phytozome database and that was used in our synteny network construction was fragmented (approximately 881.3 Mb Fig. 3 Genomic synteny analyses of ARF genes among angiosperms genomes. a Maximum-likelihood gene tree for the ARF gene family with genomic syntenic relationships between the genes. Each connecting line located inside the inverted circular gene tree (implemented in iTOL) indicates a syntenic relationship between two ARF genes (syntelogs). The connecting lines are colored in congruence with the six angiosperm ARF groups. b Synteny network of the ARF gene family using detected syntenic relations extracted from the genome synteny network database, using nodes representing ARF loci and edges (connecting lines) representing syntenic relationships. Colors of the nodes represented the six groups of ARF genes in angiosperms and size of each node indicates its connectivity (bigger nodes have more connections). The synteny network were clustered and visualized using the 'Fruchterman Reingold' layout implemented in Gephi arranged in 122,107 scaffolds) (Fig. 4a). However, the network approach using multiple plant genomes appeared to be error-tolerant and the results were unaffected by the inclusion of a few fragmented genomes [31].
Species compositions for each of the 25 synteny network communities (Fig. 4a) indicate that network communities 4 and 5 are angiosperm-wide, containing ARF genes from monocots, eudicots and Amborella, Community 23, on the other hand, only contains ARF genes from monocots and community 24 is solely confined to ARF genes from eudicots. Other communities are lineage specific such as community 21 which only contains ARF genes from grasses, communities 13 and 14 that are specific to legumes, and communities 16 and 22 that are specific to the genus Brassica.
Subnetwork communities were separately visualized, using node colors to depict different plant lineages and node shapes, to delineate ARF genes from the different classification groups (Fig. 4b). Community 0 (labeled as 'VI-16-45') consisted of ARF members from group-VI, with a total of 16 nodes and 45 connections within the community. Some syntenic communities contained ARF genes from multiple groups. Community 5 was recognized as the largest community with 226 nodes and 4742 connections, and nodes in this community were primarily ARF genes from group-VI and group-IV, with a minority of members from group-III (3 nodes) and group-V (1 nodes). The mixed group communities suggest the existence of ancient tandem duplications [31], where duplicated paralogues were likely lost in the ancestral genome such that ancient tandem paralogues are not seen in most current plant genomes, but synteny network analyses reflect them as multigroup communities. Consistent with this hypothesis, tandem ARF genes from distant groups were rarely present in the genome of a single species used in the analysis (Additional file 1: Table S1). To illustrate this, the ARF gene (scaffold00029187) from Amborella was classified as a member of group-IV, but it had a syntenic connection with group-VI ARF genes from Oryza sativa (LOC_Os10g33940), Oropetium thomaeum (Oropetium_20150105_02810A) and Phaseolus vulgaris (Phvul_003G075800). This could be explained by the occurrence of an ancient tandem gene duplication that was generated prior to the separation of groups VI and IV. Following the speciation of basal angiosperms and eudicots plus monocots, the group-VI member was lost in Amborella, and the group-IV member was lost in the ancestor of monocots and eudicots resulting in the syntenic relationship seen between group-VI and group-IV ARF genes. The inter-group genomic syntenic . Selected basal lineages including Vitis vinifera (sister to other rosids), Amaranthus hypochondriacus (sister to asterids), and Amborella trichopoda (the basal angiosperm) were depicted as red nodes in the communities. Each network community was presented following the 'ARF clades'-'number of nodes'-'number of connections' nomenclature system and some communities contain genes from multiple ARF groups Gao et al. Plant Methods (2020) 16:70 connections not only provided evidence for ancient gene duplications followed by lineage-specific gene losses, but also suggested that modern ARF genes evolved from a common ancestor present in the streptophytes.

Evolutionary characteristics for each of the six groups of ARFs in angiosperms
The global phylogenetic and synteny network analyses generated a robust six-group classification system for ARF genes and indicated pervasive intra-group syntenic relationships. To elaborate the evolutionary processes within each of the six groups, individual phylogenetic trees for angiosperm genes in each of the six groups were estimated separately and syntenic connections within each network community were mapped onto the six gene trees [39]. Along with the ARF genes identified from Phytozome plant genomes, the ARF genes from basal ANA grade angiosperms and magnoliids were incorporated in the phylogenetic analyses, however most of these ARF gene sequences were derived from transcriptomes and thus did not provide syntenic information. The number of angiosperm (including eudicots, monocots, magnoliids and ANA grade) ARF genes in each of the six groups ranged from 190 (group II) to 318 (group I). Below we describe the primary evolutionary characteristics for the six ARF groups separately. Group-I: This group represented the largest clade (containing 318 angiosperm ARF gene members) of the six groups (Fig. 5a). An evident angiosperm-wide duplication (delineated as groups IA and IB) was identified from the tree topology with the three relevant bootstrap values supporting the duplication branch and the two child clades greater than 95%. Both IA and IB clades include genes from monocots, eudicots, magnoliids and basal angiosperm lineages. The single ARF gene member from Amborella was placed as sister to both the IA and IB duplication clades, suggesting that the ARF Fig. 5 Phylogenetic and synteny network analyses for each of the six groups of ARFs in angiosperms. Maximum-likelihood trees for each of the six ARF groups were constructed, genes from different species groups were colored using different colors and genes detected in syntenic genomic blocks (syntelogs) were connected using curved lines. The syntenic connections belonging to different synteny network communities were plotted using different colors. Synteny network communities were numbered according to that depicted in Fig. 2b. Inferred ancestral transposition activities were indicated by red arrows gene duplication likely occurred after the separation of Amborella from other angiosperms.
Network communities associated with group-I included angiosperm-wide communities 4 (116 nodes) and 2 (65 nodes) (Fig. 4b), which align to groups IA and IB (Fig. 5a), respectively. Group IA was consistent with the designation ARF9 and group IB with the designation ARF1 in A. thaliana as reported previously [13]. The number of ARF genes included in group IA was greater than in group IB, particularly for the ARF genes from superrosids. The core-eudicot duplication [40] may have contributed to the expansion of the family, but some ARF genes from the magnoliids were included in the duplication clade with a bootstrap supporting value for the duplication node lower than 70%. In addition, some lineage-specific network communities for ARF genes in group-I were observed: communities 10, 11, 13, 14 and 16 are small communities containing the ARF genes from superrosids (Fig. 4b) and rendered as monophyletic clades in the phylogenetic analyses (Fig. 5a). The species composition analysis (Fig. 4a) for these lineage-specific communities indicated ancestral transposition activities in the Brassicaceae (communities 10, 11 and 16) and legumes (communities 13 and 14).
Group-II: Group-II was the smallest group (190 angiosperm ARF genes) and synteny network analyses revealed two main communities, 19 and 8, as depicted in Fig. 5b. Community 8 contains 26 nodes with ARF genes from only eudicots and Amborella, clustered with a group of magnoliid genes, that formed a paraphyletic clade at the basal position. While the nodes in community 19 were angiosperm-wide, and ARF genes from grasses were separated into two clades, one clade following the ARF genes in community 8 and the other clustered with the other monocots. However, the ARF genes clustered in each of the two grass clades did not share syntenic connections (Fig. 5b), and the two basal species (Aquilegia and Amborella) were included in both communities (Fig. 4b), suggesting the genome context (e.g. regulatory elements and adjacent genes) were altered for the ARF genes in the two communities. The nodes clustered in community 19 may correspond to an ancient tandem duplication in the ancestor of angiosperms as a clade of ARF genes from the grasses were evidently separated from other nodes, indicative of more intra than inter connections (Fig. 4b).
Group-III: Group-III contains 216 ARF genes incorporated into network communities 17 and 24 in the synteny network analyses (Figs. 4b and 5c). The phylogenetic profile for group-III genes identified them as forming two well-separated monophyletic clades (delineated as IIIA and IIIB). The group-IIIA (community 24) contains ARF genes from only eudicots and magnoliids and group IIIB (community 17) is angiosperm-wide and recognized as group-IIIB. The species composition analysis of group-IIIA encompassed a core-eudicot duplication shared by superrosids and superasterids, although a magnoliids ARF gene was clustered in this group. ARF genes from basal eudicots are recognized as sister to this duplication clade. Similarly, a duplication clade shared by ARF genes from grasses and one gene from pineapple was conspicuous and likely contributed to the generation of more ARF gene members in group-IIIB in the grasses.
Group-IV: Group-IV contains 282 angiosperm ARF genes that were contained in six major network communities 5, 9, 18, 15, 3 and 6, with community 5 being the largest community and containing genes from multiple groups (Fig. 4b). Network communities 5 and 9 are angiosperm-wide and 18 contains ARF genes from only eudicots. The remaining three communities (3, 6 and 15) were smaller and none formed a high-confidence monophyletic clade which in turn does not support the possibility of ancestral lineage specific transpositions. By aligning the genomic synteny connections onto the phylogenetic profile, two evident clusters of ARF genes in this group were recognized (Fig. 5d). Communities 5, 6 and 15 were clustered into one group and communities 9 and 18 were clustered into another. Both groups were recognized as angiosperm-wide groups suggesting an angiosperm-wide duplication within group-IV, although the duplication topology cannot be easily deduced from the gene tree. In community 9, a cluster of monocot ARF genes contained more connections within the cluster and were separated from other nodes (Fig. 4b), suggesting the possibility of further rounds of gene duplications and losses in the evolutionary past of ARF genes in group IV in angiosperms.
Group-V: Group-V contains a total of 287 angiosperm ARF genes clustered in four synteny communities, 12, 20, 23 and 1 (Fig. 5e), among which communities 12 and 20 were angiosperm-wide, and communities 23 and 1 contain small numbers of monocot ARF genes. By integrating the synteny network and phylogenetic profile analyses, three subgroups could be identified (VA, VB and VC), and consistent with the community network analyses, the nodes in community 12 were phylogenetically separated into two subgroups (groups-VA and -VC). Nodes in communities 23 and 1 were recognized in one monophyletic clade (group-VB). An ancestral transposition in the ancestor of commelinids (including grasses, pineapple and banana genes) was evident in communities 23 and 1 (Fig. 4a, b), while an ARF gene from Spirodela polyrhiza, which is sister to commelinids, was syntenically clustered in community 20.
Group-VI: The group-VI included 295 ARF genes integrated into five syntenic network communities 5, 7, 0, 21 and 22 (Fig. 5f ). Community 5 was angiosperm-wide, community 7 encompassed primarily ARF genes from eudicot and Amborella, community 0 contained ARF genes from monocots and Amborella, and communities 21 and 22 were solely comprised of ARF genes from grasses and crucifers, respectively (Fig. 4a). Mapping the syntenic connections on the phylogenetic tree, the monophyletic clades in grasses (community 21) and crucifers (community 22) were generated and provided phylogenomic evidence for ancestral transposition activities in these two lineages. The ARF genes clustered in community 5 were phylogenetically separated into two distinct clades with some ARF genes from grasses were placed in a basal position in the group-VI phylogeny, while the nodes in community 7 were well-clustered in the family phylogeny.
In the phylogenomic synteny network analyses we employed the maximum-likelihood gene tree generated by IQ-TREE in which more evolutionary models were implemented. We also attempted to reconstruct the ARF gene family phylogenies using RAxML (Additional file 1: Fig. S4), which generated alternative tree topologies, nevertheless, the syntenic community patterns remained constant and the major duplication clades and transposition activities could be consistently captured. Tree uncertainties may make some of the evolutionary processes that generated the ARF gene family elusive, but the synteny network approach appears robust and uncovered evolutionary details and provided more clues for future experimental studies. For example, ARF genes were recurrently duplicated and transposed in specific lineages which suggests that the functions of these transposed genes might reveal novel regulatory elements that were captured in their altered genomic context. The transpositions that we indicated to have occurred in crucifers, legumes, commelinids and grasses were tightly associated with ancestral polyploidy events [41], which generated more possibilities in the gene regulatory network. The ancestral gene duplication together with transpositions could have greatly contributed to the expansion of the auxin regulatory network which would have had important implications in the understanding of the evolutionary processes of current land plants.

Conclusions
In this study, we generated a high-confidence broadscale family phylogeny for ARF genes from augmented genomic and transcriptomic data, from which we summarized the evolutionary history of this focal transcription factor in streptophytes. Based on the family phylogeny, we proposed a six-group classification regime for angiosperm ARF genes. Group IV contains the ARF activators and these genes are well-conserved in all land plant clades. The Group IV subfamily phylogeny also supported the 'hornwort-sister' hypothesis. Genomic synteny network analyses revealed highly conserved genomic syntenies among angiosperm ARF gene loci and within each of the six ARF gene groups. CFinder clique analyses of the ARF gene synteny network identified 25 subnetwork communities, which were further projected onto the six subfamily phylogenies. The analyses suggest that ancient duplications and transpositions have greatly contributed to the diversification of ARF genes in angiosperms. Ancestral lineage-specific transpositions of ARF genes were unveiled in crucifers, legumes, commelinids and grasses in groups I, V and VI, which were considered to have contributed to the functional diversification of gene members within a family [42]. Future studies focusing on non-angiosperm specific lineages should benefit from the evolutionary framework used in this study, especially when more genomes in these plant lineages become available [43].

Family phylogeny construction
To generate reliable sequence alignments for the collected ARF gene-family members, boundaries of the B3 and Auxin-response domains were identified by aligning each of the protein sequences onto the two HMM profiles using hmmalign v3.2.1 [47,48]. Alignments of the two domains were separately refined using muscle v3.8.1551 [49] and concatenated to generate a broadscale sequence alignment for ARF genes. Columns in the alignment with more than 20% gaps were removed using Phyutility v2.2.6 [50].
IQ-TREE v1.6.8 [51] software was employed to reconstruct the maximum likelihood (ML) gene tree. For the obtained broad-scale amino acid alignment, the JTT + R9 model was the best-fit evolutionary model selected by ModelFinder [52] under Bayesian Information Criterion. The SH-aLRT test and ultrafast bootstrap [53] analyses with 1000 replicates were conducted in IQ-TREE to obtain the supporting values for each internal node of the tree. The obtained maximum-likelihood gene trees were visualized and edited using FigTree v1.4.4 (http://tree. bio.ed.ac.uk/softw are/figtr ee/) and iTOL v4.3 (https :// itol.embl.de) [54]. Maximum-likelihood trees for each of the six angiosperm ARF clades using IQ-TREE (including model-selection procedure) were also reconstructed to infer potential duplication nodes by analyzing the detailed clade-specific phylogenies.
The phylogenetic analyses for each of the six ARF groups were performed using both IQ-TREE v1.6.8 [51] and RAxML v8.2.12 [55]. The model selection procedure was performed within IQ-TREE based on the Bayesian information criterion (BIC) and for RAxML analyzes we used the '-m PROTGAMMAAUTO' model with 500 bootstrap replicates. All trees were inspected, but the IQ-TREE algorithm produced better bootstrap support overall for branches ( Fig. 5 and Additional file 1: Fig. S4). Each of the six ARF groups contained multiple synteny network communities and syntenic connections in different communities were plotted using different colors as implemented in the iTOL v4.3 [54].

Genomic synteny network construction
To unveil the genomic syntenic relationships among plants, protein sequences for each of the 52 angiosperm genomes were compared with each other and themselves using Diamond v0.9.22.123 software [56] with an e-value cutoff at 1e − 5. In this way, blastp tables for a total of 52 × 51/2 + 52 = 2704 whole proteome comparisons were generated. Only the top five non-self blastp hits were retained as input for the MCScanX [57] analyses. The ARF gene associated syntenic genomic blocks were extracted and visualized in Cytoscape v3.7.0 [58] and Gephi v0.9.2 [59]. Some ARF syntelogs were truncated or demonstrate absence of signature domains and were not included in our phylogenetic analyses. These truncated ARF genes were classified and labelled (clade I through VI) by comparing with those phylogenetically classified ARF genes. The phylogeny of angiosperm species and the associated paleopolyploidy events were redrawn based on a tree reported earlier by Van de Peer et al. [41] and the APG IV system [60] with minor modifications: the tetraploidy event in cucurbitaceae [61], the pentaploidy of the cotton genome [62], the fern genome duplications [45], the ancestral duplication events in mosses [4,63] and in Caryophyllales [64], were included in the tree.
The ARF syntenic networks were analyzed using CFinder v2.0.6 [38] utilizing the unweighted CPM algorithm and no time limit. All possible k-clique (from 3 to 21) communities were identified for the complete ARF gene syntenic network. We used k = 3 as the clique community threshold and in this scenario one ARF gene (node) involved in a subnetwork community should have at least two connections (edge) with other nodes in the community. Increasing k values made the communities smaller and more disintegrated but also more connected. For illustration purposes, we used different nodal shapes to represent the members from the six ARF groups and different colors to depict specific plant lineages using the Cytoscape v3.7.0 software [58]. For each of the 25 communities, the species composition of the syntelogs were counted and a heatmap was generated using the pheatmap v1.0.10 (https ://githu b.com/raivo kolde /pheat map) package implemented in the R statistical environment.
Additional file 1: Fig. S1. Plant lineages screened for ARF homologues. Fig. S2. Number of Auxin Response Factor genes identified from each of the plant genomes. Fig. S3. Terminal branch length comparison between angiosperm and gymnosperm ARF genes. Fig. S4. Phylogenic and synteny network analyses for each of the six groups of ARFs in angiosperms. Table S1. Annotation and classification of ARF genes in Arabidopsis thaliana.
Additional file 2: Table S2. List and classification of ARF genes analyzed in angiosperms.