E-mail: cwd3 psu. Contact information for working groups is provided in the authorship details. Many of the resulting gene models included very long introns relative to other Amborella trichopoda is strongly supported as the single living species of the sister lineage to annotated genomes [for example, mean intron all other extant flowering plants, providing a unique reference for inferring the genome content length is bp in Amborella, compared to , and structure of the most recent common ancestor MRCA of living angiosperms.
Sequencing , and bp in Arabidopsis thaliana, grape the Amborella genome, we identified an ancient genome duplication predating angiosperm Vitis vinifera , and Norway spruce Picea abies , diversification, without evidence of subsequent, lineage-specific genome duplications. Comparisons respectively] Annotated high-confidence between Amborella and other angiosperms facilitated reconstruction of the ancestral angiosperm protein-coding gene models occupied Mb gene content and gene order in the MRCA of core eudicots.
A conservative esti- angiosperm. Transposable elements in Amborella are ancient and highly divergent, with no mate of 17, alternatively spliced protein iso- recent transposon radiations. Genome Assembly and Annotation in monocots and eudicots 23 and has been hy- pothesized to play an important regulatory role in eukaryotic genomes, distinct from the silencing The oldest angiosperm fossils date from to The genome of Amborella was sequenced and of transposons Whereas gene body methyl- million years ago Ma , but the crown age assembled using a whole-genome shotgun ap- ation is not seen in mosses or lycophytes 25 , for the angiosperms has been estimated to be proach that combined more than 23 Gb of single- bisulfite sequence mapping indicates that gene at least Ma 2—7.
S5 , suggesting that it is an ancestral feature found ical dominance before the end of the Cretaceous. Our assembly comprises scaffolds totaling in the the MRCA of flowering plants.
Angiosperms provide estimate of Mb 17, 18 , with a mean scaf- Intragenomic syntenic analysis of Amborella pro- the vast majority of human food and contribute fold length of kb, an N50 length of 4. An Amborella versus Amborella structural sequestration. Understanding angiosperm evolu- S2. Ninety percent of the assembled genome is comparison shows numerous, duplicate colinear tion and diversification is therefore essential to contained in scaffolds larger than 1.
Syntenic blocks contain an av- of the origin and early diversification of angio- 20 , fluorescence in situ hybridization FISH , erage of 10 homeologous gene pairs, and the lon- sperms 8. Most phylogenetic analyses exam- and whole-genome optical mapping Accu- gest block contains 23 gene pairs.
Spe- 17, Despite the different histories of ancient genome structure and gene family evolution cifically, comparisons of the Amborella genome whole-genome duplication WGD; paleopolyploidy , 7, 26— Evidence Modeler 22 was used common ancestor of most eudicots 26—28, Contact information for working groups is provided in the authorship to integrate gene annotations, producing 26, If the Amborella WGD revealed in this study was details.
Additionally, 17, gene models and Vitis. Instead, structural analysis shows a clear www. S6 to S8 , indicating that the duplications discovered through phylogenomics, WGD detected in Amborella is not lineage- we manually curated six large duplicated blocks Ancestral Gene Order in Core Eudicots specific and likely occurred in an ancient common Fig. Phylogenetic analysis of We combined scaffold-level information from ancestor of the two species, thereby confirming syntenic gene pairs from these large blocks Amborella with chromosome-level data from the that the divergence of Amborella predates gamma supports the placement of the epsilon genome eudicot rosid lineages of grape V.
These three species gene pairs , especially when compared to older In summary, Amborella genome structure were chosen because they have retained struc- gene family expansions shared across angiosperm demonstrates no evidence of WGD since this turally similar genomes and clear patterns of or seed plant lineages orthogroups with at lineage diverged from the rest of the angiosperms paralogy among syntenic gene copies fig.
The age distri- at least Ma. However, analyses indicate that enabling us to assign most genes to one of seven bution of the pre-angiosperm gene duplications is paralagous gene copies associated with the epsilon groups of three homeologous chromosomes or bimodal fig.
S17 , with the two peaks correspond- WGD resulted from duplication shortly before segments 26, 27, 30, A comprehensive analy- ing to the same ancestral angiosperm epsilon and the diversification of all living angiosperms 7. Zeta has escaped syntenic detection in sists. Figure 2C shows arrangements that have accumulated since this extant angiosperms and for resolving the timing the orthologous gene alignments between one of hypothesized ancient event more than Ma.
Amborella is sister to all other Tomato extant angiosperms. An overview of land Solanum lycopersicum plant phylogeny is shown, including the rela- Sunflower tionships among major lineages of angiosperms.
Helianthus annuus Representatives with sequenced genomes are Arabidopsis shown for most lineages scientific names in Arabidopsis thaliana parentheses ; however, basal angiosperms all Papaya Gamma Carica papaya of which lack genome sequences except for Eudicots Amborella and nonflowering plant lineages Cacao Theobroma cacao are indicated by their larger group names.
Hy- pothesized polyploidy events in land plant Poplar Populus trichocarpa evolution are overlaid on the phylogeny with Soybean symbols. The red star indicates the common Glycine max ancestor of angiosperms and the evolutionary Peach timing of the epsilon WGD 7. The evolution- Prunus persica ary timing of zeta 7 and gamma 26, 27, 82 Grape polyploidy events are shown with empty and Vitis vinifera purple stars, respectively.
The peach, cacao, Sacred lotus and grape genomes purple text were used Nelumbo nucifera Monocots angiosperms with the Amborella genome to reconstruct the Banana gene order in the pre-gamma core eudicot Musa acuminata Fig. Additional polyploidy events are indi- Rice cated with ellipses. Events supported by genome- Oryza sativa scale synteny analyses are filled, whereas those Maize supported only with frequency distributions of Zea mays paralogous gene pairs Ks or phylogenomic Magnoliids e.
Austrobaileyales Basal Epsilon e. This analysis, which uses Amborella as an outgroup to the three eudicot genomes, A would not have been possible without Amborella or another as yet undiscovered non-eudicot ge- nome that retains a large amount of syntenic signal. The prevalence of WGDs in monocots and other basal angiosperms 34—37 limits the possibility of identifying such genomes.
Similar patterns for all seven ancestral core eudicot chromosomes 17 illustrate the utility of the Amborella genome for reconstructing ancestral genomes within the angiosperms, thus clarifying the divergence of B Grape subgenomes after WGD events.
In the case of the reconstructed core eudicot ancestor, tracking the syntenically retained descendant blocks in the three Amborella rosids reveals a consistent pattern of subgenome dominance fig. This pattern, which governs Grape chr2 the fractionation likelihood of gene triplets gen- Gra erated by the gamma event, is not evident from the 15 3.
Amborella scf9 Ancestral Angiosperm Gene Content 6. We further merged the orthogroups into super-orthogroup clusters representing more inclusively circumscribed gene families The broader circumscription of super- orthogroups allows for the clustering of more diver- gent homologs, thus increasing the likelihood that they represent truly distinct gene families.
Phylo- genetic analyses of super-orthogroups can help to root orthogroup phylogenies and resolve the relationships among related orthogroups. We estimated the ancestral gene content at key nodes in land plant phylogeny and modeled the changes of orthogroups occurring along each branch Fig. The largest changes in gene family content appear to have occurred evolutionarily recently along terminal branches, or Fig.
A High-resolution analysis of Amborella-Amborella intra- are shared among closely related taxa, such as with- genomic syntenic regions putatively derived from the ancestral angiosperm epsilon WGD. Note the in the tomato Solanaceae or crucifer Brassicaceae series of colinear genes between the two regions.
Intragenomic syntenic regions from Amborella are families. Large numbers of orthogroup gains were shown when scaffolds are compared and appear as a series of colinear genes between the two regions. B also inferred along the deeper branches leading to Macrosynteny and microsynteny between genomic regions in Amborella and grape. Top: Macrosynteny all angiosperms new orthogroups using par- patterns between grape and Amborella and within Amborella scaffolds only scaffolds 1 to are simony reconstruction and to grasses new shown.
Each Amborella region aligns with up to three regions in grape that resulted from the gamma hexaploidization event in early core eudicots Syntenic regions within the Amborella genome were orthogroups Fig.
How- derived from the epsilon WGD before the origin of all extant angiosperms 7. An exemplar set of blocks, ever, because this analysis does not include ge- showing two homeologous Amborella regions derived from this early WGD, aligns to three distinct grape nome sequences from ferns and gymnosperms, it regions derived from gamma , with eight parallel regions in total.
Bottom: Microsynteny is shown among cannot distinguish between orthogroups originat- the eight regions noted above. Blocks represent genes with orientation on the same strand blue or ing with euphyllophytes ferns plus seed plants , reverse strand green ; shades represent matching gene pairs. C Gene order alignments between one of seed plants, or angiosperms; consequently, all of the seven hypothesized ancestral core eudicot chromosomes blue bar , the three post-hexaploidization these orthogroups are reconstructed along the stem copies of this chromosome for peach, cacao, and grape chromosomes descending from it top of figure , branch leading to angiosperms.
We sorted the in- and a subset of the Amborella scaffolds green, bottom of figure. Similar configurations were obtained for ferred gene set of the recently published Norway the other six ancestral chromosomes. Indeed, many gene lineages with genes VND7 and NAC also first appeared at this classification, and manually reevaluated the origin inferred to have specific stamen 39 , carpel 39 , time, even though Amborella does not produce of orthogroups around the MRCAs of seed plants and ovule 40 functions apparently arose after vessels, but only tracheids see below.
Analyses of volved in floral timing and initiation CO, SOC1, those with expression elicited by herbivory. Orthogroups lineages of genes involved in reproductive, regu- GO annotations related to reproduction flower for other major components of the floral regulatory latory, and developmental processes.
GO classi- development, reproductive developmental process, pathway are older still, with core components of the fications associated with pollen-pistil interaction pollination, and similar terms , including MADS- pathway present in the ancestral vascular plant and epigenetic modification were enriched in or- box gene lineages see below , were overrepre- for example, LFY, phytochromes, CLV, SKP1, GA1, thogroups arising on the branch leading to seed sented in this set of orthogroups.
After the origin of angiosperms, new genes regulation of gene expression and of cellular, bio- originated or were recruited to refine or more nar- Gene Family Expansions in Angiosperms chemical, and metabolic processes as well as genes rowly parse functions associated with flower devel- Expansions of many gene families are evident in involved in various developmental processes.
This pattern is consistent with the observation Amborella, and phylogenetic analyses indicate that These include genes involved in carpel develop- that the floral organ transcriptional program is cana- such expansions occurred in the ancestral angio- ment CRABS CLAW , endosperm development lized entrained in eudicots relative to the less organ- sperm, accompanying innovations associated with AGL62 , stem cell maintenance in meristems constrained transcriptomes of earlier-diverging, less angiosperm origin.
Once a functional flower evolved, genetic in- reproductive processes. Orthogroups containing novations related to reproductive biology con- genes with specific functions in vessel formation MADS-Box Genes MADS-box transcription factors are among the most important regulators of flower development.
Ancestral recon- Physcomitrella patens The Amborella genome encodes 36 MADS-box struction of gene family Selaginella moellendorffii genes table S19 17 , fewer than in other angio- content in land plants. S19 and S The Amborella framework Fragaria vesca an analogous likelihood- These data support the hypothesis that duplication Populus trichocarpa and diversification of floral MADS-box genes like- based analysis is provided in table S Theobroma cacao ly occurred before the origin of extant angiosperms, Carica papaya despite being tightly associated with the origin of the flower.
S20 45—47 have orthologs Solanum lycopersicum in Amborella, suggesting that they were likely present in the earliest angiosperms and were subsequently Solanum tuberosum lost in eudicots or monocots, respectively. Five GSK3 monocots and eudicots and exhibit specific fea- than those in gymnosperms We conducted a loci that were present in the ancestral angiosperm tures of corresponding seed storage proteins in comprehensive series of yeast two-hybrid assays have subsequently diversified among major angio- basal angiosperms and gymnosperms.
The protein-protein interaction PPI detected only in Amborella fig. Thus, among Terpene Synthase Genes patterns in Amborella fig. S21 are generally con- flowering plants, Amborella alone may contain all Terpenoids constitute the largest class of plant sec- sistent with those in other angiosperms, and show the GSK3 gene lineages that arose before the origin ondary metabolites and play important roles in plant clear differences from those in gymnosperms. For of extant angiosperms, underscoring the impor- ecological interactions The represent duplicate lineages in early angiosperms, angiosperm genome Amborella TPS family contains more than 30 mem- arising after the divergence from the gymnosperms.
These proteins are embedded in the seed plants 59 , is also absent in Amborella fig. This indicates that the occurrence and 4B. B function is essential for the development across the tree of life The 11S legumin-type diversification of this subfamily likely happened of petals and other petal-like organs, which rep- globulins are widespread across the seed plant phy- after the divergence of Amborella from other angio- resent one of the most prominent novel floral fea- logeny [for example, 54—56 ].
Three distinct 11S sperms, although its presence or absence in other tures and exhibit extraordinary diversity in form; legumin-type globulins have been identified in basal angiosperms still needs to be established. Comparisons of duction of C15 terpenoids, which are involved in quence and expression patterns, likely have been the Amborella globulin-coding gene sequences to diverse biological processes including the pro- crucial for functional innovations in the regulatory other seed plants revealed that key cysteine residues duction of floral scents used to attract pollinators.
In contrast, a conserved residue the subsequent radiation of flowering plants. Globally, both 61 , facilitating water transport and mechanical eukaryotes. In contrast to their low copy numbers structural fig. S26 and phylogenetic fig. S27 and support in xylem Most gymnosperms cycads, A B Fig. Amborella as the reference for understanding the molecular heterodimers. In gymnosperms, the proteins of B-class genes can only form developmental genetics of flower evolution.
A A schematic diagram show- homodimers or semi-homodimers that is, heterodimers formed by products of ing the evolutionary history of floral MADS-box genes. Note that all of the eight recently duplicated genes , whereas in the MRCA of extant angiosperms, they major gene lineages existed in the MRCA of extant angiosperms.
B Evolutionary gained the ability to form heterodimers between members of different lineages. S29, A to H. Amborella contains all of the 40 million years after insertion Wicker et al. S25 appears to re- gence between their terminal repeats. TE families. A large class of gymnosperms and other angiosperms.
However, As in the Norway spruce genome 38 , the av- Gypsy LTR retrotransposons with annotated the underlying genes of lignin precursor biosyn- erage age of identifiable transposable elements TEs experienced the most recent burst of activity 0. Classification and in- of the repeat landscape, comprising 2.
Similar to Sorghum posons in the Amborella genome. C Although some LTR were highly degraded, with highly divergent se- transposon families have been quences and missing terminal inverted repeats, active over the last 5 million again suggesting the persistence of identifiable years for example, the large elements over millions of years. The lack of re- Gypsy cluster , the estimated in- cent transposon activity in the Amborella genome sertion dates for the majority of may be due to very effective silencing or the loss elements are more than 10 Ma.
B of active transposases. See 17 for median insertion dates for each cluster table S29s. Most of these C families 19 of them are broadly conserved in other 50 angiosperms, whereas 8 have evidence suggestive Frequency of later losses during angiosperm diversification. The other 63 miRNA families appear to 0. However, the genome exhibits sig- years, including one as recent as , years ago, plant miRNAs. In contrast, none of the conserved nificant among-locus and among-scaffold variance represented by individual NCNAA Fig.
At miRNAs were 23 to 24 nt in size. The frequency in allelic variation fig. Increased LD may contribute to the size Population Genomics and tially Markovian coalescent PSMC 76 model, and persistence of genomic regions affected by Conservation Implications which has recently been applied to plant genomes selective sweeps, if they have occurred.
Further Amborella is restricted to wet tropical forests on PSMC analysis of all 14 Amborella individuals, analyses, with greater population sampling, are isolated slopes of New Caledonia. The genomes including the reference genome, the cultivated Bonn needed to distinguish the relative roles of selec- of 12 individuals of Amborella, sampled from specimen, and the 12 locality-specific exemplars tion, inbreeding, and other processes in shaping nearly all known populations, were resequenced Fig.
Genetic variation among Amborella popula- iation within this endemic species. Amborella may therefore have undergone a tribution.
These results are consistent with an in- dependent analysis and extensive sampling of the 12 populations using microsatellite loci Population genomic A Population genomic analyses tell a tale of dy- diversity in Amborella. Despite Grande-Terre, New Caledonia, its restricted distribution, Amborella maintains sub- the reference genome Santa stantial genetic diversity, with substructure among Cruz , and an additional cul- four population clusters.
As ongoing effects of an tivated individual Bonn , expanding human population for example, min- indicated in the color panel ing operations, fires, urbanization, and invasive right and with bootstrap species introduction threaten the unique flora of clouds for each genome ana- lyzed co-plotted in green. A vertical bar is drawn over the plot at about Conclusions B The phylogenetic position, conservation of ge- , years before present to indicate the timing of nome structure, and absence of a lineage-specific species-wide decline of polyploidy event have made the Amborella genome effective population size, a unique and valuable reference that facilitates in- interpreted as a genetic terpretation of major genomic events in flowering bottleneck.
Amborella has enabled the identifi- clusters of 12 individuals cation of an ancestral gene set for angiosperms of from natural populations. Yeast two-hybrid analysis of functions, eventually leading to modern flowering number of colinear genes per window size to de- MADS-box proteins in Amborella was used to plants.
As the only extant member of an ancient fine putative syntenic regions. In the case of the reconstructed core eudicot ancestor, tracking the syntenically retained descendant blocks in the three Amborella rosids reveals a consistent pattern of subgenome dominance fig.
This pattern, which governs Grape chr2 the fractionation likelihood of gene triplets gen- Gra erated by the gamma event, is not evident from the 15 3. Amborella scf9 Ancestral Angiosperm Gene Content 6.
We further merged the orthogroups into super-orthogroup clusters representing more inclusively circumscribed gene families The broader circumscription of super- orthogroups allows for the clustering of more diver- gent homologs, thus increasing the likelihood that they represent truly distinct gene families. Phylo- genetic analyses of super-orthogroups can help to root orthogroup phylogenies and resolve the relationships among related orthogroups.
We estimated the ancestral gene content at key nodes in land plant phylogeny and modeled the changes of orthogroups occurring along each branch Fig. The largest changes in gene family content appear to have occurred evolutionarily recently along terminal branches, or Fig. A High-resolution analysis of Amborella-Amborella intra- are shared among closely related taxa, such as with- genomic syntenic regions putatively derived from the ancestral angiosperm epsilon WGD.
Note the in the tomato Solanaceae or crucifer Brassicaceae series of colinear genes between the two regions. Intragenomic syntenic regions from Amborella are families. Large numbers of orthogroup gains were shown when scaffolds are compared and appear as a series of colinear genes between the two regions. B also inferred along the deeper branches leading to Macrosynteny and microsynteny between genomic regions in Amborella and grape.
Top: Macrosynteny all angiosperms new orthogroups using par- patterns between grape and Amborella and within Amborella scaffolds only scaffolds 1 to are simony reconstruction and to grasses new shown. Each Amborella region aligns with up to three regions in grape that resulted from the gamma hexaploidization event in early core eudicots Syntenic regions within the Amborella genome were orthogroups Fig.
How- derived from the epsilon WGD before the origin of all extant angiosperms 7. An exemplar set of blocks, ever, because this analysis does not include ge- showing two homeologous Amborella regions derived from this early WGD, aligns to three distinct grape nome sequences from ferns and gymnosperms, it regions derived from gamma , with eight parallel regions in total.
Bottom: Microsynteny is shown among cannot distinguish between orthogroups originat- the eight regions noted above. Blocks represent genes with orientation on the same strand blue or ing with euphyllophytes ferns plus seed plants , reverse strand green ; shades represent matching gene pairs.
C Gene order alignments between one of seed plants, or angiosperms; consequently, all of the seven hypothesized ancestral core eudicot chromosomes blue bar , the three post-hexaploidization these orthogroups are reconstructed along the stem copies of this chromosome for peach, cacao, and grape chromosomes descending from it top of figure , branch leading to angiosperms. We sorted the in- and a subset of the Amborella scaffolds green, bottom of figure.
Similar configurations were obtained for ferred gene set of the recently published Norway the other six ancestral chromosomes. Indeed, many gene lineages with genes VND7 and NAC also first appeared at this classification, and manually reevaluated the origin inferred to have specific stamen 39 , carpel 39 , time, even though Amborella does not produce of orthogroups around the MRCAs of seed plants and ovule 40 functions apparently arose after vessels, but only tracheids see below.
Analyses of volved in floral timing and initiation CO, SOC1, those with expression elicited by herbivory. Orthogroups lineages of genes involved in reproductive, regu- GO annotations related to reproduction flower for other major components of the floral regulatory latory, and developmental processes.
GO classi- development, reproductive developmental process, pathway are older still, with core components of the fications associated with pollen-pistil interaction pollination, and similar terms , including MADS- pathway present in the ancestral vascular plant and epigenetic modification were enriched in or- box gene lineages see below , were overrepre- for example, LFY, phytochromes, CLV, SKP1, GA1, thogroups arising on the branch leading to seed sented in this set of orthogroups.
After the origin of angiosperms, new genes regulation of gene expression and of cellular, bio- originated or were recruited to refine or more nar- Gene Family Expansions in Angiosperms chemical, and metabolic processes as well as genes rowly parse functions associated with flower devel- Expansions of many gene families are evident in involved in various developmental processes.
This pattern is consistent with the observation Amborella, and phylogenetic analyses indicate that These include genes involved in carpel develop- that the floral organ transcriptional program is cana- such expansions occurred in the ancestral angio- ment CRABS CLAW , endosperm development lized entrained in eudicots relative to the less organ- sperm, accompanying innovations associated with AGL62 , stem cell maintenance in meristems constrained transcriptomes of earlier-diverging, less angiosperm origin.
Once a functional flower evolved, genetic in- reproductive processes. Orthogroups containing novations related to reproductive biology con- genes with specific functions in vessel formation MADS-Box Genes MADS-box transcription factors are among the most important regulators of flower development. Ancestral recon- Physcomitrella patens The Amborella genome encodes 36 MADS-box struction of gene family Selaginella moellendorffii genes table S19 17 , fewer than in other angio- content in land plants.
S19 and S The Amborella framework Fragaria vesca an analogous likelihood- These data support the hypothesis that duplication Populus trichocarpa and diversification of floral MADS-box genes like- based analysis is provided in table S Theobroma cacao ly occurred before the origin of extant angiosperms, Carica papaya despite being tightly associated with the origin of the flower. S20 45—47 have orthologs Solanum lycopersicum in Amborella, suggesting that they were likely present in the earliest angiosperms and were subsequently Solanum tuberosum lost in eudicots or monocots, respectively.
Five GSK3 monocots and eudicots and exhibit specific fea- than those in gymnosperms We conducted a loci that were present in the ancestral angiosperm tures of corresponding seed storage proteins in comprehensive series of yeast two-hybrid assays have subsequently diversified among major angio- basal angiosperms and gymnosperms.
The protein-protein interaction PPI detected only in Amborella fig. Thus, among Terpene Synthase Genes patterns in Amborella fig. S21 are generally con- flowering plants, Amborella alone may contain all Terpenoids constitute the largest class of plant sec- sistent with those in other angiosperms, and show the GSK3 gene lineages that arose before the origin ondary metabolites and play important roles in plant clear differences from those in gymnosperms.
For of extant angiosperms, underscoring the impor- ecological interactions The represent duplicate lineages in early angiosperms, angiosperm genome Amborella TPS family contains more than 30 mem- arising after the divergence from the gymnosperms. These proteins are embedded in the seed plants 59 , is also absent in Amborella fig. This indicates that the occurrence and 4B.
B function is essential for the development across the tree of life The 11S legumin-type diversification of this subfamily likely happened of petals and other petal-like organs, which rep- globulins are widespread across the seed plant phy- after the divergence of Amborella from other angio- resent one of the most prominent novel floral fea- logeny [for example, 54—56 ].
Three distinct 11S sperms, although its presence or absence in other tures and exhibit extraordinary diversity in form; legumin-type globulins have been identified in basal angiosperms still needs to be established.
Comparisons of duction of C15 terpenoids, which are involved in quence and expression patterns, likely have been the Amborella globulin-coding gene sequences to diverse biological processes including the pro- crucial for functional innovations in the regulatory other seed plants revealed that key cysteine residues duction of floral scents used to attract pollinators.
In contrast, a conserved residue the subsequent radiation of flowering plants. Globally, both 61 , facilitating water transport and mechanical eukaryotes. In contrast to their low copy numbers structural fig. S26 and phylogenetic fig. S27 and support in xylem Most gymnosperms cycads, A B Fig. Amborella as the reference for understanding the molecular heterodimers. In gymnosperms, the proteins of B-class genes can only form developmental genetics of flower evolution. A A schematic diagram show- homodimers or semi-homodimers that is, heterodimers formed by products of ing the evolutionary history of floral MADS-box genes.
Note that all of the eight recently duplicated genes , whereas in the MRCA of extant angiosperms, they major gene lineages existed in the MRCA of extant angiosperms. B Evolutionary gained the ability to form heterodimers between members of different lineages. S29, A to H. Amborella contains all of the 40 million years after insertion Wicker et al. S25 appears to re- gence between their terminal repeats.
TE families. A large class of gymnosperms and other angiosperms. However, As in the Norway spruce genome 38 , the av- Gypsy LTR retrotransposons with annotated the underlying genes of lignin precursor biosyn- erage age of identifiable transposable elements TEs experienced the most recent burst of activity 0. Classification and in- of the repeat landscape, comprising 2. Similar to Sorghum posons in the Amborella genome.
C Although some LTR were highly degraded, with highly divergent se- transposon families have been quences and missing terminal inverted repeats, active over the last 5 million again suggesting the persistence of identifiable years for example, the large elements over millions of years. The lack of re- Gypsy cluster , the estimated in- cent transposon activity in the Amborella genome sertion dates for the majority of may be due to very effective silencing or the loss elements are more than 10 Ma.
B of active transposases. See 17 for median insertion dates for each cluster table S29s. Most of these C families 19 of them are broadly conserved in other 50 angiosperms, whereas 8 have evidence suggestive Frequency of later losses during angiosperm diversification. The other 63 miRNA families appear to 0. However, the genome exhibits sig- years, including one as recent as , years ago, plant miRNAs. In contrast, none of the conserved nificant among-locus and among-scaffold variance represented by individual NCNAA Fig.
At miRNAs were 23 to 24 nt in size. The frequency in allelic variation fig. Increased LD may contribute to the size Population Genomics and tially Markovian coalescent PSMC 76 model, and persistence of genomic regions affected by Conservation Implications which has recently been applied to plant genomes selective sweeps, if they have occurred.
Further Amborella is restricted to wet tropical forests on PSMC analysis of all 14 Amborella individuals, analyses, with greater population sampling, are isolated slopes of New Caledonia. The genomes including the reference genome, the cultivated Bonn needed to distinguish the relative roles of selec- of 12 individuals of Amborella, sampled from specimen, and the 12 locality-specific exemplars tion, inbreeding, and other processes in shaping nearly all known populations, were resequenced Fig.
Genetic variation among Amborella popula- iation within this endemic species. Amborella may therefore have undergone a tribution.
These results are consistent with an in- dependent analysis and extensive sampling of the 12 populations using microsatellite loci Population genomic A Population genomic analyses tell a tale of dy- diversity in Amborella.
Despite Grande-Terre, New Caledonia, its restricted distribution, Amborella maintains sub- the reference genome Santa stantial genetic diversity, with substructure among Cruz , and an additional cul- four population clusters. As ongoing effects of an tivated individual Bonn , expanding human population for example, min- indicated in the color panel ing operations, fires, urbanization, and invasive right and with bootstrap species introduction threaten the unique flora of clouds for each genome ana- lyzed co-plotted in green.
A vertical bar is drawn over the plot at about Conclusions B The phylogenetic position, conservation of ge- , years before present to indicate the timing of nome structure, and absence of a lineage-specific species-wide decline of polyploidy event have made the Amborella genome effective population size, a unique and valuable reference that facilitates in- interpreted as a genetic terpretation of major genomic events in flowering bottleneck.
Amborella has enabled the identifi- clusters of 12 individuals cation of an ancestral gene set for angiosperms of from natural populations. Yeast two-hybrid analysis of functions, eventually leading to modern flowering number of colinear genes per window size to de- MADS-box proteins in Amborella was used to plants. As the only extant member of an ancient fine putative syntenic regions. These regions were identify heterodimeric PPIs found only in angio- lineage, Amborella provides a unique window into subsequently compared and confirmed using the sperms.
Proteomic and phylogenetic analysis of the earliest events in angiosperm evolution. Blocks seed storage globulin proteins validated protein- determined to represent the pan-angiosperm du- coding gene models as well as examined protein Materials and Methods plication event were further studied using phylo- features that separate angiosperms from earlier genomic methods to ascertain whether duplication land plant lineages.
Sequencing and Assembly patterns on trees concurred with a region-wide du- Plant material for the reference genome sequence plication model.
Population Genomics was obtained from a plant in cultivation since Scaffolds containing up to 10 orthologous To assess the levels and patterns of genetic var- at the University of California at Santa Cruz and paralogous genes in common syntenic con- iation within A. Each of the subgenomes mapped to mapped to the reference genome using BWA. We and Sanger sequenced BAC end sequence reads virtually the whole length of the appropriate used basic population genetic measures to infer were filtered to remove organellar contaminants, reconstructed chromosome.
The reconstructed levels of diversity and applied the PSMC model, reads of short length or poor quality, artificial dupli- genes show a much clearer pattern of pan-rosid originally applied to human and other mammalian cates, and chimeras. After filtering, the read collec- fractionation bias in extant genomes than is ap- genomes, to study the effective population size Ne tion was pooled and assembled with the Roche parent without evidence derived from the Amborella of Amborella over time.
Genetic divergence among Newbler assembler V2. Darwin, in Letter to Hooker, F. Darwin, A. Seward, Protein-coding genes, transposons, and endoge- A global plant gene family classification was Eds. John Murray, London, UK, Bell, D. Soltis, P. Soltis, The age and diversification of the angiosperms re-revisited. Friis, K. Pedersen, P. Crane, Cretaceous package Initial gene model and transposon spruce genome and a large collection of EST as- angiosperm flowers: Innovation and evolution in plant reproduction.
We analyzed the evolutionary history 4. Hilu, D. Quandt, Land plant The PASA annotation pipeline 79 was used to of gain and loss of orthogroups and estimated the evolutionary timeline: Gene effects are secondary to fossil identify and classify alternative splicing events gene families present in the MRCA of living angio- constraints in relaxed clock estimation of age and by aligning Newbler assembled and Sanger sperms using both parsimony and likelihood meth- substitution rates.
Genome-wide analyses were performed, as 5. Doyle, Molecular and fossil evidence on the origin of Seq assemblies. Three small RNA libraries and well as more focused studies of genes with roles in angiosperms. Earth Planet Sci. Zhang, L. Zeng, H. Shan, H. Ma, Highly conserved low-copy nuclear genes as effective markers for cluding miRNAs, phased siRNAs, and hetero- tions in angiosperms and seed plants, we per- phylogenetic analyses in angiosperms.
New Phytol. All resulting formed maximum likelihood phylogenetic analysis — Gene duplications were scored on the 7. Jiao et al.
Nature , 97— Soltis, C. Bell, S. Kim, P. Pos- early evolution of angiosperms. Jansen et al. Six of the largest syn- identifies genome-scale evolutionary patterns.
A Zeiss Axio Imager. M2 fluores- used for manual curation of syntenic duplicates Moore, C. Bell, P. Insights into angiosperm evolution, floral development and chemical biosynthesis from the Aristolochia fimbriata genome. Nature plants. The chromosome-level wintersweet Chimonanthus praecox genome provides insights into floral scent biosynthesis and flowering in winter.
Genome Biology. Prickly waterlily and rigid hornwort genomes shed light on early angiosperm evolution. Nature Plants. Evolution of floral diversity: genomics, genes and gamma. What can lycophytes teach us about plant evolution and development? Modern perspectives on an ancient lineage. In land plants, genetic resources are well established in model … Expand. Polyploidy-associated genome modifications during land plant evolution. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure.
The evolutionary root of flowering plants. Systematic biology. Cretaceous angiosperm flowers: Innovation and evolution in plant reproduction. Conservation and canalization of gene expression during angiosperm diversification accompany the origin and evolution of the flower. Proceedings of the National Academy of Sciences.
0コメント