Information

13.2: A State-Dependent Model of Diversification - Biology

13.2: A State-Dependent Model of Diversification - Biology



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The models that we will consider in this chapter include trait evolution and associated lineage diversification. In the simplest case, we can consider a model where the character has two states, 0 and 1, and diversification rates depend on those states. We need to model the transitions among these states, which we can do in an identical way to what we did in Chapter 7 using a continuous-time Markov model. We express this model using two rate parameters, a forward rate q01 and a backwards rate q10.

We now consider the idea that diversification rates might depend on the character state. We assume that species with character state 0 have a certain speciation rate (λ0) and extinction rate (μ0), and that species in 1 have potentially different rates of both speciation (λ1) and extinction (μ1). That is, when the character evolves, it affects the rate of speciation and/or extinction of the lineages. Thus, we have a six-parameter model (Maddison et al. 2007). We assume that parent lineages give birth to daughters with the same character state, that is that character states do not change at speciation.

It is straightforward to simulate evolution under our state-dependent model of diversification. We proceed in the same way as we did for birth-death models, by drawing waiting times, but these waiting times can be waiting times to the next character state change, speciation, or extinction event. In particular, imagine that there are n lineages present at time t, and that k of these lineages are in state 0 (and nk are in state 1). The waiting time to the next event will follow an exponential distribution with a rate parameter of:

[ρ = k(q_{01} + λ_0 + μ_0)+(n − k)(q_{10} + λ_1 + μ_1) label{13.1}]

This equation says that the total rate of events is the sum of the events that can happen to lineages with state 0 (state change to 1, speciation, or extinction) and the analogous events that can happen to lineages with state 1. Once we have a waiting time, we can assign an event type depending on probabilities. For example, the probability that the event is a character state change from 0 to 1 is:

[p_{q_{01}} = (n ⋅ q_{01})/ρlabel{13.2}]

And the probability that the event is the extinction of a lineage with character state 1 is:

[p_{μ_1} = dfrac{(n − k)⋅μ_1}{ρ} label{13.3}]

And so on for the other four possible events.

Once we have picked an event in this way, we can randomly assign it to one of the lineages in the appropriate state, with each lineage equally likely to be chosen. We then proceed forwards in time until we have a dataset with the desired size or total time depth.

An example simulation is shown in Figure 13.1. As you can see, under these model parameters the impact of character states on diversification is readily apparent. In the next section we will figure out how to extract that information from our data.

Figure 13.1. Simulation of character-dependent diversification. Data were simulated under a model where diversification rate of state zero (red) is substantially lower than that of state 1 (black; model parameters q01 = 110 = 0.05, λ0 = 0.2, λ1 = 0.8, μ0 = μ1 = 0.05). Image by the author, can be reused under a CC-BY-4.0 license.


State-dependent diversification with HiSSE

BiSSE and MuSSE are powerful approaches for testing the association of a character with diversification rate heterogeneity. However, BiSSE has been shown to be prone to falsely identifying a positive association when diversification rate shifts are correlated with a character not included in the model (Maddison and FitzJohn 2015 Rabosky and Goldberg 2015). One approach to reduce the possibility of falsely associating a character with diversification rate heterogeneity is to incorporate a second, unobserved character into the model (i.e., a Hidden State-Dependent Speciation and Extinction (HiSSE) model see for example Beaulieu and O’Meara (2016)). The changes in the unobserved character’s state represent background diversification rate changes that are not correlated with the oberved character. See for a schematic overview of the HiSSE model, and Table 2 for an explanation of the HiSSE model parameters. Now let’s set up and run a HiSSE analysis in RevBayes.

We will keep this tutorial brief and assume that you have work through the State-dependent diversification with BiSSE and MuSSE.

A schematic overview of the HiSSE model. Each lineage has an observed binary state associated to it: state 0 (blue) or state 1 (red). Furthermore, there is a second, unobserved (hidden), binary character with states A or B. The HiSSE model describes jointly the evolution of both of these two characters a lineage must be in one of four different states: 0A, 0B, 1A, or 1B. We estimate separate speciation and extinction rates for each of these four states. Note that just like BiSSE can easily be extended to MuSSE, RevBayes allows you to extend HiSSE models beyond binary observed and unobserved characters.


Section 13.2: A State-Dependent Model of Diversification

The models that we will consider in this chapter include trait evolution and associated lineage diversification. In the simplest case, we can consider a model where the character has two states, 0 and 1, and diversification rates depend on those states.

We need to model the transitions among these states, which we can do in an identical way to what we did in Chapter 7 using a continuous-time Markov model. We express this model using two rate parameters, a forward rate q01 and a backwards rate q10 .

We now consider the idea that diversification rates might depend on the character state. We assume that species with character state 0 have a certain speciation rate ( λ0 ) and extinction rate ( μ0 ), and that species in 1 have potentially different rates of both speciation ( λ1 ) and extinction ( μ1 ). That is, when the character evolves, it affects the rate of speciation and/or extinction of the lineages. Thus, we have a six-parameter model (Maddison et al. 2007) . We assume that parent lineages give birth to daughters with the same character state, that is that character states do not change at speciation.

It is straightforward to simulate evolution under our state-dependent model of diversification. We proceed in the same way as we did for birth-death models, by drawing waiting times, but these waiting times can be waiting times to the next character state change, speciation, or extinction event. In particular, imagine that there are n lineages present at time t , and that k of these lineages are in state 0 (and nk are in state 1). The waiting time to the next event will follow an exponential distribution with a rate parameter of:

This equation says that the total rate of events is the sum of the events that can happen to lineages with state 0 (state change to 1, speciation, or extinction) and the analogous events that can happen to lineages with state 1. Once we have a waiting time, we can assign an event type depending on probabilities. For example, the probability that the event is a character state change from 0 to 1 is:

And the probability that the event is the extinction of a lineage with character state 1 is:

And so on for the other four possible events.

Once we have picked an event in this way, we can randomly assign it to one of the lineages in the appropriate state, with each lineage equally likely to be chosen. We then proceed forwards in time until we have a dataset with the desired size or total time depth.

An example simulation is shown in Figure 13.1. As you can see, under these model parameters the impact of character states on diversification is readily apparent. In the next section we will figure out how to extract that information from our data.

Figure 13.1. Simulation of character-dependent diversification. Data were simulated under a model where diversification rate of state zero (red) is substantially lower than that of state 1 (black model parameters q01 = 110 = 0.05 , λ0 = 0.2 , λ1 = 0.8 , μ0 = μ1 = 0.05 ). Image by the author, can be reused under a CC-BY-4.0 license.


Other files and links

  • APA
  • Standard
  • Harvard
  • Vancouver
  • Author
  • BIBTEX
  • RIS

In: Systematic Biology , Vol. 68, No. 2, 03.2019, p. 317-328.

Research output : Contribution to journal › Article › peer-review

T1 - Detecting the Dependence of Diversification on Multiple Traits from Phylogenetic Trees and Trait Data

N1 - This work was financially supported by Consejo Nacional de Ciencia y Tecnologia (CVU 385304 L.H.-A.), the Netherlands Organisation for Scientific Research (NWO-VICI grant awarded to R.S.E.) and the Faculty of Science and Engineering and the Groningen Institute for Evolutionary Life Sciences at the University of Groningen (Adaptive Life Program P.V.E.).

N2 - Species diversification may be determined by many different variables, including the traits of the diversifying lineages. The state-dependent speciation and extinction (SSE) framework contains methods to detect the dependence of diversification on these traits. For the analysis of traits with multiple states, MuSSE (multiple-states dependent speciation and extinction) was developed. However, MuSSE and other SSE models have been shown to yield false positives, because they cannot separate differential diversification rates from dependence of diversification on the observed traits. The recently introduced method HiSSE (hidden-state-dependent speciation and extinction) resolves this problem by allowing a hidden state to affect diversification rates. Unfortunately, HiSSE does not allow traits with more than two states, and, perhaps more interestingly, the simultaneous action of multiple traits on diversification. Herein, we introduce an R package (SecSSE: several examined and concealed states-dependent speciation and extinction) that combines the features of HiSSE and MuSSE to simultaneously infer state-dependent diversification across two or more examined (observed) traits or states while accounting for the role of a possible concealed (hidden) trait. Moreover, SecSSE also has improved functionality when compared with its two “parents.” First, it allows for an observed trait being in two or more states simultaneously, which is useful for example when a taxon is a generalist or when the exact state is not precisely known. Second, it provides the correct likelihood when conditioned on nonextinction, which has been incorrectly implemented in HiSSE and other SSE models. To illustrate our method, we apply SecSSE to seven previous studies that used MuSSE, and find that in five out of seven cases, the conclusions drawn based on MuSSE were premature. We test with simulations whether SecSSE sacrifices statistical power to avoid the high Type I error problem of MuSSE, but we find that this is not the case: for the majority of simulations where the observed traits affect diversification, SecSSE detects this.

AB - Species diversification may be determined by many different variables, including the traits of the diversifying lineages. The state-dependent speciation and extinction (SSE) framework contains methods to detect the dependence of diversification on these traits. For the analysis of traits with multiple states, MuSSE (multiple-states dependent speciation and extinction) was developed. However, MuSSE and other SSE models have been shown to yield false positives, because they cannot separate differential diversification rates from dependence of diversification on the observed traits. The recently introduced method HiSSE (hidden-state-dependent speciation and extinction) resolves this problem by allowing a hidden state to affect diversification rates. Unfortunately, HiSSE does not allow traits with more than two states, and, perhaps more interestingly, the simultaneous action of multiple traits on diversification. Herein, we introduce an R package (SecSSE: several examined and concealed states-dependent speciation and extinction) that combines the features of HiSSE and MuSSE to simultaneously infer state-dependent diversification across two or more examined (observed) traits or states while accounting for the role of a possible concealed (hidden) trait. Moreover, SecSSE also has improved functionality when compared with its two “parents.” First, it allows for an observed trait being in two or more states simultaneously, which is useful for example when a taxon is a generalist or when the exact state is not precisely known. Second, it provides the correct likelihood when conditioned on nonextinction, which has been incorrectly implemented in HiSSE and other SSE models. To illustrate our method, we apply SecSSE to seven previous studies that used MuSSE, and find that in five out of seven cases, the conclusions drawn based on MuSSE were premature. We test with simulations whether SecSSE sacrifices statistical power to avoid the high Type I error problem of MuSSE, but we find that this is not the case: for the majority of simulations where the observed traits affect diversification, SecSSE detects this.


Hidden state models improve the adequacy of state-dependent diversification approaches using empirical trees, including biogeographical models

The state-dependent speciation and extinction models (SSE) have recently been criticized due to their high rates of “false positive” results and many researchers have advocated avoiding SSE models in favor of other “non-parametric” or “semi-parametric” approaches. The hidden Markov modeling (HMM) approach provides a partial solution to the issues of model adequacy detected with SSE models. The inclusion of “hidden states” can account for rate heterogeneity observed in empirical phylogenies and allows detection of true signals of state-dependent diversification or diversification shifts independent of the trait of interest. However, the adoption of HMM into other classes of SSE models has been hampered by the interpretational challenges of what exactly a “hidden state” represents, which we clarify herein. We show that HMM models in combination with a model-averaging approach naturally account for hidden traits when examining the meaningful impact of a suspected “driver” of diversification. We also extend the HMM to the geographic state-dependent speciation and extinction (GeoSSE) model. We test the efficacy of our “GeoHiSSE” extension with both simulations and an empirical data set. On the whole, we show that hidden states are a general framework that can generally distinguish heterogeneous effects of diversification attributed to a focal character.

Author contributions

JMB and BCO conceived the hidden-Markov approach applied to SSE models. DSC, BCO, and JMB designed simulations, derived models, implemented model averaging and R code. DSC conducted simulations. JMB conducted empirical analyses. DSC, BCO, and JMB wrote the manuscript.


Hidden state models improve state-dependent diversification approaches, including biogeographical models

The state-dependent speciation and extinction (SSE) models have recently been criticized due to their high rates of “false positive” results. Many researchers have advocated avoiding SSE models in favor of other “nonparametric” or “semiparametric” approaches. The hidden Markov modeling (HMM) approach provides a partial solution to the issues of model adequacy detected with SSE models. The inclusion of “hidden states” can account for rate heterogeneity observed in empirical phylogenies and allows for reliable detection of state-dependent diversification or diversification shifts independent of the trait of interest. However, the adoption of HMM has been hampered by the interpretational challenges of what exactly a “hidden state” represents, which we clarify herein. We show that HMMs in combination with a model-averaging approach naturally account for hidden traits when examining the meaningful impact of a suspected “driver” of diversification. We also extend the HMM to the geographic state-dependent speciation and extinction (GeoSSE) model. We test the efficacy of our “GeoHiSSE” extension with both simulations and an empirical dataset. On the whole, we show that hidden states are a general framework that can distinguish heterogeneous effects of diversification attributed to a focal character.

Table S1: Description of scenarios and parameter values used to simulate the data.

Table S2: Additional 17 models used in empirical study of conifers.

Table S3: Description of scenarios and parameter values used to simulate phylogenetic trees and range distributions under the GeoSSE+extirpation model.

Figure S1: Scheme of the transition rates between rate classes (RC0 to RC4) used for the simulation scenarios with multiple rate classes (Sims B, C, and D).

Figure S2: Proportion of widespread lineages on trees simulated under scenarios E and F.

Figure S3: Summary of model support for simulated scenarios A to D.

Figure S4: Summary of model support for simulated scenarios E to H.

Figure S5: Accuracy of turnover and extinction fraction estimates for simulations scenarios A to D.

Figure S6: Accuracy of net diversification estimates for simulations scenarios A to D.

Figure S7: Results for relative net diversification rates and Akaike model weights (AICw) for simulation scenarios B and C.

Figure S8: Distribution of Akaike weights for the model set fitted to simulation scenarios ext_A to ext_D.

Figure S9: Distribution of parameter values across 100 simulation replicates for each of the scenarios ext_A to ext_D.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.


Hidden state models improve state-dependent diversification approaches, including biogeographical models

The state-dependent speciation and extinction (SSE) models have recently been criticized due to their high rates of “false positive” results. Many researchers have advocated avoiding SSE models in favor of other “nonparametric” or “semiparametric” approaches. The hidden Markov modeling (HMM) approach provides a partial solution to the issues of model adequacy detected with SSE models. The inclusion of “hidden states” can account for rate heterogeneity observed in empirical phylogenies and allows for reliable detection of state-dependent diversification or diversification shifts independent of the trait of interest. However, the adoption of HMM has been hampered by the interpretational challenges of what exactly a “hidden state” represents, which we clarify herein. We show that HMMs in combination with a model-averaging approach naturally account for hidden traits when examining the meaningful impact of a suspected “driver” of diversification. We also extend the HMM to the geographic state-dependent speciation and extinction (GeoSSE) model. We test the efficacy of our “GeoHiSSE” extension with both simulations and an empirical dataset. On the whole, we show that hidden states are a general framework that can distinguish heterogeneous effects of diversification attributed to a focal character.

Table S1: Description of scenarios and parameter values used to simulate the data.

Table S2: Additional 17 models used in empirical study of conifers.

Table S3: Description of scenarios and parameter values used to simulate phylogenetic trees and range distributions under the GeoSSE+extirpation model.

Figure S1: Scheme of the transition rates between rate classes (RC0 to RC4) used for the simulation scenarios with multiple rate classes (Sims B, C, and D).

Figure S2: Proportion of widespread lineages on trees simulated under scenarios E and F.

Figure S3: Summary of model support for simulated scenarios A to D.

Figure S4: Summary of model support for simulated scenarios E to H.

Figure S5: Accuracy of turnover and extinction fraction estimates for simulations scenarios A to D.

Figure S6: Accuracy of net diversification estimates for simulations scenarios A to D.

Figure S7: Results for relative net diversification rates and Akaike model weights (AICw) for simulation scenarios B and C.

Figure S8: Distribution of Akaike weights for the model set fitted to simulation scenarios ext_A to ext_D.

Figure S9: Distribution of parameter values across 100 simulation replicates for each of the scenarios ext_A to ext_D.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.


Compare the rate estimates

Compare the rates estimated when the activity time is the focal character versus when solitariness is the dependent character. You can do this by opening both files in the same tracer window. If you managed to give all the parameters the same name, it is possible to compare the estimates in the Tracer window by highlighting both files.

Explore the estimates of the various parameters. Are any different? Are any the same?

Why do you think you might be seeing this pattern?


References

Freeling M, Scanlon MJ, Fowler JE. Fractionation and subfunctionalization following genome duplications: mechanisms that drive gene content and their consequences. Curr Opin Genet Dev. 201535:110–8. https://doi.org/10.1016/j.gde.2015.11.002.

Soltis PS, Soltis DE. Ancient WGD events as drivers of key innovations in angiosperms. Curr Opin Plant Biol. 201630:159–65. https://doi.org/10.1016/j.pbi.2016.03.015.

Tank DC, Eastman JM, Pennell MW, Soltis PS, Soltis DE, Hinchliff CE, et al. Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications. New Phytol. 2015207(2):454–67. https://doi.org/10.1111/nph.13491.

Van de Peer Y, Mizrachi E, Marchal K. The evolutionary significance of polyploidy. Nat Rev Genet. 201718(7):411–24. https://doi.org/10.1038/nrg.2017.26.

Zhang K, Wang XW, Cheng F. Plant polyploidy: origin, evolution, and its influence on crop domestication. Horticultural Plant J. 20195:231–9.

Cheng F, Wu J, Cai X, Liang J, Freeling M, Wang X. Gene retention, fractionation and subgenome differences in polyploid plants. Nat Plants. 20184:258–68.

Jackson S, Chen ZJ. Genomic and expression plasticity of polyploidy. Curr Opi Plant Biol. 201013(2):153–9. https://doi.org/10.1016/j.pbi.2009.11.004.

Renny-Byfield S, Gong L, Gallagher JP, Wendel JF. Persistence of subgenomes in paleopolyploid cotton after 60 my of evolution. Mol Biol Evol. 201532(4):1063–71. https://doi.org/10.1093/molbev/msv001.

Cheng F, Sun C, Wu J, Schnable J, Woodhouse MR, Liang JL, et al. Epigenetic regulation of subgenome dominance following whole genome triplication in Brassica rapa. New Phytologist. 2016211:288–99.

Li AL, Liu DC, Wu J, Zhao XB, Hao M, Geng SF, et al. mRNA and small RNA transcriptomes reveal insights into dynamic homoeolog regulation of allopolyploid heterosis in nascent hexaploid wheat. Plant Cell. 201426(5):1878–900. https://doi.org/10.1105/tpc.114.124388.

Thomas BC, Pedersen B, Freeling M. Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes. Genome Res. 200616(7):934–46. https://doi.org/10.1101/gr.4708406.

Wang JL, Tian L, Lee HS, Wei NE, Jiang HM, Watson B, et al. Genomewide nonadditive gene regulation in Arabidopsis allotetraploids. Genetics. 2006172(1):507–17. https://doi.org/10.1534/genetics.105.047894.

Alger EI, Edger PP. One subgenome to rule them all: underlying mechanisms of subgenome dominance. Curr Opin Plant Biol. 202054:108–13. https://doi.org/10.1016/j.pbi.2020.03.004.

Freeling M, Woodhouse MR, Subramaniam S, Turco G, Lisch D, Schnable JC. Fractionation mutagenesis and similar consequences of mechanisms removing dispensable or less-expressed DNA in plants. Curr Opin Plant Biol. 201215:131–9.

Edger PP, Smith R, McKain MR, Cooley AM, Vallejo-Marin M, Yuan YW, et al. Subgenome dominance in an interspecific hybrid, synthetic allopolyploid, and a 140-year-old naturally established neo-allopolyploid monkeyflower. Plant Cell. 201729(9):2150–67. https://doi.org/10.1105/tpc.17.00010.

Cheng F, Wu J, Fang L, Sun SL, Liu B, Lin K, et al. Biased gene fractionation and dominant gene expression among the subgenomes of Brassica rapa. PLos One. 20127(5):e36442. https://doi.org/10.1371/journal.pone.0036442.

Pfeifer M, Kugler KG, Sandve SR, Zhan BJ, Rudi H, Hvidsten TR, et al. Genome interplay in the grain transcriptome of hexaploid bread wheat. Science. 2014345(6194):1250091. https://doi.org/10.1126/science.1250091.

Bird KA, VanBuren R, Puzey JR, Edger PP. The causes and consequences of subgenome dominance in hybrids and recent polyploids. New Phytol. 2018220(1):87–93. https://doi.org/10.1111/nph.15256.

Chalhoub B, Denoeud F, Liu SY, Parkin IAP, Tang HB, Wang XY, et al. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science. 2014345(6199):950–3. https://doi.org/10.1126/science.1253435.

Levin DA, Soltis DE. Factors promoting polyploid persistence and diversification and limiting diploid speciation during the K-Pg interlude. Curr Opin Plant Biol. 201842:1–7. https://doi.org/10.1016/j.pbi.2017.09.010.

Salman-Minkov A, Sabath N, Mayrose I. Whole-genome duplication as a key factor in crop domestication. NatPlants. 20162(8). https://doi.org/10.1038/nplants.2016.115.

Vanneste K, Maere S, Van de Peer Y. Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution. Philos Trans R Soc B Biol Sci. 2014369(1648):20130353. https://doi.org/10.1098/rstb.2013.0353.

Leitch AR, Leitch IJ. Genomic plasticity and the diversity of polyploid plants. Science. 2008320(5875):481–3. https://doi.org/10.1126/science.1153585.

Cheng F, Sun RF, Hou XL, Zheng HK, Zhang FL, Zhang YY, et al. Subgenome parallel selection is associated with morphotype diversification and convergent crop domestication in Brassica rapa and Brassica oleracea. Nat Genet. 201648(10):1218–24. https://doi.org/10.1038/ng.3634.

Renny-Byfield S, Rodgers-Melnick E, Ross-Ibarra J. Gene Fractionation and Function in the Ancient Subgenomes of Maize. Mol Biol Evol. 201734(8):1825–32. https://doi.org/10.1093/molbev/msx121.

Wang M, Tu L, Lin M, Lin Z, Wang P, Yang Q, et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat Genet. 201749(4):579–87. https://doi.org/10.1038/ng.3807.

Liu YC, Du HL, Li PC, Shen YT, Peng H, Liu SL, et al. Pan-Genome of Wild and Cultivated Soybeans. Cell. 2020182:162.

Song JM, Guan ZL, Hu JL, Guo CC, Yang ZQ, Wang S, et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat Plants. 20206:34.

Yang X, Lee WP, Ye K, Lee C. One reference genome is not enough. Genome Biol. 201920(1):104. https://doi.org/10.1186/s13059-019-1717-0.

Yu JY, Golicz AA, Lu K, Dossa K, Zhang YX, Chen JF, et al. Insight into the evolution and functional characteristics of the pan-genome assembly from sesame landraces and modern cultivars. Plant Biotechnol J. 201917:881–92.

Zhang L, Cai X, Wu J, Liu M, Grob S, Cheng F, et al. Improved Brassica rapa reference genome by single-molecule sequencing and chromosome conformation capture technologies. Horticulture Res. 20185(1):50. https://doi.org/10.1038/s41438-018-0071-9.

Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, et al. Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato. Cell. 2020182:145–61 e123.

Golicz AA, Bayer PE, Barker GC, Edger PP, Kim H, Martinez PA, et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat Commun. 20167(1):13390. https://doi.org/10.1038/ncomms13390.

Hubner S, Bercovich N, Todesco M, Mandel JR, Odenheimer J, Ziegler E, et al. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat Plants. 20195(1):54–62. https://doi.org/10.1038/s41477-018-0329-0.

Maretty L, Jensen JM, Petersen B, Sibbesen JAN, Liu SY, Villesen P, et al. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature. 2017548:87.

Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R. The microbial pan-genome. Curr Opin Genet Dev. 200515(6):589–94. https://doi.org/10.1016/j.gde.2005.09.006.

Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial "pan-genome". Proc Natl Acad Sci U S A. 2005102(39):13950–5. https://doi.org/10.1073/pnas.0506758102.

Gao L, Gonda I, Sun HH, Ma QY, Bao K, Tieman DM, et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat Genet. 201951:1044.

Zhao Q, Feng Q, Lu HY, Li Y, Wang A, Tian QL, et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat Genet. 201850:279.

Jiao WB, Schneeberger K. Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat Commun. 202011(1):989. https://doi.org/10.1038/s41467-020-14779-y.

Gordon SP, Contreras-Moreira B, Woods DP, Marais DLD, Burgess D, Shu SQ, et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat Commun. 20178(1):2184. https://doi.org/10.1038/s41467-017-02292-8.

Nagaharu U. Genome analysis in Brassica with special reference to the experimental formation of B. napus and peculiar mode of fertilization. Jpn J Bot. 19357:389–452.

Wang XW, Wang HZ, Wang J, Sun RF, Wu J, Liu SY, et al. The genome of the mesopolyploid crop species Brassica rapa. Nat Genet. 201143(10):1035–U1157. https://doi.org/10.1038/ng.919.

Cheng F, Mandakova T, Wu J, Xie Q, Lysak MA, Wang XW. Deciphering the diploid ancestral genome of the mesohexaploid Brassica rapa. Plant Cell. 201325(5):1541–54. https://doi.org/10.1105/tpc.113.110486.

Lye ZN, Purugganan MD. Copy number variation in domestication. Trends Plant Sci. 201924(4):352–65. https://doi.org/10.1016/j.tplants.2019.01.003.

Wu J, Wei K, Cheng F, Li S, Wang Q, Zhao J, et al. A naturally occurring InDel variation in BraA.FLC.b (BrFLC2) associated with flowering time variation in Brassica rapa. BMC Plant Biol. 201212(1):151. https://doi.org/10.1186/1471-2229-12-151.

Belser C, Istace B, Denis E, Dubarry M, Baurens FC, Falentin C, et al. Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps. Nat Plants. 20184(11):879–87. https://doi.org/10.1038/s41477-018-0289-4.

Li PR, Su TB, Zhao XY, Wang WH, Zhang DS, Yu YJ, Bayer PE, Edwards D, Yu SC, Zhang FL. Assembly of the non-heading pak choi genome and comparison with the genomes of heading Chinese cabbage and the oilseed yellow sarson. Plant Biotechnol J. 2021. https://doi.org/10.1111/pbi.13522.

Boutte J, Maillet L, Chaussepied T, Letort S, Aury JM, Belser C, et al. Genome size variation and comparative genomics reveal intraspecific diversity in Brassica rapa. Front Plant Sci. 202011. https://doi.org/10.3389/fpls.2020.577536.

Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017356(6333):92–5. https://doi.org/10.1126/science.aal3327.

Cai X, Wu J, Liang J, Lin R, Zhang K, Cheng F, et al. Improved Brassica oleracea JZS assembly reveals significant changing of LTR-RT dynamics in different morphotypes. Theor Appl Genet. 2020133(11):3187–99. https://doi.org/10.1007/s00122-020-03664-3.

Sun SL, Zhou YS, Chen J, Shi JP, Zhao HM, Zhao HN, et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat Genet. 201850:1289.

Teale WD, Paponov IA, Palme K. Auxin in action: signalling, transport and the control of plant growth and development. Nat Rev Mol Cell Biol. 20067(11):847–59. https://doi.org/10.1038/nrm2020.

Murat F, Louis A, Maumus F, Armero A, Cooke R, Quesneville H, et al. Understanding Brassicaceae evolution through ancestral genome reconstruction. Genome Biol. 201516(1):262. https://doi.org/10.1186/s13059-015-0814-y.

Cheng F, Liang JL, Cai CC, Cai X, Wu J, Wang XW. Genome sequencing supports a multi-vertex model for Brassiceae species. Curr Opin Plant Biol. 201736:79–87. https://doi.org/10.1016/j.pbi.2017.01.006.

Liu SY, Liu YM, Yang XH, Tong CB, Edwards D, Parkin IAP, Zhao MX, Ma JX, Yu JY, Huang SM, et al. The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes. Nat Commun. 20145(3930):3930. https://doi.org/10.1038/ncomms4930.

Perumal S, Koh CS, Jin L, Buchwaldt M, Higgins EE, Zheng C, et al. A high-contiguity Brassica nigra genome localizes active centromeres and defines the ancestral Brassica genome. Nat Plants. 20206(8):929–41. https://doi.org/10.1038/s41477-020-0735-y.

Zhang X, Yue Z, Mei S, Qiu Y, Yang X, Chen X, et al. A de novo genome of a Chinese radish cultivar. Horticultural Plant J. 20151:155–64.

Gao LW, Lyu SW, Tang J, Zhou DY, Bonnema G, Xiao D, et al. Genome-wide analysis of auxin transport genes identifies the hormone responsive patterns associated with leafy head formation in Chinese cabbage. Sci Rep. 20177:42229. https://doi.org/10.1038/srep42229.

Schnable JC, Springer NM, Freeling M. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc Natl Acad Sci U S A. 2011108(10):4069–74. https://doi.org/10.1073/pnas.1101368108.

Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin DC, et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature. 2012492:423.

Emery M, Willis MMS, Hao Y, Barry K, Oakgrove K, Peng Y, et al. Preferential retention of genes from one parental genome after polyploidy illustrates the nature and scope of the genomic conflicts induced by hybridization. PLoS Gen. 201814(3):e1007267. https://doi.org/10.1371/journal.pgen.1007267.

Byrne ME. Networks in leaf development. Curr Opin Plant Biol. 20058(1):59–66. https://doi.org/10.1016/j.pbi.2004.11.009.

Husbands AY, Chitwood DH, Plavskin Y, Timmermans MCP. Signals and prepatterns: new insights into organ polarity in plants. Genes Dev. 200923(17):1986–97. https://doi.org/10.1101/gad.1819909.

Kidner CA, Timmermans MCP. Mixing and matching pathways in leaf polarity. Curr Opin Plant Biol. 200710(1):13–20. https://doi.org/10.1016/j.pbi.2006.11.013.

Townsley BT, Sinha NR. A new development: evolving concepts in leaf ontogeny. Ann Rev Plant Biol. 201263:535–62.

Ge Y, Ramchiary N, Wang T, Liang C, Wang N, Wang Z, et al. Mapping quantitative trait loci for leaf and heading-related traits in chinese cabbage (Brassica rapa L. ssp pekinesis). Horticulture Environ Biotechnol. 201152:494–501.

Inoue T, Kubo N, Kondo T, Hirai M. Detection of quantitative trait loci for heading traits in Brassica rapa using different heading types of Chinese cabbage. J Horticultural Sci Biotechnol. 201590(3):311–7. https://doi.org/10.1080/14620316.2015.11513188.

Allen GC, Flores-Vergara MA, Krasnyanski S, Kumar S, Thompson WF. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide. Nat Protoc. 20061:2320–5.

Pendleton M, Sebra R, Pang AW, Ummat A, Franzen O, Rausch T, et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat Methods. 201512(8):780–6. https://doi.org/10.1038/nmeth.3454.

Grob S, Schmid MW, Grossniklaus U. Hi-C Analysis in Arabidopsis Identifies the KNOT, a Structure with Similarities to the flamenco Locus of Drosophila. Mol Cell. 201455:678–93.

Zimin AV, Puiu D, Luo MC, Zhu TT, Koren S, Marcais G, et al. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res. 201727(5):787–92. https://doi.org/10.1101/gr.213405.116.

Waterhouse RM, Seppey M, Simao FA, Manni M, Ioannidis P, Klioutchnikov G, et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol. 201835:543–8.

Durand NC, Shamim MS, Machol I, Rao SSP, Huntley MH, Lander ES, et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 20163(1):95–8. https://doi.org/10.1016/j.cels.2016.07.002.

Robinson JT, Turner D, Durand NC, Thorvaldsdottir H, Mesirov JP, Aiden EL. Juicebox.js provides a cloud-based visualization system for Hi-C data. Cell Syst. 20186:256.

Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome Biol. 20045(2):R12. https://doi.org/10.1186/gb-2004-5-2-r12.

Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 201920(1):275. https://doi.org/10.1186/s13059-019-1905-y.

Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2009Chapter 4(Unit 4):10.

Besemer J, Borodovsky M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res. 200533(Web Server):W451–4. https://doi.org/10.1093/nar/gki487.

Birney E, Clamp M, Durbin R. GeneWise and genomewise. Genome Res. 200414(5):988–95. https://doi.org/10.1101/gr.1865504.

Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 201129(7):644–U130. https://doi.org/10.1038/nbt.1883.

Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 200331(19):5654–66. https://doi.org/10.1093/nar/gkg770.

Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 20089(1):R7. https://doi.org/10.1186/gb-2008-9-1-r7.

Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 200937(Database):D211–5. https://doi.org/10.1093/nar/gkn785.

Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 201920(1):238. https://doi.org/10.1186/s13059-019-1832-y.

Chen CJ, Chen H, Zhang Y, Thomas HR, Frank MH, He YH, et al. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol Plant. 202013:1194–202.

Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 200735(Web Server):W265–8. https://doi.org/10.1093/nar/gkm286.

Ou SJ, Jiang N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 2018176(2):1410–22. https://doi.org/10.1104/pp.17.01310.

Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 200533(2):511–8. https://doi.org/10.1093/nar/gki198.

Talavera G, Castresana J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol. 200756(4):564–77. https://doi.org/10.1080/10635150701472164.

Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 201430(9):1312–3. https://doi.org/10.1093/bioinformatics/btu033.

Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w(1118) iso-2 iso-3. Fly. 20126(2):80–92. https://doi.org/10.4161/fly.19695.

Chen SF, Zhou YQ, Chen YR. Gu J: fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 201834:884–90.

Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 200925(14):1754–60. https://doi.org/10.1093/bioinformatics/btp324.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 200925(16):2078–9. https://doi.org/10.1093/bioinformatics/btp352.

Eggertsson HP, Jonsson H, Kristmundsdottir S, Hjartarson E, Kehr B, Masson G, et al. Graphtyper enables population-scale genotyping using pangenome graphs. Nat Genet. 201749:1654.

Kronenberg ZN, Fiddes IT, Gordon D, Murali S, Cantsilieris S, Meyerson OS, et al. High-resolution comparative analysis of great ape genomes. Science. 2018360:1085.

Garrison E, Siren J, Novak AM, Hickey G, Eizenga JM, Dawson ET, et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol. 201836:875.

Goel M, Sun HQ, Jiao WB, Schneeberger K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 201920(1):277. https://doi.org/10.1186/s13059-019-1911-0.

Kim D, Landmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 201512(4):357–U121. https://doi.org/10.1038/nmeth.3317.

Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 201920(1):278. https://doi.org/10.1186/s13059-019-1910-1.

Marcais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Computat Biol. 201814(1):e1005944. https://doi.org/10.1371/journal.pcbi.1005944.

Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 201834:3094–100.

Cheng F, Wu J, Fang L, Wang XW. Syntenic gene analysis between Brassica rapa and other Brassicaceae species. Front Plant Sci. 20123. https://doi.org/10.3389/fpls.2012.00198.

Akey JM, Zhang G, Zhang K, Jin L, Shriver MD. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 200212(12):1805–14. https://doi.org/10.1101/gr.631202.

Xu X, Liu X, Ge S, Jensen JD, Hu FY, Li X, et al. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat Biotechnol. 201230(1):105–U157. https://doi.org/10.1038/nbt.2050.

Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, et al. Genome-wide detection and characterization of positive selection in human populations. Nature. 2007449(7164):913–U912. https://doi.org/10.1038/nature06250.

Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 201127(15):2156–8. https://doi.org/10.1093/bioinformatics/btr330.

Cai X, Chang L, Zhang T, Chen H, Zhang L, Lin R, et al. Impacts of allopolyploidization and structural variation on intraspecific diversification in Brassica rapa. Dataset NCBI. 2021 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA730930.

Zhang Z, Zhao WM, Xiao JF, Bao YM, Wang F, Hao LL, et al. Database resources of the BIG Data Center in 2019. Nucleic Acids Res. 201947:D8–D14.

Cai et al. Genome assemblies and annotations of Brassica rapa accessions. 2021. https://doi.org/10.6084/m9.figshare.14571297.v1.

Review history

The review history is available as Additional file 6.

Peer review information

Wenjing She was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.


Hidden state models improve state-dependent diversification approaches, including biogeographical models

The state-dependent speciation and extinction (SSE) models have recently been criticized due to their high rates of “false positive” results. Many researchers have advocated avoiding SSE models in favor of other “nonparametric” or “semiparametric” approaches. The hidden Markov modeling (HMM) approach provides a partial solution to the issues of model adequacy detected with SSE models. The inclusion of “hidden states” can account for rate heterogeneity observed in empirical phylogenies and allows for reliable detection of state-dependent diversification or diversification shifts independent of the trait of interest. However, the adoption of HMM has been hampered by the interpretational challenges of what exactly a “hidden state” represents, which we clarify herein. We show that HMMs in combination with a model-averaging approach naturally account for hidden traits when examining the meaningful impact of a suspected “driver” of diversification. We also extend the HMM to the geographic state-dependent speciation and extinction (GeoSSE) model. We test the efficacy of our “GeoHiSSE” extension with both simulations and an empirical dataset. On the whole, we show that hidden states are a general framework that can distinguish heterogeneous effects of diversification attributed to a focal character.

Table S1: Description of scenarios and parameter values used to simulate the data.

Table S2: Additional 17 models used in empirical study of conifers.

Table S3: Description of scenarios and parameter values used to simulate phylogenetic trees and range distributions under the GeoSSE+extirpation model.

Figure S1: Scheme of the transition rates between rate classes (RC0 to RC4) used for the simulation scenarios with multiple rate classes (Sims B, C, and D).

Figure S2: Proportion of widespread lineages on trees simulated under scenarios E and F.

Figure S3: Summary of model support for simulated scenarios A to D.

Figure S4: Summary of model support for simulated scenarios E to H.

Figure S5: Accuracy of turnover and extinction fraction estimates for simulations scenarios A to D.

Figure S6: Accuracy of net diversification estimates for simulations scenarios A to D.

Figure S7: Results for relative net diversification rates and Akaike model weights (AICw) for simulation scenarios B and C.

Figure S8: Distribution of Akaike weights for the model set fitted to simulation scenarios ext_A to ext_D.

Figure S9: Distribution of parameter values across 100 simulation replicates for each of the scenarios ext_A to ext_D.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.


Watch the video: Directional, Stabilizing, and Diversifying Selection (August 2022).