General mixed Yule–coalescent model

The general mixed Yule–coalescent model (GMYC), also called the generalized mixed Yule–coalescent model, is a method in molecular phylogenetics and systematics for inferring putative species boundaries from DNA sequence data. It is most often applied to a single genetic locus and requires an ultrametric phylogenetic tree as input, in which all tips are equidistant from the root because branch lengths are scaled to time. The method operates on the premise that deeper branches in a gene tree reflect diversification among species (a macroevolutionary process), whereas shallower branches near the tips reflect the coalescence of lineages within species (a microevolutionary process).

GMYC was introduced in 2006 and became one of the earliest widely used tree-based approaches to molecular species delimitation. Later work refined the likelihood framework, proposed variants that allow more than one threshold shift across the tree, and extended the approach to Bayesian phylogenetic inference. It has been widely used in DNA barcoding, biodiversity surveys and studies of poorly known groups, but has also been criticized for the limitations inherent in single-locus inference and for sensitivity to sampling design and tree reconstruction methods.

Background

Species boundaries in taxonomy and systematics have traditionally been drawn on the basis of morphology. Molecular data added another line of evidence, particularly in groups with few diagnostic morphological characters or large numbers of undescribed forms.[1] GMYC emerged during the growth of DNA-based species delimitation methods as a way to extract species hypotheses from a dated gene tree rather than from a fixed sequence-distance cut-off.[2][3]

The two ideas combined by GMYC come from different parts of evolutionary theory. The Yule process is a simple branching model used to represent diversification among species.[2][3] The process is named after the British statistician Udny Yule, whose 1925 work introduced branching models to the study of evolution.[4][5] Coalescent theory, by contrast, describes how sampled gene lineages within populations trace back to common ancestors through time.[2][3] GMYC assumes that these two processes leave different signatures in the timing of branching events and that the change between them can be estimated from an ultrametric tree.[2][6]

History

The method was introduced by Joan Pons and colleagues in 2006 in a study of sequence-based species delimitation in beetles.[2] That formulation used a single threshold to separate older branches interpreted as between-species diversification from younger branches interpreted as within-species coalescence.[2]

In 2009, Michael T. Monaghan and colleagues proposed a modified version that allowed the transition to vary among lineages rather than forcing the whole tree to share one global threshold.[7] In 2012, Noah Reid and Bryan Carstens published a Bayesian implementation, usually called bGMYC, that uses Markov chain Monte Carlo methods to integrate over uncertainty in tree topology, branch lengths and model parameters.[8] In 2013, Tomochika Fujisawa and Timothy Barraclough published a revised treatment of GMYC, clarified the underlying likelihood framework and evaluated the method with simulations across a wider range of conditions.[3]

Method

GMYC begins with a DNA sequence alignment and an ultrametric gene tree, usually inferred from a single locus and sampled from multiple individuals across one or more putative species.[2][3] The model identifies a "threshold" time point where there is a significant shift in branching rates. It compares a null model, in which a single branching process explains the entire tree, against an alternative in which older branches follow a speciation-like process and younger branches follow a within-species coalescent process.[2][3]

Clusters of tips connected by branching events younger than the threshold are interpreted as belonging to the same species-level entity, while deeper splits are interpreted as divergences between such entities.[2][6] In the original, single-threshold version, the whole tree shares one such transition. Multiple-threshold variants allow different parts of the tree to shift at different times, which may better fit some data sets with uneven divergence histories or different effective population sizes.[7][3]

Although the method is often described as species delimitation, GMYC outputs are usually treated as hypotheses rather than as final taxonomic decisions. Studies using the method commonly compare its results with morphological, geographical and additional molecular evidence, or with other delimitation approaches, before formally recognising species.[6][9]

Applications

GMYC is frequently used as a rapid "first pass" to estimate diversity in large datasets, particularly when morphological differences are subtle or taxonomic expertise is limited.[6][1] Beyond its original application to beetles,[2] it has been used in broad biodiversity inventories of taxa such as Madagascar's fauna,[7] Philippine round-leaf bats,[9] and various butterfly groups.[6]

Because it can be run on a single locus, GMYC has been attractive in DNA barcoding and other projects where many samples are available but multilocus or genomic data are not.[6][10] It has also been used in phylogeographic and community-level studies as a way to convert sequence variation into operational species hypotheses for downstream analyses.[7][8]

Limitations and comparison with other methods

GMYC has several well-known limitations, primarily categorized into phylogenetic error and biological violations. Because the method relies on a single locus, the resulting gene tree may not accurately reflect the true species history due to incomplete lineage sorting.[9][11] Furthermore, factors such as population structure, uneven sampling, or recent radiations can lead to "over-splitting," where the model identifies individual populations as distinct species.[3][10]

Some studies have found that the single-threshold model is more reliable than the multiple-threshold variant under a range of simulated conditions, even though the latter can fit some data sets better.[3] Other work has shown that GMYC performs poorly on species-poor data sets and can be sensitive to how the tree was inferred.[12][13] For these reasons, later reviews have generally treated GMYC as a useful exploratory method rather than a stand-alone solution to species delimitation.[1][10]

GMYC is commonly compared with other single-locus tree-based methods such as the Poisson tree processes approach, as well as with multilocus methods based on the multispecies coalescent.[11][1] Comparative studies have often found that no one method is best in every scenario, but that GMYC tends to perform less reliably than multilocus coalescent methods when divergence is shallow, gene flow is ongoing or the input tree is poorly supported.[11][10]

Software

Maximum-likelihood GMYC analyses have been distributed in the R package splits.[14] GMYC has also been implemented in web-based and comparative species-delimitation workflows, including interfaces associated with the development of Poisson tree processes (PTP) methods.[15] The Bayesian implementation is known as bGMYC.[8] In 2021, the package P2C2M.GMYC was introduced to assess whether empirical data sets violate the assumptions of the model.[16]

References

  1. ^ a b c d Flot, Jean-François (2015). "Species delimitation's coming of age". Systematic Biology. 64 (6): 897–899. doi:10.1093/sysbio/syv071.
  2. ^ a b c d e f g h i j Pons, Joan; Barraclough, Timothy G.; Gomez-Zurita, Jesus; Cardoso, Anabela; Duran, Daniel P.; Hazell, Steaphan; Kamoun, Sophien; Sumlin, William D.; Vogler, Alfried P. (2006). "Sequence-based species delimitation for the DNA taxonomy of undescribed insects". Systematic Biology. 55 (4): 595–609. doi:10.1080/10635150600852011.
  3. ^ a b c d e f g h i Fujisawa, Tomochika; Barraclough, Timothy G. (2013). "Delimiting species using single-locus data and the generalized mixed Yule coalescent approach: a revised method and evaluation on simulated data sets". Systematic Biology. 62 (5): 707–724. doi:10.1093/sysbio/syt033. PMC 3739884. PMID 23681854.
  4. ^ Yule, G.U. (1925). "A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S." Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character. 213 (402–410): 21–87. doi:10.1098/rstb.1925.0002.
  5. ^ Yates, Frank (1952). "George Udny Yule 1871–1951". Obituary Notices of Fellows of the Royal Society. 8 (21): 308–323. doi:10.1098/rsbm.1952.0020.
  6. ^ a b c d e f Talavera, Gerard; Dincă, Vlad; Vila, Roger (2013). "Factors affecting species delimitations with the GMYC model: insights from a butterfly survey". Methods in Ecology and Evolution. 4 (12): 1101–1110. doi:10.1111/2041-210X.12107.
  7. ^ a b c d Monaghan, Michael T.; Wild, Ruth; Elliot, Miranda; Fujisawa, Tomochika; Balke, Michael; Inward, Daegan J.G.; Lees, David C.; Ranaivosolo, Ravo; Eggleton, Paul; Barraclough, Timothy G.; Vogler, Alfried P. (2009). "Accelerated species inventory on Madagascar using coalescent-based models of species delineation". Systematic Biology. 58 (3): 298–311. doi:10.1093/sysbio/syp027.
  8. ^ a b c Reid, Noah M.; Carstens, Bryan C. (2012). "Phylogenetic estimation error can decrease the accuracy of species delimitation: a Bayesian implementation of the general mixed Yule-coalescent model". BMC Evolutionary Biology. 12 (1) 196. doi:10.1186/1471-2148-12-196. PMC 3503838. PMID 23031350.
  9. ^ a b c Esselstyn, Jacob A.; Evans, Ben J.; Sedlock, Jodi L.; Anwarali Khan, Faisal Ali; Heaney, Lawrence R. (2012). "Single-locus species delimitation: a test of the mixed Yule–coalescent model, with an empirical application to Philippine round-leaf bats". Proceedings of the Royal Society B: Biological Sciences. 279 (1743): 3678–3686. doi:10.1098/rspb.2012.0705. PMC 3415896. PMID 22764163.
  10. ^ a b c d Dellicour, Simon; Flot, Jean-François (2018). "The hitchhiker's guide to single-locus species delimitation". Molecular Ecology Resources. 18 (6): 1234–1246. doi:10.1111/1755-0998.12908. hdl:2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/277278.
  11. ^ a b c Luo, Arong; Ling, Cheng; Ho, Simon Y. W.; Zhu, Chao-Dong (2018). "Comparison of methods for molecular species delimitation across a range of speciation scenarios". Systematic Biology. 67 (5): 830–846. doi:10.1093/sysbio/syy011. PMC 6101526. PMID 29462495.
  12. ^ Dellicour, Simon; Flot, Jean-François (2015). "Delimiting species-poor data sets using single molecular markers: a study of barcode gaps, haplowebs and GMYC". Systematic Biology. 64 (6): 900–908. doi:10.1093/sysbio/syu130.
  13. ^ Tang, Cuong Q.; Humphreys, Aelys M.; Fontaneto, Diego; Barraclough, Timothy G. (2014). "Effects of phylogenetic reconstruction method on the robustness of species delimitation using single‐locus data". Methods in Ecology and Evolution. 5 (10): 1086–1094. doi:10.1111/2041-210X.12246. PMC 4374709.
  14. ^ "gmyc: Optimizes genetic clusters using the generalized mixed Yule coalescent approach". rdrr.io. Retrieved 5 March 2026.
  15. ^ Zhang, Jiajie; Kapli, Paschalia; Pavlidis, Pavlos; Stamatakis, Alexandros (2013). "A general species delimitation method with applications to phylogenetic placements". Bioinformatics. 29 (22): 2869–2876. doi:10.1093/bioinformatics/btt499. PMC 3810850. PMID 23990417.
  16. ^ Fonseca, Emanuel M.; Duckett, Drew J.; Carstens, Bryan C. (2021). "P2C2M.GMYC: An R package for assessing the utility of the Generalized Mixed Yule Coalescent model". Methods in Ecology and Evolution. 12 (3): 487–493. doi:10.1111/2041-210X.13541.