The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC

doi:10.1093/molbev/mst062

. 2013 Jul;30(7):1675-86.

doi: 10.1093/molbev/mst062. Epub 2013 Apr 4.

The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC

Walid H Gharib ¹, Marc Robinson-Rechavi

Affiliations

PMID: 23558341
PMCID: PMC3684852
DOI: 10.1093/molbev/mst062

The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC

Walid H Gharib et al. Mol Biol Evol. 2013 Jul.

. 2013 Jul;30(7):1675-86.

doi: 10.1093/molbev/mst062. Epub 2013 Apr 4.

Authors

Walid H Gharib ¹, Marc Robinson-Rechavi

Affiliation

¹ Department of Ecology and Evolution, Biophore, Lausanne University, Lausanne, Switzerland.

PMID: 23558341
PMCID: PMC3684852
DOI: 10.1093/molbev/mst062

Abstract

Positive selection is widely estimated from protein coding sequence alignments by the nonsynonymous-to-synonymous ratio ω. Increasingly elaborate codon models are used in a likelihood framework for this estimation. Although there is widespread concern about the robustness of the estimation of the ω ratio, more efforts are needed to estimate this robustness, especially in the context of complex models. Here, we focused on the branch-site codon model. We investigated its robustness on a large set of simulated data. First, we investigated the impact of sequence divergence. We found evidence of underestimation of the synonymous substitution rate for values as small as 0.5, with a slight increase in false positives for the branch-site test. When dS increases further, underestimation of dS is worse, but false positives decrease. Interestingly, the detection of true positives follows a similar distribution, with a maximum for intermediary values of dS. Thus, high dS is more of a concern for a loss of power (false negatives) than for false positives of the test. Second, we investigated the impact of GC content. We showed that there is no significant difference of false positives between high GC (up to ∼80%) and low GC (∼30%) genes. Moreover, neither shifts of GC content on a specific branch nor major shifts in GC along the gene sequence generate many false positives. Our results confirm that the branch-site is a very conservative test.

Keywords: adaptive evolution; base composition; codon model.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.

Fig. 1.

Saturation of the dS with purifying selection (ω = 0.1). The x axis shows the median dS values expected against y axis for the median dS values observed using the branch-site model. The bony vertebrates branch "α" is shown in red, the mammalian branch "β" in yellow, and the euteleostei branch "γ" in purple. Each dot corresponds to each divergence test conducted multiplying the initial tree length by 0.1 up to 512. The gray line shows the expected values. Both plots are same, whereas the lower figure is the zoomed version for more accuracy.

F<sc>ig</sc>. 2.

Fig. 2.

Saturation of the dS with positive selection (ω = 12). The x axis shows the median dS values expected against y axis for the median dS values observed using the branch-site model. The bony vertebrates branch "α" is shown in red, the mammalian branch "β" in yellow, and the euteleostei branch "γ" in purple. Each dot corresponds to each divergence test conducted multiplying the initial tree length by 0.1 up to 512. The gray line shows the expected values. Both plots are the same, whereas the lower figure is the zoomed version for more accuracy.

F<sc>ig</sc>. 3.

Fig. 3.

Power of the branch-site model against sequence divergence under positive selection ω = 12. Various expected parameters values against the power (percent of true positives). The bony vertebrates branch "α" is shown in red, the mammalian branch "β" in yellow, and the euteleostei branch "γ" in purple. Each dot corresponds to each divergence test conducted multiplying the initial tree length with 10% FDR correction. The vertical black lines correspond to the branches with dS of 0.5.

F<sc>ig</sc>. 4.

Fig. 4.

Power of the branch-site model against sequence divergence under purifying selection and neutral evolution. The x axis shows the ratio multiplication of the tree length, and the y axis shows percentage of false positives detection under ω = 0.1, ω = 0.5, and ω = 1, respectively, from upper to lower part of the figure. The bony vertebrates branch "α" is shown in red, the mammalian branch "β" in yellow, and the euteleostei branch "γ" in purple. Each dot corresponds to each divergence test conducted multiplying the initial tree length by 0.1 up to 512 shown on the x axis. The black line shows threshold of 10% FDR correction.

F<sc>ig</sc>. 5.

Fig. 5.

Power of the branch-site model against sequence divergence under positive selection. The x axis shows the ratio multiplication of the tree length, and the y axis shows percentage of true positives detection under ω = 2, ω = 6, and ω = 12, respectively, from upper to lower part of the figure. The bony vertebrates branch "α" is shown in red, the mammalian branch "β" in yellow, and the euteleostei branch "γ" in purple. Each dot corresponds to each divergence test conducted multiplying the initial tree length by 0.1 up to 512 shown on the x axis. The black line shows threshold of 10% FDR correction.

F<sc>ig</sc>. 6.

Fig. 6.

Power of the branch-site model under different GC content and GC shifts. From left to right, graphs show the rate of false positives under purifying selection and neutral evolution (lower left). The graph on the right down corner shows the rate of true positives. The bony vertebrates branch "α" is shown in red, the mammalian branch "β" in yellow, and the euteleostei branch "γ" in purple.

F<sc>ig</sc>. 7.

Fig. 7.

GC variation effect on the branch-site model. Power of the site model (M2a vs. M1a) and (M7 vs. M8a) under various GC content and GC shifts. Linked dots are rates of false positives, whereas single dots are true positives.

F<sc>ig</sc>. 8.

Fig. 8.

Effect of positive selection on nearby branches. (a) The red line (VertVsVert) shows the detection of positive selection on the vertebrates branch "α" as foreground branch, when positive selection is simulated on the same branch "α." The light (VertVsEute) and dark (VertVsMamm) purple lines show the detection of positive selection on the euteleostei branch "γ" and the mammalian branch "β," respectively, for these same data with positive selection simulated on branch "α". (b and c) These are similar figures but with the positive selection fixed to the mammalian branch "β" or the euteleostei branch "γ," respectively. The foreground branches were always set to the three branches "α," "β," and "γ".

F<sc>ig</sc>. 9.

Fig. 9.

Schematic tree of the data set showing the foreground branches tested. The bony vertebrates branch "α" is shown in red, the mammalian branch "β" in yellow, and euteleostei branch "γ" in purple.

See this image and copyright information in PMC

References

1. Anisimova M, Bielawski JP, Yang Z. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol. 2001;18:1585–1592. - PubMed
1. Anisimova M, Bielawski JP, Yang Z. Accuracy and power of bayes prediction of amino acid sites under positive selection. Mol Biol Evol. 2002;19:950–958. - PubMed
1. Anisimova M, Kosiol C. Investigating protein-coding sequence evolution with probabilistic codon substitution models. Mol Biol Evol. 2009;26:255–271. - PubMed
1. Anisimova M, Nielsen R, Yang Z. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics. 2003;164:1229–1236. - PMC - PubMed
1. Anisimova M, Yang Z. Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol. 2007;24:1219–1228. - PubMed

Publication types

Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

[1] Anisimova M, Bielawski JP, Yang Z. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol. 2001;18:1585–1592. - PubMed

[2] Anisimova M, Bielawski JP, Yang Z. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol. 2001;18:1585–1592. - PubMed

[3] Anisimova M, Bielawski JP, Yang Z. Accuracy and power of bayes prediction of amino acid sites under positive selection. Mol Biol Evol. 2002;19:950–958. - PubMed

[4] Anisimova M, Bielawski JP, Yang Z. Accuracy and power of bayes prediction of amino acid sites under positive selection. Mol Biol Evol. 2002;19:950–958. - PubMed

[5] Anisimova M, Kosiol C. Investigating protein-coding sequence evolution with probabilistic codon substitution models. Mol Biol Evol. 2009;26:255–271. - PubMed

[6] Anisimova M, Kosiol C. Investigating protein-coding sequence evolution with probabilistic codon substitution models. Mol Biol Evol. 2009;26:255–271. - PubMed

[7] Anisimova M, Nielsen R, Yang Z. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics. 2003;164:1229–1236. - PMC - PubMed

[8] Anisimova M, Nielsen R, Yang Z. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics. 2003;164:1229–1236. - PMC - PubMed

[9] Anisimova M, Yang Z. Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol. 2007;24:1219–1228. - PubMed

[10] Anisimova M, Yang Z. Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol. 2007;24:1219–1228. - PubMed

Account

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC

Affiliation

The branch-site test of positive selection is surprisingly robust but lacks power under synonymous substitution saturation and variation in GC

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous