Skip to main content

Phylo-rs: an extensible phylogenetic analysis library in rust

BMC Bioinformatics volume 26, Article number: 197 (2025) Cite this article

Abstract

Background

The advent of next-generation and long-read sequencing technologies has provided an ever-increasing wealth of phylogenetic data that require specially designed algorithms to decipher the underlying evolutionary relationships. As large-scale data become increasingly accessible, there is a concomitant need for efficient computational libraries that facilitate the development and dissemination of specialized algorithms for phylogenetic comparative biology.

Results

We introduce Phylo-rs: a fast, extensible, general-purpose library for phylogenetic analysis and inference written in the Rust programming language. Phylo-rs leverages a combination of speed, memory-safety, and native WebAssembly support offered by Rust to provide a robust set of memory-efficient data structures and elementary phylogenetic algorithms. Phylo-rs focuses on the efficient and convenient deployment of software aimed at large-scale phylogenetic analysis and inference. Scalability analysis against popular libraries shows that Phylo-rs performs comparably or better on key algorithms. We utilized it to assess the phylogenetic diversity of influenza A virus in swine, identifying virus groups that are undergoing evolutionary expansion that could be targeted for control through multivalent vaccines. Additionally, we used Phylo-rs to enhance phylogenetic inference by visualizing tree space from Markov chain Monte Carlo (MCMC) Bayesian analysis, efficiently computing approximately five billion tree pair distances to evaluate convergence and select MCMC runs for genomic epidemiology.

Conclusion

Phylo-rs enables the design and implementation of cutting-edge software for phylogenetic analysis, thereby facilitating the application and dissemination of theoretical advancements in biology. Phylo-rs is available under an open-source license on GitHub at https://github.com/sriram98v/phylo-rs, with documentation available at https://docs.rs/phylo/latest/phylo/.

Peer Review reports

Background

Phylogenetic trees, or phylogenies, are fundamental to evolutionary biology as they represent hypotheses about the relationships between different taxonomic groups, benefiting diverse disciplines from agronomy [1] and conservation biology [2] to medical sciences [3] and epidemiology [4]. Recent advances in next-generation and long-read sequencing technologies [5, 6] have improved access to large-scale genomic data and phylogenies. The scale of these data and phylogenetic trees necessitates efficient and effective computational libraries that implement specialized algorithms to analyze phylogenies and uncover hidden statistics and relationships between taxonomic groups [7].

Current phylogenetic libraries have, at times, struggled to keep pace with the demands of large-scale phylogenetic analysis. Existing libraries often make trade-offs between runtime efficiency and developmental ease based on the chosen language. Software implemented in libraries like Dendropy [8], TreeSwift [9], phytools [10] and ape [11] offer simple and intuitive syntax at the cost of the efficiency, low-level control, and functionality necessary for large-scale phylogenetic analysis. In contrast, implementations in libraries like Genesis [12], CompactTree [13] and Gotree [14] offer memory and runtime efficiency but lack the memory-safety and security features of modern programming languages [15, 16].

Rust is a modern programming language that leverages speed and memory-safety with high-level syntactical features. Rust is compiled with LLVM [17], providing optimal speed with a low memory footprint. Additionally, Rust supports automatic type inference at compile time, reducing the verbosity of written code. The key feature of Rust is the concept of ownership and borrowing of variables, which enables Rust to infer the lifetime of data stored in memory automatically. This eliminates the overhead of online memory management and completely eradicates common memory errors such as segmentation faults. Concomitantly, ownership enforces thread-safety, preventing race conditions in multi-threaded code. These features make Rust attractive for applications in bioinformatics.

We introduce Phylo-rs, a versatile phylogenetic library that provides an extensible foundation of data structures and algorithms for phylogenetic analysis and inference implemented in Rust [18]. Phylo-rs utilizes Rust’s modern programming language features, delivering high-performance software while ensuring memory-safety and maintainable code. Additionally, Phylo-rs provides native WebAssembly (WASM) support, offering a highly portable and compact compilation target for software [19]. This enables access to software written using Phylo-rs on web browsers, eliminating system compatibility issues and narrowing the gap between cutting-edge research and practical application. To our knowledge, Phylo-rs is the first comprehensive phylogenetic analysis library written in Rust.

The structure of this paper is organized as follows. The Methods section outlines the library’s internal structure and highlights additional features that assist researchers in building and deploying high-performance software for phylogenetic research. The Results section provides a comparative analysis that emphasizes the efficiency of Phylo-rs relative to other popular libraries. Following that, the Experimental Evaluation section details two computation-intensive phylogenetic analyses using real experimental data, demonstrating the utility and applicability of Phylo-rs. Finally, in the Conclusion and Future Work section, we summarize the work presented in this article and discuss future directions for the development of this library.

Implementation

At a high level, phylogenies in Phylo-rs are implemented as Rust ‘traits’ that describe their behavior and functionality while making no assumptions on how they are represented in memory. These traits allow using any data structure, also called structs, to represent phylogenies. Structs require the implementation of only a few basic methods to gain access to several iterators, operators, and functions. This includes tree traversals, simulations, distance metrics, edit operations, and file I/O. These traits can be inherited by other user-defined traits, enabling seamless extensions to existing methods and convenient implementation of new algorithms, as shown in Fig. 1.

Fig. 1

A trait dependency graph showing how behavior is shared between objects that build up to a phylogenetic tree. Meta tree nodes, stat tree nodes, and weighted tree nodes extend the behavior of a rooted tree node to manipulate the meta-annotation, stat-annotation, and weight-annotation of a node, respectively. Similarly, a rooted meta tree and a rooted stat tree extend the behaviors of a rooted tree, and finally, a phylogenetic tree extends the behavior of all the defined trees

Phylo-rs eliminates redundant memory usage by yielding references instead of deep copies. Phylo-rs enforces memory-safety at compilation, which secures software from memory vulnerabilities. Memory-safety is ensured in Phylo-rs by assigning object lifetimes; tree components are retained in memory for as long as the tree itself, eliminating memory-related errors or vulnerabilities.

Classical analyses of phylogenies require the pairwise comparison of trees using established metrics such as the Robinson-Foulds metric [20], cophenetic distances [21], and cluster affinity distance [22]. Phylo-rs offers functions that implement the most efficient known algorithm [23] to compute these distances.

Many phylogenetic inference algorithms employ tree edit operations [24, 25] in algorithms aimed at inferring the optimal phylogenetic history of a set of taxa. In line with that, Phylo-rs provides traits to perform tree edit operations such as Subtree Pruning and Regrafting [20], Tree Bisection and Reconnection [26], and Nearest Neighbor Interchange [27].

Phylo-rs supports the widely used Newick [28] encoding for phylogenies, including constructing and translating trees from live streams of ASCII data over web-based and multi-threaded ports. Phylo-rs implements a Newick trait that can be extended to cloud-based applications. The Newick trait can also be extended to support numerous file formats, such as the Nexus format, without making any metadata structure specifications.

Phylo-rs is furnished with an intuitive tree-like struct that implements all the traits of phylogenies, which is fully detailed in the official Phylo-rs documentation. Phylo-rs documents the trade-offs for every method, providing links to alternative methods that achieve the same results differently, where possible. Traits are automatically tested using the standard tree struct via continuous integration and are benchmarked at every stable release.

Phylo-rs is equipped with additional features to enable researchers to implement algorithms for large-scale analysis seamlessly. Each feature can be enabled or disabled at compilation time, depending on the infrastructure of the target hardware.

Multi-threading: Phylo-rs delivers multi-thread support by parallelizing its iterators while guaranteeing data-race freedom. Analyses that require independent computations for each vertex of a phylogeny can be executed simultaneously. Data parallelism can be highly beneficial in large-scale studies where phylogenies with tens of thousands of taxa can be analyzed efficiently by sharing the computational workload between numerous CPUs.

Single Instruction, Multiple Data: Phylo-rs permits parallelization of bit-level operations on single-CPU environments through the use of Single Instruction, Multiple Data (SIMD). SIMD has been frequently used to improve application performance in a variety of fields [29, 30], with cases achieving a 10x speedup [29]. Phylo-rs utilizes SIMD when inferring and enumerating bipartitions of the taxa induced by a phylogeny. Phylo-rs computes the overlap between two clusters through parallelized bit-level operations on the same core by representing clusters as bit-strings.

WASM: Phylo-rs achieves platform interoperability, ease of use, and effortless distribution by supporting WASM as a compilation target. WASM is a compact binary instruction format for stack-based virtual machines [19] and can be called from JavaScript via Node.js or as a command line interface application. With WASM support, Phylo-rs has three major advantages over other analytical libraries. Firstly, Phylo-rs is safe, as users are protected by software sandboxed virtual environments, protecting them from any damage from running malicious code. Secondly, Phylo-rs is fast as low-level code generated by compilers is optimized ahead of time, allowing the code to fully utilize machine hardware. Further, WASM supplies users with efficient tools that overcome the inefficient runtimes traditionally seen with sandboxed applications. Thirdly, Phylo-rs is very portable as the low-level code compiled to WASM as a single architecture targeted for the Web can run across various browsers, operating systems, and hardware types.

As such, WASM is an excellent alternative to standard Graphical User Interface applications and provides a robust platform for disseminating bioinformatic tools and applications [31, 32]. User interfaces can be standardized using any modern web browser, reducing the redundant graphical overhead of installed applications. Analytic tools written with Phylo-rs can be shared as web apps with built-in graphical interfaces and intuitive visualizations using modern graphical libraries.

Results

We present a scalability analysis highlighting the memory and runtime performance of Phylo-rs relative to popular libraries, namely, Dendropy [8], Gotree [14], TreeSwift [9], Genesis [12], CompactTree [13], and ape [11]. We also include phylotree, another phylogenetic library written in Rust.

Each comparison was performed on an Intel(R) Core(TM) i7-10700K 3.80GHz CPU running Arch Linux v6.6.28-2-lts and was executed on a single thread. In Sect. Runtime analysis, we present a scalability analysis of the runtime of the implementation of several key algorithms on a fixed set of simulated trees described in Comparative Analysis. Then in Sect. Memory analysis, we evaluate the memory efficiency of the competing libraries on trees with increasing sizes. We provide all scripts used for testing the tools and creating the plots shown here in the scalability directory of the official repository.

Runtime analysis

We compare Phylo-rs with other popular phylogenetic libraries using a runtime analysis that contrasts the mean runtime of six foundational algorithms commonly employed in phylogenetic analyses [24, 25]: (i) computing the Robinson-Foulds metric (RF), (ii) retrieving the Least Common Ancestor (LCA), (iii) tree traversals in pre- and post-order for vertices (VT) and edges (ET), (iv) subtree extraction and contraction (TC), (v) simulating random trees using the Yule evolutionary model (YTS), and (vi) applying the Nearest Neighbor Interchange (NNI) operation.

We conducted 1000 iterations for each implementation with a precision of \(\pm 12\) ns on randomly simulated phylogenetic trees with varying numbers of taxa, starting from 200 and going up to 10000 taxa in increments of 200. We exclude libraries that did not provide an implementation. The runtimes for each algorithm were recorded internally using the respective programming language time utilities; all runtimes were calculated with Rust standard library for Phylo-rs and phylotree, Python timeit for Dendropy [8] and TreeSwift [9], the system time utility for ape [11] and Gotree [14], and the chrono library for CompactTree [13] and Genesis [12], using identical trees for each implementation of the same algorithm. Benchmarking with timeit and R scripts entails an overhead of approximately 2 ms for loading the binaries and virtual environments, which was excluded from the recorded runtimes.

Fig. 2

A scalability analysis showing the runtime of A computing tree traversals, B computing Least Common Ancestor, C computing Nearest Neighbor Interchange, D simulating trees under the Yule evolutionary model, E contracting a tree by some subset of taxa, and F computing the Robinson-Foulds distance for pairs of randomly generated tree under the Yule speciation model. Phylo-rs shows similar runtimes to phylotree in simulating trees under the Yule evolutionary model, and outperforms all competing libraries in all other implementations. Phylo-rs shows a near 100x speedup in tree traversals and least common ancestor retrievals, and a 10x speedup in computing the Robinson-Foulds distance, Nearest Neighbor Interchange, and tree contraction

The results of the runtime scalability analysis are summarized in Fig. 2. Notably, Phylo-rs achieves a significant speedup in computing the RF distance between a pair of trees compared to all the competing libraries; Phylo-rs achieves a 10x speedup compared to Dendropy and phylotree in computing the RF distance between trees. Phylo-rs shows a consistent 100x speedup in tree traversals compared to all the competing libraries. In the simulation of Yule trees, Phylo-rs maintains a similar runtime to that of phylotree, but is still slower than gotree by a factor of 10. Phylo-rs shows a 100x speedup in performing an NNI operation on a tree compared to gotree. Phylo-rs can also compute the LCA for a pair of nodes faster than all competing libraries.

Additionally, Fig. 2 shows that the implementations of all key algorithms in Phylo-rs outperform the implementations of the same algorithms in Dendropy and Treeswift. Lastly, Fig. 2 indicates that there are more methods natively implemented in Phylo-rs than those compared in the previous section. These operations are fundamental components of many popular algorithms used in practice, including maximum likelihood estimation [24, 25] and Bayesian inference [33]. The improved runtimes indicate that Phylo-rs can significantly reduce the time required to perform large-scale phylogenetic analyses, making it a more efficient choice for researchers and practitioners. Phylo-rs can be easily integrated into various workflows and pipelines by providing a broader range of fundamental operations. This makes it appealing for researchers and practitioners working on diverse phylogenetic tasks.

Fig. 3

Memory utilization for reading trees from files containing newick strings with randomly generated topology. In all cases, all libraries except ape and CompactTree require more memory to read and store a tree. Relatively, Phylo-rs maintains a significantly smaller footprint that a majority of the competing libraries. Note that when the number of taxa exceeds 50K and 100K, Dendropy and Treeswift are unable to read the trees as the depth of trees with over 50K taxa exceeds the Python recursion depth limit, respectively, and were hence omitted for larger trees

Memory analysis

We compare and contrast the memory utilization of Phylo-rs with competing libraries using a scalability analysis where we exhibit the memory utilization of each library to read newick encoded trees of varying sizes from plain-text files. We conducted 1000 iterations for each implementation on randomly simulated phylogenetic trees with varying numbers of taxa, starting from 1000 taxa and going up to 1 million taxa. All benchmarks we conducted using the GNU time utility, where we recorded the mean memory utilization of each library over 1000 iterations.

Figure 3 highlights the mean memory utilized by each library in reading a newick encoded tree from a plain-text file. Phylo-rs maintains a low memory footprint relative to all the compared libraries except CompactTree. More popular libraries like Treeswift and Dendropy exhibit high memory utilization. Notably, Treeswift and Dendropy cannot read trees with over 50K and 100K taxa, respectively, as these trees exceed the Python depth recursion limit and were excluded from the analysis.

In summary, Phylo-rs exhibits competitive performance and utility compared to other popular libraries, with more standard algorithms available out-of-the-box and without sacrificing runtime or memory efficiency. Furthermore, Phylo-rs offers more flexibility with its platform interoperability and WASM support, making it an attractive alternative for developing and disseminating large-scale phylogenetic analysis tools.

Experimental evaluation

We demonstrate the utility of Phylo-rs using two examples of computationally demanding phylogenetic analyses with real experimental data. All results and corresponding visualizations presented in this section can be reproduced on a typical desktop PC by following the instructions in the official GitHub repository at https://github.com/sriram98v/phylo-rs.

Quantifying phylogenetic diversity for influenza A virus control

We quantified the phylogenetic diversity (PD) [34] of the H1 subtype influenza A virus (IAV) in swine collected between the years 2015 and 2022. The H1 subtype of swine IAV in the United States has at least 11 genetically distinct clades of viruses [35]. Controlling IAV transmission relies upon vaccination and designing optimal vaccination strategies requires a detailed analysis of the genetic diversity of the circulating viruses [36, 37].

Fig. 4

Visualization of variation in phylogenetic diversity of the H1 subtype influenza A virus (IAV) collected between the years 2015 and 2022. The phylogenetic clades 1B.2.1 and 1A.1.1.3 demonstrated an almost linear increase in phylogenetic diversity across the years indicating evolution of the pathogen with increases in genetic diversity that may reduce the efficacy of vaccine control strategies. The phylogenetic clades 1B.2.2.2 and 1A.4 demonstrated a decline in phylogenetic diversity, suggesting that vaccine control measures may be designed with a single antigenic component to effectively prevent infection and transmission. The phylogenetic diversity of a tree at each year was computed as the Faith Index [34] implemented in Phylo-rs

To quantify diversity dynamics, we downloaded all 8241 publicly available IAV hemagglutinin (HA) sequences from the USDA IAV in the swine surveillance system collected between 2015–2022. All sequences were classified into one of the named swine IAV clades using octoFLU v.1.0.0 [35, 38]. We aligned the nucleotide sequences with mafft v.7.525 and inferred a maximum likelihood tree using IQ-Tree v2.2.6 [25] under the generalized time-reversible (GTR) substitution model with empirical base frequencies and five free-rate categories [39]. We computed PD for each named clade detected within each year using Phylo-rs and visualized the resulting dynamics in Figure 4. These data indicate that the 1B.2.1 and 1A.1.1.3 clades demonstrated a steady increase in PD across the years, whereas other clades, e.g., 1B.2.2.1 and 1A.3.3.2, fluctuated. The steady increase in PD in the 1B.2.1 and 1A.1.1.3 clades represents a significant challenge for control strategies, i.e., vaccines to reflect circulating genetic and antigenic diversity may not work adequately as a strain selected as a vaccine antigen in 2016 may not reflect the diversity in the clade in 2018 [37]. In addition, this analysis identified clades with low PD, which may be susceptible to removal through the use of targeted vaccines that are focused on the genetic diversity observed within these clades. A benefit of using PD to track diversity is that clades may be driven to extinction with a reduction in total genetic diversity and the subsequent minimization of reassortment and antigenic drift [40].

Fig. 5

UMAP embedding of the phylogenetic tree space explored by 6 independent MCMC runs. All runs were conducted under the same conditions. Each color represents the trees from a single run, where the green star indicates the starting tree and the red star indicates the final tree. The distances between the trees were computed using the Robinson-Foulds metric as implemented in Phylo-rs

Visualizing phylogenetic tree space

Phylogenetic tree spaces are often complex with many local optima, which confounds the phylogenetic inference [25, 33]. A standard approach to searching the tree space for an optimal phylogeny is to sample the tree space using multiple Markov chain Monte Carlo (MCMC) Bayesian analyses [25, 33], resulting in several samples of the tree space. The samples produced by each analysis can then be visualized by computing all pairwise distances between the sampled trees and embedding them into a 2- or 3-dimensional Euclidean space. A single MCMC analysis can produce upwards of 10000 trees, making the computation of pairwise distances infeasible in large-scale studies involving hundreds of taxa. Phylo-rs makes the computation of all pairwise distances feasible even on large datasets with thousands of taxa and tens of thousands of sampled trees due to its innate speed and in-built multi-threading.

We tested the visualization of the tree space explored in multiple MCMC runs using an analysis that was conducted to assess the emergence and spread of highly pathogenic avian influenza (HPAI) H5N1 viruses in dairy cattle in the US from [41, 42]. Ten independent MCMC runs were conducted with BEAST v1.10.4 [33] on a set of 587 influenza A virus hemagglutinin H5N1 clade 2.3.4.4b sequences sampled from dairy cattle, poultry, peridomestic mammals, and wild birds. Each run consisted of a single Markov chain lasting 50 million generations, sampled every 5000 steps. This resulted in 10001 sampled trees in each run and 100010 trees in total. We computed all pairwise Robinson-Foulds metrics between the sampled trees using Phylo-rs on a workstation with an Intel(R) Xenon(R) w7-2475X 4.8GHz CPU running Ubuntu 20.04.3 LTS. The computation was conducted with 40 threads, taking 32 h to calculate the distance of approximately 5 billion tree pairs.

To simplify visualization, we omitted four runs that did not converge [41] and removed the first 20% of trees as the burn-in from the remaining six runs. We then embedded the distances between the remaining 48,000 trees into a 2-dimensional space using UMAP (Fig. 5). Each independent MCMC run formed a continuous line in the resulting embedding. All runs except for run 10 appear to have traversed a similar subspace of trees while running 10 clusters separately from the other runs. For downstream analysis, this visualization allowed us to discard run 10 and combine the other log files to improve the effective sample size of the analysis, as it provided a determination that the independent runs converged on the same distribution in the MCMC runs.

Such a visualization approach can be a powerful aid in understanding the phylogenetic tree space explored by Bayesian or maximum likelihood tree inference software [43, 44]. However, due to potential distortions resulting from dimensionality reduction, the visualization should be used in conjunction with quantitative approaches, such as assessment of effective sample size (ESS), e.g., to assess the convergence of different MCMC chains accurately.

Conclusions and discussion

Phylo-rs is a general-purpose phylogenetic analysis library written in Rust. By leveraging the Rust programming language’s memory-safety features and speed, Phylo-rs offers a variety of advanced phylogenetic algorithms and functionality. Phylo-rs fosters the dissemination of complex software for phylogenetic analysis, bridging the gap between theoretical advancement and practical implementation. Phylo-rs is available under an open-source license on GitHub at https://github.com/sriram98v/phylo-rs, with documentation at https://docs.rs/phylo/latest/phylo/.

Support for PhyloXML and PhyloJSON file formats will be included in the future. Further, tree simulations under the Birth-Death and Coalescent evolutionary models will in added in the near future. Phylo-rs will extend bindings to other languages, such as R and Python, and implement tree traits on highly memory-efficient structures provided by libraries such as ts-kit [45].

Availability and requirements

Project name: Phylo-rs

Project home page: https://crates.io/crates/phylo

Operating system(s): Platform independent

Programming language: Rust

Other requirements: Rust 1.85.0

License: MIT License

Any restrictions to use by non-academics: None

Data availability

The trees used in the phylogenetic diversity analysis are available at https://github.com/sriram98v/phylo-rs/tree/main/examples/phylogenetic-diversity, and the tree files used in the MCMC visualization analysis are available from the authors at https://doi.org/10.5281/zenodo.15213504.

Abbreviations

WASM:

WebAssembly

SIMD:

Single instruction multiple data

LCA:

Least common ancestor

NNI:

Nearest neighbor interchange

PD:

Phylogenetic diversity

IAV:

Influenza A virus

HA:

Hemagglutinin

GTR:

General time-reversible

MCMC:

Markov chain monte-carlo

HPAI:

Highly pathogenic avian influenza

ESS:

Effective sample size

References

  1. Bai S-S, Zhang H-B, Jing H, et al. Identification of genetic locus with resistance to take-all in the wheat-psathyrostachys huashanica keng introgression line h148. J Integr Agric. 2021;20(12):3101–13.

    Article CAS Google Scholar

  2. Pipins S, Baillie JE, Bowmer A, et al. Advancing edge zones to identify spatial conservation priorities of tetrapod evolutionary history. Nat Commun. 2024;15(1):7672.

    Article CAS PubMed PubMed Central Google Scholar

  3. Li L, Xie W, Zhan L, et al. Resolving tumor evolution: a phylogenetic approach. J Nat Cancer Center. 2024;4(2):97–106.

    Article Google Scholar

  4. Featherstone LA, Zhang JM, Vaughan TG, et al. Epidemiological inference from pathogen genomes: a review of phylodynamic models and applications. Virus Evol. 2022;8(1):045.

    Article Google Scholar

  5. Modi A, Vai S, Caramelli D, et al. The illumina sequencing protocol and the novaseq 6000 system. In: Bacterial Pangenomics: Methods and Protocols, 2021;pp. 15–42. Springer, Berlin

  6. Wang M, Fu A, Hu B, et al. Nanopore targeted sequencing for the accurate and comprehensive detection of sars-cov-2 and other respiratory viruses. Small. 2020;16(32):2002169.

    Article CAS PubMed PubMed Central Google Scholar

  7. Wang Y, Zhao Y, Bollas A, et al. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol. 2021;39(11):1348–65.

    Article CAS PubMed PubMed Central Google Scholar

  8. Moreno MA, Holder MT, Sukumaran J. Dendropy 5: a mature python library for phylogenetic computing. 2024; arXiv:2405.14120

  9. Moshiri N. Treeswift: a massively scalable python tree package. SoftwareX. 2020;11:100436.

    Article PubMed PubMed Central Google Scholar

  10. Revell LJ. phytools 2.0: an updated r ecosystem for phylogenetic comparative methods (and other things). PeerJ. 2024;12:16505.

    Article Google Scholar

  11. Paradis E, Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in r. Bioinformatics. 2019;35(3):526–8.

    Article CAS PubMed Google Scholar

  12. Czech L, Barbera P, Stamatakis A. Genesis and gappa: processing, analyzing and visualizing phylogenetic (placement) data. Bioinformatics. 2020;36(10):3263–5.

    Article CAS PubMed PubMed Central Google Scholar

  13. Moshiri N. Compacttree: a lightweight header-only c++ library and python wrapper for ultra-large phylogenetics. Gigabyte 2025, (2025)

  14. Lemoine F, Gascuel O. Gotree/goalign: toolkit and go API to facilitate the development of phylogenetic workflows. NAR Gen Bioinform. 2021;3(3):075.

    Google Scholar

  15. Perkel JM. Why scientists are turning to rust. Nature. 2020;588:185.

    Article CAS PubMed Google Scholar

  16. Fulton, K.R., Chan, A., Votipka, D., et al.: Benefits and drawbacks of adopting a secure programming language: Rust as a case study. In: Seventeenth Symposium on Usable Privacy and Security (SOUPS 2021), pp. 597–616 (2021)

  17. Li C, Jiao J. Llvm framework: Research and applications. In: 2023 19th International conference on natural computation, fuzzy systems and knowledge discovery (ICNC-FSKD). IEEE, 2023;pp. 1–6

  18. Klabnik S, Nichols C. The rust programming language. No Starch Press, (2023)

  19. Haas A, Rossberg A, Schuff DL, et al. Bringing the web up to speed with webassembly. In: Proceedings of the 38th ACM SIGPLAN Conference on programming language design and implementation. PLDI 2017, pp. 185–200. Association for Computing Machinery, New York, NY, USA 2017. https://doi.org/10.1145/3062341.3062363

  20. Yamada K, Chen Z-Z, Wang L. Improved practical algorithms for rooted subtree prune and regraft (rspr) distance and hybridization number. J Comput Biol. 2020;27(9):1422–32.

    Article CAS PubMed Google Scholar

  21. Cardona G, Mir A, Rosselló F, et al. Cophenetic metrics for phylogenetic trees, after sokal and rohlf. BMC Bioinform. 2013;14:1–13.

    Article Google Scholar

  22. Moon J, Eulenstein O. The cluster affinity distance for phylogenies. In: Bioinformatics research and applications: 15th International Symposium, ISBRA 2019, Barcelona, Spain, June 3–6, 2019, Proceedings 15, 2019;pp. 52–64. Springer

  23. Górecki P, Markin A, Eulenstein O. Cophenetic distances: A near-linear time algorithmic framework. In: International Computing and Combinatorics Conference, 2018;pp. 168–179. Springer

  24. Kozlov AM, Darriba D, Flouri T, et al. Raxml-ng: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics. 2019;35(21):4453–5.

    Article CAS PubMed PubMed Central Google Scholar

  25. Minh BQ, Schmidt HA, Chernomor O, et al. Iq-tree 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37(5):1530–4.

    Article CAS PubMed PubMed Central Google Scholar

  26. Kelk S, Linz S, Meuwese R. Deep kernelization for the tree bisection and reconnection (tbr) distance in phylogenetics. J Comput Syst Sci. 2024;142:103519.

    Article Google Scholar

  27. Collienne L, Gavryushkin A. Computing nearest neighbour interchange distances between ranked phylogenetic trees. J Math Biol. 2021;82(1):8.

    Article PubMed PubMed Central Google Scholar

  28. Felsenstein, J.: Inferring phylogenies. In: Inferring phylogenies, pp. 664–664 (2004)

  29. Gao Y, Liu Y, Ma Y, et al. abpoa: an simd-based c library for fast partial order alignment using adaptive band. Bioinformatics. 2021;37(15):2209–11.

    Article CAS PubMed Google Scholar

  30. Gangavarapu K, Ji X, Baele G, et al. Many-core algorithms for high-dimensional gradients on phylogenetic trees. Bioinformatics. 2024;40(2):030.

    Article Google Scholar

  31. Kramer A, Turakhia Y, Corbett-Detig R. Shusher: private browser-based placement of sensitive genome samples on phylogenetic trees. J Open Source Softw. 2021;6(66):3677.

    Article Google Scholar

  32. Aksamentov I, Roemer C, Hodcroft EB, et al. Nextclade: clade assignment, mutation calling and quality control for viral genomes. J Open Source Softw. 2021;6(67):3773.

    Article Google Scholar

  33. Suchard MA, Lemey P, Baele G, et al. Bayesian phylogenetic and phylodynamic data integration using beast 1.10. Virus Evol. 2018;4(1):016.

    Article Google Scholar

  34. Chao A, Chiu C-H, Jost L. Phylogenetic diversity measures and their decomposition: a framework based on hill numbers. Biodiver Conserv Phylogenet Systemat. 2016;14:141–72.

    Article Google Scholar

  35. Anderson TK, Chang J, Arendsee ZW, et al. Swine influenza a viruses and the tangled relationship with humans. Cold Spring Harb Perspect Med. 2021;11(3):038737.

    Article Google Scholar

  36. Neveau MN, Zeller MA, Kaplan BS, et al. Genetic and antigenic characterization of an expanding h3 influenza a virus clade in us swine visualized by nextstrain. Msphere. 2022;7(3):00994–21.

    Article CAS Google Scholar

  37. Markin A, Wagle S, Grover S, et al. Parnas: objectively selecting the most representative taxa on a phylogeny. Syst Biol. 2023;72(5):1052–63.

    Article PubMed PubMed Central Google Scholar

  38. Chang J, Anderson TK, Zeller MA, et al. octoflu: automated classification for the evolutionary origin of influenza a virus gene sequences detected in us swine. Microbiol Resource Announc. 2019;8(32):10–1128.

    Article Google Scholar

  39. Barba-Montoya J, Tao Q, Kumar S. Using a GTR+Γ substitution model for dating sequence divergence when stationarity and time-reversibility assumptions are violated. Bioinformatics. 2020;36(Supplement-2):884–94.

    Article Google Scholar

  40. Markin A, Macken CA, Baker AL, et al. Revealing reassortment in influenza a viruses with treesort. bioRxiv, 2024;2024–11.

  41. Nguyen T-Q, Hutter C, Markin A, et al. Emergence and interstate spread of highly pathogenic avian influenza A(H5N1 in dairy cattle in the United States. Science, 2025;388:eadq0900. https://doi.org/10.1126/science.adq0900

  42. Anderson T, Hutter CR, Markin A, Nguyen T. Flu-crew/dairy-cattle-hpai-2024: Data and code from: emergence and interstate spread of highly pathogenic avian influenza A(H5N1) in Dairy Cattle in the United States. https://doi.org/10.5281/zenodo.15213504

  43. Khodaei M, Owen M, Beerli P. Geodesics to characterize the phylogenetic landscape. PLoS ONE. 2023;18(6):0287350.

    Article Google Scholar

  44. Wilgenbusch JC, Huang W, Gallivan KA. Visualizing phylogenetic tree landscapes. BMC Bioinform. 2017;18:1–12.

    Article Google Scholar

  45. Kelleher J, Thornton KR, Ashander J, et al. Efficient pedigree recording for fast population genetics simulation. PLoS Comput Biol. 2018;14(11):1006581.

    Article Google Scholar

Download references

Acknowledgements

We are grateful for the comments on the manuscript provided by Dr. Paweł Górecki, Dr. Geng Ding, and Paige Falor and code reviews by Sanket Wagle.

Funding

This work was supported in part by the United States Department of Agriculture (USDA), Agricultural Research Service (ARS project numbers 5030-32000-231-000-D, 5030-32000-231-111-I, 3022-32000-018-017-S, 5030-32000-231-095-S, and 5030-32000-231-103-A) and with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services (Contract No. 75N93021C00015). The funding sources had no role in study design, data collection, and interpretation, or the decision to submit the work for publication. Mention of trade names or commercial products in this article is solely to provide specific information and does not imply recommendation or endorsement by the USDA. USDA is an equal opportunity provider and employer.

Author information

Authors and Affiliations

  1. Department of Computer Science, Iowa State University, Ames, IA, 50011, USA

    Sriram Vijendran & Oliver Eulenstein

  2. National Animal Disease Center, Agricultural Research Service, United States Department of Agriculture, Ames, IA, 50010, USA

    Tavis Anderson & Alexey Markin

Authors
  1. Sriram Vijendran
  2. Tavis Anderson
  3. Alexey Markin
  4. Oliver Eulenstein

Contributions

Conceptualization, S.V.; methodology, S.V.; software, S.V; validation S.V., A.M.; writing, review, and editing, S.V., A.M., T.A., O.E. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Sriram Vijendran or Oliver Eulenstein.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Vijendran, S., Anderson, T., Markin, A. et al. Phylo-rs: an extensible phylogenetic analysis library in rust. BMC Bioinformatics 26, 197 (2025). https://doi.org/10.1186/s12859-025-06234-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12859-025-06234-w

Keywords

BMC Bioinformatics

ISSN: 1471-2105

Contact us

AltStyle によって変換されたページ (->オリジナル) /