Roary: rapid large-scale prokaryote pan genome analysis

doi:10.1093/bioinformatics/btv421

. 2015 Nov 15;31(22):3691-3.

doi: 10.1093/bioinformatics/btv421. Epub 2015 Jul 20.

Roary: rapid large-scale prokaryote pan genome analysis

Andrew J Page ¹, Carla A Cummins ¹, Martin Hunt ¹, Vanessa K Wong ², Sandra Reuter ³, Matthew T G Holden ⁴, Maria Fookes ¹, Daniel Falush ⁵, Jacqueline A Keane ¹, Julian Parkhill ¹

Affiliations

¹ Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge.
² Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, Department of Medicine, University of Cambridge, Cambridge.
³ Department of Medicine, University of Cambridge, Cambridge.
⁴ School of Medicine, University of St. Andrews, North Haugh, St Andrews and.
⁵ College of Medicine, Swansea University, Swansea, UK.

PMID: 26198102
PMCID: PMC4817141
DOI: 10.1093/bioinformatics/btv421

Roary: rapid large-scale prokaryote pan genome analysis

Andrew J Page et al. Bioinformatics. 2015.

. 2015 Nov 15;31(22):3691-3.

doi: 10.1093/bioinformatics/btv421. Epub 2015 Jul 20.

Authors

Andrew J Page ¹, Carla A Cummins ¹, Martin Hunt ¹, Vanessa K Wong ², Sandra Reuter ³, Matthew T G Holden ⁴, Maria Fookes ¹, Daniel Falush ⁵, Jacqueline A Keane ¹, Julian Parkhill ¹

Affiliations

¹ Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge.
² Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, Department of Medicine, University of Cambridge, Cambridge.
³ Department of Medicine, University of Cambridge, Cambridge.
⁴ School of Medicine, University of St. Andrews, North Haugh, St Andrews and.
⁵ College of Medicine, Swansea University, Swansea, UK.

PMID: 26198102
PMCID: PMC4817141
DOI: 10.1093/bioinformatics/btv421

Abstract

A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors.

Availability and implementation: Roary is implemented in Perl and is freely available under an open source GPLv3 license from http://sanger-pathogens.github.io/Roary

Contact: roary@sanger.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Effect of dataset size on the wall time of multiple applications. Only analysis that completed within 2 days and 60 GB of RAM is shown

See this image and copyright information in PMC

References

1. Enright A.J., et al. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 30, 1575–1584. - PMC - PubMed
1. Fouts D.E., et al. (2012) PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species. Nucleic Acids Res., 40, e172. - PMC - PubMed
1. Fu L., et al. (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 28, 3150–3152. - PMC - PubMed
1. Medini D., et al. (2005) The microbial pan-genome. Curr. Opin. Genet. Dev., 15, 589–594. - PubMed
1. Nguyen N., et al. (2014) Building a pangenome reference for a population. In: Sharan R. (ed.) Research in Computational Molecular Biology, Springer International Publishing, pp. 207–221.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

[1] Enright A.J., et al. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 30, 1575–1584. - PMC - PubMed

[2] Enright A.J., et al. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 30, 1575–1584. - PMC - PubMed

[3] Fouts D.E., et al. (2012) PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species. Nucleic Acids Res., 40, e172. - PMC - PubMed

[4] Fouts D.E., et al. (2012) PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species. Nucleic Acids Res., 40, e172. - PMC - PubMed

[5] Fu L., et al. (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 28, 3150–3152. - PMC - PubMed

[6] Fu L., et al. (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics, 28, 3150–3152. - PMC - PubMed

[7] Medini D., et al. (2005) The microbial pan-genome. Curr. Opin. Genet. Dev., 15, 589–594. - PubMed

[8] Medini D., et al. (2005) The microbial pan-genome. Curr. Opin. Genet. Dev., 15, 589–594. - PubMed

[9] Nguyen N., et al. (2014) Building a pangenome reference for a population. In: Sharan R. (ed.) Research in Computational Molecular Biology, Springer International Publishing, pp. 207–221.

[10] Nguyen N., et al. (2014) Building a pangenome reference for a population. In: Sharan R. (ed.) Research in Computational Molecular Biology, Springer International Publishing, pp. 207–221.

Account

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Roary: rapid large-scale prokaryote pan genome analysis

Affiliations

Roary: rapid large-scale prokaryote pan genome analysis

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources