This site needs JavaScript to work properly. Please enable it to take advantage of the complete set of features!
Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log in
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 15;33(14):2202-2204.
doi: 10.1093/bioinformatics/btx153.

GenomeScope: fast reference-free genome profiling from short reads

Affiliations

GenomeScope: fast reference-free genome profiling from short reads

Gregory W Vurture et al. Bioinformatics. .

Abstract

Summary: GenomeScope is an open-source web tool to rapidly estimate the overall characteristics of a genome, including genome size, heterozygosity rate and repeat content from unprocessed short reads. These features are essential for studying genome evolution, and help to choose parameters for downstream analysis. We demonstrate its accuracy on 324 simulated and 16 real datasets with a wide range in genome sizes, heterozygosity levels and error rates.

Availability and implementation: http://genomescope.org , https://github.com/schatzlab/genomescope.git .

Contact: mschatz@jhu.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
(A) GenomeScope heterozygosity, total genome size, and unique genome size estimates: (left) twenty seven simulated A.thaliana datasets with vary amounts of heterozygosity, sequencing error or read duplications; (middle) ten synthetic mixtures of real E.coli sequencing data; and (right) six genuine plant and animal sequencing datasets: L.calcarifer (Asian seabass), D.melanogaster (fruit fly), M.undulates (budgerigar), A.thaliana Col-Cvi F1 (thale cress), P.bretschneideri (pear), C.gigas (Pacific oyster). Also displayed are the true simulated values (Simulated), the results from a mapping and variant calling pipeline (Mapping), and a whole genome alignment (DnaDiff) where available. (B) GenomeScope k-mer profile plot of the A.thaliana dataset showing the fit of the GenomeScope model (black) to the observed k-mer frequencies (blue). The unusual peak of very high frequency k-mers (∼10 ×ばつ coverage) were determined to be highly enriched for organelle sequences

References

    1. Bankevich A. et al. (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19, 455–477. - PMC - PubMed
    1. Bates D.M., Watts D.G. (1988) Nonlinear Regression Analysis and Its Applications. John Wiley & Sons, Inc., New York, NY.
    1. Chikhi R., Medvedev P. (2014) Informed and automated k-mer size selection for genome assembly. Bioinformatics, 30, 31–37. - PubMed
    1. Gnerre S. et al. (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. U. S. A., 108, 1513–1518. - PMC - PubMed
    1. Goodwin S. et al. (2016) Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet., 17, 333–351. - PMC - PubMed
Cite

AltStyle によって変換されたページ (->オリジナル) /