SILVA logo DSMZ Digital Diversity
Release
Information Files ARB files
Tools
Search Browser TestProbe TestPrime ACT SILVAngs
Download
TaskManager Archive
Projects
Overview

Release Information: SILVA 92

Version 92 of the SSU and LSU databases as released on 01.10.2007

SSU

LSU

Parc

504,295

(+42,472)

91,556

(+5867)

Ref

208,801

(+11,911)

7480

(+578)

Information about former releases can be found here.

Sequence Retrieval and Processing

SSU

LSU

candidates

1,108,767

432,598

< 300 bases

477,916

304,710

> 2% ambiguities

8677

2295

> 2% homopolymers

20,330

4937

> 5% vector contamination

15,700

2362

rejected by SINA

80,765

22,819

alignment quality and bpscore < 30

6825

4717

Sequences have been retrieved from EMBL Release 92 (Sept. 07) using a complex keyword search procedure. Cross checks with RDP II indicated no loss of primary data. Most of the sequences rejected by the new SINAligner were classified as not ribosomal RNA sequences by manual inspection or the remaining aligned sequence fragments were below 300 bases.

Basic statistics for the SILVA databases, release 92

SSUParc

SSU Ref

LSUParc

LSURef

Version

92

92

92

92

Total

504,295

208,801

91,556

7480

Bacteria

407,652

168,131

7234

4119

Archaea

21,977

8249

157

150

Eukaryota

73,189

32,421

84,165

3215

Cultured #

19,738

16,055

7340

810

Typestrains #

11,523

10,249

3430

611

# according to straininfo.net

Growth of the ribosomal RNA databases since 1992

Blue: RDP II, orange: SILVA SSUParc based on the EMBL release 92

Length Distribution (SSU & LSU)

Red: raw data, black: the quality checked & aligned SSUParc sequences
Red: raw data, black: the quality checked & aligned LSUParc sequences

Sequence quality in relation to length in SSUParc 90

Known Bugs

  • Around 1000 SSU entries have no Pintail values

Future Developments

Similarity based search and aligner functionalities are planned for end of 2007. The SEED and Ref databases require further extension and curation.

Small Subunit rRNA Database

SSU Parc (Web database & ARB file) contains all aligned sequences with an alignment quality value and a basepair score equal and above 30. All sequences with a Pintail value < 50 or an alignment quality value < 75 have been assigned to color group 1 in ARB (red). No further curation has been applied.

To create SSU Ref (ARB file), additionally all sequences below 1,200 bases for Bacteria and Eukarya and below 900 bases for Archaea or an alignment quality value below 50 have been removed from SSUParc. A guide tree was calculated by adding all sequences to the tree_1200 of SILVA release 91 which is based on tree_1000 from the ssujan04 release. For tree calculation, highly variable positions were removed for Bacteria, Archaea, and Eukarya with the respective position variability filters. Phyla and most of the classes for Bacteria and Archaea have been organized according to the Bergey's taxonomic outline. After manually inspection of the tree, around 190 sequences have been removed due to long branches. Position variability filters for Bacteria, Archaea and Eukarya have been calculated and added to the dataset. Please take into account that also sequences below an alignment quality value of 75 need further attention. All sequences with a Pintail value < 50 or an alignment quality value < 75 have been assigned to color group 1 in ARB (red). Before using the alignment for extensive phylogenetic reconstructions all sequences should be checked carefully.

Large Subunit rRNA Databases

LSU Parc (Web database & ARB file) contains all aligned sequences with an alignment quality value and a basepair score equal and above 30. No further curation has been applied, only a guide tree has been added by the most parsimonious addition of 5867 sequences to the LSUParc guide tree from SILVA 91.

Additionally, for LSU Ref (ARB file) all sequences below 1,900 bases have been removed, a guide tree was calculated based on the tree_1900 of SILVA release 91, and basic filters have been added. All sequences with an alignment quality value < 75 have been assigned to color group 1 in ARB (red). Please take into account that the SEED consisted only of around 2,800 sequences and there is no guaranty that well aligned close relatives have always been available. We would recommend additional manual curation before using it for extensive phylogenetic reconstructions.

Alternative Names

All names of validly described species in the SSU and LSU databases have been checked for changes (basonyms, synonyms and orthographical corrections) against the DSMZ "Nomenclature up to date" catalogue (http://www.dsmz.de/download/bactnom/names.txt) released in September 2007.

Alternative Taxonomies

Besides the EMBL Taxonomy, alternative classifications taken from the greengenes and the RDP II project are now available in SILVA. On the webpage, the user can switch using the Taxonomy menu. In ARB, the different taxonomies can be found in the fields: embl_tax, gg_tax and rdp_tax for EMBL, greengenes and RDP II, respectively. The corresponding *_name fields shows the respective sequence name for each entry. Please take into account that both greengenes and RDP II provide only a subset of the sequences hosted by SILVA. If no taxonomic mapping to greengenes or RDP II was available they are assigned as "unclassified" and the respective sequence name equals EMBL. For the LSU datasets, there are no alternative taxonomies available.

Cultured and Type strains

Type strain informationhas been added to the field strain and is indicated by [T] or [t]. For the SSU datasets two mappings are now available, one based on the RDP II 9.54 [T] dataset and one provided by straininfo.net [t]. Furthermore, based on straininfo.net a [c] was assigned for cultivated strains and a [g] for genomes. How to search for them see FAQs. LSU datasets contain only the straininfo.net information.

Quality Values

The length and colours of the bars give a first indication on the sequence and alignment quality as well as the risk for sequence anomalies based on Pintail analysis. After downloading the sequences as an ARB file, sequences that need attention can be selected by searching for low quality (alignment, sequence) or Pintail values in the corresponding ARB database fields. A full description of the colour code and all database fields available in the ARB files can be found in the FAQ section. Taking into account the rich set of sequence associated information that comes along with every SILVA sequence, user designed subdatabases can be easily generated.

SEED

All rRNA sequences have been aligned based on a completely manually re-checked SEED alignment of 51,601 rRNA sequences for SSU and 2,868 rRNA sequences for LSU. The SSU alignment is based on the official ssu_jan04 release of the ARB Project. The SSU SEED alignment has been considerably improved for Archaea by manual addition of more than 1,000 sequences by Katrin Knittel. All SSU Eukaryotic sequences (18S) have been cross-checked by Wolfgang Ludwig before their addition to the SEED. Most of the bacterial sequences have also undergone a curation process carried out by the SILVA Team. We would rate our SSU SEED alignment for all Bacteria and Archaea as good and for Eukarya as reasonable.

The LSU alignment was provided by Wolfgang Ludwig and has not been released before. It was cross-checked by the SILVA Team before using it as the SEED for automatic alignment. Bacteria and Archaea could be rated as good. The Eukaryotes need definitely further attention.

Update Cycle

For all four datasets, ARB change files are available in the download section. These datasets contain only sequences that are either new or the accession number has changed between SILVA release 91 and 92.

Did not find what you were looking for?

Search within all databases of the DSMZ Digital Diversity

provided to you by

DSMZ Digital Diversity



© DSMZ 2025Imprint Privacy Statement License Contact

AltStyle によって変換されたページ (->オリジナル) /