Release information: SILVA 102

Version 102 of the SSU and LSU databases released in February 2010

SSU		LSU
Parc	1,246,462	(+ 250,715)	180,344	(+ 19,327)
Ref	460,783	(+ 50,876)	16,966	(+ 2540)
HSM	116,389	Human skin microbiome

Information about former releases can be found here.

Sequence Retrieval and Processing

SSU	LSU
candidates (total)	2,494,470	629,496
RNAmmer	39,266	5876
< 300 bases	765,570	366,540
> 2% ambiguities	15,574	4245
> 2% homopolymers	35,621	9979
> 5% vector contamination	18,899	14,465
rejected by SINA	94,707	19,941
sequence or alignment quality or bp score < 30	76,156	21,796

Sequences have been retrieved from EMBL Release 102 (Dec 09) using a complex keyword search procedure and sequence based search with RNAmmer profiles. Cross checks with RDP II indicated no loss of primary data. Most of the sequences rejected by the new SINA aligner were classified as not ribosomal RNA sequences by manual inspection or the remaining aligned sequence fragments were below 300 bases.

Basic statistics for the SILVA databases, release 102

SSU Parc	SSU Ref	LSU Parc	LSU Ref
Version	102	102	102	102
Total	1,246,462	460,783	180,344	16,966
Bacteria	1,043,182	391,167	15,272	9984
Archaea	55,127	19,260	270	258
Eukaryota	134,351	50,356	164,802	6724
Cultured #	28,891	23,197	14,156	3602
Typestrains #	10,824	10,747	382	368

# according to straininfo.net and the Living Tree Project

Growth of the ribosomal RNA databases since 1992

Blue: RDP II, orange: SILVA SSUParc based on the EMBL release 102

Length Distribution (SSU & LSU)

Red: raw data, black: the quality checked & aligned SSUParc sequences

Red: raw data, black: the quality checked & aligned LSUParc sequences

Sequence quality in relation to length in SSUParc 90

New in Release 102

Webpage
- Browser and Search rewrite is ongoing see http://beta.arb-silva.de
Taxonomy
- The classifications in the reference guide trees (SSURef/LSURef) were completely revised and improved taking into account information from Bergey's trust, Euzeby's LPSN as well as data from specialized databases and the literature, for details see FAQ.
- Taxonomic path in ARB and export files have been normalized. All taxonomic levels are now separated by ;
ARB files
- The more than 116,000 full length sequences of the Human skin microbiome project have been separated from the SSURef dataset.
Pipeline
- several improvements

Known Bugs

SSUParc: 121 sequences have no Pintail values

Small Subunit rRNA Database

SSU Parc (Web database & ARB file) contains all aligned sequences with an alignment quality value, a basepair score or a sequence quality equal and above 30. All sequences with a Pintail value < 50 or an alignment quality value < 75 have been assigned to color group 1 in ARB (red). All Living Tree Project typestrains have been assigned to color group 2 in ARB (light blue). No further sequence curation has been applied.

To create SSU Ref (ARB file), additionally, all sequences below 1,200 bases for Bacteria and Eukarya and below 900 bases for Archaea or an alignment quality value below 50 have been removed from SSUParc. A guide tree was calculated by adding all sequences to the tree_1200 of SILVA release 100 which is based on tree_1000 from the ssujan04 release. For tree calculation, highly variable positions were removed for Bacteria, Archaea, and Eukarya with the respective position variability filters. The tree for Bacteria and Archaea have been organized mainly based on the Bergey's taxonomic outline, LPSN and the literature. After manually inspection of the tree, some sequences have been removed due to long branches. Position variability filters for Bacteria, Archaea and Eukarya have been calculated and added to the dataset. Please take into account that also sequences below an alignment quality value of 75 need further attention. All sequences with a Pintail value < 50 or an alignment quality value < 75 have been assigned to color group 1 in ARB (red). All Living Tree Project typestrains have been assigned to color group 2 in ARB (light blue). Before using the alignment for extensive phylogenetic reconstructions all sequences should be checked carefully.

Large Subunit rRNA Databases

LSU Parc (Web database & ARB file) contains all aligned sequences with an alignment quality value, a basepair score or a sequence quality equal and above 30. No further curation has been applied, only a guide tree has been added by the most parsimonious addition of around 22,000 sequences to the LSUParc guide tree from SILVA 100.

Additionally, for LSU Ref (ARB file) all sequences below 1,900 bases have been removed, a guide tree was calculated based on the tree_1900 of SILVA release100, and basic filters have been added. The tree for Bacteria and Archaea have been organized mainly based on the Bergey's taxonomic outline, LPSN and the literature. All sequences with an alignment quality value < 75 have been assigned to color group 1 in ARB (red). Please take into account that the SEED consisted only of around 2,800 sequences and there is no guaranty that well aligned close relatives have always been available. We would recommend additional manual curation before using it for extensive phylogenetic reconstructions.

Alternative Names

All names of validly described species in the SSU and LSU databases have been checked for changes (basonyms, synonyms and orthographical corrections) against the DSMZ "Nomenclature up to date" catalogue (http://www.dsmz.de/download/bactnom/names.txt) released in November 2009.

Alternative Taxonomies

Besides the EMBL Taxonomy, alternative classifications taken from the greengenes and the RDP II project are also available in SILVA. On the webpage, the user can switch using the Taxonomy menu. In ARB, the different taxonomies can be found in the fields: embl_tax, gg_tax and rdp_tax for EMBL, greengenes and RDP II, respectively. The corresponding *_name fields shows the respective sequence name for each entry. Please take into account that both greengenes and RDP II provide only a subset of the sequences hosted by SILVA. If no taxonomic mapping to greengenes or RDP II was available they are assigned as "unclassified" and the respective sequence name equals EMBL. For the LSU datasets, there are no alternative taxonomies available.

Cultured and Type strains

Type strain and cultured informationhas been added to the field strain and is indicated by [T] and [C]. Several sources have been used to compile the information: The Straininfo.net bioportal, The Ribosomal Database Project II (10.17) and the Living Tree Project which provides manually curated information compliant with Euzebys "List of Prokaryotic names with Standing in Nomenclature".

Strain Identifiers

Source	Information	Tag	Datasets
EMBL	Typestrains	(t)	SSU, LSU
EMBL	Genomes	e[G]	SSU, LSU
Straininfo.net	Cultured	s[C]	SSU, LSU
Straininfo.net	Typestrains	s[T]	SSU, LSU
Living Tree Project	Typestrains (curated)	l[T]	SSU
RDP II	Typestrains	r[T]	SSU

The identifiers can be used for data retrieval by searching in the strain field see FAQ.

Genome

Genome information is provided by EMBL.

Detailed information about the corresponding identifiers and target databases can be found in the table to the right.

The identifiers can be used for data retrieval by searching in the strain field see FAQ.

Quality Values

The length and colours of the bars give a first indication on the sequence and alignment quality as well as the risk for sequence anomalies based on Pintail analysis. After downloading the sequences as an ARB file, sequences that need attention can be selected by searching for low quality (alignment, sequence) or Pintail values in the corresponding ARB database fields. A full description of the colour code and all database fields available in the ARB files can be found in the FAQsection. Taking into account the rich set of sequence associated information that comes along with every SILVA sequence, user designed sub-databases can be easily generated.

SEED

All rRNA sequences have been aligned based on a completely manually re-checked SEED alignment of 56,354 rRNA sequences for SSU and 2,868 rRNA sequences for LSU. The SSU alignment is based on the official ssu_jan04 release of the ARB Project. The SSU SEED alignment has been considerably improved for Archaea by manual addition of more than 1,000 sequences by Katrin Knittel. All SSU Eukaryotic sequences (18S) have been cross-checked by Wolfgang Ludwig before their addition to the SEED. Most of the bacterial sequences have also undergone a curation process carried out by the SILVA Team. We would rate our SSU SEED alignment for all Bacteria and Archaea as good and for Eukarya as reasonable.

The LSU alignment was provided by Wolfgang Ludwig and has not been released before SILVA. It was cross-checked by the SILVA Team before using it as the SEED for automatic alignment. Bacteria and Archaea could be rated as good. The Eukaryotes need definitely further attention.

RNAmmer

RNAmmer is a computational predictorfor the major rRNA species (SSU, LSU) from all three domains of life. The program uses hiddenMarkov models trained on data from the European ribosomal RNA database project. SILVA runs the profiles of RNAmmer on all sequence entries of the EMBL archive to complement the existing predictions. All predictions are marked with RNAmmer in the ann_src_field. More information about RNAmmer can be found in the paper.

Update Files

Update files are not longer provided. Because of the constant improvement we do on the SILVA pipeline we recommend to always take the latest version of SILVA and update it with your personal sequences. The difference between SILVA and your own database can be easily determined using the ARB Merge Tool.

Page Content

Did not find what you were looking for?

Search within all databases of the DSMZ Digital Diversity

provided to you by

DSMZ Digital Diversity

SILVA

Tutorials

Documentation

Social Media

Twitter RSS Feed

Leibniz Association UniEuk de.NBI Elixir Core Data Resource DSMZ

Release

Tools

Download

Projects

Did not find what you were looking for?

Social Media