Contents
BLAST stands for Basic Local Alignment Search Tool and was developed by Altschul et al. (1990). It is a very fast search algorithm that is used to separately search protein or DNA sequence databases. BLAST is best used for sequence similarity searching, rather than for motif searching.
A fairly complete on-line guide to BLAST searching can be found at the NCBI BLAST Help Manual. CGD has a separate help document for the BLAST results page. Documentation about WU-BLAST2 is posted here.
BLAST searches offered by CGD allow users to compare any query sequence to Candida sequence datasets. To search other fungal sequences, use SGD's Fungal BLAST tool. To search other datasets, NCBI BLAST can be used.
First, enter the sequence that you would like to compare in Step 1, "Enter your query sequence."
The sequence can be entered directly in the box provided. (Alternately, the sequence may be uploaded from a text file; the ability to utilize this option is provided in the "Optional: upload a local sequence TEXT file" section near the bottom of the page.)
In Step 2, "Select one or more target genomes," select the organism dataset(s) against which your sequence will be compared.
More than one target dataset may be available for a given organism. For example, for C. albicans SC5314, we provide the option to search against Assembly 21 (haploid, chromosome-level genome assembly) or Assembly 19 (diploid, contig-level assembly).
In Step 3, "Select Target Sequence Dataset," choose the type of dataset against which your sequence will be compared. Options include: genomes (chromosomal or contig sequences), genes (ORFs plus any intronic sequence), coding sequences (ORFs, with intronic sequence removed), proteins (translations of ORF sequences), genomic sequence of non-coding features (including intronic sequence), and sequence of non-coding features with intronic sequences removed. The selection available for an individual organism will depend on the annotation available for the organism, and some types of dataset are not available for some of the organisms included.
In step 4, "Choose Appropriate BLAST Program," select the type of search to run. CGD offers these BLAST programs to accommodate different types of searches:
Program options are limited by Query Sequence type and Target Sequence Dataset choice. We try to guess your Query Sequence type from its text content. If that guess is wrong, you can override it using the radio-button selection option for "DNA" or "protein" located below the program selection pull-down menu.
NOTE
For BLASTX and TBLASTX searches:
You may choose an alternate genetic code to use for query
translation. Queries for which this may be appropriate include DNA
sequences from most Candida albicans-related species (use 12:
Alternative Yeast Nuclear Code) or mitochondrial DNA
sequences. C. glabrata uses the standard code, Translation
Table 1, for translation of its nuclear genome. See
the Non-standard Genetic Codes Help Page for more details. The default code for CGD is: 12: Alternative Yeast Nuclear Code.
In Step 5, you may submit the query, or clear the form to re-enter data.
Note: Two additional, optional sections of the BLAST submission form allow (1) query submission using a text file, and (2) customization of BLAST parameters, respectively.
1) Sequences can be submitted for a BLAST search in two different ways. The sequence can be uploaded from a local text file with FASTA, GCG, or RAW formatting, or the sequence can be typed or pasted into the Query Sequence window. (Note: The contents of an uploaded sequence file will not be displayed in the Query Sequence window of the search page.)
To use the Upload Local File option:
2) Other options are available, including the ability to add a note to the BLAST output, or to receive the results by email.
Changing other search parameters can also change the outcome of the BLAST search:
You may choose to allow (default) or disallow gapped alignments using the Yes/No option on the interface.
BLAST searches are subject to filtering. A filter will remove repetitive sequences from a query, so that the results of the BLAST search will be less numerous and, ideally, more informative. For nucleic acid query sequences, the "dust" filter is used as the default. For all other searches, the "seg" filter is the default. You can remove filtering using the On/Off option on the interface.
The Expect threshold ("E") reflects the number of matches expected to be found by chance. If the statistical significance of a match is greater than the Expect threshold, the match will not be reported. The E threshold default is set to 10. Decreasing the E threshold will increase the stringency of the search: fewer matches will be reported. On the other hand, increasing the E threshold will decrease the stringency of the search and result in more matches being reported.
The default scoring matrix used is BLOSUM62; however, other matrices may be selected from the pull-down menu provided on the interface.
The number of alignments displayed on the results page is customizable.
The user can also change the word length (W): BLAST first searches for a perfect match of at least the word length. Once a match is found then it tries to extend the high-scoring segment pair (HSP). The default W value for BLASTN is 11; for all other programs the default is 3. If the word length is less than 11 the query sequence must be less than 5000 bp.
If a query sequence is short (less than about 30 residues), the user may want to adjust the Cutoff Score ("S") to a lower value, which will result in a less stringent criterion for reporting matches.
A note on translation tables:
In C. albicans, nuclear encoded proteins are translated using Translation table 12 (Alternative Yeast Nuclear), whereas mitochondrial encoded proteins are translated using Translation table 4 (Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma). For BLAST searches where a nucleotide dataset must be translated (TBLASTN and TBLASTX), the CGD BLAST tool uses Table 4 for translation of the mitochondrial dataset, and Table 12 for translation of the datatsets containing nuclear genes that are available as BLAST target datasets in CGD. When a nucleotide query sequence is entered by the user, and this sequence is to be used in a BLAST search that requires its translation (BLASTX and TBLASTX), a choice must be made as to which translation table should be used. To handle these query sequences accurately, the "Query translation table" parameter should be set by the user to specify the translation table used to translate the query sequence. By default, the user-supplied query sequence is translated using the same table that is appropriate for the dataset against which it is being searched (i.e., Table 4 is used if the query sequence is being BLASTed against the mitochondrial dataset, and Table 12 in all other cases). However, if, for example, an S. cerevisiae nucleotide sequence were being used in a BLASTX or TBLASTX search, then translation Table 1 (the Standard table) should be selected. Please see NCBI's Taxonomy browser and Translation Table web page for more information about alternate translation tables.