dictyBase Help: BLAST Searches

dictyBase Help: BLAST Searches


Contents



Description

BLAST stands for Basic Local Alignment Search Tool and was developed by Altschul et al. (1990). It is a very fast search algorithm that is used to search protein or DNA databases for sequence similarity. A fairly complete on-line guide to BLAST searching can be found at the NCBI BLAST Help Manual.

BLAST searches offered by dictyBase allow users to compare any query sequence to D. discoideum sequence data sets. To search any other (non-Dicty) data sets, NCBI BLAST can be used.

Using BLAST

dictyBase offers these five BLAST programs to accommodate different types of searches:

  1. BLASTN compares a nucleotide query sequence against a nucleotide sequence dataset.
  2. BLASTP compares an amino acid query sequence against a protein sequence dataset.
  3. BLASTX compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence dataset.
  4. TBLASTX compares the six-frame translations of a DNA sequence to the six-frame translations of a nucleotide sequence dataset.
  5. TBLASTN compares a protein query sequence against a nucleotide sequence dataset dynamically translated in all six reading frames (both strands).

Databases

  • Coding Sequences (CDS)

    These are the coding sequences of the best quality sequence available for a given gene. In case a gene has a curated gene model, the database contains this sequence. Genes that are not yet curated are represented by the gene prediction of the Sequencing Center. In addition, if a gene that is in GenBank has not been mapped to the genome, the sequence from GenBank is contained in the database. In case a gene has more than one transcript, all transcripts are represented.

    Genomic Sequences

    This database contains the genomic sequences of all genes as described above in the Coding Sequence section. The genomic sequence in general is described as the gene sequence containing all exons and intron plus 1,000 base pairs at each end 5' and 3'. Note that this can mean, as in Dictyostelium gene density is quite high, that the genomic sequence of one gene overlaps with its neighbor, resulting in two partial hits in a blast search. Note also that 1,000 base pairs are only present when available, which might not be the case at the end of a contig or for a non-mapped GenBank record.

    Protein Sequences:

    This is the protein translationof the DNA "coding sequences (CDS)".

    EST Sequences

    This database contains EST sequences from the Japanese Sequencing Project as obtained from GenBank, and additional EST sequences contributed by H. Urushihara by direct submission to dictyBase.

    Full Chomosomes 1,2,3,4,5,6,M

    The entries in this database are the full length chromosomes in dictyBase. In addition to chromosomes 1, 2, 3, 4, 5, 6, and M (mitochondrial), this includes 'floating contigs' which are long stretches of DNA that have been sequenced but have not been fit into an assembly yet. These contigs are in two large arbitrary concatemers, 2F and 3 F, from chromosome 2 and 3, respectively.

Options

  • Changing the E-Value determines the stringency of a BLAST search. A lower E-value increases the stringency, a higher E-Value decreases the stringency of a search (to be used if short and / or very repetitive sequences are submitted). The default is 0.1, which means no alignment with a value higher than 0.1 is displayed.
  • The Number of alignments to show determines how many alignments are displayed.
  • The default Word Size is 11 nucleotides for DNA and 3 amino acids for Proteins. Increasing the Word Size increases the minimal length of an identical match required.
  • The Matrix is a general purpose matrix. The BLOSUM matrix assigns a probability score for each position in an alignment that is based on the frequency with which that substitution is known to occur among consensus blocks within related proteins. BLOSUM62, the default, is among the best of the available matrices for detecting weak similarities. Other supported options are PAM30, PAM30, BLOSUM80, and BLOSUM45. Adjustments to the matrix may be in order when a search for very distant relatives of the query is being performed.
  • Filtering is ON by default and filters the query sequence for low complexity regions. In a protein search low complexity regions appear as X's in the alignment while in a nucleotide search they appear as n's. The score and E-value of a match may be affected slightly by filtering since it effectively shortens the query length. The DUST and SEG algorithms are used. For A/T-rich or other repetitive Dictyostelium sequences turning Filtering OFF might be desirable.
  • The default Gapped Alignment reports the best local alignments and is suitable for most applications. However, an ungapped search may be desirable when hits that align to the entire length of the query are most interesting. An ungapped search can be specified by checking the 'False' option.

Blast at NCBI

In addition there is a button 'BLAST at NCBI' that links out directly to NCBI BLAST with the protein sequence pasted into the query window.

Accessing the BLAST Search Page

BLAST can be accessed by selecting the hypertext link on the menu bar at the top of all dictyBase WWW pages or through a link in the "Associated Sequences" section on each gene page.

Sequences for a BLAST search can be submitted by typing or pasting a sequence into the Query Sequence window. When the BLAST page is accessed from the "Associated Sequences" section on a gene page, the sequence of that locus is pasted automatically into the window.

Associated Glossary Terms:

Go to BLAST

Home| Contact dictyBase| SOPs| Site Map  Supported by NIH (NIGMS and NHGRI)