May 2013 Notice: The most recent Greengenes database and taxonomy updates are now found at Taxonomic information on this site is deprecated and should be used with caution.



Classify a batch of sequences against multiple taxonomies.
Use this tool for classification of your set of aligned 16S rRNA sequences or finding near-neighbors or both.  If your sequences are not yet aligned in the greengenes 7,682-character format, then first align them. Each correctly aligned sequence in your uploaded file will be compared to the prokMSA to find near-neighbors using Simrank. Sequence divergence from near-neighbors will be calculated using the DNAML option of DNADIST (PHYLIP package). The Lane mask (Lane, 1991) will be used to restrict calculations to 1,287 conserved columns (lanes) of aligned characters. A report will be emailed which lists the best matching taxa from multiple taxonomies. Please contact before uploading files containing more than a few hundred sequences.

Other notes:
  • A, C, G, T 16S rRNA gene base frequency used by DNADIST (F84 distance) will be 0.2537, 0.2317, 0.3167, 0.1979, respectively.
  • Transition:Transversion Ratio is assumed to be 2.0.
  • Taxonomy reference sequences used for classification will be those deemed as non-chimeric (divergence ratio < 1.10).
  • Your sequnce may match different reference sequences in different taxonomic outlines. This is because the major 16S rRNA gene collections include non-identical sets of genes in their outlines. Thus, the nearest neighbor from the entire greengenes collection (found using a "compare" tool) may be a closer match than the nearest neighbor from the more limited list of sequences from NCBI, for instance.
"Automatic taxonomic classification" is a way of getting a a general idea of where a 16S rRNA gene sequence (perhaps downloaded from GenBank, or from a new bacteria you have grown in your lab) fits into an existing taxonomic outline. The tool does not re-construct trees but instead finds a reference sequence in a previously constructed tree that is similar to the one submitted by the user. If the similarity is high enough, then the taxonomy of the reference sequence is applied to the submitted sequence.
Aligned fasta file to upload:

Each record in the file should contain an aligned 16S rRNA gene sequence exactly 7,682 characters in length.
Taxonomy options:
Select the taxonomic nomenclature you prefer:

Delivery options:
Email address to send results (required):
  • Last Database Update: October 2, 2011 1:41PM
  • 1049116 aligned 16S rDNA records >1250nt