|
|
|
Classify a batch of sequences against multiple taxonomies.
|
Use this tool for classification of your set of aligned 16S rRNA sequences or
finding near-neighbors or both.  If your sequences are not yet aligned in the greengenes
7,682-character format, then first align them.
Each correctly aligned sequence in your uploaded file will be compared to the prokMSA to find near-neighbors
using Simrank. Sequence divergence from
near-neighbors will be calculated using the DNAML option of DNADIST (PHYLIP package). The
Lane mask (Lane, 1991) will be used to restrict calculations to 1,287 conserved columns (lanes) of aligned characters.
A report will be emailed which lists the best matching taxa from multiple taxonomies.
Please contact tdesantis@lbl.gov before
uploading files containing more than a few hundred sequences.
Other notes:
- A, C, G, T 16S rRNA gene base frequency used by DNADIST (F84 distance) will be 0.2537, 0.2317, 0.3167, 0.1979, respectively.
- Transition:Transversion Ratio is assumed to be 2.0.
- Taxonomy reference sequences used for classification will be those deemed as non-chimeric (divergence ratio < 1.10).
- Your sequnce may match different reference sequences in different taxonomic outlines.
This is because the major 16S rRNA gene collections include non-identical sets of genes in their outlines.
Thus, the nearest neighbor from the entire greengenes collection
(found using a "compare" tool) may be a closer match
than the nearest neighbor from the more limited list of sequences from NCBI, for instance.
"Automatic taxonomic classification" is a way of getting a a general idea of where a 16S rRNA gene sequence (perhaps downloaded from GenBank, or from a new bacteria you have grown in your lab) fits into an existing taxonomic outline. The tool does not re-construct trees but instead finds a reference sequence in a previously constructed tree that is similar to the one submitted by the user. If the similarity is high enough, then the taxonomy of the reference sequence is applied to the submitted sequence.
|
|
|
|