Consensus

Functions
Home

Browse
Export
Slice
Consensus
Compare
Search
Probe
Align
Trim
Download
Curate
More Tools...

About
Citation
Tutorial
FAQ
Objectives
Methods
Contact

My Interest List

remove all
collapse all
show marked

My Taxonomy

Changing taxonomy will empty My Interest List.
 
Align a batch of sequences against the 16S greengenes rRNA gene database using NAST
Use this tool for aligning your set of 16S rRNA gene sequences using NAST or for finding near-neighbors or both.  Each query sequence in your uploaded file will be searched for 16S rRNA gene sequences and aligned according to a Core Set of alignment templates. You can even upload a whole fasta genome and NAST will find, extract, and align the 16S rRNA genes for you. Additionally, Simrank (an N-mer comparison tool) will be used to find nearest-neighbors (non-chimeric) as well as nearest-isolates for each of your sequences from the entire Greengenes database. Then, the query sequences can be returned by email either aligned or unaligned, with or without the near-neighbor sequences included in supplementary files. Please refrain from uploading files containing more than 500 sequences. If you have a larger project, try breaking it up into several files, or contact tdesantis@lbl.gov to make alternate arrangements. This version of NAST will align at a rate of ~10 sequences per minute.
Server status:
0 NAST alignment job(s) currently underway.
Fasta file to upload:

Please avoid any odd characters (parentheses, for example) in your file name. Be sure there is a return (newline) after the final nucleotide in your file.
Be sure that none of your sequence records are named "0" (the number zero).

Batch size for NAST:
Tells NAST how many records to align at once. Bigger batches (max. 100) help the whole job to complete sooner but smaller batches (min. 1) give more frequent feedback to the screen while the job is running. This parameter has no effect on alignment quality.
Significant match requirements:
  • Minimum length:
    Uploaded sequences that do not align to a "template" sequence over at least this many bases will not be included in the output.
  • Minimum percent identity:
    Uploaded sequences that do not share at least this similarity to a "template" sequence will not be included in the output.
Files you desire:
  • Tab-delimited text file summarizing alignment fate of each sequence. It will be titled with the ".xls" extension, "xyz_NAST.xls", for example, for convenient opening in spreadsheet applications but it is plain text format. See more notes below.
  • Sequence file of uploaded sequences sucessfully aligned; "xyz_NAST.fasta", for example.
  • Sequence file of uploaded sequences not able to be aligned according to user requirements; "xyz_NASTnot.fasta", for example.
  • Sequence file containing nearest-neighbor non-chimeric sequences for each sequence in upload (redundant neighbors removed). File wil be named "xyz_nn_NAST.fasta", for example.
  • Sequence file containing near-neighbor non-chimeric sequences from named isolates (redundant neighbors removed). File will be named "xyz_nni_NAST.fasta", for example.
Formatting options:
remove common alignment gap characters (returned sequences will contain an equal number of characters)
remove all alignment gap characters (returned sequences will be unequal in length)
do not remove alignment gap characters (returned sequences will be 7,682 characters)
is my preferred file format.
Delivery:
The files requested above will be compressed into one email attachment. Greengenes is using the tgz format which can be opened in MacOSX, WindowsXP, and UNIX-like platforms without the need for special software in our tests.
Email address to send results (required):


Notes on contents of returned table:
A table will accompany your NAST-aligned sequences if selected above. The columns of this table are as follows:
  • candidate sequence ID: The name of your sequence.
  • candidate nucleotide count: The number of bases in your sequence.
  • errors: A description of an error encountered when NASTing a particular sequence.
  • template ID: The prokMSA_id (a.k.a. gg_id) of the sequence used as the alignment template.
  • BLAST percent identity to template: Percentage calculated along HSP only.
  • longest insertion relative to template: Largest local misalignment produced in order to preserve the global alignment.
  • candidate span aligned: If only a sub-section of the candidate sequence could be aligned to the template, you see the span positions here.
  • candidate nucleotide count post-NAST: Should always be 7682, that is the point of NAST.
  • unaligned length: Count of bases making it into the aligned sequence.
  • count of single nucleotide 7mers or longer Nmers: Just a helpful alert to odd homopolymers.
  • non-ACGT nucleotide count
  • non-ACGT nucleotide percent
 
  • Last Database Update: October 2, 2011 1:41PM
  • 1049116 aligned 16S rDNA records >1250nt