|
| Alignment |
The Greengenes
alignment tool takes the gene sequence you submit and compares them
with the 16S rRNA gene database, Greengenes. While the entire data
base contains over 80,000 genes, a Core Set of around 10,000
representative genes are used for alignment purposes.
Specifically, the alignment tool uses the Greengenes algorithm NAST (Near Alignment Space Termination) to compare
the bases in your sequences with the
Greengenes database and matches them up wherever there are
similarities. Where differences are found they are noted (in the
Excel report) and dashes are inserted to allow the candidate to align to a reference sequence. It is the NAST algorithm that creates the 7,682
character
standardized expansion of your FASTA file. This 7,682 character format is
the standard on Greengenes and is required before a user's sequence can
be checked for chimeric structure and subsequently classified.
Due to the
structure of the ribosome, some portions of the gene are more conserved, meaning that over time there are very few mutations within these sections. Other portions, called hypervariable,
are more subject to the accumulation of insertion, deletion and
substitution mutations. (For a brief review of point mutations, click here.).
While NAST isn't specifically comparing only
the conserved
regions while ignoring the hypervariable regions, such sections can be
easily discerned on the aligned FASTA file. You will see sections
of bases separated by long series of dashes. One can make the
assumption that regions dense with bases represent conserved regions
and that regions dense with dashes represent hypervariable regions.
Steps in Using Alignment Tool
1. Fasta Files
2. Significant Match
3. Returned Files
4. Formatting Options
5. Delivery Options
Mutations
Near Neighbors
Tutorial Main
Greengenes Main
|
|