|
| NAST Excel Report |
How to
Read a Spreadsheet on the NAST Alignment
As is discussed on the Alignment
page, the
sequence you submitted has been stretched out to 7,682 bases as it was
aligned to various identifiable portions of a template sequence in the
Greengenes catalog. When a sequence is submitted to be
aligned
the user receives back an Excel file that explains the alignment
results.

- A) This is simply the name of the
sequence that
was aligned,
the unique ID.
- B) The number of nucleotides
submitted (sequence
length) to
Greengenes
- C) If there is any error message it
will show up
here.
If
the sequence was too short or it didn't match anything in the database
you would receive an error message.
- D) Template ID - this is the identification, in
the
Greengenes
catalogue, of the (gene?) that matches your submitted sequence.
- E) BLAST - this is a comparison of
columns A
& D,
given as a
percent. A low number here would be a Low Quality Indicator.
- F) This number is the longest
insertion relative
to the
template.
- G) Candidate Span Aligned - this essentially
reports that
the sequence
was aligned from base # X all the way to base #Y. A good
alignment will show that most, if not all of the submitted sequence was
able to be aligned.
- H) Candidate Characters After
Alignment - A
sequence was
submitted and as much of it as possible was aligned. This
value
is the difference between the two numbers in "G".
- I) This is the actual number of
nucleotide bases
that were
aligned.
- J) This is the count of 7mers, or
single
nucleotides repeated
at
least 7 times. If a sequence reads ...ACGTTTTTTTTTTCACGA...,
the
repeating 'T's are a low quality indicator. If there are only
one
or two of these 7mers in the whole aligned sequence it is not
necessarily bad. If there are 5 or 8 or more, it is most
likely a
Low Quality Indicator for the sequence. In other words, once
submitted to be classified you are going to be much less sure of the
accuracy of that identification.
- K) Non-ACGT nucleotide count - If a
large number
of the
nucleotides counted were not A, C, G, or T it is another Low Quality
Indicator. When FinchTV or some other program was
transforming
the chromatogram to actual nucleotide bases sometimes it isn't sure of
an actual assignment. The peak on the chromatogram may have
been
ambiguous. In such instances an 'N' or 'R' may have been
reported. Again, if there are a lot of these in a sequence it
is
a Low Quality Indicator.
- L) This number is the same value as
reported in
column K, but
represented as a percent of all of the nucleotide bases submitted for
alignment.
Tutorial Main
Greengenes
Main |
|
|