Consensus

Tutorial
NAST_Excel
Tutorial Topics
Home
The 16S Gene
Lab Overview
Base Calling
Alignment
Chimeras
Classification
Tree
Mutations
Taxonomies
Micro-Arrays
Sample Data
Teachers

NAST Excel Report
How to Read a Spreadsheet on the NAST Alignment
    As is discussed on the Alignment page, the sequence you submitted has been stretched out to 7,682 bases as it was aligned to various identifiable portions of a template sequence in the Greengenes catalog.  When a sequence is submitted to be aligned the user receives back an Excel file that explains the alignment results.    
NAST Aligned Spreadsheet

  • A)  This is simply the name of the sequence that was aligned, the unique ID.
  • B)  The number of nucleotides submitted (sequence length) to Greengenes
  • C)  If there is any error message it will show up here.  If the sequence was too short or it didn't match anything in the database you would receive an error message.
  • D) Template ID - this is the identification, in the Greengenes catalogue, of the (gene?) that matches your submitted sequence.
  • E)  BLAST - this is a comparison of columns A & D, given as a percent.   A low number here would be a Low Quality Indicator.
  • F)  This number is the longest insertion relative to the template.
  • G) Candidate Span Aligned - this essentially reports that the sequence was aligned from base # X all the way to base #Y.  A good alignment will show that most, if not all of the submitted sequence was able to be aligned.
  • H)  Candidate Characters After Alignment - A sequence was submitted and as much of it as possible was aligned.  This value is the difference between the two numbers in "G".
  • I)  This is the actual number of nucleotide bases that were aligned.  
  • J)  This is the count of 7mers, or single nucleotides repeated at least 7 times.  If a sequence reads ...ACGTTTTTTTTTTCACGA..., the repeating 'T's are a low quality indicator.  If there are only one or two of these 7mers in the whole aligned sequence it is not necessarily bad.  If there are 5 or 8 or more, it is most likely a Low Quality Indicator for the sequence.  In other words, once submitted to be classified you are going to be much less sure of the accuracy of that identification.
  • K)  Non-ACGT nucleotide count - If a large number of the nucleotides counted were not A, C, G, or T it is another Low Quality Indicator.  When FinchTV or some other program was transforming the chromatogram to actual nucleotide bases sometimes it isn't sure of an actual assignment.  The peak on the chromatogram may have been ambiguous.  In such instances an 'N' or 'R' may have been reported.  Again, if there are a lot of these in a sequence it is a Low Quality Indicator.
  • L)  This number is the same value as reported in column K, but represented as a percent of all of the nucleotide bases submitted for alignment.
Tutorial Main

Greengenes Main
 FASTA
file
Align
1
Align
2
Align
3
Align
4
Align
5
Chimera
1
Chimera
2
Classify Tree