Consensus

Tutorial
Vocabulary
Tutorial Topics
Home
The 16S Gene
Lab Overview
Base Calling
Alignment
Chimeras
Classification
Tree
Mutations
Taxonomies
Micro-Arrays
Sample Data
Teachers

Glossary

7mer - a term derived from the word oligomer.  You're probably familiar with the term polymer- a molecule composed of many repeating subunits.  "Poly" means many.  "Oligo" means few.  Here, an oligomer refers to a nucleotide base sequence that is only repeated a few times. A 7mer would look like ACTTTTTTTGATC, where the thymine is repeated seven times in a row.  You may see 10mer, 15mer, etc.  The number as a prefix simply defines exactly how many times that base repeats.

ARB – ARB is a UNIX-based software that can create a local database, analyze and make trees of nucleic acid sequences.

BLAST – Basic Local Alignment Search Tool is an algorithm which does a best alignment of two sequences before comparing them for similarities/differences.

Conserved – The conserved regions of a gene are those parts that, over time, have accumulated very few mutations/changes in the nucleotide sequence.

Core Set – The Core Set is a group of about 10,000 sequences within the complete Greengenes database.  It is a set of sequences representative of most of the major prokaryotic taxa.

eukaryote - an organism whose DNA is bound inside a nuclear membrane and has organelles within the cytoplasm.

FASTA – The FASTA is a text file comprised of a unique identification of the sequence(s) followed by the actual nucleotide bases of the sequence(s).  It is the file format either returned from the sequencing lab or the file created by calling the bases from a chromatogram.  To learn more about FASTA files, click here.

GenBank – GenBank is the National Institute of Health’s (NIH) database of genomic sequences.  It is a repository of all publicly available DNA sequences.  To learn more about GenBank click here.

Hypervariable – The hypervariable regions of a gene are those portions that, over time, have been more tolerant of mutations, and have therefore accumulated more changes within the nucleotide sequence.

JGI – The Joint Genomic Institute is a Department Of Energy Lab, funded under through the Office of Biological and Environmental Research in DOE's Office of Science.  It is a collaborative effort to sequence genomes, both for public and DOE projects.  To learn more about the Joint Genomic Institute click here.

NAST – Near Alignment Space Termination is the Greengenes algorithm that matches up submitted sequences with the Greengenes database to look for similarities and align the submitted sequences based on those similarities.

NCBI The National Center for Biotechnology Information is a subdivision of the National Institute of Health.  It maintains databases, (one of which is GenBank) develops software and works on bioinformatics as well.  To learn more about NCBI click here.

OTU - An Operational Taxonomic Unit is just that, a defined level which taxonomists use to discuss or compare organisms.  In context of the taxonomies used by Greengenes an OTU refers to the terminal level at which that taxonomy classifies the sequences.  While it might be all the way down to the specific strain for one taxonomy it might only be to sub-order for another.

phenotype - the observable, physical characteristics of an organism.  When talking about microorganisms it harkens back to classifying them using characteristics like gram positive or gram negative, shape of the cell (bacillus or coccus), etc.

prokaryote - an organism lacking a defined nucleus and other organelles

prokMSA – an old name for Greengenes

putative chimera – these are sequences that Greengenes/Bellerephon deems to most likely be chimeras because it has found strong similarities between portions of the sequence and other submitted sequences.

Simrank – Simrank is an algorithm which uses oligomers in common to compare two nucleic acid sequences

 Tutorial Main

Greengenes Main

 FASTA
file
Align
1
Align
2
Align
3
Align
4
Align
5
Chimera
1
Chimera
2
Classify Tree