Consensus

Functions
Home

Browse
Export
Slice
Consensus
Compare
Search
Probe
Align
Trim
Download
Curate
More Tools...

About
Citation
Tutorial
FAQ
Objectives
Methods
Contact

My Interest List
0 sequences

remove all
collapse all
show marked

My Taxonomy

Changing taxonomy will empty My Interest List.
 
Create a de-replicated set of sequences
Would you like a smaller version of the greengenes data set for a special purpose? No problem. Choose which branches of the taxonomic tree you are interested in then return to this page to reduce the number of sequences without sacrificing diversity. All you have to do is answer a few questions and greengenes will do the rest.
My Interest List will be emptied and then refilled with only the de-replicated records.

Only non-putatitve-chimeric sequences are available for consideration by this tool. Any sequence with a divergence ratio >= 1.1 and fragment-to-parent similarity > 90% will not be considered as a representative. This may change in the future.

Large clustering jobs will NOT return all genes (bug which seems to not occur on 64bit Solaris) Contact us if you would like to comment on this tool.

1. Input

Read clusters from file:

Example file format:
CLUSTER1 103993
CLUSTER2 6079
CLUSTER3 145025,141640,9686,145027,141642
CLUSTER4 65406,46170,94721,28317,133543,136732,136731
                     

Have greengenes cluster the sequences (Inactive)

Which sequence set do you wish to de-replicate?
Sequences in My Interest List
Sequence IDs from file
All available sequences (soon to come)

What are your conditions for sequences to be in the same cluster?
When they have over % identity via megablast along a stretch of over base pairs.
Ranges allowed: 75 to 100%(but poor performance expected below 93%) and 1200 to 1550bp

2. How would you like to choose a representative from each cluster?
ParameterImportance
Choose earliest deposited sequence (compares years only)
Choose published sequence
Choose a sequence from an isolate
Choose a sequence with low ambiguity
Choose a long sequence
Choose sequence with hand-curated alignment
Choose sequence with limited small gap intrusions
Choose sequence within the GOLD database
Forget the parameters and just choose randomly!
I want speed. Skip picking a representative.

Password to enable special features:

Email address to send results (required):
(Please report bugs. This tool is still under development.)
 
  • Last Database Update: October 19, 2009 11:26AM
  • 398522 aligned 16S rDNA records >1250nt