|
|
|
Special administration page for approved curators only.
|
This interface allows overrides of certain fields in the greengenes database:
(prokMSAname, db_name, remark, warning, and aligned_seq)
Directions to prepare your file:
1. Begin an ARB session.
2. Export the database in BoulderIO format using the prokMSA.eft.
3. Compress the exported file with gzip. For example "gzip my_file.pMSA".
4. Resulting pMSA.gz file is uploaded with the following form.
What will happen:
- The fields prokMSAname, db_name, remark, and warning will be overwritten
to match the corresponding fields from the upload.
- The field aligned_seq_phil will be overwritten to match the aligned_seq field from
the upload.
- The aligned_seq from the upload is compared to the existing aligned_seq
in the greengenes database. The uploaded sequence is considered an improvement
if ALL FOUR of the following conditions are met:
- Differences exist between the alignment of the uploaded sequence and the existing sequence.
- Either unaligned sequence is an exact match or an exact sub-sequence of the other.
- New aligned sequence has more (or equal number of) bases at "lane mask" positions than existing sequence.
- New aligned sequence has less (or equal number of) bases intruding into small alignment gaps
compared to the existing sequence.
- If conditions are met (alignment is improved) or "force" is invoked
- aligned_seq from upload overwites greengenes aligned_seq.
- unaligned_length, non_ACGT_count, non_ACGT_percent are recalculated and overwritten to coresponding fields.
- span_aligned is set to 1..unaligned_length.
- the greengenes fields template and blast_perc_ident_to_template will NOT be altered.
- a list of all the alignment positions affected are recorded in the manual_overwrites_table.
- I or F are written to status to indicate alignment was Improved or Forced, repectively.
Be aware:
- There are no multiple curator policies set in stone yet. Therefore if you change an alignment it may get changed again when a new Core
Set of templates is established and the whole database is realigned according to those templates. The best way to use this tool to
improve the global greengenes MSA is to improve the alignments of Core Set templates.
- If an improved sequence is sent through this tool twice then you may find unexpected behavior. The first time the improvement will be
observed and the record will be updated and the status will be se to I, for improved. If it is feed through again, no improvement will be
seen since it will exactly match the existing alignment. But if force was set for this batch then the alignment
will again be overwritten and the status will be set to F, for forced. Thus, to preserve the meaning of the status field, avoid overwriting
the same record mutiple times.
- Uploading a sequence which is aligned for the first time will require a force to write it to the database. This is because the new alignment
will not pass condition 2 above.
|
|
|
|