Emboss 1.2.0 now available
ableasby at hgmp.mrc.ac.uk
ableasby at hgmp.mrc.ac.uk
Tue Aug 15 23:01:29 UTC 2000
Here are the changes in 1.2.0
Note the need to reindex blast databases with dbiblast if you use
those indexes.
1. Indexing and format changes
As a result of much feedback, mainly people saying this or that
database can't be indexed, version 1.2.0 now handles the NCBI
format cleanly.
a) All library and applications using NCBI format adhere to the
specifications in ftp://ncbi.nlm.nih.gov/blast/db/README
N.B. This means that DBIBLAST uses NCBI format only. You will
have to re-dbiblast any existing blast databases you use
in EMBOSS. Why? ....
EMBOSS no longer tries to guess what format a setdb/pressdb/formatdb
database used to be in. Once the process has happened it is
just another NCBI format database.
This has the advantage that compatibility can be maintained and
means that all NCBI databases can be indexed using the given keys.
The other indexing programs (dbiflat, dbigcg and dbifasta) allow
you to index the more friendly formats (excuse the bias).
b) dbifasta has extended options. It can now cope with
>db id ... e.g. KABAT "dbid" becomes a format
for the method emblcd
formats. Also, its NCBI option, as above, now adheres strictly to
their syntax. This allows database such as pdbseq to be indexed
correctly.
If you wish to index non-NCBI databases using their familiar keys then
don't use dbiblast.
Numerous bugs in this area (all of the failure kind) have been
squashed and hopefully not too many introduced.
2. New applications in 1.2.0
a) primersearch (Val Curwen)
searches DNA sequences for matches with primer pairs; it reads in
primer pairs from input files, searches them against any sequence(s)
specified and reports all potential amplimers. You
can specify a maximum percent mismatch level; for example, 10%
mismatch on a primer of length 20bp means that the program will
classify a primer as matching a sequence if 18 of the 20 base pairs
match.
b) megamerger (Gary Williams)
takes two overlapping sequences and merges them into one sequence. It
could thus be regarded as the opposite of what splitter does. The
sequences can be very long. The program does a match of all sequence
words of size 20 (by default). It then reduces this to the minimum
set of overlapping matches by sorting the matches in order of size
(largest size first) and then for each such match it removes any
smaller matches that overlap. The result is a set of the longest
ungapped alignments between the two sequences that do not overlap with
each other. If the two sequences are identical in their region of
overlap then there will be one region of match and no mismatches.
Rgds
Alan
More information about the EMBOSS
mailing list