Emboss 1.2.0 now available

ableasby at hgmp.mrc.ac.uk ableasby at hgmp.mrc.ac.uk
Tue Aug 15 23:01:29 UTC 2000


Here are the changes in 1.2.0
Note the need to reindex blast databases with dbiblast if you use
those indexes.

1. Indexing and format changes

As a result of much feedback, mainly people saying this or that
database can't be indexed, version 1.2.0 now handles the NCBI
format cleanly.

a) All library and applications using NCBI format adhere to the
   specifications in ftp://ncbi.nlm.nih.gov/blast/db/README
   N.B. This means that DBIBLAST uses NCBI format only. You will
   have to re-dbiblast any existing blast databases you use
   in EMBOSS. Why? ....

   EMBOSS no longer tries to guess what format a setdb/pressdb/formatdb
   database used to be in. Once the process has happened it is
   just another NCBI format database.
   This has the advantage that compatibility can be maintained and
   means that all NCBI databases can be indexed using the given keys.
   The other indexing programs (dbiflat, dbigcg and dbifasta) allow
   you to index the more friendly formats (excuse the bias).

b) dbifasta has extended options. It can now cope with
         >db id    ... e.g. KABAT "dbid" becomes a format
                       for the method emblcd
   formats. Also, its NCBI option, as above, now adheres strictly to
   their syntax. This allows database such as pdbseq to be indexed
   correctly.

If you wish to index non-NCBI databases using their familiar keys then
don't use dbiblast.

Numerous bugs in this area (all of the failure kind) have been
squashed and hopefully not too many introduced.


2. New applications in 1.2.0

a) primersearch (Val Curwen)

searches DNA sequences for matches with primer pairs; it reads in
primer pairs from input files, searches them against any sequence(s)
specified and reports all potential amplimers.  You
can specify a maximum percent mismatch level; for example, 10%
mismatch on a primer of length 20bp means that the program will
classify a primer as matching a sequence if 18 of the 20 base pairs
match.

b) megamerger  (Gary Williams)

takes two overlapping sequences and merges them into one sequence.  It
could thus be regarded as the opposite of what splitter does.  The
sequences can be very long. The program does a match of all sequence
words of size 20 (by default).  It then reduces this to the minimum
set of overlapping matches by sorting the matches in order of size
(largest size first) and then for each such match it removes any
smaller matches that overlap. The result is a set of the longest
ungapped alignments between the two sequences that do not overlap with
each other. If the two sequences are identical in their region of
overlap then there will be one region of match and no mismatches.


Rgds
Alan






More information about the EMBOSS mailing list