EMBOSS 1.3.0

ableasby at hgmp.mrc.ac.uk ableasby at hgmp.mrc.ac.uk
Thu Aug 17 22:18:25 UTC 2000


EMBOSS 1.3.0 contains two new applications [also the EMBOSSRC environment
variable directive (see the adminstrators guide) and a few minor
bugfixes.]


1) Vectorstrip  (Val Curwen)

   vectorstrip is intended to be useful for stripping vector sequence
   from the ends of sequences of interest. For example, if a fragment has
   been cloned into a vector and then sequenced, the sequence may contain
   vector data eg from the cloning polylinker at the 5' and 3' ends of
   the sequence. vectorstrip will remove these contaminating regions and
   output trimmed sequence ready for input into another application.
   
   vectorstrip is suitable for use with low quality sequence data as it
   can allow for mismatches between the sequence and the vector patterns
   provided. You can specify the maximum level of mismatch expected.
   
   Vector data can either be provided in a file or interactively. If
   presented in a file, vectorstrip will search all input sequences with
   all vectors listed in that file. The intention is that the user can
   maintain a single file for use with vectorstrip, containing all the
   linker sequences commonly used in the laboratory.
   
   The two patterns for each vector are searched separately against the
   sequence. Once the search is completed, each of the hits of the 5'
   sequence is paired with each of the hits of the 3' sequence and the
   resulting subsequences are output. For example, if the 5' sequence
   matches the sequence from (a) position 30-60, and(b)position 70-100,
   and the 3' sequence matches from 150-175, then two subsequences will
   be output: from 61-149, and from 101-149. The lower the quality of the
   sequence, the more likely multiple hits become if nonzero mismatches
   are accepted.
   
   Default behaviour is to report only the best matches between the
   vector patterns and the sequence. This means that if you specify a
   maximum mismatch level of 10%, but the vector patterns match the
   sequence with zero mismatches, the search will stop and the program
   will output only these "best" matches. If there are no perfect
   matches, the program will try searching again allowing 1 mismatch,
   then 2, and so on until either the patterns match the sequence or the
   maximum specified mismatch level is exceeded. You can tell vectorstrip
   to show all possible matches up to your specified maximum level.

2. Diffseq  (Gary Williams)

   diffseq takes two overlapping, nearly identical sequences and reports
   the differences between them, together with any features that overlap
   with these regions. GFF files of the differences in each sequence are
   also produced.
   
   diffseq should be of value when looking for SNPs, differences between
   strains of an organism and anything else that requires the differences
   between sequences to be highlighted.
   
   The sequences can be very long. The program does a match of all
   sequence words of size 10 (by default). It then reduces this to the
   minimum set of overlapping matches by sorting the matches in order of
   size (largest size first) and then for each such match it removes any
   smaller matches that overlap. The result is a set of the longest
   ungapped alignments between the two sequences that do not overlap with
   each other. The mismatched regions between these matches are reported.


Alan






More information about the EMBOSS mailing list