EMBOSS 1.3.0
ableasby at hgmp.mrc.ac.uk
ableasby at hgmp.mrc.ac.uk
Thu Aug 17 22:18:25 UTC 2000
EMBOSS 1.3.0 contains two new applications [also the EMBOSSRC environment
variable directive (see the adminstrators guide) and a few minor
bugfixes.]
1) Vectorstrip (Val Curwen)
vectorstrip is intended to be useful for stripping vector sequence
from the ends of sequences of interest. For example, if a fragment has
been cloned into a vector and then sequenced, the sequence may contain
vector data eg from the cloning polylinker at the 5' and 3' ends of
the sequence. vectorstrip will remove these contaminating regions and
output trimmed sequence ready for input into another application.
vectorstrip is suitable for use with low quality sequence data as it
can allow for mismatches between the sequence and the vector patterns
provided. You can specify the maximum level of mismatch expected.
Vector data can either be provided in a file or interactively. If
presented in a file, vectorstrip will search all input sequences with
all vectors listed in that file. The intention is that the user can
maintain a single file for use with vectorstrip, containing all the
linker sequences commonly used in the laboratory.
The two patterns for each vector are searched separately against the
sequence. Once the search is completed, each of the hits of the 5'
sequence is paired with each of the hits of the 3' sequence and the
resulting subsequences are output. For example, if the 5' sequence
matches the sequence from (a) position 30-60, and(b)position 70-100,
and the 3' sequence matches from 150-175, then two subsequences will
be output: from 61-149, and from 101-149. The lower the quality of the
sequence, the more likely multiple hits become if nonzero mismatches
are accepted.
Default behaviour is to report only the best matches between the
vector patterns and the sequence. This means that if you specify a
maximum mismatch level of 10%, but the vector patterns match the
sequence with zero mismatches, the search will stop and the program
will output only these "best" matches. If there are no perfect
matches, the program will try searching again allowing 1 mismatch,
then 2, and so on until either the patterns match the sequence or the
maximum specified mismatch level is exceeded. You can tell vectorstrip
to show all possible matches up to your specified maximum level.
2. Diffseq (Gary Williams)
diffseq takes two overlapping, nearly identical sequences and reports
the differences between them, together with any features that overlap
with these regions. GFF files of the differences in each sequence are
also produced.
diffseq should be of value when looking for SNPs, differences between
strains of an organism and anything else that requires the differences
between sequences to be highlighted.
The sequences can be very long. The program does a match of all
sequence words of size 10 (by default). It then reduces this to the
minimum set of overlapping matches by sorting the matches in order of
size (largest size first) and then for each such match it removes any
smaller matches that overlap. The result is a set of the longest
ungapped alignments between the two sequences that do not overlap with
each other. The mismatched regions between these matches are reported.
Alan
More information about the EMBOSS
mailing list