[EMBOSS] fuzzpro oddity
pmr at ebi.ac.uk
Fri Mar 13 14:41:45 UTC 2009
Steve Taylor wrote:
> I have an odd problem. I am trying to search a multi-fasta set of
> proteins. If I do:
> fuzzpro -sequence invertebrate.protein.faa -pattern CXXXXFYPXXXXXW
> -stdout -auto
> I get:
> Error: Sequence is not a protein
> fuzzpro -sequence invertebrate.protein.faa -pattern CXXXXFYPXXXXX
> -stdout -auto
> returns results.
> Any thoughts why? Is this a cryptic way of saying it can't find the
> motif or some other problem?
Nothing to do with the pattern. There is a strange sequence there:
>gi|170590912|ref|XP_001900215.1| hypothetical protein Bm1_43765
This sequence has a 'J' which is a mass-spec ambiguity code for "I or L"
and has somehow crept into a translation (perhaps with an ambiguous
codon - there are several possibilties)
EMBOSS 3.0.0 refuses to read it. EMBOSS 4.0.0 also fails.
EMBOSS 5.0.0 and 6.0.0 understand J and should be able to process it and
convert it to X
As for the difference between the patterns - they both fail, but without
the W it gives some results before it reaches the bad sequence.
As you are reporting the results to stdout, it is not so easy to spot ...
but just before the tail of the report I get the "Error: Sequence is not
a protein" line (as one it to stdout and one is to stderr you may not
see it in exactly the same place)
Solutions are: edit the J (and any others) to X in the file
and of course to update your EMBOSS installation to 6.0.0
As to the missing second message about the bad sequence ... if this were the
only sequence in the file it would issue a message because it can read
nothing. When reading through a file with many sequences it assumes the
first failure is end of file. We need to do something about that - such
as adding the sequence name to the message so you know where it stopped.
Thanks for the report - it was fun to look into.
More information about the EMBOSS