[EMBOSS] fuzzpro oddity

Steve Taylor stephen.taylor at imm.ox.ac.uk
Fri Mar 13 15:08:27 UTC 2009


Thanks Peter.

Nice detective work. I wonder if anybody in NCBI/RefSeq is reading this? This is where the source is from...I wonder how blast indexing handles not having a > for example?

Steve

>>
>> I have an odd problem. I am trying to search a multi-fasta set of 
>> proteins. If I do:
>>
>> fuzzpro -sequence invertebrate.protein.faa -pattern CXXXXFYPXXXXXW 
>> -stdout -auto
>>
>> I get:
>>
>> Error: Sequence is not a protein
>>
>> fuzzpro -sequence invertebrate.protein.faa -pattern CXXXXFYPXXXXX 
>> -stdout -auto
>>
>> returns results.
>>
>> Any thoughts why? Is this a cryptic way of saying it can't find the 
>> motif or some other problem?
> 
> 
> Nothing to do with the pattern. There is a strange sequence there:
> 
> >gi|170590912|ref|XP_001900215.1| hypothetical protein Bm1_43765 
> [Brugia malayi]
> MAAQKERLTGDIYJESDIRQKSALSSSATVPSPQMNSQASRSASERQNIWEHRLGIRAPEQNSEQKKYWEYRNIYHIPVP 
> 
> QGIEFWEDEDKKRWEMINIGGLDESEANRQIKKAKLQLARERQQENRGSRTPQTTHIFFIISLICFGLQIVLAAICIGFC 
> 
> IYQIFNNSQIEAGIAFLLLALMLLIGAAGGIFSALKRSENLAICTAVYNVTSAVGIIVAIINLYSFRVGQSGNLSAFIPI 
> 
> AGVVALVQNFNKLS
> 
> This sequence has a 'J' which is a mass-spec ambiguity code for "I or L"
> and has somehow crept into a translation (perhaps with an ambiguous
> codon - there are several possibilties)
> 
> EMBOSS 3.0.0 refuses to read it. EMBOSS 4.0.0 also fails.
> 
> EMBOSS 5.0.0 and 6.0.0 understand J and should be able to process it and
> convert it to X
> 
> As for the difference between the patterns - they both fail, but without
> the W it gives some results before it reaches the bad sequence.
> 
> As you are reporting the results to stdout, it is not so easy to spot ...
> but just before the tail of the report I get the "Error: Sequence is not
> a protein" line (as one it to stdout and one is to stderr you may not
> see it in exactly the same place)
> 
> Solutions are: edit the J (and any others) to X in the file
> and of course to update your EMBOSS installation to 6.0.0
> 
> As to the missing second message about the bad sequence ... if this were 
> the
> only sequence in the file it would issue a message because it can read
> nothing. When reading through a file with many sequences it assumes the
> first failure is end of file. We need to do something about that - such
> as adding the sequence name to the message so you know where it stopped.
> 
> Thanks for the report - it was fun to look into.
> 
> Peter
> 
> 
> 


More information about the EMBOSS mailing list