[EMBOSS] Problem with protein caracters

Peter Rice pmr at ebi.ac.uk
Sat Jul 11 10:54:21 UTC 2009


Radwen ANIBA wrote:
> I'm trying to use some programs that comes with emboss package to analyze
> some protein sequences but I have sometimes this message :
> 
> Error: ajSeqTypeCheckIn: Sequence must be protein sequence without BZ U X or
> *: found bad character 'X'
> 
> Is there any manner to force the program considering these types of residues

EMBOSS uses the type attribute of the input sequence (or seqset or 
seqall) to identify the type of the input sequence (nucleotide, protein, 
or any) and the characters that are allowed (gaps, stops, non-standard 
residies and ambiguity characters).

Your application is expecting "pureprotein". This is only used by 
applications unable to handle the ambiguity codes (it can be difficult 
to define what an algorithm should do with them).

The alternative are:

protein - accepts all characters, converts stops to X
proteinstandard - converts U,O and J to 'X'
stopproteinstandard - converts stops, U, O, J to X

"protein" is probably what you want. You need to be able to do something 
with the ambiguity codes X, B, Z and J and with the non-standard amino 
acids U (selenocysteine) and O (pyrrolysine)

Hope this helps

Peter Rice



More information about the EMBOSS mailing list