[EMBOSS] iep program for multiple protein sequences

Peter Rice pmr at ebi.ac.uk
Fri Sep 8 11:20:24 UTC 2006

Tao Song wrote:
> Hi,
>      I wonder can the iep program  that calculates the isoelectric point of 
> a protein be used
> for a protein database? When asked to input protein sequence I gave 'tsw' 
> instead of
> 'tsw:laci_ecoli' I got an error that said 'sequence must be protein sequence 
> without BZ U X
> or *: found bad character Z'. Does iep can only take one protein sequence as 
> input file?

Your command does read the test swissprot database, but fails on an 
entry that is a sequence fragment with a Z ambiguity code.

For the next release, I have a patch that will convert B and Z to D/N 
and E/Q using the Dayhoff frequencies of naturally occurring amino 
acids. This will convert the first B or Z to a charged residue (as these 
are more common), the second to an uncharged residue, and so on. With 
this change in place iep can be modified to accept any protein sequence 
and will produce consistent results on ambiguity codes.

A question: We can try this fix as a general solution for programs 
requiring "pureprotein" input, by converting any B or Z (or J) ambiguity 
code. Is this useful? For iep the order does not matter and the 
converted sequence does not appear in the output, but I think a 
program-by-program solution is better.

Other programs insisting on "pureprotein" input are hmoment, octanol and 



