[EMBOSS] iep program for multiple protein sequences
Peter Rice
pmr at ebi.ac.uk
Fri Sep 8 11:20:24 UTC 2006
Tao Song wrote:
> Hi,
>
> I wonder can the iep program that calculates the isoelectric point of
> a protein be used
> for a protein database? When asked to input protein sequence I gave 'tsw'
> instead of
> 'tsw:laci_ecoli' I got an error that said 'sequence must be protein sequence
> without BZ U X
> or *: found bad character Z'. Does iep can only take one protein sequence as
> input file?
Your command does read the test swissprot database, but fails on an
entry that is a sequence fragment with a Z ambiguity code.
For the next release, I have a patch that will convert B and Z to D/N
and E/Q using the Dayhoff frequencies of naturally occurring amino
acids. This will convert the first B or Z to a charged residue (as these
are more common), the second to an uncharged residue, and so on. With
this change in place iep can be modified to accept any protein sequence
and will produce consistent results on ambiguity codes.
A question: We can try this fix as a general solution for programs
requiring "pureprotein" input, by converting any B or Z (or J) ambiguity
code. Is this useful? For iep the order does not matter and the
converted sequence does not appear in the output, but I think a
program-by-program solution is better.
Other programs insisting on "pureprotein" input are hmoment, octanol and
pepwindow
regards,
Peter
More information about the EMBOSS
mailing list