pestfind: name collision

David Mathog mathog at mendel.bio.caltech.edu
Thu Jun 5 14:55:11 UTC 2003


> 
> There is a name collision of an EMBOSS program called pestfind with
the program 
> pestfind that already existed (at least in Pise).

Yes, same problem here.

> I had to rename our original 
> interface to Pestfind.
> (cf http://bioweb.pasteur.fr/seqanal/protein/intro-uk.html)
> However, it is very possible that the EMBOSS interface is equivalent
or better.
> What do you think?

The original PESTFIND was a basic program by Martin Rechsteiner
with corrections by Bob Stellwagen.  I've never seen it run
(not having basic on any of my machines) but In 1995 I obtained
the code and ported it (more or less literally translated line
for line) to ANSI C, that version is still available here:

  ftp://saf.bio.caltech.edu/pub/software/molbio/pestfind.zip

At that time I left the name as "pestfind" because it was simply
a translation of the original program into a new language.  The
numeric results were intended to exactly reproduce the original
(and not improve on them in any way.)

Later a PERL version was written by M.Grabner.  I have not
seen the code but according to this page:

http://emb1.bcc.univie.ac.at/embnet/tools/bio/PESTfind/about.htm

it was also a translation from the original basic.

The pestfind in EMBOSS is by Michael K. Schuster and Martin Grabner.
It appears to be a translation to C from MG's earlier Perl.

I have not analyzed the EMBOSS code extensively but clearly
it differs in some ways from the original PESTFIND since the
results are not quite the same.  A quick glance showed that
the original had a weight term and two of its
values were 186 and 163, this bit of code seems to be
missing from the EMBOSS PESTFIND.  Perhaps these are coded in
there in some other manner, for instance, the values are derived
from some included table and differ slightly from the integer
values?


The results produced by drm_PESTFIND (mine) and emboss_PESTFIND
are (for instance):

drm_PESTFIND:

Potential PEST sequence 89-124 (flank_dist=34)
  HYTNPSQDVTVPCPVPSTPPTPSPSTPPTPSPSCCH
  The weight percent of PEDST is: 60.854301
  The hydrophobicity index is: 39.809540
  The PEST-FIND score is: 13.565096
---------------------------------

emboss_PESTFIND:

 Potential PEST motif with 34 amino acids between position 89 and 124.
    89 HYTNPSQDVTVPCPVPSTPPTPSPSTPPTPSPSCCH 124
       DEPST: 60.53 % (w/w)
       Hydrophobicity index: 39.60
       PEST score: 13.49

The differences are too large to be rounding errors - there's
some (slight) intrinsic difference in the code.
Part of this is probably due to the lack of an explicit "weight"
term in the EMBOSS pestfind, and part may be due to the
EMBOSS version correcting a bug in the original Basic version
which I left (intentionally) intact in the C translation.

The bottom line though is that:

1.  The EMBOSS version seems to be doing a slightly different
calculation than the original PESTFIND. The results are
very close, they may even be "better" by some criteria, but
they are not the same.

2.  There was already an earlier program with that name.
Usually with EMBOSS a port of "program" into an EMBOSS version
resulted in "eprogram" in order to avoid exactly this
sort of confusion.

So the EMBOSS version should be renamed "epestfind".  

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the EMBOSS mailing list