changes to embprop.c, emowse and charge to cope with modified amino acids.

David Martin dmartin at bioinformatics.msiwtb.dundee.ac.uk
Wed Jul 4 10:45:17 UTC 2001


Dear all,

I'm writing this generally so that you can comment (and point out good
reasons why I shouldn't do this) before I go ahead and make these changes.
I am happy to do the work, just give me the nod and I'll send amended
files.

Background:

I have had some discussion with PMR over the last few days about changing
the emowse program to reflect how mass spec is actually done. the major
thrust of what I wanted was to be able to use a different amino acid data
file to reflect different methods used in mass spec.

Problem:

Eamino.dat is hardcoded into the emboss libraries (and into charge.c) as
the source data file for the amino acid data. This is not readily
changeable at run time. Just modifying Eamino.dat and saving in
.embossdata is a kludge that is not universally applicable (ie it is
impossible from a web interface).

Solution:

Have the datafile name passed as an advanced parameter on the command
line.

Changes suggested:

embprop.c

embPropAminoRead(void) should change to embPropAminoRead( AjPFile fp ) and
the code adjusted to cope (delete two lines dealing with file opening).

embPropCalcMolwt

this should return an ajFatal at 'if (! propInit)'. Programs should call
embPropAminoRead with the AjPFile retrieved from the ACD parsing (defaults
to Eamino.dat of course).

See a related note about this method below.


emowse.c

EMOWSE needs to retrieve the file pointer from teh command line and pass
it to embPropAminoRead (AjPFile datafile = ajAcdGetInFile('aadata');
embPropAminoRead(datafile);)



charge.c

Charge has its own reading routines that should be modifed as above.


Further comemnts on molecular weight.
====================================

At the end of embPropCalcMolwt the program adds the molecular weight of
water to the peptide.

Stylistically this is ugly as it is as a float rather than as a defined
constant.

Scientifically it less useful as it eliminates the possibility of terminal
modifications.

I would suggest that this be amended as follows:

embPropCalcMolwt be retained with the same signature but call a new method

embPropCalcmolwtMod(char *s, ajint start, ajint end, double nmass, double
cmass);

using the constants PROPENZN_H and PROPENZC_OH (defined as 1.000 and
17.0153 respectively) so the default remains.

this new method has the content of the original Molwt method but with the
final sum being

return sum + nmass + cmass;

This then lets a program determine the N and C terminal groups.

Various could be defined and allowed as a list in an ACD file, eg
PROPENZN_ACETYL for acetylated N termini

Thoughts and comments?

..d




More information about the emboss-dev mailing list