[Biojava-l] GSoC:AAPropertiesComputation Updates

Scooter Willis HWillis at scripps.edu
Tue Jul 12 11:37:32 UTC 2011


Chuan

Currently for Nucleotides and Amino Acids we used a single instance of
each when building a ProteinSequence or DNASequence to minimize memory. We
will need to come up with an approach that allows an amino acid to have a
modified mass via a PTM designation that takes care of creating/replacing
the amino acid with a new instance. This is complicated so it may be
better to simply work from an expanded list of amino acids with known
PTMs. We also need a way to describe a PTM as part of a simple string so
that proper ProteinSequence is created and will still work in code that
doesn't understand that a PTM is represented. Sequence alignment as an
example. 


Each PTM type should have a short and long name. In the following example
for possible/known PTMs of Cysteine(not a complete list) forces the PTMs
to be a child of the amino acid. The short name and name should come from
well established Mass Spec services such as Mascot.

This way we should be able to create a ProteinSequence with a PTM by doing
this ProteinSequence ps = new ProteinSequence("DE[capC]CK"); The parser
will need to change to support parsing PTMs. I have included the
compounds.xml file that I used for an internal project for code that finds
peptides in mass spec files. Once all this is in place should be easy
enough to add in code to get the mass of a protein sequence. Andy Yates
put the code together that loads the current amino acid definitions from a
config file. We should probably look at modifying that process to support
loading a more complete list of possible amino acids and the properties.

Is the plan to load the physio-chemical properties as well from an XML
file? Are the code changes for amino acid properties going into Core? I
realized last week that an old email I turned off was getting emails from
Biojava-l so I have been out of the loop on current progress.

Thanks

Scooter 

<amino_acid symbol='C' short='Cys' name='Cysteine'>
	<molecular_formula>
		<element symbol='C' count='3'/>
		<element symbol='H' count='5'/>
		<element symbol='N' count='1'/>
		<element symbol='O' count='1'/>
		<element symbol='S' count='1'/>
	</molecular_formula>
	<modifications>
		<modification symbol='U' short='capC' name='Carboxamidylmethyl Cysteine'
>
			<molecular_formula>
				<element symbol='C' count='5'/>	
				<element symbol='H' count='8'/>
				<element symbol='N' count='2'/>
				<element symbol='O' count='2'/>
				<element symbol='S' count='1'/>
			</molecular_formula>
		</modification>
		<modification symbol='Z' short='pmC' name='Palmytoylated Cysteine' >
		<molecular_formula>
			<element symbol='C' count='19'/>
			<element symbol='H' count='35'/>
			<element symbol='N' count='1'/>
			<element symbol='O' count='2'/>
			<element symbol='S' count='1'/>
		</molecular_formula>
			</modification>
		<modification symbol='$' short='dsbC' name='Disulfide-bonded Cysteine' >
		<molecular_formula>
			<element symbol='C' count='3'/>
			<element symbol='H' count='4'/>
			<element symbol='N' count='1'/>
			<element symbol='O' count='1'/>
			<element symbol='S' count='1'/>
		</molecular_formula>
		</modification>
	</modifications>
        </amino_acid>


 



On 7/11/11 11:36 PM, "Chuan Hock Koh" <kohchuanhock at gmail.com> wrote:

>Hi all,
>
>I am a student for this year Google Summer of Code. We are working on
>developing tools and APIs for the calculation of physico-chemical
>properties
>via BioJava.
>
>We have come to an end of the second phase of this project. In the first
>two
>phase of the projects, we have developed APIs, test the APIs and also
>wrote
>up documentation (in Cookbook of BioWiki under "Physico-Chemical Proerties
>Computation").
>
>We are writing this email to invite for comments/suggestions on the
>project.
>The next few phases of the project can be found here (
>http://biojava.org/wiki/GSoC:AAPropertiesComputation#Timeline).
>
>Previously, it was suggested in this mailing list that we should allow
>users
>to define the mass of amino acids instead of always using the standard
>mass.
>We have done so and have documentation on how to do it here (
>http://biojava.org/wiki/BioJava:CookBook:AAPROP:xmlfiles).
>
>However, we face a problem here regarding the definition of modified amino
>acids. That is in the class of ProteinSequence, it ensures a symbol must
>be
>define in AminoAcidCompoundSet. This would limit the possibilities in
>defining the symbol for modified amino acids.
>
>
>Looking forward to hear from you guys,
>
>AAPropertiesComputation Team
>Student: Chuan Hock KOH
>Mentors: Peter Troshin & Andreas Prlic
>
>-- 
>http://compbio.ddns.comp.nus.edu.sg/~ChuanHockKoh<http://compbio.ddns.comp
>.nus.edu.sg/~ChuanHockKoh/index.html>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biojava-l

-------------- next part --------------
A non-text attachment was scrubbed...
Name: compounds.xml
Type: application/xml
Size: 10760 bytes
Desc: compounds.xml
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20110712/13137d1a/attachment.wsdl>


More information about the Biojava-l mailing list