[Biojava-dev] Introducing the GSoC project

Scooter Willis HWillis at scripps.edu
Mon May 9 19:18:07 UTC 2011


We should avoid hard coding any data values in source and use a "config file" that is loaded by a class with an interface to query the value. This way easy enough to either replace the config file or pass in the data with the proper interace. Very easy to find multiple definitions of mass for atoms depending on who did the weighing!

We should always use double unless memory is a concern which shouldn't be an issue.

From: Andreas Prlic <andreas at sdsc.edu<mailto:andreas at sdsc.edu>>
Date: Mon, 9 May 2011 15:10:59 -0400
To: biojava-dev <biojava-dev at lists.open-bio.org<mailto:biojava-dev at lists.open-bio.org>>
Cc: Chuan Hock Koh <kohchuanhock at gmail.com<mailto:kohchuanhock at gmail.com>>, "Rose, Peter" <pwrose at ucsd.edu<mailto:pwrose at ucsd.edu>>, Scooter Willis <hwillis at scripps.edu<mailto:hwillis at scripps.edu>>
Subject: Re: [Biojava-dev] Introducing the GSoC project

Hi,

Another task for refactoring: The -structure module contains an Element class, which should be moved to -core as well. It also needs an update, since some of the data inside is not on the latest state, e.g atomic mass. A source for up-to-date data is  http://www.iupac.org/publications/pac/83/2/0359/ (Table 2)

>From Peter Rose:

There are two complications with these data:

1.       For 10 elements a range of weights is given. I suggest we use the average here.

2.       The precision of some of the numbers may exceed float. If so, should we use double?
Going forward, we should document the source of the data in the source file.

Andreas



On Mon, May 9, 2011 at 4:13 AM, Scooter Willis <HWillis at scripps.edu<mailto:HWillis at scripps.edu>> wrote:
Ah Fu

Some elements that model the attributes of the amino acids and the protein
should go in core module. The obvious ones are mass and the
physio-chemical attributes of the amino acids. Algorithms that determine a
quantitative value from the attributes that is not absolute finite would
go in the module use are developing. It would help if you can identify
portions of the code that need to go in core as they properly model the
amino acid and protein and would be code that others can use in their
module. I can assist with the code that would go into core where I have
had on my list of things to do a way to properly handle PTM modeling of
amino acids such that the mass and chemical composition would be correct
for those with a mass spectrometry interest. I have code in another
library for a project at work that deals with the PTM issue but would
rather use an external well defined data model for PTM's. Lasts years Gsoc
project had a PTM element for PDB and we may be able to use that as a
model on how to integrate PTM support into core.

Thanks

Scooter

On 5/9/11 6:39 AM, "Chuan Hock Koh" <kohchuanhock at gmail.com<mailto:kohchuanhock at gmail.com>> wrote:

>Hi all,
>
>As Andreas have highlighted, I am one of the student for this year's
>Google
>Summer of Code.
>
>I will be implementing physico-chemical computation of protein sequences
>within the biojava framework. I will soon be embarking on the coding of
>some
>basic properties such as molecular weight, instability index, isoelectric
>point etc.
>
>If you have any suggestions for any properties you would like to be
>implemented, we would love to hear from you. Please take a look at the
>following page for what we currently have in mind.
>http://biojava.org/wiki/GSoC:AAPropertiesComputation
>
>Looking forward to hearing from you guys!
>Ah Fu
>
>
>
>
>On Sat, May 7, 2011 at 10:22 AM, Andreas Prlic <andreas at sdsc.edu<mailto:andreas at sdsc.edu>> wrote:
>
>> Hi -devs,
>>
>> I would like to kick off our work on this year's Google summer of code
>> by introducing  Ah Fu and his project to the -dev list. Welcome Ah Fu
>> and we are looking forward to a fun summer working on this together
>> with you!
>>
>> In case anybody is interested in the project details, Ah Fu has
>> already set up a project page on the wiki:
>> http://biojava.org/wiki/GSoC:AAPropertiesComputation
>>
>> Similar to last year we are planning to track the weekly progress there.
>>
>> The plan is to keep important and high level discussions here on this
>> list, but keep details offline. We are also having a skype call every
>> Thursday at 8 AM PST, in case anybody is interested in joining.
>>
>> Andreas
>>
>
>
>
>--
>http://compbio.ddns.comp.nus.edu.sg/~ChuanHockKoh<http://compbio.ddns.comp
>.nus.edu.sg/~ChuanHockKoh/index.html<http://nus.edu.sg/~ChuanHockKoh/index.html>>
>_______________________________________________
>biojava-dev mailing list
>biojava-dev at lists.open-bio.org<mailto:biojava-dev at lists.open-bio.org>
>http://lists.open-bio.org/mailman/listinfo/biojava-dev




--
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------




More information about the biojava-dev mailing list