[Biojava-dev] SVN

Tue Feb 10 18:20:09 UTC 2009

OK wasn't sure if that was right. But I've copied this back to it so
it's out in the open now. :)

cheers,
Richard

Andreas Prlic wrote:
> Hi Guys,
> 
> May I suggest to move any general discussions to the biojava-dev
> mailing list? I think these issues are interesting also for other
> people out there...
> 
> Cheers,
> Andreas
> 
> 
> On Tue, Feb 10, 2009 at 8:57 AM, Richard Holland
> <holland at eaglegenomics.com> wrote:
>> The read-only subversion repository details are at:
>>
>>  http://www.biojava.org/wiki/CVS_to_SVN_Migration
>>
>> You can browse the repository at:
>>
>>  http://code.open-bio.org/svnweb/index.cgi/biojava
>>
>> All the code is in various branches underneath biojava-live. The other
>> trees are random bits and bobs, the only one that gets regularly used is
>> the bytecode tree.
>>
>> Generally our policy is that the first few commits should just be emails
>> of patched code sent to an existing developer, then after a few of those
>> when we've got confidence in the code quality we'll give you direct
>> commit access.
>>
>> I have to be up front here - I actually mooted to the other guys at
>> BioJava just two weeks ago that I was ready to move on and hand over
>> control of the project, as I simply don't have the time to dedicate to
>> it any more (plus I don't actually use it myself in any of my current
>> paid work). So far there have been no takers!
>>
>> It does sound like you've given the project some serious thought, much
>> more than I have myself in recent months. I'm not that knowledgeable in
>> the specific areas you're talking about and don't feel I'm the best
>> person to make the judgement, so I've copied the other three main
>> contributors on this email for their comments. I'm happy to go with
>> whatever they decide.
>>
>> cheers,
>> Richard
>>
>> Scooter Willis wrote:
>>> Richard
>>>
>>> I have access to dev.open-bio.org <http://dev.open-bio.org> but did not
>>> get any subversion details. Do you only have one source tree or multiple
>>> versions? Can you provide any details on the policies or the
>>> methodologies for doing code updates etc?
>>>
>>> I was also giving BioJava3 some thought where based on what I read it
>>> looks like the goal is to do some major refactoring/clean up/better
>>> overall design. I think one theme that would really drive good design
>>> and clean code is forcing every computational intensive task to be grid
>>> aware. It would be like doing MPI but instead take into consideration
>>> parallelization from the beginning which makes it much easier. Amazon
>>> EC2 is the best game in town and others will follow but Amazon is
>>> sponsoring development projects with $100,000+ backing on an annual
>>> basis. They want applications that run on their grid. Could be a way to
>>> get development sponsored. Could also be a way to raise money for
>>> biojava as a percentage of runtime for those who want to use biojava on
>>> a cluster but don't want to go through the hassle/learning curve of
>>> deploying virtual machines. BioJava Webservices hosted by Eagle Genomics.
>>>
>>> I had always hoped that gigaspaces would catch on but the api is very
>>> specific to gigaspaces. GridGain is also an interesting option where
>>> they use annotations but you quickly get into gridgain specific code.
>>> http://www.terracotta.org/ looks to be a very interesting option where
>>> it extends at the VM level threads and can move them to other machines
>>> and the memory model is the same across all instances. This way you can
>>> still code BioJava3 to take advantage of threads so it will run on a
>>> dual quad core machine or if you run on a cluster you can scale to 100+
>>> machines by launching the application with the terracotta batch file.
>>> Same code works in both type of deployments.
>>>
>>> For the tree code I put together running against 1000+ HIV genome
>>> sequences took 5 minutes on a single CPU. The code could easily be setup
>>> for all pairwise distance comparisons to be run in parallel based on the
>>> number of work threads to fire up as a hint. If I wanted to use the quad
>>> processor on my box then I would indicate 4 for number of threads to use
>>> and the job would finsih in a minute+. Use terrracotta and select grid
>>> option with 100 worker threads and it would farm everything out with out
>>> any additional work on the part of the user.
>>>
>>> You also have some interesting uses of the newer NVIDIA GPU and CUDA for
>>> super fast matrix/vector calculations that should be considered in
>>> BioJava3 design/vision.
>>>
>>> Everything should also have a web services component which is fairly
>>> easy to do by creating a servlet to wrap an api call and then use
>>> annotations to generate the WSDL for it.
>>>
>>> If you place these constraints on BioJava3 the code should be fairly
>>> clean and well designed with a great deal of added functionality in the
>>> ability to scale.
>>>
>>> Thanks
>>>
>>> Scooter
>> --
>> Richard Holland, BSc MBCS
>> Finance Director, Eagle Genomics Ltd
>> M: +44 7500 438846 | E: holland at eaglegenomics.com
>> http://www.eaglegenomics.com/
>>
> 

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/