[Biojava-l] Questions about Summer of Code Project

Andy Yates ayates at ebi.ac.uk
Thu Apr 8 10:23:06 UTC 2010


Hi Singer,

To add a bit more information to Andreas' comments. Java has a very mature concurrent execution library (java.util.concurrent) which was introduced in version 1.5. BioJava is a 1.6 project and so I would expect any multi-concurrent library to be using this. Extensions are available for this most notably the Google guava project, the Actor model found in Scala (with more pure Java implementations available) and the Map/Reduce paradigm first white-papered by Google. The big rules about concurrency are:

* Mutable objects are the work of the devil & should be avoided
* Tasks & Futures are quite lightweight things to produce; threads are not
* Multiple tasks can be given to a queue to be processed by a number of threads in a pool
* Assume a non-linear execution pipeline and attempt to pass messages/jobs into queues when data is processed
* Assume that things will fail
* Write your program with a view to be concurrent; do not force concurrency on an already written program

Concurrent programs are very hard things to write and normally fail because what they attempt to do is too complex or too simple. Getting the balance right is hard but do-able. I can also recommend Brian Goetz's Java Concurrency in Practice (http://www.javaconcurrencyinpractice.com/). 

Andy

On 7 Apr 2010, at 20:30, Andreas Prlic wrote:

> Hi Singer,
> 
>> I had previously sent this, but was not part of the mailing list, so I
>> can only assume it got lost in a spam loop.
> 
> You need to be subscribed in order to be able to post...
> 
>> I was interested in applying for the All-Java Multiple Sequence
>> Alignment Google Summer of Code project.
> 
> Several students have expressed their interest  in this project.
> Depending on how the funding situation will be, at maximum one will be
> able to work on this... There is also a 2nd BioJava related project or
> you could propose your own ideas...
> http://biojava.org/wiki/Google_Summer_of_Code
> 
> 
> I wanted to create a project
>> plan but had some questions about the package as it stands now.
>> 
>> 1. What exactly has changed with the transition to BioJava 3? From
>> what I've read on the BioJava 3 proposal page, it seems like that the
>> changes are to the organization of the code. Additionally there are
>> some new standards to follow. Java 6 usage is desired, but I am unsure
>> of what of the new features could be used in modifying pairwise
>> sequence alignments.
> 
> BioJava is more modular in version 3. There is a new module for
> working with sequences. The current alignment module is still based on
> the old version of BioJava though.
> 
>> 
>> 2. Is the Neighbor Joining Algorithm really the best for this? Are
>> other multiple alignments implementations desired? I have implemented
>> the neighbor joining algorithm very inefficiently in python, it was
>> not particularly difficult.
> 
> NJ is a clustering technique, but there are also others.
> http://en.wikipedia.org/wiki/Neighbor-joining
> Another online lecture that might be useful is:
> http://www.mbio.ncsu.edu/MB451/lecture/trees/lecture.html
> 
> This step seems like it will not take very
>> long. Additionally, parallelism, I have no experience with parallelism
>> in Java and will only have some experience with it in C, will that be
>> an issue?
> 
> I have never written multi threaded code in C, but I would guess it is
> much much easier in Java...
> 
>> 3. Is there a specific paper with the exact algorithm that should be
>> implemented here?
> 
> We have only 3 months for this project so having a modular core
> algorithm that can be extended would be a priority. I recommend
> reading the Clustalw, T-Coffee and Muscle papers.
> 
>> General: Will use cases be provided? Will test data be provided? These
>> would both be useful in coding the test cases which seem to be coded
>> first.
> 
> I can provide plenty of data for that.
> 
> 
>> Additionally, I have access to my current windows machine as well as
>> as Linux machine for testing, but no Mac. While in theory with java,
>> if it works on one, then it works on another, and especially with if
>> it works on Linux, it should be fine on Mac, should I be worried about
>> strange peculiarities?
> 
>> From my experience Java works pretty fine on any platform. There might
> be issues with user interfaces that require testing, but we are not
> going to do  user interfaces here...
> 
> Andreas
> 
> 
>> 
>> Thanks,
>> Singer Ma
>> Harvey Mudd College 2011
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
> 
> 
> 
> -- 
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> Senior Scientist, RCSB PDB Protein Data Bank
> University of California, San Diego
> (+1) 858.246.0526
> -----------------------------------------------------------------------
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/








More information about the Biojava-l mailing list