[Biojava-l] Questions about Summer of Code Project

Andreas Prlic andreas at sdsc.edu
Wed Apr 7 19:30:19 UTC 2010


Hi Singer,

> I had previously sent this, but was not part of the mailing list, so I
> can only assume it got lost in a spam loop.

You need to be subscribed in order to be able to post...

> I was interested in applying for the All-Java Multiple Sequence
> Alignment Google Summer of Code project.

Several students have expressed their interest  in this project.
Depending on how the funding situation will be, at maximum one will be
able to work on this... There is also a 2nd BioJava related project or
you could propose your own ideas...
http://biojava.org/wiki/Google_Summer_of_Code


 I wanted to create a project
> plan but had some questions about the package as it stands now.
>
> 1. What exactly has changed with the transition to BioJava 3? From
> what I've read on the BioJava 3 proposal page, it seems like that the
> changes are to the organization of the code. Additionally there are
> some new standards to follow. Java 6 usage is desired, but I am unsure
> of what of the new features could be used in modifying pairwise
> sequence alignments.

BioJava is more modular in version 3. There is a new module for
working with sequences. The current alignment module is still based on
the old version of BioJava though.

>
> 2. Is the Neighbor Joining Algorithm really the best for this? Are
> other multiple alignments implementations desired? I have implemented
> the neighbor joining algorithm very inefficiently in python, it was
> not particularly difficult.

NJ is a clustering technique, but there are also others.
http://en.wikipedia.org/wiki/Neighbor-joining
Another online lecture that might be useful is:
http://www.mbio.ncsu.edu/MB451/lecture/trees/lecture.html

This step seems like it will not take very
> long. Additionally, parallelism, I have no experience with parallelism
> in Java and will only have some experience with it in C, will that be
> an issue?

I have never written multi threaded code in C, but I would guess it is
much much easier in Java...

> 3. Is there a specific paper with the exact algorithm that should be
> implemented here?

We have only 3 months for this project so having a modular core
algorithm that can be extended would be a priority. I recommend
reading the Clustalw, T-Coffee and Muscle papers.

> General: Will use cases be provided? Will test data be provided? These
> would both be useful in coding the test cases which seem to be coded
> first.

I can provide plenty of data for that.


> Additionally, I have access to my current windows machine as well as
> as Linux machine for testing, but no Mac. While in theory with java,
> if it works on one, then it works on another, and especially with if
> it works on Linux, it should be fine on Mac, should I be worried about
> strange peculiarities?

>From my experience Java works pretty fine on any platform. There might
be issues with user interfaces that require testing, but we are not
going to do  user interfaces here...

Andreas


>
> Thanks,
> Singer Ma
> Harvey Mudd College 2011
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------




More information about the Biojava-l mailing list