[Biojava-dev] phylo code

Thasso Griebel thasso at minet.uni-jena.de
Tue Aug 7 08:29:08 UTC 2007


Hi,

On the fire/event thing. You are right, it is not a good idea to  
change this if the code is already released. But maybe a little  
helper class with some static methods ( like IOTools ) would help.  
Something like PhylipIO.readTree(File) or PhylipIO.readAlignment 
(File) might do the job, the interfaces do not have to change and  
when it avoids direct contact with the parser implementation as long  
as standard phylip dataypes are concerned. And, as far as I see, this  
would follow the main path IO is done in BioJava.

I can ( and will ) definitely work on the  short-name map, but this  
has to wait at least 5 week as I am going on vacation ( yeah, 5 weeks  
without a computer nearby !!!). But currently I am really interested  
in a good phylip parser, so I will work on that.

One thing about JGraphT. I used and use that a lot when doing  
phylogenetics, and its a great library. But you might consider adding  
a little interface structure on top at some point to decouple the the  
usage from the library. This would allow BioJava compatible tree  
implementations without the need of JGraphT.  JGraphT is great for  
all the simple, and even the complex jobs, but for me there is one  
little drawback. You can fill in every Object as a graph vertex,  
which is great, but this makes the underlying implementation rather  
complex. I do not remember exactly, but I think they use a large set  
of HashMap structures and some complex mapping mechanism to build up  
a graph. There are situations where this makes it hard to extend. For  
example, for one of our supertree algorithm implementations we needed  
a MultiGraph, a graph with different edge types, and it was a real  
pain to add that to the library without reimplementing JGraphT's  
Graph interface completely. I think  a Tree interface structure on  
top would be simple and easy and might be really helpful at some point.

regards,
thasso

On 07.08.2007, at 09:48, Richard Holland wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Thanks for your feedback Thasso.
>
> The fire/events thing is certainly a misnomer - guilty as charged (I
> wrote the code...) - but I suppose I wasn't expecting the naming to
> matter much. I'll bear that in mind for future code. We can't really
> change the existing interfaces now as they've been released and it is
> not nice to users for us to change public interfaces that might  
> already
> be in use.
>
> The PHYLIP format handler was written by Jim Balhoff. Jim - do you  
> have
> any responses to Thasso's comments about the output options?
>
> I like the sound of your PHYLIP short-name map. You could  
> definitely go
> ahead and contribute an update which implemented that. (Don't  
> forget to
> make your code clear the map between one file and the next!)
>
> The tree model never materialised in the end. We chose instead to use
> the external JGraphT package to represent our trees. The exact way in
> which this is done will be documented by Boh-Yun as part of the GSOC
> project.
>
> cheers,
> Richard
>
>
> Thasso Griebel wrote:
>> Hi,
>>
>> thanks for the fast answer.
>>
>> So, I would really like to contribute to the BioJava Phylogeny
>> Package, but I will wait until the GSOC project is finished and I am
>> back from my vacation ;) For now, I just took a look on the io
>> package, especially the three classes concerning phylip io.
>> Some personal feeling about the naming.
>> I wouldn't mention "fire" and "events" in the documentation and
>> rename PHYLIPFileListener to something else, i.e. PHYLIPFileHandler.
>> For me, this was irritating. I started looking for a fireXXX method
>> immediately and automatically started  thinking about the
>> EventDispatchingThread and synchronization. As far as I see, this is
>> a direct handler delegate, the parser delegates some method calls to
>> a given handler callback method, no events are fired (there is not
>> even an Event class that could be fired) and no event dispatching is
>> involved.
>> Also, the io seem to focus on the output of alignment. What about
>> trees and distance/character matrices ?
>> And a really important point is the name handling. The method
>>
>>   private static String formatSequenceLabel(String label)
>>
>> in PHYLIPFileFormat simply cuts names that are longer than 10
>> characters. I know that phylip has this stupid  limitation. But
>> cutting will not work in a lot of cases. If just two sequence names
>> are identical over the first ten characters,  the resulting phylip
>> file is corrupted and can not be used by most of the phylip
>> executables. Maybe the handler can do a renaming and keep track of
>> the map. Then one can use the parser to write some phylip file, start
>> a phylip method on that file, read in the results and do an automatic
>> remapping of the names ?
>>
>> I also wanted to take a look on the tree model. Is this already in
>> the CVS ? I found biojava-live/src/org/biojavax/bio/phylo/tree, but
>> it was empty ?
>>
>> regards,
>>
>> thasso
>>
>>
>>
>> On 01.08.2007, at 20:03, Richard Holland wrote:
>>
>>> Hi Thasso.
>>>
>>> Thanks for your interest! We always welcome all contributions of
>>> code and
>>> documentation to the BioJava project.
>>>
>>> Our GSOC project is well under way and has a plan as you have  
>>> already
>>> noticed. The items on that plan are what was agreed with our
>>> student, Boh
>>> Yun Lee, and will be implemented by the time that GSOC is over  
>>> (early
>>> September) - or at least, most of them will, and any that aren't
>>> will be
>>> flagged as such.
>>>
>>> If you have suggestions to make about code already committed, we  
>>> would
>>> welcome them. Such feedback is a necessary part of the development
>>> cycle.
>>> Likewise we would welcome contributions of code not already
>>> mentioned as
>>> being planned on the GSOC pages.
>>>
>>> However, if your existing code implements something that is on the
>>> to-do
>>> list on the GSOC page, then the best thing would be to hang on till
>>> the
>>> end of the GSOC project then contribute any items that did not get
>>> implemented.
>>>
>>> cheers,
>>> Richard
>>>
>>> On Wed, August 1, 2007 6:19 pm, Thasso Griebel wrote:
>>>> hi,
>>>>
>>>> i recently explored the phylogeny code base in biojava. I am
>>>> currently working on a phylogeny framework similar to mesquite
>>>> (http://www.bio.informatik.uni-jena.de/epos) and i would like to  
>>>> add
>>>> some functionality using biojava (the sequence stuff and the nexus
>>>> parser).
>>>>
>>>> I saw the PhyloSOC07 page ( http://biojava.org/wiki/
>>>> BioJava:PhyloSOC07 ) and i might be able to help out with some  
>>>> of the
>>>> methods, so is there a way to contribute ? Or is this closed to
>>>> participants of the google summer of code ?
>>>>
>>>> I have code for n-consensus and adams consensus methods and some
>>>> possible extensions for UPGMA (provide other distance computations,
>>>> i.e wpgma, sigle linkage and complete linkage) and NJ ( outgroup  
>>>> and
>>>> rooting support ). As our research groups has a focus on supertree
>>>> construction, we have java implementations for several graph based
>>>> supertree methods (aho's build, mincut, modified mincut, ranked  
>>>> tree,
>>>> ancestral build).
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Thasso
>>>>
>>>> --
>>>> Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer
>>>> Bioinformatik
>>>> Office 3426--http://bio.informatik.uni-jena.de--Institut fuer
>>>> Informatik
>>>> Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-
>>>> Universitaet Jena
>>>> Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena,
>>>> Germany
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>>
>>> -- 
>>> Richard Holland
>>> BioMart (http://www.biomart.org/)
>>> EMBL-EBI
>>> Hinxton, Cambridgeshire CB10 1SD, UK
>>
>> --
>> Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer  
>> Bioinformatik
>> Office 3426--http://bio.informatik.uni-jena.de--Institut fuer  
>> Informatik
>> Phone +49 (0)3641 9-46454-----------Friedrich-Schiller- 
>> Universitaet Jena
>> Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena,  
>> Germany
>>
>>
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGuCO04C5LeMEKA/QRAritAJ0eFpRIQ7YP/NROSfoToo/+4aTfEwCfbFUU
> Hbne6alMOuzmr8CxX/hqsfs=
> =U4gJ
> -----END PGP SIGNATURE-----

--
Dipl. Inf. Thasso Griebel-------------------Lehrstuhl fuer Bioinformatik
Office 3426--http://bio.informatik.uni-jena.de--Institut fuer Informatik
Phone +49 (0)3641 9-46454-----------Friedrich-Schiller-Universitaet Jena
Fax +49 (0)3641 9-46452----------Ernst-Abbe-Platz 2, 07743 Jena, Germany






More information about the biojava-dev mailing list