[Bioperl-l] tree building, analysis interfaces (long)

Shawn shawnh@fugu-sg.org
18 Sep 2002 23:39:02 +0800


Hi Jason,
	great mail.

On Wed, 2002-09-18 at 22:52, Jason Stajich wrote:
> Shawn/Elia -
> 
> Can we try and agree on a standard for tree building interfaces? - I'd
> like to respect the stream nature of some tree building -- parsimony may
> produce multiple trees and go with the next_tree.  This is not criticism
> so much as a plea for us to move in the same direction so these object can
> really be generic building blocks, if you have better ideas feel free to
> provide counter-arguments.

I agree, the Phylip wrappers could be refactored. I like the way you
have done it. I sneaked a peek into some of the Bio::Tools::Phylo::PAML
code and boy, I like some of the neat iterator functions that u
implemented. 

> 
> I'd suggest we use the Bio::Factory::TreeFactoryI interface which Molphy
> and PAML result objects use.  create_tree is a little restrictive in that
> multiple trees can be produced by a method and it couples the running and
> the parsing in the same method which may not work so well in some systems.

Agree. Of course, I hadn't split the parser and wrapper for Phylip as
they essentially use TreeIO for that without needing any further
parsing. But we definitely want a common interface for these modules and
TreeFactory looks good.


> In case you get lost reading all of the below, basically I'd like to see
> us go to the Result object which implements the TreeFactory interface and
> has a next_tree method.  My vision of how execs should happen is
> 
> # setup object, init variables
> $obj->parameters({ .. } );
> 
> # run the app, get back a result status and result object
> my ($rc,$result) = $obj->run();
> 
> while( my $tree = $result->next_tree ) {
> }

cool. One thing about iterators,and this is an aside, in terms of the
pipeline, I think I mentioned to u before that fetching of objects
are done through datahandlers which are essentially method calls.
We don't really support "while" loops in the pipeline and it really
doesn't make sense to do so. Thus is it worth having a get_all_xxx
method as an alternative for the xxxIO modules? Elia what do u think?


> something up if this is too confusing, but let me know if you think this
> seems sane?

Very. I will find time to look at this if and when things settle down
here. We aren't using these modules that much right now but I expect so
once we start doing the ortholog pipelines. Feel free to modify as u
please :)

> 
> Incidently, if those module names look really long and scary to you - I
> hope that the work that Martin is doing to build a general AnalysisFactory
> object will allow us to do (names subject to change of course)
> 
> my $factory = new Bio::Tools::Run::AnalysisFactory(-type => 'local');
> 
> my $protml = $factory->program('protml'); # or some other coding
> 
> $protml->parameter( .. );
> my ($rc,$results) = $protml->run();
> 
> AND substitute 'local' for 'ebi.ac.uk:novella', or 'pasteur.fr:pise' (or
> something more GRID-like) to run these analysis in their compute queues.
> Hence the need for fairly standard and simple interfaces to the
> applications and hiding of all the details in a result object IMHO.

Really looking forward to this.


shawn