[Bioperl-pipeline] Re: biopipe questions

Fri Feb 21 12:26:51 EST 2003

> I see the pull mechanism function this way: The user selects where 
> (what node in the pipeline) the results (final data) should appear.  
> It is actually possible then to execute a *partial* pipeline.

Hmmm... seems to me you could get that way only to a partial pipeline 
that still ends in the same place, we are actually working towards 
models that can be partial either way, and where data can be stored at 
each step of the pipeline, not just at the end...

> I can point you to some of the discussions we've had about that if 
> you'd like to hear more.

Sure

> Do you use BioPipe on a cluster to pass data between cluster CPUs 
> (treating each CPU in a cluster as a node in the pipeline, i.e., 
> "serial computing"), or are you talking about "mearly" doing parallel 
> processing within the context of a single pipeline node (e.g., one 
> node is BLAST that happens to run on a Beowulf cluster)?

"Merely" classic parallel computing, which is how we have been doing 
genome annotation, etc, by distributing a gazillion jobs over the 
cluster.

> It would actually be great to be able to work with the data inside of 
> MySQL.  In Piper, we wanted to use DOM for everything, and rely on XML 
> only for filesystem storage.  This is because, as you know, it is very 
> time consuming to read/write from/to a flat file for every change in 
> the pipeline.

As I say you can use both, though in reality reading/writing XML takes 
1/10000th of the time of running a pipeline, so we are not too bothered 
with it.

> Perhaps, though, the firewall doesn't mean that the front-end can't 
> use MySQL.  We could borrow the code that you use to work with 
> everyting in MySQL (that means writing the front-end in Perl I guess) 
> and then generate the XML and pass it on to the back-end (BioPipe).

Yes

> For us at BiO, that part may mean using PHP/MySQL, as that is how we 
> do session management.  AFAIK, you can't (easily) mix 2 scripting 
> languages (e.g., Perl and PHP) in the same file.

yep definitely. We have no issue with using other languages though 
because of our PERL bias you are likely to get more co-development, 
coding hands and help if you have more PERL.

> I also have a question about how BioPipe handles discovery and 
> awareness.  How does BioPipe "know" what resources are available on 
> the network?  Is there a directory service?

Not at the moment, though the idea is to play well with BioMOBY being 
developed here at the Biohackathon, to take care of that (not done yet)

> And how does it handle nodes coming and going (e.g., what happens when 
> data is sent to a node that subsequently goes offline)?  Perhaps you 
> rely on the external scheduler for such things?

We have two failover systems. One is built into the scheduler, LSF, 
where you can set a retry count and it keeps trying different nodes 
until it works. On top of that we also have a retry system in BioPipe, 
so either will be used, thus allowing you to have it even if you don't 
want to use the scheduler for that.

> I thought you mentioned a BioHackathon happening a couple months ago. 
> Is this a different one?

No this is the one, might have mentioned it at the planning stage ;)

Elia