[Bioperl-pipeline] Re: Changes to come (long long mail)

Sun Aug 17 16:50:57 EDT 2003

Hi Shawn,

> est->
> 	Analysis: Run Blast against genome
> 		-> Chain_Output (with filter attached ) && (Output(store blast hit)
> {Optional})
> 			->Analysis(setup_est2genome)
> 	Analysis: Est2Genome-> Output(store gene)
>
>
>   We do not need to have some temporary blast hit database but we can
> still have it stored if we want to by attaching an additional output
> iohandler.

I think this approcsh will be very helpful as our analysis are getting more and
more focused.......

>
> The Guts
> ---------------
>
> What I'm proposing is to have a grouping of rules.
>
> A rule group  means that I will chain a group of analysis in a single
> job.
>
> Sample rule table:
>
> +---------+---------------+---------+------+---------+
> | rule_id | rule_group_id | current | next | action  |
> +---------+---------------+---------+------+---------+
> |       1 |             1 |       1 |    2 | NOTHING |
> |       2 |             2 |       2 |    3 | CHAIN   |
> |       3 |             3 |       3 |    4 | NOTHING   |
> +---------+---------------+---------+------+---------+
>
> Analysis1: InputCreate
> Analysis2: Blast
> Analysis3: SetupEst2Genome
> Analysis4: Est2Genome
>
> So here we have 3 rule groups. Each job will have its own rule group.
>
> For a single est input, it will create 3 jobs during the course of the
> pipeline execution.
> Job 1: Input Create (fetch all ests and create blast jobs)
> Job 2: Blast (blast est against database)
>              Output is chained to Analysis 3 (setup est2genome) using a
> IOHandler of type chain with a blast filter attached
> Job 3:  Run Analysis 4(est2genome) of jobs created by analysis 3
>
> Only between analysis 2 and 3 do chaining occur.
>
> If Job 2 fails, the blast and setup_est2genome analysis will have to be
> rerun.
>
> You could imagine having multiple analysis chained within a rule_group.
>
> I have working code for this.  The next thing that I'm still thinking
> about is to have a stronger
> form of datatype definition between the runnables which is currently
> not too strongly
> enforced . It will be  probably based on Martin's (or Pise or emboss)
> Analysis data
> definition interface. We can either have this information done at the
> runnable layer
> or the bioperl-run wrappers layer or both.
>
> Once this is done, we can have a hierarchical organization of the
> pipelines:
>
> - chaining analysis within rule groups
> - chaining rule groups ( add a rule_group relationship table)(defined
> within 1 xml)
>
> - chaining pipelines(add a meta_pipeline table) which means re-using
> different xmls
> as long as the inputs and outputs of first and last analysis of the
> pipelines match.
>
>
> I would like some help with regards to this application definition
> interface if people are interested or have
> comments...

I would like to chip in.....and maybe after this changes we will have a very
updated version of biopipe with all the things we have done in the past
months...

> sorry for the long mail..if u get to reading to this point.
>
> shawn
>
>

bala

--------------------------------------------------------
This mail was sent through Intouch: http://www.techworx.net/