[Bioperl-pipeline] new feature

shawnh@worf.fugu-sg.org shawnh@worf.fugu-sg.org
Tue, 15 Oct 2002 01:16:24 +0800 (SGT)


Hi Kiran,
	
> 
> After you have started running the pipeline, somewhere downstream or after
> you have finished running, you realise that you have screwed up in some
> analysis parameters (which could be anywhere in the flow of the pipeline)
> and you need to rerun the pipeline from that stage onwards (because you
> wouldnt want to re-run the entire pipeline).
> This could easily happen lot of times (like during our ciona-annotation, we
> got over it as we could meddle with the pipeline database, but we should get
> some better way)

Something I have thought a little about. Its not hard to have an option where
you can put a hold/pend on an analysis while the user checks the data before
allowing the pipeline to proceed.

> 
> In terms of pipeline implementation, I can see the general abstraction as
> you are able to modify the analysis somehow (script etc) and the pipeline
> should be able to run from any particular analysis to another particular
> analysis (some portion of the entire flow). Not sure currently what it
> invloves to implement (may involve moving some of the completed jobs back to
> job table, and more)
> but before that, we need to evolve a correct spec for this feature and one
> of us can look into this.


The general problem does not lie within the pipeline itself. It has to
do with the storing of the output and the abililty to rollback. As we aim to
be flexible, to be able to write to any database, the inherent problem is
that say I have run half my genewises, then at some point I discover a coordinate
error. To rerun the analysis is no problem. But you have in your output database gene/exons
supporting features many tables that are filled with one store, something
transparent to the pipeline. So I would imagine you need a clear interface that your adaptors
must adhere to. For every store method, you need a unstore/delete that will rollback cleanly.
So if we say we fix on a couple of schemas (GFD/Ensembl currently) and develop a adaptor interface
for that, then when we want to rerun some analysis, we can call the appropriate methods to remove
teh stored output.


I can't really see how we can do it without enforcing this interface..

others?



shawn

> 
> so i am throwing this up for discussion.. your comments...
> 
> 
> kiran
> 
> 
> _______________________________________________
> bioperl-pipeline mailing list
> bioperl-pipeline@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-pipeline
> 

-- 
********************************
* Shawn Hoon
* http://www.fugu-sg.org/~shawnh
********************************