[Bioperl-pipeline] runnable parameters

Shawn shawnh@fugu-sg.org
13 Sep 2002 08:15:26 +0800


On Fri, 2002-09-13 at 01:02, Elia Stupka wrote:
> > yup the genewise runnable should do the trick. however its debatable
> > how inflated we want the runnable to be..but of course the initial
> > design to have runnables as modules is to allow for things like this.
> 
> I have a feeling that it should take one optional input and that input is
> a simple class that gives it the necessary info... hey wait a minute, that
> class is a SeqFeature with subseqfeatures! It's just lots of
> starts,ends,strands! So we "just" need to figure how to pass it into the
> genewise runnable... we come back to the fact of being able to pass things
> in memory, without writing to database... unless of course we don't write
> all these starts,ends, strands to GFD temporarily....


Essentially, how does one submit a job with inputs to the node without
having the inputs fetched. I can't think of many. 
So this maybe silly.

One option is for pipeline manager or some smart module that bsubs a
DataMonger runner.pl that takes in a file as a parameter. (this file
path is stored in some table) This script then parses the file structure
knowing how to  recreate the object from that minimal info. Then it does
the filtering and passes runs the genewise runnable directly. Of course
it can store the input/job information as well for logging purposes. The
bsubbing module must know how to create each 'file object'...so we need
a object->flat kinda thingy. Akin to input create. Allows more complex
objects to be passed. Start-ends, we can store as cigar strings. 

I know passing files may not be too elegant but thats what runnables are
doing anyway. So some kinda of FeatureI/O where we can do a
write_feature. (GFF?) We can optimize the I/O operations by batching the
a numbers of jobs to a single node. Besides, reading the bioperl-list
it seems that a lot of performance hit is in method calls and a bit on
object construction. (Think GFD/VirtualContig). 

So if a job dies, we still have the file to rerun the job.

of course the silly thing is we are moving the database in to flat
files..a no no right?..maybe GFD is the way to go. comments?

On another note, one thing about GFD is that there are way too many
function calls, object construction, which is why its somewhat slow.
hmm... The FeatureI/O thing should work quite closly with GFD.

lets discuss more.



> > ok..off to bed. good nite!
> 
> Nice joke ;)

oops sorry about that. you ready to enlighten them, 'Sid' ? ;)


shawn