[Bioperl-l] Re: Pipeline Input/Output refactoring plan

Michele Clamp michele@sanger.ac.uk
Wed, 13 Mar 2002 16:56:04 +0000 (GMT)


On Wed, 13 Mar 2002, Elia Stupka wrote:

>> We really like the IOAdaptor thing.  I'm not convinced about storing it in
>> the database like that though.  We were thinking more of having it in the
>> analysisprocess table (one input_adaptor column and one output_adaptor
>> column).  Presumably you want to run the same analysis on differently
>> shaped inputs which may come from different databases is that right?
>
>(will try to make sense, I am falling asleep)
>
>Ah, makes sense actually, I had just started a reply, but now I see what
>you mean and it makes total absolute sense, doh!
>
>> Now this has us foxed as we (well me especially) don't really understand
>> the biodb stuff.  Can you explain the reasoning behind this.
>
>Ah, this is just to have maximum flexibility, in many cases some of the
>columns could be null (redundant), but a case when they are all full would
>be:
>
>db_locator: hostuserblablabla-comparadb
>dbadaptor_module: Bio::EnsEMBL::Compara:DBAdaptor
>biodbadaptor_module: Bio::EnsEMBL::Compara:GenomeDB
>biodbname: homo_sapiens_3_26-protein
>IO_adaptor: Bio::EnsEMBL::Compara::DnaFragAdaptor
>IO_adaptor_method: "fetch_by_name"
>
>the same could apply if fetching stuff from bioperl-db, does it make more
>sense now?

Think so yes.  I'm still not clear what the difference is between the
db_locator and the biodbname.


>
>> >The idea is not to leave anything hard coded in the runnable about how it
>> >should fetch its input and write its output.
>> 
>> Agreed - you mean the RunnableDB yes?
>
>Yes.
>
>> I was thinking more of one adaptor per input type i.e. there is no choice
>> of methods.  The number of input types we have is very small.
>
>Nah... here is where we would really like the new design to be radically
>different. You could be having as input a Family object and then run an
>aligner runnable on it, or even more weird stuff. This ties in with making
>it flexible enough for non-genome-wide applications, design tools,etc.

I'm not convinced here that you gain flexibility - you still have to have
code to fetch stuff somewhere.  Your way you have to specify the method
name as well as the module name and in our way we just have one module per
method name.   I'll go away and muse about this some more.

>
>> I'm worried about having to set up even more stuff in a database.  People
>> have enough trouble loading up an analysisprocess table as it is.  I would
>> like people to take a read-only ensembl/biosql database build a runnableDB
>> and point it at that database.  Actually we're ok here thinking about it -
>> apart from the IO_Adaptor table which I don't understand.
>
>I was thinking about that, and I think we could provide conf files or
>built-in types so that tables could get filled, so we have flexibility and
>user-friendliness at the same time. Also, for user-friendlyness type stuff
>we are developing the java pipeline design stuff which would work fine as
>long as we have this flexible system.

The system would work with or without the IOTable I think so I'm happy to
go with this.


>> We have GeneAdaptors and FeatureAdaptors and PredictionAdaptors already
>> and these can be reused.
>
>Ah yah, what I meant was that if we want to store genes in bioperl rather
>than ensembl we need to do something about it, but it's no big deal.

Righto.  

>
>> There are strong noises for branching from some corners here :-)  
>
>That brings me to the other mail... bioperl or ensembl...

:-)

-- 
And so as the stripey-winged owl's genome of Fate 
is decoded by the great sequencer of Time,
and as the big grep of Eternity uses all the cpu of Destiny
I come to the end of the mail.