[Bioperl-l] Re: bio-pipeline schema in

Ewan Birney birney@ebi.ac.uk
Wed, 20 Mar 2002 08:33:36 +0000 (GMT)


On Wed, 20 Mar 2002, Elia Stupka wrote:

> Hello,
> 
> I've just committed our first stab at the schema for the new bio-pipeline.
> I will soon import the bioperl-pipeline CVS module, where we will work on
> the perl-port of the pipeline, trying to please both the generic bioperl
> user who might want to use it to fetch some sequences from GENBANK and run
> them through his pipeline as well as gradually porting some (hopefully)
> all of the ensembl pipeline to this schema.
> 
> The schema is in biosql-schema/sql/biopipelinedb-mysql.sql


Elia - I know this will frustrate you a little bit, but I do think you
should repost this guy also to the obda-l list, so biojava and biopython
guys can have a poke/understand this. Quite how we manage multi-language,
multi-project development and somehow make progress, I'm not sure, but we
should try to make sure that biosql remain cross-project and not just a
"collection of schemas which different Bio* projects use".


Otherwise, this looks fine. (I think). I *still* don't get the
relationships between IO tables and the fact that there are no
RunnableDBs. Hmmmmm.


I guess with Jerm coming over to Hinxton in April this will shake out
then...


> 
> The schema was more or less described the other day, save a few changes.
> We have renamed the IO table to datasource table and we have now added
> another table, because we realised we will have cases where we want to
> take multiple input ids (for example two sequences to crossmatch). So we
> have added the IO table:
> 
> **********
> Table IO
> **********
> IO_id (internal id)
> datasource_id (foreign key to the datasource table which has all the
> locator, adaptor,etc. stuff)
> IO_type (input or output)
> **********
> 
> Then the analysis table keys off this IO table, and has a runnable column
> (instead of module), so via these two keys when the analysis object is
> passed to the runnabledb it knows the input adaptors, the output adaptor
> and the runnable to use.
> 
> The LSFid in the job table has been changed to queue_id since we are
> planning to allow local use as well as PSB,etc.
> 
> All column and table names follow the sane new-style table_id naming
> schema.
> 
> The class column has been removed from both job and input_analysis since
> that is all encapsulated in the datasource table.
> 
> The input table is now made of an internal_id, foreign key to the
> datasource and a name which corresponds to identifier.
> 
> We are going to start coding the bioperl-pipeline modules, and the first
> three test cases we want to get working in the next few months are:
> 
> [preliminary get one simple runnable like repeatmasker to work in the new
> schema] :)
> 
> 1)Have 3 genomes in ensembl schemas and run a pipeline to tblastx all
> against all.
> 
> 2)Generate families between 3 genomes and store clustalw alignments for
> them.
> 
> 3)Generate and store conserved syntenyc regions (using our protein
> ensembl-compara stuff already working) between organisms and run DBA
> (DnaBlockAligner) on the non-coding portions of the conserved segments.
> 
> etc.etc.etc. :)
> 
> Elia
> 
> -- 
> ********************************
> * http://www.fugu-sg.org/~elia *
> * tel:    +65 874 1467         *
> * mobile: +65 90307613         *
> * fax:    +65 777 0402         *
> ********************************
> 
> 
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------