[Open-bio-l] bio-pipeline schema in

Brian Gilman gilmanb@genome.wi.mit.edu
Wed, 20 Mar 2002 09:52:41 -0500 (EST)


Hello Elia,

	Have you spoken with Alex Rolfe at all?? He just submitted a
proposal for a Workflow system that we designed and implemented here at
WI. I think it would be great to try and integrate these two projects. he
can be contacted at arolfe@genome.wi.mit.edu..

				Best, 

					-B

-----------------------
Brian Gilman <gilmanb@genome.wi.mit.edu>
Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
phone +1 617  252 1069 / fax +1 617 252 1902


On Wed, 20 Mar 2002, Elia Stupka wrote:

> Hello,
> 
> I've just committed our first stab at the schema for the new bio-pipeline.
> I will soon import the bioperl-pipeline CVS module, where we will work on
> the perl-port of the pipeline, trying to please both the generic bioperl
> user who might want to use it to fetch some sequences from GENBANK and run
> them through his pipeline as well as gradually porting some (hopefully)
> all of the ensembl pipeline to this schema.
> 
> The schema is in biosql-schema/sql/biopipelinedb-mysql.sql
> 
> The schema was more or less described the other day, save a few changes.
> We have renamed the IO table to datasource table and we have now added
> another table, because we realised we will have cases where we want to
> take multiple input ids (for example two sequences to crossmatch). So we
> have added the IO table:
> 
> **********
> Table IO
> **********
> IO_id (internal id)
> datasource_id (foreign key to the datasource table which has all the
> locator, adaptor,etc. stuff)
> IO_type (input or output)
> **********
> 
> Then the analysis table keys off this IO table, and has a runnable column
> (instead of module), so via these two keys when the analysis object is
> passed to the runnabledb it knows the input adaptors, the output adaptor
> and the runnable to use.
> 
> The LSFid in the job table has been changed to queue_id since we are
> planning to allow local use as well as PSB,etc.
> 
> All column and table names follow the sane new-style table_id naming
> schema.
> 
> The class column has been removed from both job and input_analysis since
> that is all encapsulated in the datasource table.
> 
> The input table is now made of an internal_id, foreign key to the
> datasource and a name which corresponds to identifier.
> 
> We are going to start coding the bioperl-pipeline modules, and the first
> three test cases we want to get working in the next few months are:
> 
> [preliminary get one simple runnable like repeatmasker to work in the new
> schema] :)
> 
> 1)Have 3 genomes in ensembl schemas and run a pipeline to tblastx all
> against all.
> 
> 2)Generate families between 3 genomes and store clustalw alignments for
> them.
> 
> 3)Generate and store conserved syntenyc regions (using our protein
> ensembl-compara stuff already working) between organisms and run DBA
> (DnaBlockAligner) on the non-coding portions of the conserved segments.
> 
> etc.etc.etc. :)
> 
> Elia
> 
> -- 
> ********************************
> * http://www.fugu-sg.org/~elia *
> * tel:    +65 874 1467         *
> * mobile: +65 90307613         *
> * fax:    +65 777 0402         *
> ********************************
> 
> 
> 
> 
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l@open-bio.org
> http://open-bio.org/mailman/listinfo/open-bio-l
>