[Open-bio-l] bio-pipeline schema in

Elia Stupka elia@fugu-sg.org
Thu, 21 Mar 2002 11:09:53 +0800 (SGT)


> 	Have you spoken with Alex Rolfe at all?? He just submitted a
> proposal for a Workflow system that we designed and implemented here
> at WI. I think it would be great to try and integrate these two
> projects. he can be contacted at arolfe@genome.wi.mit.edu..

Thanks, sounds very interesting, will mail him now

Elia

> 

> 				Best, 
> 
> 					-B
> 
> -----------------------
> Brian Gilman <gilmanb@genome.wi.mit.edu>
> Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
> One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
> phone +1 617  252 1069 / fax +1 617 252 1902
> 
> 
> On Wed, 20 Mar 2002, Elia Stupka wrote:
> 
> > Hello,
> > 
> > I've just committed our first stab at the schema for the new bio-pipeline.
> > I will soon import the bioperl-pipeline CVS module, where we will work on
> > the perl-port of the pipeline, trying to please both the generic bioperl
> > user who might want to use it to fetch some sequences from GENBANK and run
> > them through his pipeline as well as gradually porting some (hopefully)
> > all of the ensembl pipeline to this schema.
> > 
> > The schema is in biosql-schema/sql/biopipelinedb-mysql.sql
> > 
> > The schema was more or less described the other day, save a few changes.
> > We have renamed the IO table to datasource table and we have now added
> > another table, because we realised we will have cases where we want to
> > take multiple input ids (for example two sequences to crossmatch). So we
> > have added the IO table:
> > 
> > **********
> > Table IO
> > **********
> > IO_id (internal id)
> > datasource_id (foreign key to the datasource table which has all the
> > locator, adaptor,etc. stuff)
> > IO_type (input or output)
> > **********
> > 
> > Then the analysis table keys off this IO table, and has a runnable column
> > (instead of module), so via these two keys when the analysis object is
> > passed to the runnabledb it knows the input adaptors, the output adaptor
> > and the runnable to use.
> > 
> > The LSFid in the job table has been changed to queue_id since we are
> > planning to allow local use as well as PSB,etc.
> > 
> > All column and table names follow the sane new-style table_id naming
> > schema.
> > 
> > The class column has been removed from both job and input_analysis since
> > that is all encapsulated in the datasource table.
> > 
> > The input table is now made of an internal_id, foreign key to the
> > datasource and a name which corresponds to identifier.
> > 
> > We are going to start coding the bioperl-pipeline modules, and the first
> > three test cases we want to get working in the next few months are:
> > 
> > [preliminary get one simple runnable like repeatmasker to work in the new
> > schema] :)
> > 
> > 1)Have 3 genomes in ensembl schemas and run a pipeline to tblastx all
> > against all.
> > 
> > 2)Generate families between 3 genomes and store clustalw alignments for
> > them.
> > 
> > 3)Generate and store conserved syntenyc regions (using our protein
> > ensembl-compara stuff already working) between organisms and run DBA
> > (DnaBlockAligner) on the non-coding portions of the conserved segments.
> > 
> > etc.etc.etc. :)
> > 
> > Elia
> > 
> > -- 
> > ********************************
> > * http://www.fugu-sg.org/~elia *
> > * tel:    +65 874 1467         *
> > * mobile: +65 90307613         *
> > * fax:    +65 777 0402         *
> > ********************************
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Open-Bio-l mailing list
> > Open-Bio-l@open-bio.org
> > http://open-bio.org/mailman/listinfo/open-bio-l
> > 
> 

-- 
********************************
* http://www.fugu-sg.org/~elia *
* tel:    +65 874 1467         *
* mobile: +65 90307613         *
* fax:    +65 777 0402         *
********************************