[Open-bio-l] bio-pipeline schema in
Elia Stupka
elia@fugu-sg.org
Thu, 21 Mar 2002 11:09:53 +0800 (SGT)
> Have you spoken with Alex Rolfe at all?? He just submitted a
> proposal for a Workflow system that we designed and implemented here
> at WI. I think it would be great to try and integrate these two
> projects. he can be contacted at arolfe@genome.wi.mit.edu..
Thanks, sounds very interesting, will mail him now
Elia
>
> Best,
>
> -B
>
> -----------------------
> Brian Gilman <gilmanb@genome.wi.mit.edu>
> Sr. Software Engineer MIT/Whitehead Inst. Center for Genome Research
> One Kendall Square, Bldg. 300 / Cambridge, MA 02139-1561 USA
> phone +1 617 252 1069 / fax +1 617 252 1902
>
>
> On Wed, 20 Mar 2002, Elia Stupka wrote:
>
> > Hello,
> >
> > I've just committed our first stab at the schema for the new bio-pipeline.
> > I will soon import the bioperl-pipeline CVS module, where we will work on
> > the perl-port of the pipeline, trying to please both the generic bioperl
> > user who might want to use it to fetch some sequences from GENBANK and run
> > them through his pipeline as well as gradually porting some (hopefully)
> > all of the ensembl pipeline to this schema.
> >
> > The schema is in biosql-schema/sql/biopipelinedb-mysql.sql
> >
> > The schema was more or less described the other day, save a few changes.
> > We have renamed the IO table to datasource table and we have now added
> > another table, because we realised we will have cases where we want to
> > take multiple input ids (for example two sequences to crossmatch). So we
> > have added the IO table:
> >
> > **********
> > Table IO
> > **********
> > IO_id (internal id)
> > datasource_id (foreign key to the datasource table which has all the
> > locator, adaptor,etc. stuff)
> > IO_type (input or output)
> > **********
> >
> > Then the analysis table keys off this IO table, and has a runnable column
> > (instead of module), so via these two keys when the analysis object is
> > passed to the runnabledb it knows the input adaptors, the output adaptor
> > and the runnable to use.
> >
> > The LSFid in the job table has been changed to queue_id since we are
> > planning to allow local use as well as PSB,etc.
> >
> > All column and table names follow the sane new-style table_id naming
> > schema.
> >
> > The class column has been removed from both job and input_analysis since
> > that is all encapsulated in the datasource table.
> >
> > The input table is now made of an internal_id, foreign key to the
> > datasource and a name which corresponds to identifier.
> >
> > We are going to start coding the bioperl-pipeline modules, and the first
> > three test cases we want to get working in the next few months are:
> >
> > [preliminary get one simple runnable like repeatmasker to work in the new
> > schema] :)
> >
> > 1)Have 3 genomes in ensembl schemas and run a pipeline to tblastx all
> > against all.
> >
> > 2)Generate families between 3 genomes and store clustalw alignments for
> > them.
> >
> > 3)Generate and store conserved syntenyc regions (using our protein
> > ensembl-compara stuff already working) between organisms and run DBA
> > (DnaBlockAligner) on the non-coding portions of the conserved segments.
> >
> > etc.etc.etc. :)
> >
> > Elia
> >
> > --
> > ********************************
> > * http://www.fugu-sg.org/~elia *
> > * tel: +65 874 1467 *
> > * mobile: +65 90307613 *
> > * fax: +65 777 0402 *
> > ********************************
> >
> >
> >
> >
> > _______________________________________________
> > Open-Bio-l mailing list
> > Open-Bio-l@open-bio.org
> > http://open-bio.org/mailman/listinfo/open-bio-l
> >
>
--
********************************
* http://www.fugu-sg.org/~elia *
* tel: +65 874 1467 *
* mobile: +65 90307613 *
* fax: +65 777 0402 *
********************************