[BioSQL-l] Affymetrix SQL for PostgreSQL

Allen Day allenday at ucla.edu
Thu May 1 16:54:02 EDT 2003


Hi,

Things are going well with the Chado/RAD merger.  So far I've managed to 
port the table and view create statements from Oracle over to PostgreSQL, 
and the table creates are also portable to MySQL using SQL::Translator.

I loaded some of the Affymetrix MAGE-ML files of all the database
crossreference info for their probesets last week.  This week I've started 
to gather our protocol data, which is prerequisite to loading any real 
data.

So... I can't give you any opinion as to how I've found the RAD schema to
be from a data analyst's point of view yet.  From the loading and schema
porting experience I've had so far though, it seems that both the Chado
and RAD teams have put a lot of thought into creating clear schemata.

Hopefully within a month or so I'll have some expression values loaded
into Chado/RAD and will be starting to use the db for analysis, and can
give some better feedback.

> The way I could envision a different design of a gene expression model
> in BioSQL is as a warehouse star-schema, where there'd be essentially
> one (or very few) analytical data tables, and all the rest is hosted by
> the existing biosql tables (i.e., mostly the term table). It would be
> understood then that people would host their expression data in another
> schema, and the biosql table(s) would be used as a warehouse only.

Ah, okay.  You could certainly strip the RAD schema down.  Right now the
Chado port is ~50 tables with a handful of views.

-Allen




> Sounds great. Here are a few comments as for my $0.02 ...
> 
> There's probably as many expression data schemas out there as labs 
> hosting expression data. There's not that many big efforts making a 
> generalizing attempt, but there are some (GEO, ArrayExpress, GeneX, 
> RAD, SMD, and I'm sure a couple more).
> 
> If gene expression tables in the 'official' BioSQL (everyone can - and 
> many will - have his/her own, extended or whatever, build), a design 
> that attempts to be generic and technology agnostic would be most 
> attractive to me.
> 
> Gene expression not having been within the scope of BioSQL yet ever, 
> I'd prefer to take as much advantage of existing open-source schemas as 
> possible, since then the reality-check has already happened and the 
> software support may come with it.
> 
> Lately GMOD/Chado faced a similar situation, and Allen who I believe 
> took the lead on that project settled on integrating the respective 
> parts of GUS/RAD.
> 
> Allen, how did that work out? Could we just build on your work and RAD?
> 
> Marc, what made you decide to disregard the big expression schemas? (No 
> offense whatsoever, I'm just curious.)
> 
> The way I could envision a different design of a gene expression model 
> in BioSQL is as a warehouse star-schema, where there'd be essentially 
> one (or very few) analytical data tables, and all the rest is hosted by 
> the existing biosql tables (i.e., mostly the term table). It would be 
> understood then that people would host their expression data in another 
> schema, and the biosql table(s) would be used as a warehouse only.
> 
> 	-hilmar
> 
> On Thursday, May 1, 2003, at 12:08  PM, Marc Colosimo wrote:
> 
> >
> > Since I couldn't easily find a good schema, I made my own based on
> > Affymetrixs GATC schema. My hope is that as I develope it, that it will
> > use parts of BioSQL to handle the non-array stuff (taxon, sequence
> > databases, etc...). I only have a few tables made and they are not
> > normalized (one actually I think is best de-normalized). Oh, I am 
> > keeping
> > in mind MIAME stuff.
> >
> > I have one script that is almost finished that loads in CEL files. I 
> > just
> > have a few complex regexs to make/debug and add support for bulk 
> > loading
> > on a local machine (piping it to psql). Now that I have played around 
> > with
> > DBI, loading CDF files are next.
> >
> > If people are interested in the code to try it out, let me know.
> >
> > Marc
> >
> > _______________________________________________
> > BioSQL-l mailing list
> > BioSQL-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/biosql-l
> >
> 



More information about the BioSQL-l mailing list