[BioSQL-l] Affymetrix SQL for PostgreSQL

Marc Colosimo mcolosim at brandeis.edu
Fri May 2 09:02:58 EDT 2003


On Thu, 1 May 2003, Allen Day wrote:

> Hi,
> 
> Things are going well with the Chado/RAD merger.  So far I've managed to 
> port the table and view create statements from Oracle over to PostgreSQL, 
> and the table creates are also portable to MySQL using SQL::Translator.
> 
> I loaded some of the Affymetrix MAGE-ML files of all the database
> crossreference info for their probesets last week.  This week I've started 
> to gather our protocol data, which is prerequisite to loading any real 
> data.
> 
> So... I can't give you any opinion as to how I've found the RAD schema to
> be from a data analyst's point of view yet.  From the loading and schema
> porting experience I've had so far though, it seems that both the Chado
> and RAD teams have put a lot of thought into creating clear schemata.
> 
> Hopefully within a month or so I'll have some expression values loaded
> into Chado/RAD and will be starting to use the db for analysis, and can
> give some better feedback.

I didn't know that GMOD was working on this. It really isn't on the web 
site and I didn't think Chado was for expression data. The projects listed 
for expression analysis really are not for microarray data (as I 
understand them).

> 
> > The way I could envision a different design of a gene expression model
> > in BioSQL is as a warehouse star-schema, where there'd be essentially
> > one (or very few) analytical data tables, and all the rest is hosted by
> > the existing biosql tables (i.e., mostly the term table). It would be
> > understood then that people would host their expression data in another
> > schema, and the biosql table(s) would be used as a warehouse only.
> 
> Ah, okay.  You could certainly strip the RAD schema down.  Right now the
> Chado port is ~50 tables with a handful of views.
> 

What is this RAD schema?

> 
> > Sounds great. Here are a few comments as for my $0.02 ...
> > 
> > There's probably as many expression data schemas out there as labs 
> > hosting expression data. There's not that many big efforts making a 
> > generalizing attempt, but there are some (GEO, ArrayExpress, GeneX, 
> > RAD, SMD, and I'm sure a couple more).
> > 

A month ago, I asked if there was any available schemas. I got a very 
short list. I searched for more, but the ones I found had no public 
schema. As a single person who is at the bench more than at the computer, 
writing and testing a big schema is not ideal. Affy's is very big and 
keeps track of a lot of information (it is a LIMS). I want and I think 
people probably would like a Simple Oligo Database. I am aim for very few 
tables, just the meat and potatoes with the gravy on the side.

> > If gene expression tables in the 'official' BioSQL (everyone can - and 
> > many will - have his/her own, extended or whatever, build), a design 
> > that attempts to be generic and technology agnostic would be most 
> > attractive to me.
> > 
> > Gene expression not having been within the scope of BioSQL yet ever, 
> > I'd prefer to take as much advantage of existing open-source schemas as 
> > possible, since then the reality-check has already happened and the 
> > software support may come with it.
> > 
> > Lately GMOD/Chado faced a similar situation, and Allen who I believe 
> > took the lead on that project settled on integrating the respective 
> > parts of GUS/RAD.
> > 
> > Allen, how did that work out? Could we just build on your work and RAD?
> > 
> > Marc, what made you decide to disregard the big expression schemas? (No 
> > offense whatsoever, I'm just curious.)
> > 

I didn't find any thing public (not because I didn't try). Links to sites 
are welcomed.

> > The way I could envision a different design of a gene expression model 
> > in BioSQL is as a warehouse star-schema, where there'd be essentially 
> > one (or very few) analytical data tables, and all the rest is hosted by 
> > the existing biosql tables (i.e., mostly the term table). It would be 
> > understood then that people would host their expression data in another 
> > schema, and the biosql table(s) would be used as a warehouse only.
> > 
> > 	-hilmar
> > 
> > On Thursday, May 1, 2003, at 12:08  PM, Marc Colosimo wrote:

-marc



More information about the BioSQL-l mailing list