[BioSQL-l] Affymetrix SQL for PostgreSQL
Marc Colosimo
mcolosim at brandeis.edu
Fri May 2 09:02:58 EDT 2003
On Thu, 1 May 2003, Allen Day wrote:
> Hi,
>
> Things are going well with the Chado/RAD merger. So far I've managed to
> port the table and view create statements from Oracle over to PostgreSQL,
> and the table creates are also portable to MySQL using SQL::Translator.
>
> I loaded some of the Affymetrix MAGE-ML files of all the database
> crossreference info for their probesets last week. This week I've started
> to gather our protocol data, which is prerequisite to loading any real
> data.
>
> So... I can't give you any opinion as to how I've found the RAD schema to
> be from a data analyst's point of view yet. From the loading and schema
> porting experience I've had so far though, it seems that both the Chado
> and RAD teams have put a lot of thought into creating clear schemata.
>
> Hopefully within a month or so I'll have some expression values loaded
> into Chado/RAD and will be starting to use the db for analysis, and can
> give some better feedback.
I didn't know that GMOD was working on this. It really isn't on the web
site and I didn't think Chado was for expression data. The projects listed
for expression analysis really are not for microarray data (as I
understand them).
>
> > The way I could envision a different design of a gene expression model
> > in BioSQL is as a warehouse star-schema, where there'd be essentially
> > one (or very few) analytical data tables, and all the rest is hosted by
> > the existing biosql tables (i.e., mostly the term table). It would be
> > understood then that people would host their expression data in another
> > schema, and the biosql table(s) would be used as a warehouse only.
>
> Ah, okay. You could certainly strip the RAD schema down. Right now the
> Chado port is ~50 tables with a handful of views.
>
What is this RAD schema?
>
> > Sounds great. Here are a few comments as for my $0.02 ...
> >
> > There's probably as many expression data schemas out there as labs
> > hosting expression data. There's not that many big efforts making a
> > generalizing attempt, but there are some (GEO, ArrayExpress, GeneX,
> > RAD, SMD, and I'm sure a couple more).
> >
A month ago, I asked if there was any available schemas. I got a very
short list. I searched for more, but the ones I found had no public
schema. As a single person who is at the bench more than at the computer,
writing and testing a big schema is not ideal. Affy's is very big and
keeps track of a lot of information (it is a LIMS). I want and I think
people probably would like a Simple Oligo Database. I am aim for very few
tables, just the meat and potatoes with the gravy on the side.
> > If gene expression tables in the 'official' BioSQL (everyone can - and
> > many will - have his/her own, extended or whatever, build), a design
> > that attempts to be generic and technology agnostic would be most
> > attractive to me.
> >
> > Gene expression not having been within the scope of BioSQL yet ever,
> > I'd prefer to take as much advantage of existing open-source schemas as
> > possible, since then the reality-check has already happened and the
> > software support may come with it.
> >
> > Lately GMOD/Chado faced a similar situation, and Allen who I believe
> > took the lead on that project settled on integrating the respective
> > parts of GUS/RAD.
> >
> > Allen, how did that work out? Could we just build on your work and RAD?
> >
> > Marc, what made you decide to disregard the big expression schemas? (No
> > offense whatsoever, I'm just curious.)
> >
I didn't find any thing public (not because I didn't try). Links to sites
are welcomed.
> > The way I could envision a different design of a gene expression model
> > in BioSQL is as a warehouse star-schema, where there'd be essentially
> > one (or very few) analytical data tables, and all the rest is hosted by
> > the existing biosql tables (i.e., mostly the term table). It would be
> > understood then that people would host their expression data in another
> > schema, and the biosql table(s) would be used as a warehouse only.
> >
> > -hilmar
> >
> > On Thursday, May 1, 2003, at 12:08 PM, Marc Colosimo wrote:
-marc
More information about the BioSQL-l
mailing list