[BioSQL-l] Microarrays and BioSQL

Hilmar Lapp hlapp at gnf.org
Wed Apr 2 11:17:14 EST 2003


On Wednesday, April 2, 2003, at 10:47  AM, Marc Colosimo wrote:

> Like William, I am interested in building a database to sort our Affy
> Microarray data. I came to the conclusion that I cannot use much of 
> BioSQL
> to store our data. It came as a surprise to me that I cannot easily
> incorporate the simplest sequence data. Here is my problem:
>
> Each gene has at least 11 oligos for it. They have the same name, like
> 171720_x_at. I have files for the target sequence (500bp), the 
> sequence of
> each oligo and their positions, and a file that has descriptions of the
> probes. At a minimum I have 13 items, each with the same name.
>

The way I did this here is to treat Affy probesets as bioentries (you 
read them in in FASTA format target sequence), with the individual 
probes (oligos) being features on the probeset (you read those in from 
the tab file and then associate by look-up in memory while you're 
loading). Note that the name for oligos is artificial, since they 
really have no identifier (and neither do seqfeatures). I leave those 
probeset bioentries pretty bare otherwise, since they are not more than 
that - expression reporters.

I then associate the target sequence bioentries with the (fully 
annotated) transcript (also a bioentry) they supposedly target via a 
bioentry_relationship. Note that this is computed content and is 
subject to change according to your current state of knowledge (about 
transcripts), and there are different algorithms for how to actually 
establish that relationship (e.g., just take Affy's annotation, or 
blast against UniGene, or map both UniGene and target sequences to the 
genome and then go for co-location; the first one is the easiest but 
also the worst because dated - we chose to recompute ourselves).

The question which protein a transcript encodes we solve through 
another (computed) bioentry to bioentry relationship. You get the idea.

This works pretty nicely for us. It is one of the things why I like 
biosql in fact.

	-hilmar

> For the descriptions, I have references to various databases and in 
> many
> cases the chromosome the gene is on and the what protein it codes. From
> what I can tell, there is no simple way to incorporate this data about 
> a
> gene. Features really do not cover this. I could make several 
> externalDB
> refs for the other stuff. But, I would like it all to be in one place.
>
> Maybe I completely missed the boat. Do people know of any other 
> opensource
> db for this that also has MIAME tables? I really do not want to 
> reinvent
> the wheel here.
>
> Thanks,
> Marc
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list