[BioSQL-l] Microarrays and BioSQL
Hilmar Lapp
hlapp at gnf.org
Wed Apr 2 11:17:14 EST 2003
On Wednesday, April 2, 2003, at 10:47 AM, Marc Colosimo wrote:
> Like William, I am interested in building a database to sort our Affy
> Microarray data. I came to the conclusion that I cannot use much of
> BioSQL
> to store our data. It came as a surprise to me that I cannot easily
> incorporate the simplest sequence data. Here is my problem:
>
> Each gene has at least 11 oligos for it. They have the same name, like
> 171720_x_at. I have files for the target sequence (500bp), the
> sequence of
> each oligo and their positions, and a file that has descriptions of the
> probes. At a minimum I have 13 items, each with the same name.
>
The way I did this here is to treat Affy probesets as bioentries (you
read them in in FASTA format target sequence), with the individual
probes (oligos) being features on the probeset (you read those in from
the tab file and then associate by look-up in memory while you're
loading). Note that the name for oligos is artificial, since they
really have no identifier (and neither do seqfeatures). I leave those
probeset bioentries pretty bare otherwise, since they are not more than
that - expression reporters.
I then associate the target sequence bioentries with the (fully
annotated) transcript (also a bioentry) they supposedly target via a
bioentry_relationship. Note that this is computed content and is
subject to change according to your current state of knowledge (about
transcripts), and there are different algorithms for how to actually
establish that relationship (e.g., just take Affy's annotation, or
blast against UniGene, or map both UniGene and target sequences to the
genome and then go for co-location; the first one is the easiest but
also the worst because dated - we chose to recompute ourselves).
The question which protein a transcript encodes we solve through
another (computed) bioentry to bioentry relationship. You get the idea.
This works pretty nicely for us. It is one of the things why I like
biosql in fact.
-hilmar
> For the descriptions, I have references to various databases and in
> many
> cases the chromosome the gene is on and the what protein it codes. From
> what I can tell, there is no simple way to incorporate this data about
> a
> gene. Features really do not cover this. I could make several
> externalDB
> refs for the other stuff. But, I would like it all to be in one place.
>
> Maybe I completely missed the boat. Do people know of any other
> opensource
> db for this that also has MIAME tables? I really do not want to
> reinvent
> the wheel here.
>
> Thanks,
> Marc
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list