[BioSQL-l] storing clustering information in seqfeatures tables
Hilmar Lapp
hlapp at gnf.org
Tue May 25 15:26:43 EDT 2004
I store UniGene in biosql, which is the result of a sequence (i.e.,
bioentry) clustering. What did you want to cluster?
So long as you cluster either features or bioentries, there are
association tables that establish relationships between the features
(seqfeature_relationship) and bioentries (bioentry_relationship),
respectively. In order not to store all pairwise relationships in a
cluster, you can store a bioentry cluster in the same way UniGene is
stored, namely as a bioentry for the cluster itself, and bioentries for
all members of the cluster, which are also linked by a row in
bioentry_relationship to their respective cluster.
The only thing you can't do in this scenario right out of the box is to
store the distance to the cluster. I introduced an Evidence table
locally for this purpose. The Evidence table basically has a score, a
foreign key to Bioentry_Relationship, and is typed by a foreign key to
Term. I can add this table to the MySQL/Pg versions immediately if
desired or considered helpful, since it doesn't break any backward
compatibility.
-hilmar
On Tuesday, May 25, 2004, at 08:56 AM, Kannan Vijayan wrote:
>
> Hi,
>
> I was wondering if anybody has successfully managed to store results of
> clustering programs in the biosql schema, in some clean way. We're
> currently
> attempting to figure out how to migrate from a home-rolled schema to
> the
> biosql schema, and this is one feature that, while not currently
> handled by
> our schema, we would like to be able to handle in the future.
>
> I've only recently started looking at the biosql schema, so I'm not
> fully up
> to speed on what the best way to do this would be, but I think the
> 'seqfeatures' structures would be particularly appropriate for storing
> this
> information.
>
> Has anybody done this? I would appreciate any tips that people have
> to offer.
>
> thanks.
> --
> Kannan Vijayan <kvijayan at gene.pbi.nrc.ca>
> Bioinformatics Support Specialist
> National Research Council
> Plant Biotechnology Institute
> 110 Gymnasium Place
> Saskatoon, SK S7N 0W9
> Canada
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list