[BioSQL-l] storing clustering information in seqfeatures tables

Hilmar Lapp hlapp at gnf.org
Tue May 25 15:26:43 EDT 2004


I store UniGene in biosql, which is the result of a sequence (i.e., 
bioentry) clustering. What did you want to cluster?

So long as you cluster either features or bioentries, there are 
association tables that establish relationships between the features 
(seqfeature_relationship) and bioentries (bioentry_relationship), 
respectively. In order not to store all pairwise relationships in a 
cluster, you can store a bioentry cluster in the same way UniGene is 
stored, namely as a bioentry for the cluster itself, and bioentries for 
all members of the cluster, which are also linked by a row in 
bioentry_relationship to their respective cluster.

The only thing you can't do in this scenario right out of the box is to 
store the distance to the cluster. I introduced an Evidence table 
locally for this purpose. The Evidence table basically has a score, a 
foreign key to Bioentry_Relationship, and is typed by a foreign key to 
Term. I can add this table to the MySQL/Pg versions immediately if 
desired or considered helpful, since it doesn't break any backward 
compatibility.

	-hilmar

On Tuesday, May 25, 2004, at 08:56  AM, Kannan Vijayan wrote:

>
> Hi,
>
> I was wondering if anybody has successfully managed to store results of
> clustering programs in the biosql schema, in some clean way.  We're 
> currently
> attempting to figure out how to migrate from a home-rolled schema to 
> the
> biosql schema, and this is one feature that, while not currently 
> handled by
> our schema, we would like to be able to handle in the future.
>
> I've only recently started looking at the biosql schema, so I'm not 
> fully up
> to speed on what the best way to do this would be, but I think the
> 'seqfeatures' structures would be particularly appropriate for storing 
> this
> information.
>
> Has anybody done this?  I would appreciate any tips that people have 
> to offer.
>
> thanks.
> -- 
> Kannan Vijayan <kvijayan at gene.pbi.nrc.ca>
> Bioinformatics Support Specialist
> National Research Council
> Plant Biotechnology Institute
> 110 Gymnasium Place
> Saskatoon, SK S7N 0W9
> Canada
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the BioSQL-l mailing list