[BioSQL-l] database extensions

Hilmar Lapp hlapp at gmx.net
Sun Aug 6 18:32:43 UTC 2006


Hi Angel, sorry for the belated response, I was at BOSC. See my  
comments below.

On Aug 3, 2006, at 2:28 PM, Angel Pizarro wrote:

> Hello,
>
> Relatively new to biosql, but I was wondering about a few aspects  
> of the
> schema/project.
>
> First, about the ontology tables, what is the preferred way to map
> ontology annotations to bioentries? via a seqfeature? Currently I just
> added a new table to map GO associations with the evidence code from
> GOA.  Not optimal as there may be multiple lines of evidence for an
> association, as in the godatabase schema.

You link ontology terms to bioentries through the  
bioentry_qualifier_value table, i.e., as a value-less term association.

If you want to capture the evidence code for GO then associations  
then you can use the value field in bioentry_qualifier_value to hold  
the code. This indeed won't very well if there are multiple evidence  
codes.

You could collapse them into one delimited string but that will  
impair your ability to constrain searches by evidence code. However,  
a LIKE constraint instead of string equality may not make a big  
difference since typically the value column isn't indexed anyway  
since you may have big values there. At any rate, if you do have  
multiple evidence codes and you do want to constrain searches by  
evidence code then there needs to be a better solution.

>
> Second, are primary keys up for discussion any time soon? I realize  
> that
> a lot of external projects rely on this schema, so it has to remain
> stable, but the inconsistent use of UID, compound keys or even lack  
> of a
> key really put a hindrance on the use of off-the-shelf ORMs.

Can you elaborate? Meanwhile most tables do have a surrogate key.  
Only those that serve as association tables and aren't referenced  
themselves (and only very few association tables are referenced by  
foreign key) do not (they still have a unique key constraint though).

Just to make sure - you're looking at the CVS check-out version, not  
at 0.1 or something?

>
> Third, how does one go about submitting proposals for schema  
> extensions?
> I am wanting to extend the schema with a few modules, mainly ripped  
> out
> of either  GUS and/or chado, as well as adding a module for  
> proteomics data.

You would send those to the list, ideally accompanied with some  
comments on motivation and why the existing tables can't deal with  
the data the new entities are supposed to capture. That would give  
people a chance to comment.

I enthusiastically welcome proposals for additions especially if  
those help to promote the utility of BioSQL.

>
> Fourth, is the current practice for representation of biological
> pathways and interactions to use the bioentryrelationship table?

Yes, that was my plan when I worked on the Symgene project. I didn't  
get to ever implement that though so don't know how well it would  
really work.

I did implement bioentry graphs with the bioentry_relationship table,  
and I had to add an evidence table to accomplish my goals. With that  
it worked very well though.

This is the evidence table, I'll add it in the 1.1 version.

CREATE TABLE Evidence (
        Evidence_Id              INTEGER NOT NULL,
        Score                    NUMBER NULL,
        Last_Modified            DATE DEFAULT SYSDATE NOT NULL,
        Bioentry_Relationship_Id INTEGER NOT NULL,
        Term_Id                  INTEGER NOT NULL,
        DBXref_Id                INTEGER NULL,
        PRIMARY KEY (Evidence_Id)
        UNIQUE (Bioentry_Relationship_Id, Term_Id, DBXref_Id)
);


>
> Many thanks.

You're most welcome.

	-hilmar


>
> -- 
> Angel Pizarro
> Director, Bioinformatics Facility
> Institute for Translational Medicine and Therapeutics
> University of Pennsylvania
> 806 BRB II/III
> 421 Curie Blvd.
> Philadelphia, PA 19104-6160
>
> P: 215-573-3736
> F: 215-573-9004
> E: angel at mail.med.upenn.edu
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================








More information about the BioSQL-l mailing list