[Open-bio-l] BioSQL schema: some questions
Ewan Birney
birney@ebi.ac.uk
Sat, 27 Apr 2002 12:07:44 +0100 (BST)
On Fri, 26 Apr 2002, Chris Mungall wrote:
> > 8) Biosequence has a seq_version. How is that different from
> > Bioentry.entry_version?
>
> pass
sequence versions and entry versions have different semantics - sequecne
version is hte important one and changes on sequence changes. entry
verions changes on sequence changes and any other (eg,
annotation) changes.
Standard embl/genbank stuff.
>
> > 9) Why is there molecule in Biosequence (and not in Bioentry)? I.e.,
> > would there be Biosequence entries of different molecule (mRNA, DNA,
> > ...) for a particular Bioentry? If so, this is contradicted by the
> > identifying relationship to Bioentry (there is a UK on the FK).
>
> pass; there was a thread on bioperl a while ago about molecule type vs
> alphabet
>
molecule = where it came from (eg mRNA)
alphabet = how it is encoded (DNA/RNA etc)
> > 10) In Seqfeature all attributes except primary key and FK to Bioentry
> > are nullable. This makes it hard to guarantee a way to uniquely identify
> > a record (other than by PK, which may change from db-load to db-load).
>
> Looking at it from a genbank loading point of view, this makes sense.
>
> If you want features to persist then they should have their own bioentry.
>
> Individual projects may wish to have their own decisions about ways to
> uniquely identify seqfeatures (dbxrefs, 'name' qualifiers) but we can't
> enforce this at the relational level without breaking genbank mode
>
> > 11) Similar for Seqfeature_location: since start and end are nullable,
> > what would be the UK other than the PK? Maybe seqfeature_id and
> > location_rank?
>
> I'll add
> UNIQUE (seqfeature_id, location_rank),
>
> > 12) Seqfeature_relationship has a PK attribute, but is never referenced.
> > Will someone want to reference it by PK?
>
> Quite possibly.
>
> I had envisioned the semantics of seqfeature_relationship being left open.
> Mostly it will be used to specify compositional relationships.
>
> You could use it in combination an ontology to specify other kinds of
> relationships (e.g. P-insertion X disrupts gene Y); in some of these
> cases, you may want to record extra information about the relationship
> (e.g. who made the association and when)
>
> > 13) Same for the association table between Dbxref and Ontology_Term
> > (Dbxref_Qualifier_Value).
>
> dbxref_qualifier_value_id isn't really useful as far as I can see
>
> > 14) Same for the association table between Dbxref and Bioentry
> > (Bioentry_Direct_Links; the table name should actually be singular for
> > consistency).
>
> yep
>
> > 15) There is no hierarchy or relationship between ontology terms.
> > Intentional?
>
> this is here - as a seperate component, under sql/ontology/
>
> right now the db is built from the components via a makefile, which also
> takes care of mysql/pg conversion.
>
> I think it may be a good idea to further break down the schema into
> components; I don't know if makefiles are the best long term solution for
> specifying how to combine the components.
>
> is there a standard way of specifying this, or shall we make up our own.
>
> Sounds like a good excuse for a tab vs xml war....
>
> > 16) Why is seqfeature_source_id nullable in Seqfeature?
>
> pass
>
Probably an oversihgt
> > 17) Aren't Dbxref.dbname and Biodatabase.name redundant? Shouldn't there
> > be a FK?
>
> pass
>
Not dbref.dbnames will be biodatabase.name, in particular in things like
swissport which going link-tastic v. quickly.
> > I'm wondering how I would 'correctly' represent a mapping of, e.g.,
> > Celera transcripts (Bioentries?) onto the Ensembl assembly.
>
> We need a similarity-pair table for this - shall I make one?
>
> How should we deal with scores, e-vals etc? Using a qualifier-value system
> is generic and can be extended for a variety of programs and metrics. But
> then we lose the ability to use floating point arithmetic at the DBMS
> level.
>
> We could have tables:
>
> featurepair_qvalue_float
>
> featurepair_qvalue_int
>
> featurepair_qvalue_text
>
> but this seems a bit ugly.
>
> In gadfly the featurepair table has the common qualifiers (score, e-val,
> qframe, sframe), and a qualifier-value system is used for the less common
> ones. I think this is a good solution; it breaks the generic biosql model
> but querying by e-val is so useful and common I think it's OK
>
> Or do we allow different implementations here?
>
> > -hilmar
> >
>
>
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l@open-bio.org
> http://open-bio.org/mailman/listinfo/open-bio-l
>
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------