[BioSQL-l] Consistency between bio* projects
mark.schreiber at group.novartis.com
mark.schreiber at group.novartis.com
Sun Jan 16 21:41:15 EST 2005
It would seem that what is needed is a mapping of each field from a file
format to a field in a BioSQL table. I think initially this would only
need to be done for EMBL, SwissProt and GenBank.
In many ways I prefer the idea of developing a SQL API which would be more
robust and would serve to define what is expected of each proceedure call.
However I think it should be achievable for the schema. In fact there is
no reason why both cannot co-exist. For any API there should be a possbile
implementation so naturally the schema could be used to generate an API.
People could then happilly make other schemata that fit the API which may
be optimised for their needs.
Does anyone have a recent UML or similar diagram for the schema? I can
then use this to suggest mappings from GenBank fields to the API. I think
it may be easier in many cases to follow bioperl's lead. BioJava seems to
follow the 'store everything that isn't a feature as a bioentry_qualifier'
approach so I just need to add some special cases.
Hilmar, would you be prepared to do any work on the BioPerl side for
synchronization of the two?
- Mark
Hilmar Lapp <hlapp at gnf.org>
01/15/2005 01:58 AM
To: Mark Schreiber/GP/Novartis at PH
cc: biosql-l at open-bio.org
Subject: Re: [BioSQL-l] Consistency between bio* projects
On Friday, January 14, 2005, at 01:10 AM,
mark.schreiber at group.novartis.com wrote:
> Unfortunately, Bioperl stores identifiers as
> follows:
>
> Bioentry.bioentry_id is the unique internal reference number
> Bioentry.name is the GI number
The GI number goes to Bioentry.Identifier, which is was designated the
purpose of storing the identifier within an external database.
Bioentry.name should hold the locus name, which for contigs and many
other entries etc will be identical to the accession (but not the GI
number!).
If you find it in Bioentry.name then I suspect you weren't loading from
genbank or embl formatted input?
From memory the basic idea of BioSQL was to define a schema that bio*
> projects could both read and write from in a language independant
> manner.
> For reasons best left to the designers (mostly I think cause MySQL
> couldn't handle stored proceedures) the level of interaction is right
> down
> at the schema level.
Right. Also, not all database drivers in all languages support stored
procedure calls equally well. In e.g. PostgreSQL and Oracle you can
always get around this by writing a view and putting an INSTEAD OF
INSERT (or UPDATE) trigger on it that will then call the procedure, but
this is clearly not even close to an option in MySQL.
It's maybe worth considering whether opening a dichotomy here between
MySQL and the rest to provide people who need it with a SQL-level API
that both perl and java will use. People who are interested in this by
definition will not be interested in MySQL anyway.
> Unfortunaltey this means that the way data is stored
> needs to be very consistent between projects if any API's that use
> BioSQL
> can be portable. My biojava API cannot be applied to a DB previously
> setup
> with bioperl which was the original idea behind BioSQL in the first
> place.
>
> Help!!
I think you're raising a great point. Indeed, such a contract hasn't
really been written. We're probably one of few who use both perl and
java to access a biosql database (and I'm not using biojava as the
object model on the java side, which is why I'm not running into this
problem). (Note as an aside that you could also write adaptors that
transform between the SymGene and the Biojava model when storing or
retrieving objects from/to the database.)
It'd be great if you were willing to take the lead for getting this all
spelled out and laid down in a document?
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list