Identifiers (was: [Biocorba-l] BSANE and bioCORBA)
Andrew Dalke
dalke@acm.org
Fri, 1 Jun 2001 05:01:34 -0600
> Please look into yesterday mails in this list from me and Scott. I
>think there were answers to your questions. Generally, I think we are not
>talking here about using the CORBA NamingService but only about re-using
>its syntax for specifying names.
Ahh, my apologies for not tracking the thread. I'll read over
it tomorrow afternoon and update the BoF Wiki.
> The problem now stands "waht we want to put into identifiers".
[...]
> - set of name components
> - each of the components having a name and a version
> - a "content type" that says:
> - what object is identified by this identifier (i.e. sequence_aa)
> - in which format the object contents is stored (ie. sequence in
> fasta format)
The wiki page at
http://www.biopython.org/wiki/html/BioPython/NamingBoF.html
also lists a few other possible names, like for parts of formats
if you want to do content negotiation. (Eg, "give me the PDB file
with only the header information" or "Give me this in HTML and
use tables if you can generate them.")
Again, I apologize if this was in yesterday's thread - I'm writing
this just before going to sleep and that's too much research for
now :) - but do you have any use cases involving versions of
different components that I could put on the Wiki page? The
only ones I could think of (other than database version and
record version) were contrived.
The closest I could think of was "I want to use the 1993 definition
of LOCUS over the 2001 definition", as in
db/locus;version=1993/QWERTY
but I have no idea if that example makes any sense - though I
believe it does not.
> I am not saying that we have agreed on this summary.
Well, that's why discussion is needed - given how many times
this topic has come up over the last few years in various forums,
if it was obvious it would have been done already. :)
> What else, or what less, do you want to have in the identifier?
One thing I forgot to add on the Wiki is the relation of the
name to XML. When viewed in that respect, you could consider
the data file as an XML database and do a record lookup for
"ID QWERTY" using a query for records containing the element
"ID" whose content is "QWERTY".
This view would then be generalizable to queries like "all
records from 'gallus gallus' which have more than 400 bases."
In that case then, each piece of data in a record (like the
organism fields) can be considered as branches in the hierarchy.
And each of these branches would need to be named.
But I consider that entirely too complicated for the first
pass through (or second or even the third).
I'll update the Wiki to include this comparison.
Andrew
dalke@acm.org