Identifiers (was: [Biocorba-l] BSANE and bioCORBA)
Andrew Dalke
dalke@acm.org
Thu, 31 May 2001 03:53:47 -0600
Ewan:
>Andrew - more grist to your mill - what do you think of URIs vs
>NamingService stuff? I am sure Martin or Juha can point you to a Naming
>Service doc thing...
I would like to see mor information about NamingService. The
only thing I've seen so far is part of the BSA specification
which discussed purely the syntax but didn't go into a description
of how to apply that syntax. (I would like to see some examples
with a database name and record identifier in it.)
I think the way to do this is come up with a simple, abstract
scheme and define ways to serialize that abstraction to URIs
or a CORBA Name. (I think for now simple is beter than
complete because I want to get code up and running and learn
from experience what's needed.)
Martin:
>> What URI has and NamingService syntax does not have:
>> - a protocol (together with a port number)
>> - it allows non-escaped dots (ie. as used in the hostnames)
On the first, yes and no. It's a matter of how you do the
serialization to a string. Why couldn't you have
/port.80/ (or /port./ for the default) ?
Also, hostnames and ports are too location specific and not
generic - names should be resolved to locations. Imagine the
machine goes down but there's a functionally identical server
running on a different machine, or even off a different port.
You'ld get the same results pointing to either location.
>> What URI has not and NS has:
>> - versioning of each name part
Again, version information *could* be kept be an appropriately
encoded URI
embl/_version/123/id/QWERTY/_version/2
for QWERTY.2 from embl release 123
Also, a close reading of RFC 2396 (Uniform Resource Identifiers
(URI): Generic Syntax) brings me to section 3.3 titled 'Path Component'.
path = [ abs_path | opaque_part ]
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pchar
so there is a way to store parameter information in a URI,
which could be version information, as in
embl;version=123/id/QWERTY;version=2
This depends on my reading of the RFC being correct, and
I have no external source to correct me on that. (I've not
seen anyone use this feature.)
>> Is there anything else what make them different?
>> [ When we know differences we can start to compare them.]
They are not different at all in the abstract sense that
each can, with proper escaping, be stored in the other. I
could store a NS as a url-encoded opaque string, as in
"fakens:" + urlescape("some.23/name.12/here.")
and vice versa.
The only advantages to compare then are:
ease of programming
ease of human understanding of the raw string
(I lie because ease of testing for equality is
also important.)
Take as a case scenario the task of finding the name
for a record where the name is embedded in some other text.
For starters
http://spam/
is obviously a URL (and my mail program automatically
underlined it in blue) and 'dalke@acm.org' is obviously
an email. So a URI like bio:embl/ID/QWERTY is easy to
identify as a reference to a database. I could easily
write a scanner to find and highlight or extract those
names.
I don't know enough about Corba's Naming Service to
know that the names look like. If they are like
bio/embl/ID/QWERTY
then the search can be done by looking for words prefixed
with "bio/" as compared to "bio:". Stick in some dots
as needed :)
An advantage of URIs is integration with other software.
Code that understands URLs already handles:
http:
ftp:
gopher:
https:
and others, so they are usually configurable to allow
different schemes to be added easily.
Because of the prominance of the web and URLs, I think
people are already used to seing URL-like names, which
is another reason I like using URIs as the naming framework.
But again, in either case it's a string representation of
an abstract name, so both naming schemes could be used
and even intertranslated as need be.
Andrew
dalke@acm.org