Identifiers (was: [Biocorba-l] BSANE and bioCORBA)

Andrew Dalke dalke@acm.org
Thu, 31 May 2001 03:53:47 -0600


Ewan:
>Andrew - more grist to your mill - what do you think of URIs vs
>NamingService stuff? I am sure Martin or Juha can point you to a Naming
>Service doc thing...

I would like to see mor information about NamingService.  The
only thing I've seen so far is part of the BSA specification
which discussed purely the syntax but didn't go into a description
of how to apply that syntax.  (I would like to see some examples
with a database name and record identifier in it.)

I think the way to do this is come up with a simple, abstract
scheme and define ways to serialize that abstraction to URIs
or a CORBA Name.  (I think for now simple is beter than
complete  because I want to get code up and running and learn
from experience what's needed.)

Martin:
>>    What URI has and NamingService syntax does not have:
>>       - a protocol (together with a port number)
>>       - it allows non-escaped dots (ie. as used in the hostnames)

On the first, yes and no.  It's a matter of how you do the
serialization to a string.  Why couldn't you have

/port.80/ (or /port./ for the default) ?

Also, hostnames and ports are too location specific and not
generic - names should be resolved to locations.  Imagine the
machine goes down but there's a functionally identical server
running on a different machine, or even off a different port.
You'ld get the same results pointing to either location.

>>    What URI has not and NS has:
>>       - versioning of each name part

Again, version information *could* be kept be an appropriately
encoded URI

embl/_version/123/id/QWERTY/_version/2

for QWERTY.2 from embl release 123

Also, a close reading of RFC 2396 (Uniform Resource Identifiers
(URI): Generic Syntax) brings me to section 3.3 titled 'Path Component'. 

      path          = [ abs_path | opaque_part ]

      path_segments = segment *( "/" segment )
      segment       = *pchar *( ";" param )
      param         = *pchar

so there is a way to store parameter information in a URI,
which could be version information, as in

  embl;version=123/id/QWERTY;version=2

This depends on my reading of the RFC being correct, and
I have no external source to correct me on that.  (I've not
seen anyone use this feature.)


>>    Is there anything else what make them different?
>>    [ When we know differences we can start to compare them.]

They are not different at all in the abstract sense that
each can, with proper escaping, be stored in the other.  I
could store a NS as a url-encoded opaque string, as in

  "fakens:" + urlescape("some.23/name.12/here.")

and vice versa.

The only advantages to compare then are:

  ease of programming
  ease of human understanding of the raw string

(I lie because ease of testing for equality is
also important.)

Take as a case scenario the task of finding the name
for a record where the name is embedded in some other text.

For starters
  http://spam/
is obviously a URL (and my mail program automatically
underlined it in blue) and 'dalke@acm.org' is obviously
an email.  So a URI like bio:embl/ID/QWERTY is easy to
identify as a reference to a database.  I could easily
write a scanner to find and highlight or extract those
names.

I don't know enough about Corba's Naming Service to
know that the names look like.  If they are like
  bio/embl/ID/QWERTY
then the search can be done by looking for words prefixed
with "bio/" as compared to "bio:".  Stick in some dots
as needed :)


An advantage of URIs is integration with other software.
Code that understands URLs already handles:
  http:
  ftp:
  gopher:
  https:
and others, so they are usually configurable to allow
different schemes to be added easily.

Because of the prominance of the web and URLs, I think
people are already used to seing URL-like names, which
is another reason I like using URIs as the naming framework.

But again, in either case it's a string representation of
an abstract name, so both naming schemes could be used
and even intertranslated as need be.

                    Andrew
                    dalke@acm.org