[DAS2] Why use URIs for feature IDs?
Andrew Dalke
dalke at dalkescientific.com
Wed Feb 8 19:24:06 UTC 2006
Yes. I like URLs. I've been so in favor of URLs that until
this morning I had in the spec that the "id" *is* the URL.
There was no short form for the URL. (still /is/ no short form
since it hasn't changed ;)
That meant several things:
- everyone needs to disambiguate through the xml:base to
figure out if two features are the same. (Neither Gregg nor
Thomas liked that)
- queries of the style we are doing become more complex
(type=http://www.server/path/to/das/type/000A956826C8 vs.
type=000A956826C8 )
- passing URLs about make for bigger XML, hence slower.
The first is technical. The second is emotional - that sort of
query looks ugly. The last is .. I can't speak for the last.
In an earlier email I showed how a different site layout can
be as efficient as any id scheme. Quickly, use
http://www.../volvox/1/S <- versioned source URL
http://www.../volvox/1/T?.. <- types query url
http://www.../volvox/1/T001 <- type urls
http://www.../volvox/1/F?.. <- feature query urls
http://www.../volvox/1/F001 <- type urls
and don't worry about any sort of hierarchy in the system.
Everything has the xml:base of "http://www.../volvox/1/"
so relative URLs are trivial strings.
Several said "just chop off the last bit of the URL to get
the id" or "combine some base feature URL with the feature
id to get the full URL."
Why is that useful? Lincoln said on today's phone call that
he wants both a URL and an id, and expected that both would
be there.
I'm now going to be either stubborn or irritating or both.
Why have an id at all? That is, why at all have a short string
(say of the form /[A-Za-z0-9_]/ when the URL is there and
meets all the functional requirements of an identifier?
(I'll use 'id' to refer to a short string, 'url' to refer to
a URL. Both are identifiers. I should be using 'uri' for
the latter, I know. See comment below.)
Today I thought I came up with one reason to have ids and
to have a non-existant URL for a <FEATURE> element. I
think now that I was wrong.
My use case was for uploading data to the Emsembl viewer
to display a new DAS track. Put all of the types into one
file, in the types XML format. Put all of the features into
another file in a features XML format. Use arbitrary ids for
cross referencing, because there is no URL for them - they
don't exist in any form outside the document.
Upload them to the server. The server reassembles the
annotations by cross referencing the ids.
I now see that that's a mistake. As Gregg corrected me,
they use URIs not just URLs. They could use
"das_private:ABC123" or a fully-qualified URL or a
xml:base and the partial URL or whatever scheme. All
the server needs to know is how to compare the two URI
strings. It's free to rename the strings if need be.
(Could it keep the original URLs? Perhaps, but the
original data might not be accessible. Consider an
exon predictor whose output you want to upload to the
Ensembl viewer. There is no URL for that.)
Given that this isn't a valid use case for having an 'id'
and not having a 'url' now I ask again, what's the point of\
having *both* a unique URL and a unique 'id' for the elements?
Tradition? Elegance?
With Dave Howorth's comment about the specialness of 'id'
I can see changing the attribute name to 'url'.... or 'uri'.
I've got to write a couple paragraphs for Nomi now.
I'll leave with the following comment from
http://tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages
> Designing XML Languages is hard. It’s boring, political,
> time-consuming, unglamorous, irritating work. It always takes longer
> than you think it will, and when you’re finished, there’s always this
> feeling that you could have done more or should have done less or got
> some detail essentially wrong.
Andrew
dalke at dalkescientific.com
More information about the DAS2
mailing list