[DAS] Re: das/2 proposal status
Andrew Dalke
dalke at dalkescientific.com
Fri Oct 1 12:30:08 EDT 2004
Dave Howorth wrote:
> XML dates are defined in <http://www.w3.org/TR/xmlschema-2/>, broadly
> as 1999-10-26 for a date or 2000-03-04T20:00:00Z for a dateTime. I
> would suggest mandating the canonical representations.
I know the ISO date formats less than I do the RFC date formats. In
part because all ISO specs cost money, so aren't freely available on
the web. Thus when reading the XML schema documentation, where it says
[Definition:] date represents a calendar date. The *value space*
of date is the set of Gregorian calendar dates as defined in § 5.2.1
of [ISO 8601]. Specifically, it is a set of one-day long,
non-periodic
instances e.g. lexical 1999-10-26 to represent the calendar date
1999-10-26, independent of how many hours this day has.
it's hard for me to know what that means. Is 1999-1-3 allowed for
the 3rd of January? Elsewhere in the documentation it suggests the
answer is no, that all months and days are two digits in length. But
there are comments like
See ISO 8601 Date and Time Formats (§D) for details about legal
values in the various fields.
which defer final say to that spec.
I'm not saying that we shouldn't use ISO 8601. I'm complaining
because I don't have enough knowledge of it to make a judgment
and the information needed for clarification isn't available.
> This format has several advantages over the earlier complex textual
> ones:
> * Dates can be compared directly as strings with no need for parsing,
> * Dates are easier to parse when it is necessary,
> * They don't require non-English speakers to learn abbreviations,
> * XML defines rules for interpretation and comparison.
Date comparisons do require parsing. The XML schema spec modifies
the ISO spec to allow years past 9999. (See "3.2.9.1 Lexical
representation") This is done by allowing digits to the left
of the ISO year spec. Thus to support the XML Schema datetime
type, an implementation must allow the year 16293-03-02, which
is lexigraphically before 2000-01-01 .
Python's standard datetime only handles proleptic Gregorian
between 0001-01-01 to 9999-12-31. I would prefer staying
to that range as I suspect more libraries can handle ISO
dates rather than XML Schema extensions to the ISO dates.
Were we to go this route I would insist on restricting the
allowed dates to the subset of ISO 8601 and XML Schema.
That is, no dates outside 0001-9999 (8601 allows 0000-9999
while XML Schema allows -Inf to +Inf, *except* 0000).
Let the future figure out how to extend DAS to make it Y10K
compliant. :)
It's also true that 8601 is easier to parse and generate
than RFC dates. The latter requires, eg, getting the
day-of-week correct. But since most every language has
code for doing that how is either one harder than the other,
practically speaking?
Actually,
http://www.mcs.vuw.ac.nz/technical/software/SGML/doc/iso8601/
ISO8601.html
claims
The following complete, abbreviated or truncated formats are
permissible:
"19930214" or "1993-02-14" (complete representation)
"1993-02" (reduced precision)
"1993"
"19"
"930214" or "93-02-14" (truncated, current century assumed)
"-9302" or "-93-02"
"-93"
"--0214" or "--02-14"
"--02"
"---14"
so the ISO datetime can be quite complicated to parse. The XML Schema
restricts it to "the extended format CCYY-MM-DDThh:mm:ss" where only
the last field, the seconds, may have decimals.
That New Zealand URL says the ISO spec says
Decimal fractions may be included with an hour, minute or second.
The decimal sign should be either a comma (preferred) or a full
stop. If the value is less than one then the decimal sign should
be preceded by a zero. The number of decimal places is set
depending on the application.
The XML Schema spec says it's "s.sss" for that case and uses the
phrase "decimal point" instead of "decimal sign". That suggests
some libraries might generate ISO extended format and end up with
a "," for the seconds decimal instead of ".". I had no experience
to judge. Python uses a "." but that's because the implementer
is from the US.
There are some abbreviations. There's "Z" for the time zone ;)
Less glibly, the datetime is not going to be read by humans who
aren't comfortable with English. It will be read by people who
like looking at the low-level format, and by software. The
rest of the format (element tag, spec) requires a good knowledge
of English so that first group isn't going to be left out. As
for software for reading the date, well, is the ISO spec for the
dates, or the XML schema spec for the dates, available in
different languages? ISO, perhaps might be in French too.
In other words, I don't think that's a major objection.
I do like that if we use some subset of ISO and XML Schema
as our datetime then the XML schema language will be able
to validate it for us, and automatically convert it into
the native date type for a language using the dates. That's
good enough reason for me. But I would want to test it out
first.
>> That should probably be 'name' instead of 'id'. For consistency's
>> sake since 'id' seems otherwise always used for resolvable URIs.
>
> In the context of an XML document, I think the use of 'id' attributes
> for values that are not of ID type is very misleading. In the case
> of resolvable URIs, why not use the tag 'url' instead? And use 'name'
> as Andrew suggests in other cases.
I don't think 'url' is right for this case, but I don't know the
precise expected semantics for an ID type.
In general we use id for the case exemplified by the following
<!-- This is in the document http://..../abc/ -->
<CHILDREN>
<CHILD id="xyz" />
<CHILD id="zzz" />
</CHILDREN>
To get to the first child, use http://..../abc/xyz
Is that correct? If that's not the correct meaning for id
(I suspect it's supposed to be usable for targets, as with
the ....#xyz syntax used in HTML) then we can easily change
it to 'url'.
Andrew
dalke at dalkescientific.com
More information about the DAS
mailing list