[DAS2] DAS/2 weekly meeting notes for 28 Nov 05

Tue Nov 29 03:05:40 UTC 2005

Notes from the weekly DAS/2 teleconference, 28 Nov 2005.

$Id: das2-teleconf-2005-11-28.txt,v 1.1 2005/11/29 03:06:04 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  UC Berkeley: Suzi Lewis
  Sanger: Thomas Down, Andreas Prlic
  Sweden: Andrew Dalke

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Today's topic: Spec issues (for DAS/2 retrievals)
-------------------------------------------------

We are following the agenda summary in Andrew's email:
http://portal.open-bio.org/pipermail/das2/2005-November/000352.html

1) DAS Status Code in headers
-----------------------------
Use http error codes and not das-specific ones.
das-error to provide more detail.

GH: Do we really need a detailed response document?

TD: How do you distinguish different parts of the error-causing
request?
AD: how detailed do we need to be?

LS: If you wish to do error recovery, you could have problems with one
part and not another. You give up granularity.

GH: Willing to give up the granularity in favor of simplicity.

AD: Possibilities of error

LS: How about everything that can be turned into an http error should
be. And have a special section to provide das details. E.g.:
    <x-das-error id="code#" description="...">
client is still going to have to understand das error codes

GH, AD: client does need to be there.
AD: Using only http error codes reduces complexity - you only need to
check one place. Another benefit - you can provide a file-based das
server (this was not an use case from the RFCs, just AD's pet idea he
envisions as potentially useful).

GH: Can't think of DAS/1 clients that did anything meaningful with
those das error codes.
AD: NCBI entrez server - does lots of extra error support. Don't want
to go there with das.

TD, LS: DAS error codes can be used to tell client which part of the
URL is at fault. Now it will be just '404 not found'.

AD: REST API says use the http protocol directly.
LS: There are some things in the DAS API that don't translate into
http error codes.

AD: We can support this with error document.

[A] Use HTTP error codes and x-das-error document with code and optional
description.

2) Content-type
---------------

[A] No objections to using: application/x-das+blah+xml

3) Key/value data
-----------------

Three possibilities summarized in Andrew's email.

1) (current spec) using namespace in attrib value.
2) (steve, lincoln) all attribute values are URI's
3) (andrew) Relax-NG based, drop in well-structured XML

SC: (clarified proposal #2). For more, see today's post at:
http://portal.open-bio.org/pipermail/das2/2005-November/000363.html

AD: What's wrong with the Relax-NG based approach?
LS: I don't understand it yet.
SC: Community lacks experience with Relax-NG in general.

TD: Does it let you to point to schema fragments for data types?
AD: There are ways to define it in the schema, haven't looked at it.

LS: This looks great. Would propose having a convention that if it's a
simple, single-valued key, value should be encoded in an attribute
(value="blah"), not as content of a section (CDATA). Reason: It's more
consistent with rest of spec, and it's easier to parse. So in the
example, genefinder-score is not correctly encoded.

AD: That's not in the das: namespace, hence is not under our
control. We can use this convention for things in the das namespace.

AD: User can put it any xml as long as it's reasonably well-formed. We
can define what well-formed is. This is what atom uses. Allows some
simple key val data on client as if it were native data. It permits
searches without needing to know about complex data.

GH: Likes idea of allowing arbitrary xml.

SC: Not completely arbitrary since we limit use of das: namespace,
and possibly other aspects.

LS: So we're going to say we have properties represented as key/val
pairs using this syntax. You'll find 'das:' as well as possibly other
namespaces. I think that works.

What becomes of /property url (ptype)? Does that go away and replaced
by namespace?

AD: Possibly use it for data type (e.g., float). Or we could make it
discoverable? 

LS: Easier to make it part of the spec.

TD: If this can work like XML schema, we could have a pointer to an
xsi. Is there a way to put a pointer to a schema url?

AD: Found this to be useless. Hard coding what is expected is better
than having discoverability.

TD: With the xsi schema location, you can put multiple schema
locations for the das schema, and your extension, separate pointers to
both in a single document.

AD: Never found dynamically resolved schemas useful for anything
LS: In theory they are. Why not?
AD: Knowing that something's an int does say what that int is supposed
to mean.
LS: Right. Let's make sure that the common types of annotation a
server would want to return are in the spec from the get go. Anyone
that doesn't care about extensions can ignore additional properties.

No doubt people will make extensions to DAS/2 that are implemented on
client and server that are in-house, private extensions that only work
in client-server pairs.

Should we allow schema fragments to be brought in via xsi?

TD: this would be in the top-level element. Or can put it on an
enclosing element.
AD: Is there a good reason to do it?

LS: Let's not seek discoverability.

[A] Andrew will flesh out his Relax-NG based property encoding approach.

SC: You could put your schema at the url pointed do by 'das:'

AD: Don't see a need. I found that many of the DAS/1 schema
fragments/documents were in valid. This didn't seem to bother DAS/1
clients and users.
LS: In the real world, people don't validate.

5) xlink and <link>
-------------------

AD: The official xlink spec is long. Have not fully groked it.
GH: Does anyone else have experience with it?  (silence...) Seems like
a reason to not go there.

AD: Atom, uses link to say, "Here's some generic linked out stuff". We
could use it to say, "I'm looking for the stylesheet for this thing or
the schema for the xml document."

GH: We need to draw line between generic links and specific
things. eg. feature ids, all ids are resolvable links, and so could in
principle be specified with link tags.

AD: Link from feature to versioned source it's a part of. Client can
figure out context from url.
Use case: DAS user sends email to colleague, 'look at this url for
feature X'. The other user enters URL in his das browser, client can
identify the das2-versioned source given the feature URL.
LS: They would rely on xml:base.
Nothing in the current DAS/2 spec says that the xml base is for the
versioned source.
LS: But it does give you the versioned source. This is absolutely part
of the spec.
AD: Nothing in the spec that says that features have to be on the same
machine as the rest of the data.
LS: Why does user want versioned source on the same machine that the
feature came from?

AD: Nothing in the spec says that that a feature has to be under
'feature' in the URL.

GH: Generalizing the info href element to be more generic, to specify
what that link means is fine as long as we don't do this for everything
that can be a link. Doc hrefs are fine, not ids.

LS: We're not going to demand that people specify links. (Something
about giving people enough rope to hang themselves with...)

GH: Ids are opaque uris to id the feature.

LS: The HTML link tag has been around a long time, and used a total of
two times: style sheets, copyright statements. This could have easily
been done with a stylesheet tag and copyright tag (without needing a
general link tag).

[A] Consider the xlink/link tags issue tabled.

6) Source filters
-----------------

GH: Use case: DAS/2 client is trying to discover what registry has,
query can be the same as for any das server, you can just apply
additional filters when dealing with a registry.

AP: Client would use tags that a registry server must implement.

GH: A non-registry server can implement as well.

TD: say filtering is optional in general.
AD: I tend to not like optional things. Filtering is required for features.

GH: The spec can state the filters that a registry is required to
implement on sources query. General DAS/2 servers are not requiredd,
but can if they want. What if you send a sources query with filters that it
doesn't understand?

LS: Return everything
GH: Return error
AP: Client can filter out what they want

GH: It's already important to have search capability in client.
Use case: On given genome, show me all gene predicitons for this
region. You need to go to all servers, which could be many.

AD: Can you filter by type of features that can be returned?
AP: Can be added.

GH: Want to be able to search on ontology term, not just id of the
type. 
AD: Need meta-data server to ask of DAS/2 servers what features do you
implement? 

LS: Does metadata protocol need to be part of das spec, or an
additional protocol on top? There should be an optional section of
DAS/2 that is implemented by metadata servers or registrys that allows
you to do servers. Shouldn't overload the core server spec.

GH: Concerned with the response. It's so close to the same xml, it
might as well be the same. Makes it easy for clients to know about
both servers and metadata servers. could call it 'sources' or
something else.

LS: Filtering by feature type, do we need that info that's returned by
sources document?
GH: No, it's part of the query.
LS: Metadata server would have to do a types request.

AD: What if there's a mismatch in SOFA version?
LS: We're in trouble.
AD: Concerned about change in meaning.
SL: Not important.

LS: Use case: There's a 'restriction site' node in SOFA 1.4 with five
terms underneath it. In version 1.5, now there's six terms. A metadata
server running off of the old version is using an incomplete
node. Metadata engine should always run off the latest version.

AP: Registry at Sanger checks every 2 hrs with server.

AD: How is this better than having client do it itself? What features
do you know with this type and this range?
GH: If lots of DAS servers, this will be time intensive
AD: Can we wait until there are lots of servers?
AP: We have 17.
LS: Current paradigm - EBI has many servers that just do one type of
feature e.g, there's a server that just does repeat elements.
So there are servers that will serve up one or a few feat types.
AD: Had not considered that.
LS: Happy to have optional filter syntax added to sources request
supported by metadata servers. Gregg is right about returning error
(unimplemented). Will not change protocol in fundamental way. Just an
annex, just optional section supported by metadata servers.

GH: Based on Andreas' queries in soap, can we squeeze everything in to
params on url? filterable?

AP: yes

AD: optional fields will include species, build#, type, etc.

[A] Add optional filter syntax to sources request. Allow unimpl error
return.

7) /regions
-----------

LS: In sofa, a feature of type region is root of all other features -
everything is a region. Has props - ref sequence it's on, start,
strandedness. The reason for region is for retrieving assemblies.

SC: Region is also currently the only way to get back a list of
available sequence ids without getting all sequence data. The
top-level sequence request returns data along with sequence.

LS/GH: region could be called 'landmarks'

[A] Andrew will work directly with Lincoln on revising region request.

8) Tiled queries
----------------

LS: This doesn't need to be in spec. If client filters features by a
range, is there a contract such that server must return exact range he
asked for, contained in, or is ok for server to return more?
GH: We need to be more strict.
LS: Agree. Client should trim it.

[A] Tiled queries should not be part of the spec.

Other issues 
------------

AP: There are still some other issues not addressed in this
call. E.g., Not possible to handle situation where protein
sequence in a structure varies from genome. Can defer to the next
spec discussion conf call.