From dalke at dalkescientific.com Tue Mar 15 17:45:00 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Mar 2005 15:45:00 -0700 Subject: [DAS2] starting the validation work Message-ID: Hi all, I've started to work on the validation. I'm going through the HTML spec in CVS. Fixed a couple of small typos already. Here's things I haven't been able to resolve so far. The sources request document starts like this (I changed it to a local DTD for testing) According to http://www.w3.org/TR/REC-xml/ Validity constraint: Root Element Type The Name in the document type declaration MUST match the element type of the root element. I think that means the doctype should be ^^^^^^^--- this changed It looks like we could also do which would let an XML reader check if it knows the public name and if not fetch it from the given URL. If I follow it correctly we could just have a single DTD URL for everything we return because the first term in the doctype specifies the start element in the DTD. Something like Second, in reading through things it looks like the general decision in the XML community is to use the phrase "URI" instead of "URN". Eg, see the XML schema document at http://www.w3.org/TR/xmlschema-2/#anyURI where it uses URI and not URN. Indeed RFC 2396 says http://www.ietf.org/rfc/rfc2396.txt A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource. The term "Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable. This is different than what we use in the DAS2 spec, which says things like The id attribute is a URN (typically in the form of a relative URL) Shall I go through and change the "URN"s to "URI"s in the docs? Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Tue Mar 15 17:53:27 2005 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Tue, 15 Mar 2005 14:53:27 -0800 Subject: [DAS2] starting the validation work Message-ID: I think we should definitely be consistent in the spec in using "URI", not "URN". gregg -----Original Message----- From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open-bio.org] On Behalf Of Andrew Dalke Sent: Tuesday, March 15, 2005 2:45 PM To: das2 at portal.open-bio.org Subject: [DAS2] starting the validation work Second, in reading through things it looks like the general decision in the XML community is to use the phrase "URI" instead of "URN". Eg, see the XML schema document at http://www.w3.org/TR/xmlschema-2/#anyURI where it uses URI and not URN. Indeed RFC 2396 says http://www.ietf.org/rfc/rfc2396.txt A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource. The term "Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable. This is different than what we use in the DAS2 spec, which says things like The id attribute is a URN (typically in the form of a relative URL) Shall I go through and change the "URN"s to "URI"s in the docs? Andrew dalke at dalkescientific.com _______________________________________________ DAS2 mailing list DAS2 at portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/das2 From ed_erwin at affymetrix.com Tue Mar 15 18:00:45 2005 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Tue, 15 Mar 2005 15:00:45 -0800 Subject: [DAS2] starting the validation work In-Reply-To: References: Message-ID: <4237691D.2050600@affymetrix.com> Well, test it and see if it works. If so, that looks like a good idea. I wouldn't have thought this was allowed by DTD's, but if it is, then great! Andrew Dalke wrote: > > > If I follow it correctly we could just have a single > DTD URL for everything we return because the first > term in the doctype specifies the start element in > the DTD. Something like > > "http://www.biodas.org/das2.dtd"> > > "http://www.biodas.org/das2.dtd"> > > "http://www.biodas.org/das2.dtd"> > From dalke at dalkescientific.com Tue Mar 15 18:07:50 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Mar 2005 16:07:50 -0700 Subject: [DAS2] starting the validation work In-Reply-To: References: Message-ID: <0caf7cadddc417a608f06f37e7459095@dalkescientific.com> Gregg: > I think we should definitely be consistent in the spec in using "URI", > not "URN". The ayes have it - changed. Ed > Well, test it and see if it works. If so, that looks like a good idea. Okay, I will. If anyone has experience with using DOCTYPE let me know... Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Wed Mar 16 00:30:10 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Mar 2005 22:30:10 -0700 Subject: [DAS2] das2 comments Message-ID: <04dc1f43a62e51ed024ccd82932d6127@dalkescientific.com> In going through spec I noticed some things that seem questionable. We have "SOURCES" which contains a list of "SOURCE" elements. "NAMESPACES" .... "NAMESPACE" elements. "TYPES" .... "TYPE" elements But we have "FEATURELIST" which contains "FEATURE" elements. And "REGION-LIST" which contains "REGION" element. "PROPERTY-LIST" ... "PROPERTY" There's also "CAPABILITIES", which contains "METHOD" elements. I suggest we normalize these to use the same style. My preference is for the English plurals FEATURELIST --> FEATURES REGION-LIST --> REGIONS PROPERTY-LIST --> PROPERTIES I'm not sure if CAPABILITIES/METHOD should be changed and if so, to what? The FEATURELIST/FEATURE/XID documentation says A typical feature will either have a single tag or a single tag, although it is possible (and sensible) to have one or more of both. If I understand it correctly then it's equivalent to the following, which I think is clearer A typical feature will have at least one tag or one tag. It is possible (and sensible) to have one or more of both. There's a FEATURE/PROP example that includes a bit of base64 encoded data that purports to be a jpeg. It isn't. When I decode it, 'file' says it's a MS Office document. When I look at the byte stream I see something that looks like the big endian Unicode BOM UTF-16/UTF-32 and the letters "abcdefghijkl" a 4 byte intervals. Any reason we couldn't have a real gif/png/jpeg/whatever here? Besides the need to make one. Speaking of which, do the prop fields each need a "name" or "description" attribute? How is a user supposed to distinguish between these two images? BASE64-ENCODED-DATA-HERE Some months ago we had the discussion on date representations. I thought we decided on ISO 8601 dates instead of RFC dates. Looking through my emails I see there wasn't a conclusion. In private email to Lincoln I said Were we to go this route I would say we define that all dates be given as YYYY-MM-DD all datetimes be given as YYYY-MM-DDTHH:MM:SS(.ss*)?(Z|[+-]hh:mm) (timezone required, fractions of a second optional), 0001 <= YYYY <= 9999, 00<=HH<=23 and leap second support is implementation dependent. This is compatible with ISO 8601, compatible with XML Schema, supportable by the likely DAS/2 clients and servers, and not dependent on any external specification. ISO dates or RFC dates? I vote ISO dates. If we go the ISO route we could more easily fit in with the Dublin core metadata elements. For example we could have dc:created = "1987-06-05" dc:description = "Volvox Example Database" However, I do not think this is needed, in part because I think the dates and the description fields are the only things that would be affected. Someone would need to present a good enough use case for it. The downside is that it adds another namespace to the system and another layer to understand. I've been trying to make sense of the XLink spec since that seems relevant to what we're doing. One possibility is to replace things that point to URIs with "xlink:href". But is that a good idea? Looking around I found Tim Berners-Lee's commentary "When should I use XLink?" http://www.w3.org/DesignIssues/XLink.html His answer is 2. You should use xlink whenever your application is one of hypertext linking, as xlink functionality such as power to control user interface behavior on link traversal is useful and should be implemented in a standard way to allow interoperability (BTW, there's the old joke that in a multiple choice answer you should pick the longest one. That's true in this case.) No one I think will browse this data directly. There will be some intermediate translation going on in between, from a web-based middle layer, a dedicated client, or an XSLT transformation. Those will be able to add xlink fields if needed. So again I mention it here as a possibility and for the record, but I don't think it's something to use. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Wed Mar 16 01:03:38 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Mar 2005 23:03:38 -0700 Subject: [DAS2] initial schemas and using a template Message-ID: <4742fe48454e11018d4f9155205221d8@dalkescientific.com> I have initial RELAX-NG schemas (in compact syntax) for the the XML in the das2_get spec. As I recall we're only working on GET for Y1 of the grant so I haven't touched the das2_put spec. To validate I downloaded jing-20030619 from http://www.thaiopensource.com/relaxng/jing.html and used the included file thusly java -jar jing.jar -c das-regionlist.rnc das-regionlist2.xml The "-c" option says that the schema is in compact format ("rnc" = "Relax Ng Compact"). For the XML I copied and pasted from the spec, changing the DOCTYPE to point to /dev/null For whatever reason jing requires that that file be present even though it isn't used. The program trang, available from the same site http://www.thaiopensource.com/relaxng/trang.html can be used to turn the compact notation into the RELAX NG XML syntax, or (as much as is possible) into an XML Schema or DTD. Here's an example of use java -jar ~/ftps/trang/trang-20030619/trang.jar \ das-details.rnc das-details.dtd The input file type is determined automatically based on the filename's extension. Here is an example of the rnc format. This is the schema for the response for details. default namespace = "http://www.biodas.org/ns/das/genome/2.00" element SOURCE { attribute xml:base { text }, ## The id attribute is a URN attribute id { text }, ## The description attribute provides a human readable string ## describing the data source. attribute description { text }, # not using taxon # missing doc_href? element VERSION { attribute id { text }, attribute description { text }, # missing doc_href? # better date string format? attribute created { text }, attribute modified { text }, element CAPABILITIES { element METHOD { # restrict to GET/PUT/POST/DELETE? attribute id { text } }+ }, element NAMESPACES { element NAMESPACE { attribute id { text }& text& element FORMAT { attribute id { text }, attribute type { text } }* }* } } } (As you can see, I have a question about if the SOURCE element should contain a doc_href attribute.) In the XML syntax (filename extension of ".rng") this looks like The id attribute is a URN The description attribute provides a human readable string describing the data source. One thing to note is that the "##" comments get converted into elements while the "#" comments get converted into . Potentially the XML with the documentation annotations could be converted into HTML. I didn't come across a program that does this already. It seems that most people roll their own converters. I would like it if the HTML documentation and the XML schema were more closely tied together. What I propose is to move the description of the different fields into the XML schema, as noted above. I would then write a program to convert the XML form into HTML that could be inserted into the documentation. Most likely this means using some sort of template/ string substitution to put everything together. And a makefile to merge them. This would also help me develop a validator for the data files in the spec itself. Eg in my testing today I fixed two typos in the examples. What I can do is pull the XML and tab-delimited files out of the HTML and into separate files. These can be tested standalone and the template can say "insert file ABC here". Sound good to you all? Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Wed Mar 16 13:40:09 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 16 Mar 2005 11:40:09 -0700 Subject: [DAS2] namespace description Message-ID: The spec has an example which looks like Feature types The description in the NAMESPACE is somewhat tricky to handle. I was able to get a schema description for it as mixed text, but it allows the following Feature types I would rather see something like Andrew dalke at dalkescientific.com From ed_erwin at affymetrix.com Wed Mar 16 13:58:27 2005 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Wed, 16 Mar 2005 10:58:27 -0800 Subject: [DAS2] namespace description In-Reply-To: References: Message-ID: <423881D3.7000405@affymetrix.com> I agree. Allowing the mixing of free-text with XML-elements makes parsing difficult. Your suggestion is good. But you can also consider this, which might be better if the description ever needs to be long: Feature types Andrew Dalke wrote: > The spec has an example which looks like > > > Feature types > > > > > The description in the NAMESPACE is somewhat tricky to handle. > I was able to get a schema description for it as mixed text, but it > allows the following > > > > Feature > > types > > > I would rather see something like > > > > > > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From dalke at dalkescientific.com Wed Mar 16 14:40:04 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 16 Mar 2005 12:40:04 -0700 Subject: [DAS2] namespace description In-Reply-To: <423881D3.7000405@affymetrix.com> References: <423881D3.7000405@affymetrix.com> Message-ID: <0a855e1350fde8b1a248341133775a4a@dalkescientific.com> Ed: > Your suggestion is good. But you can also consider this, which might > be better if the description ever needs to be long: > > > Feature types The documentation doesn't say what text could go there so I don't know how long it might be. The existing description is very short. The SOURCE and VERSION elements already use a "description" attribute. In the versioned source request the namespace result looks like Feature types The documentation says A data format recognized by this server. The id attribute is the short name of the format for use in the GET URL, and the type attribute is the returned document's MIME type. I assume the format id is the one used in fetching information about the features, and that an example of the GET is http://server/das/genome/sourceid/version/feature?format=format; filter1=value;filter2=value... In most places the term "id" is used for URIs. Eg """The id attribute is a URI (typically in the form of a relative URL) that identifies this data source. """ """Each feature, subfeature and location has a URL-based ID.""" """The id attribute is a URI that identifies this version""" I would like the use of the 'id' attribute to be consistent across DAS/2. In that way there can be a section at the top of the document which says "the 'id' attribute is always a URI. Relative URIs are resolved ...." instead of documenting it for each element. Could this use of "id" be turned into "name"? Or "format"? The other two places that use "id" for something other than a URI are CAPABILITIES> I propose using "name" here as well. I checked -- the HTTP/1.1 spec just uses the word "method" to describe these http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1 and seem too redundant. But I'm fine with that too. The other place that uses 'id' is in the types request. The list of properties for a given types looks like Again here I would prefer "name" or "key". I think we talked about this last fall and I was overruled, but I'll try again. ;) Andrew dalke at dalkescientific.com From ed_erwin at affymetrix.com Wed Mar 16 18:49:49 2005 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Wed, 16 Mar 2005 15:49:49 -0800 Subject: [DAS2] standardizing 'id' use In-Reply-To: <2995ab5161e97f0ece442ce659ca567a@dalkescientific.com> References: <2995ab5161e97f0ece442ce659ca567a@dalkescientific.com> Message-ID: <4238C61D.30304@affymetrix.com> I like consistency, but .... If you are looking for an attribute name that always means 'URI', why not call it 'uri' ? Saying that 'id' is always a 'uri' is reaching too far, IMO. As for switching 'id' to 'key' in elements, I agree with you. Andrew Dalke wrote: > I would like the use of the 'id' attribute to be consistent > across DAS/2. In that way there can be a section at the top > of the document which says "the 'id' attribute is always a URI. > Relative URIs are resolved ...." instead of documenting it > for each element. > From allenday at ucla.edu Fri Mar 18 17:14:04 2005 From: allenday at ucla.edu (Allen Day) Date: Fri, 18 Mar 2005 14:14:04 -0800 (PST) Subject: [DAS2] das2 comments In-Reply-To: <04dc1f43a62e51ed024ccd82932d6127@dalkescientific.com> References: <04dc1f43a62e51ed024ccd82932d6127@dalkescientific.com> Message-ID: > Speaking of which, do the prop fields each need a "name" > or "description" attribute? How is a user supposed to > distinguish between these two images? > > href = "http://www.wormbase.org/db/seq/gbrowse_img?name=cTel54X.1" > /> > > mime_type = "image/jpeg" > content_encoding = "base64"> > BASE64-ENCODED-DATA-HERE > I'd like to see something like MAGE's NameValueType used, can we borrow from this? http://www.affymetrix.com/support/technical/manual/netaffx_MAGE_ML_manual.affx There are formal DTDs and OMG documentation here, but they seem to be broken: http://www.mged.org/Workgroups/MAGE/mage-ml.html -Allen From allenday at ucla.edu Fri Mar 18 17:15:18 2005 From: allenday at ucla.edu (Allen Day) Date: Fri, 18 Mar 2005 14:15:18 -0800 (PST) Subject: [DAS2] initial schemas and using a template In-Reply-To: <4742fe48454e11018d4f9155205221d8@dalkescientific.com> References: <4742fe48454e11018d4f9155205221d8@dalkescientific.com> Message-ID: > I would like it if the HTML documentation and the XML > schema were more closely tied together. What I > propose is to move the description of the different > fields into the XML schema, as noted above. I would > then write a program to convert the XML form into > HTML that could be inserted into the documentation. > > Most likely this means using some sort of template/ > string substitution to put everything together. > And a makefile to merge them. > > This would also help me develop a validator for the > data files in the spec itself. Eg in my testing > today I fixed two typos in the examples. What I > can do is pull the XML and tab-delimited files > out of the HTML and into separate files. These > can be tested standalone and the template can say > "insert file ABC here". > > Sound good to you all? This sounds wonderful! -Allen From allenday at ucla.edu Fri Mar 18 17:18:30 2005 From: allenday at ucla.edu (Allen Day) Date: Fri, 18 Mar 2005 14:18:30 -0800 (PST) Subject: [DAS2] standardizing 'id' use In-Reply-To: <2995ab5161e97f0ece442ce659ca567a@dalkescientific.com> References: <2995ab5161e97f0ece442ce659ca567a@dalkescientific.com> Message-ID: I agree with all of this. For though you may want to allow both your proposed name/key attribute as well as id. The ids given in the example, as I recall, are built-ins, but it is possible to define your own properties, in which case you'd want to reference as a URI using an id attribute. On Wed, 16 Mar 2005, Andrew Dalke wrote: > In the versioned source request the namespace result looks like > > > Feature types > > > > > The documentation says > > > A data format recognized by this server. The id attribute is the > short name of the format for use in the GET URL, and the type attribute > is the returned document's MIME type. > > I assume the format id is the one used in fetching information > about the features, and that an example of the GET is > > http://server/das/genome/sourceid/version/feature?format=format; > filter1=value;filter2=value... > > > In most places the term "id" is used for URIs. Eg > > """The id attribute is a URI (typically in the form of a > relative URL) that identifies this data source. """ > > """Each feature, subfeature and location has a URL-based ID.""" > > """The id attribute is a URI that identifies this version""" > > I would like the use of the 'id' attribute to be consistent > across DAS/2. In that way there can be a section at the top > of the document which says "the 'id' attribute is always a URI. > Relative URIs are resolved ...." instead of documenting it > for each element. > > Could this use of "id" be turned into "name"? Or "format"? > > > The other two places that use "id" for something other than > a URI are > > CAPABILITIES> > > > > > > > > I propose using "name" here as well. I checked -- the > HTTP/1.1 spec just uses the word "method" to describe these > http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1 > and > > seem too redundant. But I'm fine with that too. > > > The other place that uses 'id' is in the types request. > The list of properties for a given types looks like > > > > > > > Again here I would prefer "name" or "key". I > think we talked about this last fall and I was > overruled, but I'll try again. ;) > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From lstein at cshl.edu Mon Mar 21 11:45:02 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 21 Mar 2005 11:45:02 -0500 Subject: [DAS2] DAS/2 progress report - CSHL In-Reply-To: References: Message-ID: <200503211145.03105.lstein@cshl.edu> Hi Gregg, Here is some material for the progress report: DAS/2 Specification (CSHL) The work on the DAS/2 specification has centered on two major categories: enhancement of the retrieval protocol to support more flexible queries, and development of a new writeback protocol to support creation of new biological data objects and editing of existing ones. The changes to the retrieval protocol are finished. It has been enhanced to provide for retrieval of biological objects using multiple combinations of attributes. In DAS/1, objects could only be retrieved using combinations of genomic position, object name, and object type. In DAS/2, objects can be retrieved using any combination of arbitrary attribute: for example, genes can be retrieved using GO terms or the presence of an orthologue in another species. The writeback protocol is roughly 75% complete. We have developed a locking protocol, a scheme for requesting new identifiers, and a scheme for addressing objects that are not yet created. We have also specified the protocol for writing new objects into the database. However there is still ambiguity in the protocol for updating existing objects, primarily decisions relating to the granularity of updates. On the one hand, it might be desirable to be able to address and update an individual field in an object. On the other hand, this type of granularity significantly complicates the implementation of the protocol and introduces issues relating to locking and atomicity of updates. Work on the protocol is continuing, however, and we anticipate that it will be ready for public comment by the end of spring 2005. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From lstein at cshl.edu Mon Mar 21 11:46:40 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 21 Mar 2005 11:46:40 -0500 Subject: [DAS2] Re: DAS/2 progress report Message-ID: <200503211146.40580.lstein@cshl.edu> Hi Folks, I apologize for sending that letter to the mailing list. It was actually intended to be sent directly to Gregg. Feel free to comment on it, if you wish. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From dalke at dalkescientific.com Mon Mar 28 23:47:22 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 28 Mar 2005 21:47:22 -0700 Subject: [DAS2] more small changes Message-ID: <93d44da16b2f88fd4d0b4e5d04c02051@dalkescientific.com> Here's some more small changes I would like to make upon reviewing the spec some more. These are all pushing for consistency of the Content-Type and DTD names. As I expect it isn't a problem I'll go ahead and make the changes, subject to reversal in the future if so decided. > Fetching Information about Data Sources: The Sources Request > > Performing a GET on the DAS base URL is known as a "sources" > request, ... of type text/x-das-source+xml, and a compact > tab-delimited format of type text/x-das-source+compact. ... > > It's called "sources" so I think the content types should be "text/x-das-sources+xml" and "text/x-das-sources+compact" instead of the singular "source". I also think the DOCTYPE line should be in part because I don't know what "dsn" means. DAS Source something? > Fetching Information About a Versioned Source: The Versioned > Source Request > > By adding the version to the end of the path, the URL becomes an identifier > for the versioned data source. Retrieving it returns a document that > provides metadata about the data source and the capabilities that the > DAS/2 server provides for manipulating the data source. > > REQUEST: > http://www.wormbase.org/das/genome/volvox/2 > > RESPONSE: > Content-Type: text/x-das-source-details+xml The word "details" does not exist in the description. The word "version" does, in several forms. I propose using the Content-Type: text/x-das-source-version+xml Similarly I propose changing to use Sadly that's a very long name. I could go with "das2version". > Fetching Information About Feature Types: The "Types" Request ... > Content-Type: text/x-das-featuretype+xml ... > Content-Type: text/x-das-featuretype+compact In here I suggest using Content-Type: text/x-das-types+xml Content-Type: text/x-das-types+compact The DOCTYPE DTD URL is fine. > Fetching Information About Features: The Feature Request I haven't figured out the rule for when something is a singular request vs. when it's a plural request. Earlier it was a "Types" request when "type/" is appended to a versioned data source URL. It seems that this should be a "Features" request. > The Das2XML-Formatted Feature Response ... > Content-Type: text/x-das-feature+xml ... > This returns a list of features so should be "features" Content-Type: text/x-das-features+xml > Retrieving Regions & Assemblies by homology this should be Retrieving Regions & Assemblies: the "Regions" request > Content-Type: text/x-das-region+xml > and those should be Content-Type: text/x-das-regions+xml > Retrieving Controlled Vocabularies of Property Types ... > Content-Type: text/x-das-property+xml ... > By now you all have probably gotten the hang of things ;) Content-Type: text/x-das-properties+xml One last point. The sources request is one request that can return a list of 0 or more elements. It is described with a schema that allows zeroOrMore SOURCE elements. A source request (for a single source and not a list of sources) returns an XML document that is described with the same schema. It looks like In use there will only be one SOURCE in the SOURCES so a more precise schema could enforce that. Personally I'm okay with it is as it is. That makes for about 3 fewer schemas and the expense of one extra check in the client. I'm just pointing it out. 'Cause that's the way I am. :) Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Mar 31 02:46:30 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 31 Mar 2005 00:46:30 -0700 Subject: [DAS2] template-based spec generation Message-ID: <4a1dd29807b864a3ef3dd31fedbcdbbd@dalkescientific.com> I've checked in the code for generating the specification from a template. You can see the result (for now) at http://www.dalkescientific.com/das2_get.new.html I haven't changed the original spec. To compare see http://www.dalkescientific.com/das2_get.html The changes I made were: 1) pull out the XML, tab-delimited, and text examples into individual files. These will be used as part of the validation, to help ensure that the examples in the spec are valid. 2) write Relax NG schema definitions for all of the XML files. I used the "compact" notation. The schema are stored in files ending with ".rnc". 3) wrote a Makefile to turn the .rnc files into .rng and .dtd files. 4) figure out how to get documentation from the .rng file, so I can insert the documentation directly into the spec 5) Switch to a template-based system for the spec. I chose the Zope Page Template language ("ZPT"). 6) Write 'convert_template.py' to convert the template into the final HTML. For it to work you will need to install ElementTree and ZPT ElementTree http://effbot.org/zone/element-index.htm ZPT at http://zpt.sourceforge.net/ It includes a few special commands to insert a file and to generate HTML documentation given the comments in the schema definitions. The latter uses a "macro" so that all of the HTML documentation is consistent. I also have links to the RNC/RNG/DTD files but they won't work in the above link. I do know the result doesn't flow as smoothly as the original spec. I wanted people to see what it could do. If we go this route I'll clean up the text and check in the ElementTree and ZPT modules so people like Lincoln don't need to deal with augmenting their local Python install. In the meanwhile I've been working on the validation system proper. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Thu Mar 31 13:51:01 2005 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 31 Mar 2005 10:51:01 -0800 Subject: [DAS2] UML class diagrams for DAS2 client Message-ID: Just thought people might want to see what the data models for the DAS/2 client are looking like. This UML diagram pretty much covers the APIs for the retrieval part of the current DAS/2 spec, except for feature retrieval and filtering. I've tried to do these models as a "clean" interface, with no references to the genometry data models used by IGB. Actual implementation will be tightly integrated with genometry models. I'll be checking this into the genoviz source forge repository soon, want to figure out at the DAS/2 implementation meeting today where in the repository to actually put it gregg -------------- next part -------------- A non-text attachment was scrubbed... Name: Das2InterfacesB.JPG Type: image/jpeg Size: 270164 bytes Desc: Das2InterfacesB.JPG URL: From Gregg_Helt at affymetrix.com Thu Mar 31 18:19:01 2005 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 31 Mar 2005 15:19:01 -0800 Subject: [DAS2] DAS/2 grant year 1 progress report, regular DAS2 meetings Message-ID: As most of you know I'm submitting a progress report this week for year 1 of the DAS/2 grant. I'm attaching the text part of the report. Thanks to everyone who contributed summaries of the various ongoing work, pretty much all of it was incorporated in the report. Way back in September I proposed having semi-regular conference calls to coordinate work on the grant, especially regarding evolution of the DAS/2 spec. But I never followed through. Until now. How do people feel about a monthly conference call, starting as early as next week? What days/times work well for people? I think the first item on the agenda would be review and feedback on current state of the retrieval part of the spec. Then discussion on the writeback protocol. Also, here at Affymetrix we're currently holding a weekly combined DAS2 / IGB meeting. This started out being just the Affymetrix DAS2 / IGB team (me, Ed, and Steve), but has grown since then to include Alan, Andrew, and Ann joining in by teleconference. I'd like to think of it as more of an implementation-oriented, status report kind of meeting. Anyone else involved in the DAS2 grant who is interested in implementation details is welcome to join in. Thursdays 11:30-12:30 Pacific time, phone number 800-531-3250 (international: 303-928-2693), conference id 2879055. thanks, gregg -------------- next part -------------- A non-text attachment was scrubbed... Name: DAS2_progress_reportB.doc Type: application/msword Size: 38400 bytes Desc: DAS2_progress_reportB.doc URL: From dalke at dalkescientific.com Tue Mar 15 22:45:00 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Mar 2005 15:45:00 -0700 Subject: [DAS2] starting the validation work Message-ID: Hi all, I've started to work on the validation. I'm going through the HTML spec in CVS. Fixed a couple of small typos already. Here's things I haven't been able to resolve so far. The sources request document starts like this (I changed it to a local DTD for testing) According to http://www.w3.org/TR/REC-xml/ Validity constraint: Root Element Type The Name in the document type declaration MUST match the element type of the root element. I think that means the doctype should be ^^^^^^^--- this changed It looks like we could also do which would let an XML reader check if it knows the public name and if not fetch it from the given URL. If I follow it correctly we could just have a single DTD URL for everything we return because the first term in the doctype specifies the start element in the DTD. Something like Second, in reading through things it looks like the general decision in the XML community is to use the phrase "URI" instead of "URN". Eg, see the XML schema document at http://www.w3.org/TR/xmlschema-2/#anyURI where it uses URI and not URN. Indeed RFC 2396 says http://www.ietf.org/rfc/rfc2396.txt A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource. The term "Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable. This is different than what we use in the DAS2 spec, which says things like The id attribute is a URN (typically in the form of a relative URL) Shall I go through and change the "URN"s to "URI"s in the docs? Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Tue Mar 15 22:53:27 2005 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Tue, 15 Mar 2005 14:53:27 -0800 Subject: [DAS2] starting the validation work Message-ID: I think we should definitely be consistent in the spec in using "URI", not "URN". gregg -----Original Message----- From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open-bio.org] On Behalf Of Andrew Dalke Sent: Tuesday, March 15, 2005 2:45 PM To: das2 at portal.open-bio.org Subject: [DAS2] starting the validation work Second, in reading through things it looks like the general decision in the XML community is to use the phrase "URI" instead of "URN". Eg, see the XML schema document at http://www.w3.org/TR/xmlschema-2/#anyURI where it uses URI and not URN. Indeed RFC 2396 says http://www.ietf.org/rfc/rfc2396.txt A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource. The term "Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable. This is different than what we use in the DAS2 spec, which says things like The id attribute is a URN (typically in the form of a relative URL) Shall I go through and change the "URN"s to "URI"s in the docs? Andrew dalke at dalkescientific.com _______________________________________________ DAS2 mailing list DAS2 at portal.open-bio.org http://portal.open-bio.org/mailman/listinfo/das2 From ed_erwin at affymetrix.com Tue Mar 15 23:00:45 2005 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Tue, 15 Mar 2005 15:00:45 -0800 Subject: [DAS2] starting the validation work In-Reply-To: References: Message-ID: <4237691D.2050600@affymetrix.com> Well, test it and see if it works. If so, that looks like a good idea. I wouldn't have thought this was allowed by DTD's, but if it is, then great! Andrew Dalke wrote: > > > If I follow it correctly we could just have a single > DTD URL for everything we return because the first > term in the doctype specifies the start element in > the DTD. Something like > > "http://www.biodas.org/das2.dtd"> > > "http://www.biodas.org/das2.dtd"> > > "http://www.biodas.org/das2.dtd"> > From dalke at dalkescientific.com Tue Mar 15 23:07:50 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Mar 2005 16:07:50 -0700 Subject: [DAS2] starting the validation work In-Reply-To: References: Message-ID: <0caf7cadddc417a608f06f37e7459095@dalkescientific.com> Gregg: > I think we should definitely be consistent in the spec in using "URI", > not "URN". The ayes have it - changed. Ed > Well, test it and see if it works. If so, that looks like a good idea. Okay, I will. If anyone has experience with using DOCTYPE let me know... Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Wed Mar 16 05:30:10 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Mar 2005 22:30:10 -0700 Subject: [DAS2] das2 comments Message-ID: <04dc1f43a62e51ed024ccd82932d6127@dalkescientific.com> In going through spec I noticed some things that seem questionable. We have "SOURCES" which contains a list of "SOURCE" elements. "NAMESPACES" .... "NAMESPACE" elements. "TYPES" .... "TYPE" elements But we have "FEATURELIST" which contains "FEATURE" elements. And "REGION-LIST" which contains "REGION" element. "PROPERTY-LIST" ... "PROPERTY" There's also "CAPABILITIES", which contains "METHOD" elements. I suggest we normalize these to use the same style. My preference is for the English plurals FEATURELIST --> FEATURES REGION-LIST --> REGIONS PROPERTY-LIST --> PROPERTIES I'm not sure if CAPABILITIES/METHOD should be changed and if so, to what? The FEATURELIST/FEATURE/XID documentation says A typical feature will either have a single tag or a single tag, although it is possible (and sensible) to have one or more of both. If I understand it correctly then it's equivalent to the following, which I think is clearer A typical feature will have at least one tag or one tag. It is possible (and sensible) to have one or more of both. There's a FEATURE/PROP example that includes a bit of base64 encoded data that purports to be a jpeg. It isn't. When I decode it, 'file' says it's a MS Office document. When I look at the byte stream I see something that looks like the big endian Unicode BOM UTF-16/UTF-32 and the letters "abcdefghijkl" a 4 byte intervals. Any reason we couldn't have a real gif/png/jpeg/whatever here? Besides the need to make one. Speaking of which, do the prop fields each need a "name" or "description" attribute? How is a user supposed to distinguish between these two images? BASE64-ENCODED-DATA-HERE Some months ago we had the discussion on date representations. I thought we decided on ISO 8601 dates instead of RFC dates. Looking through my emails I see there wasn't a conclusion. In private email to Lincoln I said Were we to go this route I would say we define that all dates be given as YYYY-MM-DD all datetimes be given as YYYY-MM-DDTHH:MM:SS(.ss*)?(Z|[+-]hh:mm) (timezone required, fractions of a second optional), 0001 <= YYYY <= 9999, 00<=HH<=23 and leap second support is implementation dependent. This is compatible with ISO 8601, compatible with XML Schema, supportable by the likely DAS/2 clients and servers, and not dependent on any external specification. ISO dates or RFC dates? I vote ISO dates. If we go the ISO route we could more easily fit in with the Dublin core metadata elements. For example we could have dc:created = "1987-06-05" dc:description = "Volvox Example Database" However, I do not think this is needed, in part because I think the dates and the description fields are the only things that would be affected. Someone would need to present a good enough use case for it. The downside is that it adds another namespace to the system and another layer to understand. I've been trying to make sense of the XLink spec since that seems relevant to what we're doing. One possibility is to replace things that point to URIs with "xlink:href". But is that a good idea? Looking around I found Tim Berners-Lee's commentary "When should I use XLink?" http://www.w3.org/DesignIssues/XLink.html His answer is 2. You should use xlink whenever your application is one of hypertext linking, as xlink functionality such as power to control user interface behavior on link traversal is useful and should be implemented in a standard way to allow interoperability (BTW, there's the old joke that in a multiple choice answer you should pick the longest one. That's true in this case.) No one I think will browse this data directly. There will be some intermediate translation going on in between, from a web-based middle layer, a dedicated client, or an XSLT transformation. Those will be able to add xlink fields if needed. So again I mention it here as a possibility and for the record, but I don't think it's something to use. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Wed Mar 16 06:03:38 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Mar 2005 23:03:38 -0700 Subject: [DAS2] initial schemas and using a template Message-ID: <4742fe48454e11018d4f9155205221d8@dalkescientific.com> I have initial RELAX-NG schemas (in compact syntax) for the the XML in the das2_get spec. As I recall we're only working on GET for Y1 of the grant so I haven't touched the das2_put spec. To validate I downloaded jing-20030619 from http://www.thaiopensource.com/relaxng/jing.html and used the included file thusly java -jar jing.jar -c das-regionlist.rnc das-regionlist2.xml The "-c" option says that the schema is in compact format ("rnc" = "Relax Ng Compact"). For the XML I copied and pasted from the spec, changing the DOCTYPE to point to /dev/null For whatever reason jing requires that that file be present even though it isn't used. The program trang, available from the same site http://www.thaiopensource.com/relaxng/trang.html can be used to turn the compact notation into the RELAX NG XML syntax, or (as much as is possible) into an XML Schema or DTD. Here's an example of use java -jar ~/ftps/trang/trang-20030619/trang.jar \ das-details.rnc das-details.dtd The input file type is determined automatically based on the filename's extension. Here is an example of the rnc format. This is the schema for the response for details. default namespace = "http://www.biodas.org/ns/das/genome/2.00" element SOURCE { attribute xml:base { text }, ## The id attribute is a URN attribute id { text }, ## The description attribute provides a human readable string ## describing the data source. attribute description { text }, # not using taxon # missing doc_href? element VERSION { attribute id { text }, attribute description { text }, # missing doc_href? # better date string format? attribute created { text }, attribute modified { text }, element CAPABILITIES { element METHOD { # restrict to GET/PUT/POST/DELETE? attribute id { text } }+ }, element NAMESPACES { element NAMESPACE { attribute id { text }& text& element FORMAT { attribute id { text }, attribute type { text } }* }* } } } (As you can see, I have a question about if the SOURCE element should contain a doc_href attribute.) In the XML syntax (filename extension of ".rng") this looks like The id attribute is a URN The description attribute provides a human readable string describing the data source. One thing to note is that the "##" comments get converted into elements while the "#" comments get converted into . Potentially the XML with the documentation annotations could be converted into HTML. I didn't come across a program that does this already. It seems that most people roll their own converters. I would like it if the HTML documentation and the XML schema were more closely tied together. What I propose is to move the description of the different fields into the XML schema, as noted above. I would then write a program to convert the XML form into HTML that could be inserted into the documentation. Most likely this means using some sort of template/ string substitution to put everything together. And a makefile to merge them. This would also help me develop a validator for the data files in the spec itself. Eg in my testing today I fixed two typos in the examples. What I can do is pull the XML and tab-delimited files out of the HTML and into separate files. These can be tested standalone and the template can say "insert file ABC here". Sound good to you all? Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Wed Mar 16 18:40:09 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 16 Mar 2005 11:40:09 -0700 Subject: [DAS2] namespace description Message-ID: The spec has an example which looks like Feature types The description in the NAMESPACE is somewhat tricky to handle. I was able to get a schema description for it as mixed text, but it allows the following Feature types I would rather see something like Andrew dalke at dalkescientific.com From ed_erwin at affymetrix.com Wed Mar 16 18:58:27 2005 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Wed, 16 Mar 2005 10:58:27 -0800 Subject: [DAS2] namespace description In-Reply-To: References: Message-ID: <423881D3.7000405@affymetrix.com> I agree. Allowing the mixing of free-text with XML-elements makes parsing difficult. Your suggestion is good. But you can also consider this, which might be better if the description ever needs to be long: Feature types Andrew Dalke wrote: > The spec has an example which looks like > > > Feature types > > > > > The description in the NAMESPACE is somewhat tricky to handle. > I was able to get a schema description for it as mixed text, but it > allows the following > > > > Feature > > types > > > I would rather see something like > > > > > > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From dalke at dalkescientific.com Wed Mar 16 19:40:04 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 16 Mar 2005 12:40:04 -0700 Subject: [DAS2] namespace description In-Reply-To: <423881D3.7000405@affymetrix.com> References: <423881D3.7000405@affymetrix.com> Message-ID: <0a855e1350fde8b1a248341133775a4a@dalkescientific.com> Ed: > Your suggestion is good. But you can also consider this, which might > be better if the description ever needs to be long: > > > Feature types The documentation doesn't say what text could go there so I don't know how long it might be. The existing description is very short. The SOURCE and VERSION elements already use a "description" attribute. In the versioned source request the namespace result looks like Feature types The documentation says A data format recognized by this server. The id attribute is the short name of the format for use in the GET URL, and the type attribute is the returned document's MIME type. I assume the format id is the one used in fetching information about the features, and that an example of the GET is http://server/das/genome/sourceid/version/feature?format=format; filter1=value;filter2=value... In most places the term "id" is used for URIs. Eg """The id attribute is a URI (typically in the form of a relative URL) that identifies this data source. """ """Each feature, subfeature and location has a URL-based ID.""" """The id attribute is a URI that identifies this version""" I would like the use of the 'id' attribute to be consistent across DAS/2. In that way there can be a section at the top of the document which says "the 'id' attribute is always a URI. Relative URIs are resolved ...." instead of documenting it for each element. Could this use of "id" be turned into "name"? Or "format"? The other two places that use "id" for something other than a URI are CAPABILITIES> I propose using "name" here as well. I checked -- the HTTP/1.1 spec just uses the word "method" to describe these http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1 and seem too redundant. But I'm fine with that too. The other place that uses 'id' is in the types request. The list of properties for a given types looks like Again here I would prefer "name" or "key". I think we talked about this last fall and I was overruled, but I'll try again. ;) Andrew dalke at dalkescientific.com From ed_erwin at affymetrix.com Wed Mar 16 23:49:49 2005 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Wed, 16 Mar 2005 15:49:49 -0800 Subject: [DAS2] standardizing 'id' use In-Reply-To: <2995ab5161e97f0ece442ce659ca567a@dalkescientific.com> References: <2995ab5161e97f0ece442ce659ca567a@dalkescientific.com> Message-ID: <4238C61D.30304@affymetrix.com> I like consistency, but .... If you are looking for an attribute name that always means 'URI', why not call it 'uri' ? Saying that 'id' is always a 'uri' is reaching too far, IMO. As for switching 'id' to 'key' in elements, I agree with you. Andrew Dalke wrote: > I would like the use of the 'id' attribute to be consistent > across DAS/2. In that way there can be a section at the top > of the document which says "the 'id' attribute is always a URI. > Relative URIs are resolved ...." instead of documenting it > for each element. > From allenday at ucla.edu Fri Mar 18 22:14:04 2005 From: allenday at ucla.edu (Allen Day) Date: Fri, 18 Mar 2005 14:14:04 -0800 (PST) Subject: [DAS2] das2 comments In-Reply-To: <04dc1f43a62e51ed024ccd82932d6127@dalkescientific.com> References: <04dc1f43a62e51ed024ccd82932d6127@dalkescientific.com> Message-ID: > Speaking of which, do the prop fields each need a "name" > or "description" attribute? How is a user supposed to > distinguish between these two images? > > href = "http://www.wormbase.org/db/seq/gbrowse_img?name=cTel54X.1" > /> > > mime_type = "image/jpeg" > content_encoding = "base64"> > BASE64-ENCODED-DATA-HERE > I'd like to see something like MAGE's NameValueType used, can we borrow from this? http://www.affymetrix.com/support/technical/manual/netaffx_MAGE_ML_manual.affx There are formal DTDs and OMG documentation here, but they seem to be broken: http://www.mged.org/Workgroups/MAGE/mage-ml.html -Allen From allenday at ucla.edu Fri Mar 18 22:15:18 2005 From: allenday at ucla.edu (Allen Day) Date: Fri, 18 Mar 2005 14:15:18 -0800 (PST) Subject: [DAS2] initial schemas and using a template In-Reply-To: <4742fe48454e11018d4f9155205221d8@dalkescientific.com> References: <4742fe48454e11018d4f9155205221d8@dalkescientific.com> Message-ID: > I would like it if the HTML documentation and the XML > schema were more closely tied together. What I > propose is to move the description of the different > fields into the XML schema, as noted above. I would > then write a program to convert the XML form into > HTML that could be inserted into the documentation. > > Most likely this means using some sort of template/ > string substitution to put everything together. > And a makefile to merge them. > > This would also help me develop a validator for the > data files in the spec itself. Eg in my testing > today I fixed two typos in the examples. What I > can do is pull the XML and tab-delimited files > out of the HTML and into separate files. These > can be tested standalone and the template can say > "insert file ABC here". > > Sound good to you all? This sounds wonderful! -Allen From allenday at ucla.edu Fri Mar 18 22:18:30 2005 From: allenday at ucla.edu (Allen Day) Date: Fri, 18 Mar 2005 14:18:30 -0800 (PST) Subject: [DAS2] standardizing 'id' use In-Reply-To: <2995ab5161e97f0ece442ce659ca567a@dalkescientific.com> References: <2995ab5161e97f0ece442ce659ca567a@dalkescientific.com> Message-ID: I agree with all of this. For though you may want to allow both your proposed name/key attribute as well as id. The ids given in the example, as I recall, are built-ins, but it is possible to define your own properties, in which case you'd want to reference as a URI using an id attribute. On Wed, 16 Mar 2005, Andrew Dalke wrote: > In the versioned source request the namespace result looks like > > > Feature types > > > > > The documentation says > > > A data format recognized by this server. The id attribute is the > short name of the format for use in the GET URL, and the type attribute > is the returned document's MIME type. > > I assume the format id is the one used in fetching information > about the features, and that an example of the GET is > > http://server/das/genome/sourceid/version/feature?format=format; > filter1=value;filter2=value... > > > In most places the term "id" is used for URIs. Eg > > """The id attribute is a URI (typically in the form of a > relative URL) that identifies this data source. """ > > """Each feature, subfeature and location has a URL-based ID.""" > > """The id attribute is a URI that identifies this version""" > > I would like the use of the 'id' attribute to be consistent > across DAS/2. In that way there can be a section at the top > of the document which says "the 'id' attribute is always a URI. > Relative URIs are resolved ...." instead of documenting it > for each element. > > Could this use of "id" be turned into "name"? Or "format"? > > > The other two places that use "id" for something other than > a URI are > > CAPABILITIES> > > > > > > > > I propose using "name" here as well. I checked -- the > HTTP/1.1 spec just uses the word "method" to describe these > http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1 > and > > seem too redundant. But I'm fine with that too. > > > The other place that uses 'id' is in the types request. > The list of properties for a given types looks like > > > > > > > Again here I would prefer "name" or "key". I > think we talked about this last fall and I was > overruled, but I'll try again. ;) > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From lstein at cshl.edu Mon Mar 21 16:45:02 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 21 Mar 2005 11:45:02 -0500 Subject: [DAS2] DAS/2 progress report - CSHL In-Reply-To: References: Message-ID: <200503211145.03105.lstein@cshl.edu> Hi Gregg, Here is some material for the progress report: DAS/2 Specification (CSHL) The work on the DAS/2 specification has centered on two major categories: enhancement of the retrieval protocol to support more flexible queries, and development of a new writeback protocol to support creation of new biological data objects and editing of existing ones. The changes to the retrieval protocol are finished. It has been enhanced to provide for retrieval of biological objects using multiple combinations of attributes. In DAS/1, objects could only be retrieved using combinations of genomic position, object name, and object type. In DAS/2, objects can be retrieved using any combination of arbitrary attribute: for example, genes can be retrieved using GO terms or the presence of an orthologue in another species. The writeback protocol is roughly 75% complete. We have developed a locking protocol, a scheme for requesting new identifiers, and a scheme for addressing objects that are not yet created. We have also specified the protocol for writing new objects into the database. However there is still ambiguity in the protocol for updating existing objects, primarily decisions relating to the granularity of updates. On the one hand, it might be desirable to be able to address and update an individual field in an object. On the other hand, this type of granularity significantly complicates the implementation of the protocol and introduces issues relating to locking and atomicity of updates. Work on the protocol is continuing, however, and we anticipate that it will be ready for public comment by the end of spring 2005. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From lstein at cshl.edu Mon Mar 21 16:46:40 2005 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 21 Mar 2005 11:46:40 -0500 Subject: [DAS2] Re: DAS/2 progress report Message-ID: <200503211146.40580.lstein@cshl.edu> Hi Folks, I apologize for sending that letter to the mailing list. It was actually intended to be sent directly to Gregg. Feel free to comment on it, if you wish. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 NOTE: Please copy Sandra Michelsen on all emails regarding scheduling and other time-critical topics. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From dalke at dalkescientific.com Tue Mar 29 04:47:22 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 28 Mar 2005 21:47:22 -0700 Subject: [DAS2] more small changes Message-ID: <93d44da16b2f88fd4d0b4e5d04c02051@dalkescientific.com> Here's some more small changes I would like to make upon reviewing the spec some more. These are all pushing for consistency of the Content-Type and DTD names. As I expect it isn't a problem I'll go ahead and make the changes, subject to reversal in the future if so decided. > Fetching Information about Data Sources: The Sources Request > > Performing a GET on the DAS base URL is known as a "sources" > request, ... of type text/x-das-source+xml, and a compact > tab-delimited format of type text/x-das-source+compact. ... > > It's called "sources" so I think the content types should be "text/x-das-sources+xml" and "text/x-das-sources+compact" instead of the singular "source". I also think the DOCTYPE line should be in part because I don't know what "dsn" means. DAS Source something? > Fetching Information About a Versioned Source: The Versioned > Source Request > > By adding the version to the end of the path, the URL becomes an identifier > for the versioned data source. Retrieving it returns a document that > provides metadata about the data source and the capabilities that the > DAS/2 server provides for manipulating the data source. > > REQUEST: > http://www.wormbase.org/das/genome/volvox/2 > > RESPONSE: > Content-Type: text/x-das-source-details+xml The word "details" does not exist in the description. The word "version" does, in several forms. I propose using the Content-Type: text/x-das-source-version+xml Similarly I propose changing to use Sadly that's a very long name. I could go with "das2version". > Fetching Information About Feature Types: The "Types" Request ... > Content-Type: text/x-das-featuretype+xml ... > Content-Type: text/x-das-featuretype+compact In here I suggest using Content-Type: text/x-das-types+xml Content-Type: text/x-das-types+compact The DOCTYPE DTD URL is fine. > Fetching Information About Features: The Feature Request I haven't figured out the rule for when something is a singular request vs. when it's a plural request. Earlier it was a "Types" request when "type/" is appended to a versioned data source URL. It seems that this should be a "Features" request. > The Das2XML-Formatted Feature Response ... > Content-Type: text/x-das-feature+xml ... > This returns a list of features so should be "features" Content-Type: text/x-das-features+xml > Retrieving Regions & Assemblies by homology this should be Retrieving Regions & Assemblies: the "Regions" request > Content-Type: text/x-das-region+xml > and those should be Content-Type: text/x-das-regions+xml > Retrieving Controlled Vocabularies of Property Types ... > Content-Type: text/x-das-property+xml ... > By now you all have probably gotten the hang of things ;) Content-Type: text/x-das-properties+xml One last point. The sources request is one request that can return a list of 0 or more elements. It is described with a schema that allows zeroOrMore SOURCE elements. A source request (for a single source and not a list of sources) returns an XML document that is described with the same schema. It looks like In use there will only be one SOURCE in the SOURCES so a more precise schema could enforce that. Personally I'm okay with it is as it is. That makes for about 3 fewer schemas and the expense of one extra check in the client. I'm just pointing it out. 'Cause that's the way I am. :) Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Mar 31 07:46:30 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 31 Mar 2005 00:46:30 -0700 Subject: [DAS2] template-based spec generation Message-ID: <4a1dd29807b864a3ef3dd31fedbcdbbd@dalkescientific.com> I've checked in the code for generating the specification from a template. You can see the result (for now) at http://www.dalkescientific.com/das2_get.new.html I haven't changed the original spec. To compare see http://www.dalkescientific.com/das2_get.html The changes I made were: 1) pull out the XML, tab-delimited, and text examples into individual files. These will be used as part of the validation, to help ensure that the examples in the spec are valid. 2) write Relax NG schema definitions for all of the XML files. I used the "compact" notation. The schema are stored in files ending with ".rnc". 3) wrote a Makefile to turn the .rnc files into .rng and .dtd files. 4) figure out how to get documentation from the .rng file, so I can insert the documentation directly into the spec 5) Switch to a template-based system for the spec. I chose the Zope Page Template language ("ZPT"). 6) Write 'convert_template.py' to convert the template into the final HTML. For it to work you will need to install ElementTree and ZPT ElementTree http://effbot.org/zone/element-index.htm ZPT at http://zpt.sourceforge.net/ It includes a few special commands to insert a file and to generate HTML documentation given the comments in the schema definitions. The latter uses a "macro" so that all of the HTML documentation is consistent. I also have links to the RNC/RNG/DTD files but they won't work in the above link. I do know the result doesn't flow as smoothly as the original spec. I wanted people to see what it could do. If we go this route I'll clean up the text and check in the ElementTree and ZPT modules so people like Lincoln don't need to deal with augmenting their local Python install. In the meanwhile I've been working on the validation system proper. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Thu Mar 31 18:51:01 2005 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 31 Mar 2005 10:51:01 -0800 Subject: [DAS2] UML class diagrams for DAS2 client Message-ID: Just thought people might want to see what the data models for the DAS/2 client are looking like. This UML diagram pretty much covers the APIs for the retrieval part of the current DAS/2 spec, except for feature retrieval and filtering. I've tried to do these models as a "clean" interface, with no references to the genometry data models used by IGB. Actual implementation will be tightly integrated with genometry models. I'll be checking this into the genoviz source forge repository soon, want to figure out at the DAS/2 implementation meeting today where in the repository to actually put it gregg -------------- next part -------------- A non-text attachment was scrubbed... Name: Das2InterfacesB.JPG Type: image/jpeg Size: 270164 bytes Desc: Das2InterfacesB.JPG URL: From Gregg_Helt at affymetrix.com Thu Mar 31 23:19:01 2005 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 31 Mar 2005 15:19:01 -0800 Subject: [DAS2] DAS/2 grant year 1 progress report, regular DAS2 meetings Message-ID: As most of you know I'm submitting a progress report this week for year 1 of the DAS/2 grant. I'm attaching the text part of the report. Thanks to everyone who contributed summaries of the various ongoing work, pretty much all of it was incorporated in the report. Way back in September I proposed having semi-regular conference calls to coordinate work on the grant, especially regarding evolution of the DAS/2 spec. But I never followed through. Until now. How do people feel about a monthly conference call, starting as early as next week? What days/times work well for people? I think the first item on the agenda would be review and feedback on current state of the retrieval part of the spec. Then discussion on the writeback protocol. Also, here at Affymetrix we're currently holding a weekly combined DAS2 / IGB meeting. This started out being just the Affymetrix DAS2 / IGB team (me, Ed, and Steve), but has grown since then to include Alan, Andrew, and Ann joining in by teleconference. I'd like to think of it as more of an implementation-oriented, status report kind of meeting. Anyone else involved in the DAS2 grant who is interested in implementation details is welcome to join in. Thursdays 11:30-12:30 Pacific time, phone number 800-531-3250 (international: 303-928-2693), conference id 2879055. thanks, gregg -------------- next part -------------- A non-text attachment was scrubbed... Name: DAS2_progress_reportB.doc Type: application/msword Size: 38400 bytes Desc: DAS2_progress_reportB.doc URL: