From ch-das at bobobeach.com Mon Feb 25 10:51:39 2008 From: ch-das at bobobeach.com (Cyrus Harmon) Date: Mon, 25 Feb 2008 07:51:39 -0800 Subject: [DAS2] DAS/2 questions Message-ID: <81FD5A82-2D8D-47AD-8FD8-1757D92621D4@bobobeach.com> Hello DAS folks, I've been looking at the DAS/2 spec and have a few questions/comments: [0. I'm assuming that DAS2 is the thing to use and that the version of the spec found here:http://biodas.org/documents/das2/das2_get.htmlis as close to normative/current/etc... as can be found.] 1. The table at the beginning of http://biodas.org/documents/das2/das2_get.html lists 4 types: sources, segments, types and features. Section 1.2 says "Each of the five new formats has its own MIME type." and then goes on to list three: "application/x-das-sources+xml, application/x- das-features+xml, application/x-das-types+xml". Are there three, four or five types? 2. It seems to me that it would be worth splitting up the transport issues from the filespec. Why not have a spec for the XML and a spec for DAS-over-http(s)? This seems trivial (although it of course requires a bit more work to maintain two resources rather than one), but I could be wrong. Clearly, some of the sensible values that a DAS server would return are based on things that are established at the time of the request, but the spec should still allow for construction of DAS/2 files without regard to the particular transport layer. Perhaps there's a need for establishing some set of criteria like well- formed-ness and validity that describe increasing levels of "correctness" and one could enforce the transport related issues at one of the higher levels. 3. DTDs? Searching for the string DTD in the document turns up empty. Is this by design? 4. Without a DTD it's a bit hard to read (well, the DTD might not help too much, but it does give some constraints) the specs for things like SOURCES. I'd suggest that the detailed document sections lead off with some sort of formal-ish representation of what is in that document (or document element) and then follow up with the examples. It seems that there is a fairly small number of document elements for each document type. Can we list those in their own sections in section 3.X as 3.X.Y and in these sections be explicit about what is in each document element here? Thanks, Cyrus From gregghelt at gmail.com Mon Feb 25 13:21:17 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Mon, 25 Feb 2008 10:21:17 -0800 Subject: [DAS2] DAS/2 questions In-Reply-To: <81FD5A82-2D8D-47AD-8FD8-1757D92621D4@bobobeach.com> References: <81FD5A82-2D8D-47AD-8FD8-1757D92621D4@bobobeach.com> Message-ID: <50158cb00802251021q5b79d2bby541bcccbe18d5cbc@mail.gmail.com> On Mon, Feb 25, 2008 at 7:51 AM, Cyrus Harmon wrote: > > Hello DAS folks, > > I've been looking at the DAS/2 spec and have a few questions/comments: > > [0. I'm assuming that DAS2 is the thing to use and that the version of > the spec found here:http://biodas.org/documents/das2/das2_get.htmlis > as close to normative/current/etc... as can be found.] Yep, that's the current version of the DAS/2 genome annotation spec. 1. The table at the beginning of > http://biodas.org/documents/das2/das2_get.html > lists 4 types: sources, segments, types and features. Section 1.2 > says "Each of the five new formats has its own MIME type." and then > goes on to list three: "application/x-das-sources+xml, application/x- > das-features+xml, application/x-das-types+xml". Are there three, four > or five types? Four different formats: sources, segments, types, features. I'll fix the spec doc. > 2. It seems to me that it would be worth splitting up the transport > issues from the filespec. Why not have a spec for the XML and a spec > for DAS-over-http(s)? This seems trivial (although it of course > requires a bit more work to maintain two resources rather than one), > but I could be wrong. Clearly, some of the sensible values that a DAS > server would return are based on things that are established at the > time of the request, but the spec should still allow for construction > of DAS/2 files without regard to the particular transport layer. > Perhaps there's a need for establishing some set of criteria like well- > formed-ness and validity that describe increasing levels of > "correctness" and one could enforce the transport related issues at > one of the higher levels. At one point the the spec was split up more, and the consensus among the contributors was that it needed to be consolidated. Hence the current organization. This may be worth revisiting. The readability and flow of the current doc could definitely be improved on. > 3. DTDs? Searching for the string DTD in the document turns up empty. > Is this by design? > 4. Without a DTD it's a bit hard to read (well, the DTD might not help > too much, but it does give some constraints) the specs for things like > SOURCES. I'd suggest that the detailed document sections lead off with > some sort of formal-ish representation of what is in that document (or > document element) and then follow up with the examples. It seems that > there is a fairly small number of document elements for each document > type. Can we list those in their own sections in section 3.X as 3.X.Y > and in these sections be explicit about what is in each document > element here? I apologize for the missing links from the HTML spec doc to the formal schemas! Somewhere along the way in our efforts to improve the HTML doc the links got dropped out. I'll add them back in. In the meantime here's the link I use to the CVS head for the schema: http://cvs.biodas.org/cgi-bin/viewcvs/viewcvs.cgi/das/das2/das2_schemas.rnc?rev=HEAD&cvsroot=biodas&content-type=text/vnd.viewcvs-markup There are no DTDs for DAS/2. Instead we use the RELAX-NG schema language to formally describe DAS/2 XML. Why RELAX-NG instead of DTD, XML-Schema or other alternatives? Quick answer: James Clark. Longer answer: search the web, there's plenty of debate about what is the best XML schema language. You can sort-of covert a RELAX-NG schema to a DTD, but there are many useful constraints you can specify in RELAX-NG that you can't in a DTD, which therefore get dropped in the conversion process. We felt when designing DAS/2 that we shouldn't include a down-converted DTD spec because that might encourage developers to use DTD validators etc. to determine whether an XML doc is valid DAS/2 when in fact there are plenty of ways a doc could pass a DTD validation but not a RELAX-NG validation. And there are plenty of RELAX-NG validators and other RELAX-NG tools out there. Andrew Dalke has written a web-based DAS/2 validator that utilizes RELAX-NG validation, though it's currently offline. (Andrew, are you out there? Let me know if we need to move the validation service to a more permanent host) There is also an XML-Schema version of the DAS/2 schema but is derived from the RELAX-NG schema -- first autoconverted then edited by hand to correct some conversion problems. The RELAX-NG schema is the official DAS/2 schema. Gregg From garret at globalmentor.com Fri Feb 29 13:31:12 2008 From: garret at globalmentor.com (Garret Wilson) Date: Fri, 29 Feb 2008 10:31:12 -0800 Subject: [DAS2] approval and maintenance of DAS/2 URI namespaces Message-ID: <47C84F70.4000401@globalmentor.com> I read the "list of global identifiers based on community consensus" at: http://www.biodas.org/wiki/GlobalSeqIDs It's great the DAS/2 is using URIs for global identification. For example, chromosome 1 for the B36.1 human genome assembly is identified by the URI: http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/dna/chr1 I'm unclear, however, regarding NCBI knowledge, participation, and maintenance of these URIs. The system of uniform resource identifiers prevents name clashes by delegating to the IANA-governed domain-name system, so that whoever owns a particular domain-based namespace can manage the URIs within that namespace. If DAS/2 were to use, for example, the http://www.biodas.org/genome/ namespace to manage all genome assembly URIs, I would have no cause for concern. DAS/2 is instead using several namespaces managed by other parties. I hope the response is that the NCBI is fully aware that its URI namespace is being used as authoritative genome assembly URIs and has either committed to maintaining that namespace or has delegated maintenance of the http://www.ncbi.nlm.nih.gov/genome/ namespace to DAS. (If the latter is true, it is unclear which organization would receive such a delegation of maintenance responsibility---the Open Bioinformatics Foundation? The BioTeam?) Thanks for any clarifications---this question is raised solely out of a desire for long-term standards that can be use consistently by the community. Garret From ch-das at bobobeach.com Mon Feb 25 15:51:39 2008 From: ch-das at bobobeach.com (Cyrus Harmon) Date: Mon, 25 Feb 2008 07:51:39 -0800 Subject: [DAS2] DAS/2 questions Message-ID: <81FD5A82-2D8D-47AD-8FD8-1757D92621D4@bobobeach.com> Hello DAS folks, I've been looking at the DAS/2 spec and have a few questions/comments: [0. I'm assuming that DAS2 is the thing to use and that the version of the spec found here:http://biodas.org/documents/das2/das2_get.htmlis as close to normative/current/etc... as can be found.] 1. The table at the beginning of http://biodas.org/documents/das2/das2_get.html lists 4 types: sources, segments, types and features. Section 1.2 says "Each of the five new formats has its own MIME type." and then goes on to list three: "application/x-das-sources+xml, application/x- das-features+xml, application/x-das-types+xml". Are there three, four or five types? 2. It seems to me that it would be worth splitting up the transport issues from the filespec. Why not have a spec for the XML and a spec for DAS-over-http(s)? This seems trivial (although it of course requires a bit more work to maintain two resources rather than one), but I could be wrong. Clearly, some of the sensible values that a DAS server would return are based on things that are established at the time of the request, but the spec should still allow for construction of DAS/2 files without regard to the particular transport layer. Perhaps there's a need for establishing some set of criteria like well- formed-ness and validity that describe increasing levels of "correctness" and one could enforce the transport related issues at one of the higher levels. 3. DTDs? Searching for the string DTD in the document turns up empty. Is this by design? 4. Without a DTD it's a bit hard to read (well, the DTD might not help too much, but it does give some constraints) the specs for things like SOURCES. I'd suggest that the detailed document sections lead off with some sort of formal-ish representation of what is in that document (or document element) and then follow up with the examples. It seems that there is a fairly small number of document elements for each document type. Can we list those in their own sections in section 3.X as 3.X.Y and in these sections be explicit about what is in each document element here? Thanks, Cyrus From gregghelt at gmail.com Mon Feb 25 18:21:17 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Mon, 25 Feb 2008 10:21:17 -0800 Subject: [DAS2] DAS/2 questions In-Reply-To: <81FD5A82-2D8D-47AD-8FD8-1757D92621D4@bobobeach.com> References: <81FD5A82-2D8D-47AD-8FD8-1757D92621D4@bobobeach.com> Message-ID: <50158cb00802251021q5b79d2bby541bcccbe18d5cbc@mail.gmail.com> On Mon, Feb 25, 2008 at 7:51 AM, Cyrus Harmon wrote: > > Hello DAS folks, > > I've been looking at the DAS/2 spec and have a few questions/comments: > > [0. I'm assuming that DAS2 is the thing to use and that the version of > the spec found here:http://biodas.org/documents/das2/das2_get.htmlis > as close to normative/current/etc... as can be found.] Yep, that's the current version of the DAS/2 genome annotation spec. 1. The table at the beginning of > http://biodas.org/documents/das2/das2_get.html > lists 4 types: sources, segments, types and features. Section 1.2 > says "Each of the five new formats has its own MIME type." and then > goes on to list three: "application/x-das-sources+xml, application/x- > das-features+xml, application/x-das-types+xml". Are there three, four > or five types? Four different formats: sources, segments, types, features. I'll fix the spec doc. > 2. It seems to me that it would be worth splitting up the transport > issues from the filespec. Why not have a spec for the XML and a spec > for DAS-over-http(s)? This seems trivial (although it of course > requires a bit more work to maintain two resources rather than one), > but I could be wrong. Clearly, some of the sensible values that a DAS > server would return are based on things that are established at the > time of the request, but the spec should still allow for construction > of DAS/2 files without regard to the particular transport layer. > Perhaps there's a need for establishing some set of criteria like well- > formed-ness and validity that describe increasing levels of > "correctness" and one could enforce the transport related issues at > one of the higher levels. At one point the the spec was split up more, and the consensus among the contributors was that it needed to be consolidated. Hence the current organization. This may be worth revisiting. The readability and flow of the current doc could definitely be improved on. > 3. DTDs? Searching for the string DTD in the document turns up empty. > Is this by design? > 4. Without a DTD it's a bit hard to read (well, the DTD might not help > too much, but it does give some constraints) the specs for things like > SOURCES. I'd suggest that the detailed document sections lead off with > some sort of formal-ish representation of what is in that document (or > document element) and then follow up with the examples. It seems that > there is a fairly small number of document elements for each document > type. Can we list those in their own sections in section 3.X as 3.X.Y > and in these sections be explicit about what is in each document > element here? I apologize for the missing links from the HTML spec doc to the formal schemas! Somewhere along the way in our efforts to improve the HTML doc the links got dropped out. I'll add them back in. In the meantime here's the link I use to the CVS head for the schema: http://cvs.biodas.org/cgi-bin/viewcvs/viewcvs.cgi/das/das2/das2_schemas.rnc?rev=HEAD&cvsroot=biodas&content-type=text/vnd.viewcvs-markup There are no DTDs for DAS/2. Instead we use the RELAX-NG schema language to formally describe DAS/2 XML. Why RELAX-NG instead of DTD, XML-Schema or other alternatives? Quick answer: James Clark. Longer answer: search the web, there's plenty of debate about what is the best XML schema language. You can sort-of covert a RELAX-NG schema to a DTD, but there are many useful constraints you can specify in RELAX-NG that you can't in a DTD, which therefore get dropped in the conversion process. We felt when designing DAS/2 that we shouldn't include a down-converted DTD spec because that might encourage developers to use DTD validators etc. to determine whether an XML doc is valid DAS/2 when in fact there are plenty of ways a doc could pass a DTD validation but not a RELAX-NG validation. And there are plenty of RELAX-NG validators and other RELAX-NG tools out there. Andrew Dalke has written a web-based DAS/2 validator that utilizes RELAX-NG validation, though it's currently offline. (Andrew, are you out there? Let me know if we need to move the validation service to a more permanent host) There is also an XML-Schema version of the DAS/2 schema but is derived from the RELAX-NG schema -- first autoconverted then edited by hand to correct some conversion problems. The RELAX-NG schema is the official DAS/2 schema. Gregg From garret at globalmentor.com Fri Feb 29 18:31:12 2008 From: garret at globalmentor.com (Garret Wilson) Date: Fri, 29 Feb 2008 10:31:12 -0800 Subject: [DAS2] approval and maintenance of DAS/2 URI namespaces Message-ID: <47C84F70.4000401@globalmentor.com> I read the "list of global identifiers based on community consensus" at: http://www.biodas.org/wiki/GlobalSeqIDs It's great the DAS/2 is using URIs for global identification. For example, chromosome 1 for the B36.1 human genome assembly is identified by the URI: http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/dna/chr1 I'm unclear, however, regarding NCBI knowledge, participation, and maintenance of these URIs. The system of uniform resource identifiers prevents name clashes by delegating to the IANA-governed domain-name system, so that whoever owns a particular domain-based namespace can manage the URIs within that namespace. If DAS/2 were to use, for example, the http://www.biodas.org/genome/ namespace to manage all genome assembly URIs, I would have no cause for concern. DAS/2 is instead using several namespaces managed by other parties. I hope the response is that the NCBI is fully aware that its URI namespace is being used as authoritative genome assembly URIs and has either committed to maintaining that namespace or has delegated maintenance of the http://www.ncbi.nlm.nih.gov/genome/ namespace to DAS. (If the latter is true, it is unclear which organization would receive such a delegation of maintenance responsibility---the Open Bioinformatics Foundation? The BioTeam?) Thanks for any clarifications---this question is raised solely out of a desire for long-term standards that can be use consistently by the community. Garret