[DAS2] initial schemas and using a template

Wed Mar 16 06:03:38 UTC 2005

I have initial RELAX-NG schemas (in compact syntax)
for the the XML in the das2_get spec.  As I recall we're
only working on GET for Y1 of the grant so I haven't
touched the das2_put spec.

To validate I downloaded jing-20030619 from
     http://www.thaiopensource.com/relaxng/jing.html
and used the included file thusly

java -jar jing.jar -c das-regionlist.rnc das-regionlist2.xml

The "-c" option says that the schema is in compact format
("rnc" = "Relax Ng Compact").  For the XML I copied and
pasted from the spec, changing the DOCTYPE to point to /dev/null

<!DOCTYPE DAS2DSN SYSTEM "/dev/null">

For whatever reason jing requires that that file be present
even though it isn't used.

The program trang, available from the same site
   http://www.thaiopensource.com/relaxng/trang.html
can be used to turn the compact notation into the RELAX NG
XML syntax,  or (as much as is possible) into an XML Schema
or DTD.  Here's an example of use

   java -jar ~/ftps/trang/trang-20030619/trang.jar \
         das-details.rnc das-details.dtd

The input file type is determined automatically based
on the filename's extension.

Here is an example of the rnc format.  This is the
schema for the response for details.

default namespace = "http://www.biodas.org/ns/das/genome/2.00"

element SOURCE {
     attribute xml:base { text },
     ## The id attribute is a URN
     attribute id { text },

     ## The description attribute provides a human readable string
     ## describing the data source.
     attribute description { text },

     # not using taxon
     # missing doc_href?

     element VERSION {
       attribute id { text },
       attribute description { text },
       # missing doc_href?

       # better date string format?
       attribute created { text },
       attribute modified { text },

       element CAPABILITIES {
         element METHOD {
           # restrict to GET/PUT/POST/DELETE?
           attribute id { text }
         }+
       },

       element NAMESPACES {
         element NAMESPACE {
           attribute id { text }&
           text&
           element FORMAT {
             attribute id { text },
             attribute type { text }
           }*
         }*
       }

     }
}

(As you can see, I have a question about if the SOURCE
element should contain a doc_href attribute.)

In the XML syntax (filename extension of ".rng") this
looks like

<?xml version="1.0" encoding="UTF-8"?>
<element name="SOURCE" 
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" 
ns="http://www.biodas.org/ns/das/genome/2.00" 
xmlns="http://relaxng.org/ns/structure/1.0">
   <attribute name="xml:base"/>
   <attribute name="id">
     <a:documentation>The id attribute is a URN</a:documentation>
   </attribute>
   <attribute name="description">
     <a:documentation>The description attribute provides a human 
readable string
describing the data source.</a:documentation>
   </attribute>
   <!--
     not using taxon
     missing doc_href?
   -->
   <element name="VERSION">
     <attribute name="id"/>
     <attribute name="description"/>
     <!-- missing doc_href? -->
     <!-- better date string format? -->
     <attribute name="created"/>
     <attribute name="modified"/>
     <element name="CAPABILITIES">
       <oneOrMore>
         <element name="METHOD">
           <!-- restrict to GET/PUT/POST/DELETE? -->
           <attribute name="id"/>
         </element>
       </oneOrMore>
     </element>
     <element name="NAMESPACES">
       <zeroOrMore>
         <element name="NAMESPACE">
           <interleave>
             <attribute name="id"/>
             <text/>
             <zeroOrMore>
               <element name="FORMAT">
                 <attribute name="id"/>
                 <attribute name="type"/>
               </element>
             </zeroOrMore>
           </interleave>
         </element>
       </zeroOrMore>
     </element>
   </element>
</element>

One thing to note is that the "##" comments get
converted into <a:documentation> elements while the
"#" comments get converted into .

Potentially the XML with the documentation annotations
could be converted into HTML.  I didn't come across
a program that does this already.  It seems that most
people roll their own converters.

I would like it if the HTML documentation and the XML
schema were more closely tied together.  What I
propose is to move the description of the different
fields into the XML schema, as noted above.  I would
then write a program to convert the XML form into
HTML that could be inserted into the documentation.

Most likely this means using some sort of template/
string substitution to put everything together.
And a makefile to merge them.

This would also help me develop a validator for the
data files in the spec itself.  Eg in my testing
today I fixed two typos in the examples.  What I
can do is pull the XML and tab-delimited files
out of the HTML and into separate files.  These
can be tested standalone and the template can say
"insert file ABC here".

Sound good to you all?

					Andrew
					dalke at dalkescientific.com