[DAS2] DAS2 validation service

Andrew Dalke dalke at dalkescientific.com
Wed Oct 25 17:42:32 UTC 2006


I've updated the DAS2 validation service a couple of ways.
One was to improve the error handling, eg, point it to slashdot.org
(not XML), slashdot.org/blahblah (404 - not found) or to
blahblah.blah (host does not exist) and it reports an error
instead of raising an exception.

There was a problem of sorts with the XML-RPC server.  I chose
XML-RPC yesterday because I thought it would be dead simple to use
in any environment.  It's old, stable technology.  Andreas tried
a few Java XML-RPC clients and found there were various hard-to-resolve
dependencies.  Eg, the most modern one requires Java 1.5 but his
system runs 1.4, and the older one requires some XML DOM parser
which isn't included with the system and proved hard to track down.

Rather than struggle to make that work, I've added a new HTTP
interface for automated validation

The URL is
   http://cgi.biodas.org:8080/validate_url

It has a required parameter, "url", which is the URL to validate

%curl 'http://cgi.biodas.org:8080/validate_url?url=http://slashdot.org/'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://slashdot.org/">
  <MESSAGE text="Unknown Content-Type 'text/html'." severity="error"  
/><MESSAGE text="expat: mismatched tag at byte 1794, line 29, column 3"  
severity="fatal" />
</DAS_VALIDATION>

It has an optional parameter "doctype" which is the document type to  
expect


%curl 'http://cgi.biodas.org:8080/validate_url?\
url=http://das.biopackages.net/das/genome/human/;doctype=sources'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://das.biopackages.net/das/genome/human/"  
doctype="sources" />

In that last case there were no messages.


The XML document is

<DAS_VALIDATION url="URL-used-for-the-validation"  
doctype="the-document-type"? >
   <MESSAGE severity="one of info, warning, error, fatal"
            text="the error message" />  *
</DAS_VALIDATION>

A note about the doctype.  If the server could not get the document then
the validation will not have a doctype even if you gave it one.

%curl  
'http://cgi.biodas.org:8080/validate_url?url=http://slashdot.org; 
doctype=types'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://slashdot.org">
  <MESSAGE text="Received Content-Type 'text/html', expected  
'application/x-das-types+xml'." severity="fatal" />
  <MESSAGE text="expat: mismatched tag at byte 1794, line 29, column 3"  
severity="fatal" />
</DAS_VALIDATION>

If you tell it the wrong doctype and it gets something in XML then it  
assumes the reponse is in the given doctype

%curl  
'http://cgi.biodas.org:8080/validate_url?url=http:// 
das.biopackages.net/das/genome/human/;doctype=types'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://das.biopackages.net/das/genome/human/"  
doctype="types">
  <MESSAGE text="Received Content-Type 'application/x-das-sources+xml',  
expected 'application/x-das-types+xml'." severity="fatal" />
  <MESSAGE text="Expected element  
'{http://biodas.org/documents/das2}TYPES' but got  
'{http://biodas.org/documents/das2}SOURCES' at byte 41, line 3, column  
2" severity="fatal" />
  <MESSAGE text="element &quot;SOURCES&quot; from namespace  
&quot;http://biodas.org/documents/das2&quot; not allowed in this  
context at byte 41, line 3, column 2" severity="error" />

If no input doctype is given then it will guess at the doctype based on
analysis of what it got from the remote server

%curl  
'http://cgi.biodas.org:8080/validate_url?url=http:// 
das.biopackages.net/das/genome/human/'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://das.biopackages.net/das/genome/human/"  
doctype="sources">
  <MESSAGE text="Assuming doctype of 'sources' based on Content-Type"  
severity="info" />
</DAS_VALIDATION>

This XML should be easy for anyone to parse.

					Andrew
					dalke at dalkescientific.com




More information about the DAS2 mailing list