[DAS2] xml and compact formats

Helt,Gregg Gregg_Helt at affymetrix.com
Thu May 19 14:47:56 UTC 2005


For the DAS/2 query for features, I think the ability to support
alternative response formats is critical.  Yes, this is mainly because
of performance issues -- time to generate the XML on the server, time to
deliver it over the network, time to parse it and memory footprint on
the client side. 

Standard compression of the XML format is certainly possible, and unless
we explicitly forbid it in the spec I see no reason why this can't be
declared via the standard Accept-Encoding/Content-Encoding HTTP headers.
But consider if network bandwidth is not the bottleneck (for example
within an intranet or between sites connected to the Internet2
backbone).  Compressing with gzip etc. will actually slow things down,
requiring compression on the server and decompression on the client, and
not speeding up actual client parsing at all.  From experience I can
tell you that the decompression cost on the client can be significant.
Furthermore, more specific binary formats can be much smaller than
compressed das2feature XML.  For an extreme example, with an optimized
format my client can parse in ~ 1 million SNP annotations per second.
And yes, sometimes I need to do that.

However, although I think the ability to _support_ alternative feature
formats is critical, I don't think we should be declaring a set of
alternative feature formats in the specification.  We definitely need to
have the ability for a server to generate and the client to accept
annotation formats that the spec itself knows nothing about.  All that
is needed is a way for the server to indicate which formats it provides,
and the client to choose which one it wants.  In the current spec the
server indicates which feature formats it supports for a particular
versioned source as <FORMAT> subelements under a "Feature types"
<NAMESPACE> element in the versioned source response.  I would like to
change this so that the formats supported are specific to the type of
feature.  This can be done by adding <FORMAT> subelements to the <TYPE>
elements in the types response.  This would allow different alternative
formats for different feature types, which I think is necessary -- a
format optimized for SNPs is not going to be appropriate for serving up
BLAT results, and vice versa.

Now as far as the "compact" formats for other (non-feature) responses
from the server, I don't think they're necessary.  As far as performance
issues, the size of these responses are unlikely to be anywhere near as
large as the potential responses from a features request.

So to summarize, I think the spec definitely needs to support a
mechanism for alternative feature formats, but that it should be
agnostic as to what those formats are.  And I don't think we really need
alternative formats for non-feature responses.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Tuesday, May 17, 2005 11:14 AM
> To: DAS/2
> Subject: [DAS2] xml and compact formats
> 
> Can someone remind me why we support both XML and "compact"
> formats?  For what I'm doing it complicates things.  Was it
> because of space/verbosity concerns?  If so, should we
> encourage people to use compression on the connections?
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2




More information about the DAS2 mailing list