[DAS2] tiled queries for performance

Allen Day allenday at ucla.edu
Thu Nov 24 07:10:36 UTC 2005


Hi Andrew.

I'd like to be able to consistently get network-bottlenecked response from
the server.  The largest (250 megabase) SQL range queries typically take ~30
seconds to complete, returning ~500K features.  I'm currently working on
getting the templating system (Template Toolkit aka TT2) we use to flush to
the client periodically, rather than building the entire response first.
This is the current bottleneck; TT2 generation of a 500K record XML document
takes many minutes.  Regardless of how much more optimization work we put
into the server, it's never going to be as fast as serving up pre-queried,
pre-rendered content.

I borrowed the idea of tiling from the Google maps application (
maps.google.com).  In their implementation the server is dumb, and just
serves up a static HTML/Javascript document (the application), and static
PNG images based on latitute/longitude coordinates (the data).  All of the
application logic for what to display occurs client side.  Classic AJAX.

In the DAS protocol, the distribution of the application logic is
distributed between the client and server, sometimes to ill effect.
Requiring both (a) the server to respond to arbitrary range queries, and (b)
the client to display arbitrary ranges unnecessarily creates a bifurcation
of the View component of the application.  Brian was hinting at this when he
mentioned the idea of bittorrent blocks earlier in the thread.

We also require code redundancy between client and server to be able to
fully use the type and exacttype filters.  In this case the Model component
has been bifurcated -- the client needs to build a model the ontology (from
who knows where... presumably processing OBO-Edit files) so the user can
issue queries, and the server needs to also have some representation of the
ontology to generate a response.

Hopefully the ontology DAS extension will help the latter situation outlined
above by getting both client and server to be synchronized on the same data
model.  As far as the tiling optimization goes, it's likely that I'll
implement a preprocessor for the HTTP query so I can break it into tiles --
conceptually very similar to the log10 binning that Lincoln does in the GFF
database.

-Allen


On 11/23/05, Andrew Dalke <dalke at dalkescientific.com> wrote:
>
> Allen:
> > No other clients have previously requested this range, so the
> > server-side cache faults to the DAS/2 service (slow).
>
> Admittedly I'm curious about this.  Why is this slow?  What does
> slow mean?  I assume "cannot be returned faster than the network
> will take it."
>
> How many annotations are in the database?  Figuring one annotation
> for every ... 100 bases? gives me 30 million.  Shouldn't a range
> search over < only 30 million be fast?  Is this being done in the
> database?  Which database and what's the SQL?
>
> If the DB is the bottleneck then pulling it out as a specialized
> search might be worthwhile.
>
> What I'm driving at for this is this.  The proposal feels like
> a workaround for a given implementation.  To use it requires
> more smarts in the client.  Why not put that logic on the server?
>
>
>                                         Andrew
>                                         dalke at dalkescientific.com
>
>




More information about the DAS2 mailing list