[DAS2] tiled queries for performance

Asim Siddiqui asims at bcgsc.ca
Fri Nov 25 19:15:17 UTC 2005


Hi,

I'm a newbie to this list, so apologies if I've missed something
critical.

I think this is a great idea.

I don't see this as a big change to the DAS/2 spec or requiring much in
the way of additional smarts on the client side.
The change is simply that instead of the client getting exactly what it
asks for, it may get more.

My 2 cents,

Asim


-----Original Message-----
From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-bio.org] On Behalf Of Allen Day
Sent: Wednesday, November 23, 2005 11:11 PM
To: Andrew Dalke; DAS/2
Subject: Re: [DAS2] tiled queries for performance

Hi Andrew.

I'd like to be able to consistently get network-bottlenecked response
from the server.  The largest (250 megabase) SQL range queries typically
take ~30 seconds to complete, returning ~500K features.  I'm currently
working on getting the templating system (Template Toolkit aka TT2) we
use to flush to the client periodically, rather than building the entire
response first.
This is the current bottleneck; TT2 generation of a 500K record XML
document takes many minutes.  Regardless of how much more optimization
work we put into the server, it's never going to be as fast as serving
up pre-queried, pre-rendered content.

I borrowed the idea of tiling from the Google maps application (
maps.google.com).  In their implementation the server is dumb, and just
serves up a static HTML/Javascript document (the application), and
static PNG images based on latitute/longitude coordinates (the data).
All of the application logic for what to display occurs client side.
Classic AJAX.

In the DAS protocol, the distribution of the application logic is
distributed between the client and server, sometimes to ill effect.
Requiring both (a) the server to respond to arbitrary range queries, and
(b) the client to display arbitrary ranges unnecessarily creates a
bifurcation of the View component of the application.  Brian was hinting
at this when he mentioned the idea of bittorrent blocks earlier in the
thread.

We also require code redundancy between client and server to be able to
fully use the type and exacttype filters.  In this case the Model
component has been bifurcated -- the client needs to build a model the
ontology (from who knows where... presumably processing OBO-Edit files)
so the user can issue queries, and the server needs to also have some
representation of the ontology to generate a response.

Hopefully the ontology DAS extension will help the latter situation
outlined above by getting both client and server to be synchronized on
the same data model.  As far as the tiling optimization goes, it's
likely that I'll implement a preprocessor for the HTTP query so I can
break it into tiles -- conceptually very similar to the log10 binning
that Lincoln does in the GFF database.

-Allen


On 11/23/05, Andrew Dalke <dalke at dalkescientific.com> wrote:
>
> Allen:
> > No other clients have previously requested this range, so the 
> > server-side cache faults to the DAS/2 service (slow).
>
> Admittedly I'm curious about this.  Why is this slow?  What does slow 
> mean?  I assume "cannot be returned faster than the network will take 
> it."
>
> How many annotations are in the database?  Figuring one annotation for

> every ... 100 bases? gives me 30 million.  Shouldn't a range search 
> over < only 30 million be fast?  Is this being done in the database?  
> Which database and what's the SQL?
>
> If the DB is the bottleneck then pulling it out as a specialized 
> search might be worthwhile.
>
> What I'm driving at for this is this.  The proposal feels like a 
> workaround for a given implementation.  To use it requires more smarts

> in the client.  Why not put that logic on the server?
>
>
>                                         Andrew
>                                         dalke at dalkescientific.com
>
>

_______________________________________________
DAS2 mailing list
DAS2 at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/das2




More information about the DAS2 mailing list