[DAS2] tiled queries for performance

Helt,Gregg Gregg_Helt at affymetrix.com
Mon Nov 28 10:58:00 UTC 2005


The attachment is a PowerPoint slide showing one of the feature query
optimizations that the IGB client currently uses, which combines
"overlaps" and "inside" filters.  When used consistently this guarantees
that the same feature is not returned in multiple feature queries.
However in general I agree that it is the client's responsibility to
reasonably handle cases where the same feature is returned multiple
times.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Allen Day
> Sent: Wednesday, November 23, 2005 3:50 PM
> To: das2 at portal.open-bio.org
> Subject: Re: [DAS2] tiled queries for performance
> 
> More thoughts on this.  The client can eliminate the redundancy in the
> records returned by issuing the tiling queries as previously described
> (query1), then issuing queries for records that are not contained
within
> tiles, but overlap the boundaries of 1 or more tiles (query2).
> 
> However, by issuing all the overlaps queries at once, we've just
deferred
> the performance hit one step, because we can't reasonably expect the
> server to have cached all combinations of tile overlaps queries.  I
think,
> to get this tiling optimization to work, the burden needs to be on the
> client to identify and remove duplicate responses for multiple
> edge-overlaps queries (query3).
> 
> 1000bp        2000bp        3000bp
> |             |             |
> | ===         | =====^====  |
> |         ====#=====        |
> | ============#=============#=====
> |             |             |
> 
>  <----------->                     query1a
>                <----------->       query1b
>              <o>           <o>     query2
>              <o>                   query3a
>                            <o>     query3b
> 
> Key:
> 
>   |  : tile boundary
>   =  : feature
>   ^  : gap between child features
>   #  : portion of feature overlapping tile boundary.
>  <o> : client overlaps query
>  <.> : client contains query
> 
> -Allen
> 
> 
> 
> On Mon, 21 Nov 2005, Allen Day wrote:
> 
> > Hi,
> >
> > I had an idea of how clients may be able to get better response from
> > servers by using a tiled query technique.  Here's the basic idea:
> >
> > ClientA wants features in chr1/1010:2020, and issues a request for
that
> > range.  No other clients have previously requested this range, so
the
> > server-side cache faults to the DAS/2 service (slow).
> >
> > ClientB wants features in chr1/1020:2030, and issues a request for
that
> > range.  Although the intersection of the resulting records with
> ClientA's
> > query is large, the URIs are different and the server-side cache
faults
> > again.
> >
> > If ClientA and ClientB were to each issue two separate "tiled"
requests:
> >
> >  1. chr1/1001:2000
> >  2. chr1/2001:3000
> >
> > ClientB could take advantage of the fact that ClientA had been
looking
> at
> > the same tiles.
> >
> > For this to work, the clients would need to be using the same tile
size.
> > The optimal tile size is likely to vary from datasource to
datasource,
> > depending on the length and density distributions of the features
> > contained in the datasource.  The "sources" or "versioned sources"
> > payload could suggest a tiling size to prospective clients.  Servers
> could
> > also pre-cache all tiles by hitting each tile after an update of the
> > datasource (or the DAS/2 service code).
> >
> > The tradeoff for the performance gains is that clients may now need
to
> do
> > filtering on the returned records to only return those requested by
the
> > client's client.
> >
> > -Allen
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> >
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAS2_Query_Optimization.ppt
Type: application/vnd.ms-powerpoint
Size: 287744 bytes
Desc: DAS2_Query_Optimization.ppt
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20051128/03f7f254/attachment-0001.ppt>


More information about the DAS2 mailing list