[DAS2] tiled queries for performance

Wed Nov 23 23:50:24 UTC 2005

More thoughts on this.  The client can eliminate the redundancy in the
records returned by issuing the tiling queries as previously described
(query1), then issuing queries for records that are not contained within
tiles, but overlap the boundaries of 1 or more tiles (query2).

However, by issuing all the overlaps queries at once, we've just deferred
the performance hit one step, because we can't reasonably expect the
server to have cached all combinations of tile overlaps queries.  I think,
to get this tiling optimization to work, the burden needs to be on the
client to identify and remove duplicate responses for multiple
edge-overlaps queries (query3).

1000bp        2000bp        3000bp
|             |             |
| ===         | =====^====  |
|         ====#=====        |
| ============#=============#=====
|             |             |

 <----------->                     query1a
               <----------->       query1b
             <o>           <o>     query2
             <o>                   query3a
                           <o>     query3b

Key:

  |  : tile boundary
  =  : feature
  ^  : gap between child features
  #  : portion of feature overlapping tile boundary.
 <o> : client overlaps query
 <.> : client contains query

-Allen

On Mon, 21 Nov 2005, Allen Day wrote:

> Hi,
> 
> I had an idea of how clients may be able to get better response from
> servers by using a tiled query technique.  Here's the basic idea:
> 
> ClientA wants features in chr1/1010:2020, and issues a request for that
> range.  No other clients have previously requested this range, so the
> server-side cache faults to the DAS/2 service (slow).
> 
> ClientB wants features in chr1/1020:2030, and issues a request for that
> range.  Although the intersection of the resulting records with ClientA's
> query is large, the URIs are different and the server-side cache faults
> again.
> 
> If ClientA and ClientB were to each issue two separate "tiled" requests:
> 
>  1. chr1/1001:2000
>  2. chr1/2001:3000
> 
> ClientB could take advantage of the fact that ClientA had been looking at
> the same tiles.
> 
> For this to work, the clients would need to be using the same tile size.  
> The optimal tile size is likely to vary from datasource to datasource,
> depending on the length and density distributions of the features
> contained in the datasource.  The "sources" or "versioned sources"  
> payload could suggest a tiling size to prospective clients.  Servers could
> also pre-cache all tiles by hitting each tile after an update of the
> datasource (or the DAS/2 service code).
> 
> The tradeoff for the performance gains is that clients may now need to do
> filtering on the returned records to only return those requested by the
> client's client.
> 
> -Allen
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
>