[DAS2] Refinements to range attribute and query filters in spec

Thomas Down td2 at sanger.ac.uk
Fri Feb 10 08:54:16 UTC 2006


On 9 Feb 2006, at 23:18, Helt,Gregg wrote:

>
> In the latest spec, the format for range queries is
>       seqid/min:max:strand
> and the format for range attributes in feature elements is
>       min:max:strand
>
> In the earlier spec
> (http://biodas.org/documents/das2/das2_get.html#ranges) everything but
> the seqid component of the range query was optional.  Are min and max
> still optional, as in these examples from the previous version of the
> spec?
>     Chr1/1000     Chr1 beginning at position 1000 and going to the  
> end.
>     Chr1/:2000    Chr1 from the start to position 2000.
> I personally find these kind of ranges confusing and not particularly
> useful, and would rather make min and max required for both the range
> attribute and range-based query filters.

I think it's reasonable for a client to want to fetch all features  
attached to a given sequence ID.  This would certainly be sensible  
behaviour for clients which always work on reasonably short sequences  
(e.g. protein-specialized clients), but even genome-centric clients  
might want to do this when they've had a hint that a particular  
feature type is "low density" (e.g. chromosome banding patterns?).

I'm not sure if anyone would want to query a range where only one of  
min and max are specified.

> Also, the latest spec states:
>
> A region may be on the forward or reverse strand or on both strands.
> These are respectively denoted 1, -1 and 0.  The reverse strand is the
> reverse complement of the forward strand.  Unspecified strand means
> forward strand.
>
> So for a features query, are the four overlap filters below  
> equivalent?
> Chr1/1000:2000
> Chr1/1000:2000:1
> Chr1/1000:2000:-1
> Chr1/1000:2000:0
> Or does the addition of strand information further filter the returned
> features by strand?  But if that's the case, then according to the  
> spec
> having no strand specified means forward.  So that would mean
> overlaps="Chr1/1000:2000" would only return forward strand  
> annotations,
> and not any on the reverse strand?  To me that's counterintuitive,  
> from
> a filtering perspective I'd rather no strand info mean "both strands".
> My main point though is we need to be explicit about how strand  
> info or
> lack thereof affects features queries with range-based filters.

Hmmm, I'd been interpreting Chr1/1000:2000 as "return features on  
both strands", but from the paragraph you quote I guess this is  
wrong.  I'd be happy to see this changes to "Unspecified strand means  
both strands".

              Thomas.



More information about the DAS2 mailing list