[DAS2] Refinements to range attribute and query filters in spec
Thomas Down
td2 at sanger.ac.uk
Fri Feb 10 08:54:16 UTC 2006
On 9 Feb 2006, at 23:18, Helt,Gregg wrote:
>
> In the latest spec, the format for range queries is
> seqid/min:max:strand
> and the format for range attributes in feature elements is
> min:max:strand
>
> In the earlier spec
> (http://biodas.org/documents/das2/das2_get.html#ranges) everything but
> the seqid component of the range query was optional. Are min and max
> still optional, as in these examples from the previous version of the
> spec?
> Chr1/1000 Chr1 beginning at position 1000 and going to the
> end.
> Chr1/:2000 Chr1 from the start to position 2000.
> I personally find these kind of ranges confusing and not particularly
> useful, and would rather make min and max required for both the range
> attribute and range-based query filters.
I think it's reasonable for a client to want to fetch all features
attached to a given sequence ID. This would certainly be sensible
behaviour for clients which always work on reasonably short sequences
(e.g. protein-specialized clients), but even genome-centric clients
might want to do this when they've had a hint that a particular
feature type is "low density" (e.g. chromosome banding patterns?).
I'm not sure if anyone would want to query a range where only one of
min and max are specified.
> Also, the latest spec states:
>
> A region may be on the forward or reverse strand or on both strands.
> These are respectively denoted 1, -1 and 0. The reverse strand is the
> reverse complement of the forward strand. Unspecified strand means
> forward strand.
>
> So for a features query, are the four overlap filters below
> equivalent?
> Chr1/1000:2000
> Chr1/1000:2000:1
> Chr1/1000:2000:-1
> Chr1/1000:2000:0
> Or does the addition of strand information further filter the returned
> features by strand? But if that's the case, then according to the
> spec
> having no strand specified means forward. So that would mean
> overlaps="Chr1/1000:2000" would only return forward strand
> annotations,
> and not any on the reverse strand? To me that's counterintuitive,
> from
> a filtering perspective I'd rather no strand info mean "both strands".
> My main point though is we need to be explicit about how strand
> info or
> lack thereof affects features queries with range-based filters.
Hmmm, I'd been interpreting Chr1/1000:2000 as "return features on
both strands", but from the paragraph you quote I guess this is
wrong. I'd be happy to see this changes to "Unspecified strand means
both strands".
Thomas.
More information about the DAS2
mailing list