[DAS] Adjacent feature extension

Mon Mar 7 12:43:32 UTC 2011

On 7 Mar 2011, at 11:51, Jonathan Warren wrote:

> On 7 Mar 2011, at 11:19, Andy Jenkinson wrote:
> 
>> On 7 Mar 2011, at 10:57, Jonathan Warren wrote:
>> 
>>> 
>>> My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented?
>>> If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it?
>>> 
>>> However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features?
>> 
>> I disagree. I think the problems with feature-by-id are that a) the name of the capability implies singular, and b) the concept itself (i.e. getting a feature by its ID) is such a common operation that is otherwise missing in DAS. I don't think either of those apply to an "adjacent" capability unless you specifically choose to call it "adjacent-feature" as opposed to "adjacent-features". I honestly don't think a capability called "adjacent-features" with a query structure like "/das/features?adjacent=foo:1" implies singular, rather the opposite in fact. To me that query suggests "get me the features adjacent to foo:1". True that 2 features is plural which still leaves a "one feature either side" interpretation possible, but IMO certainly not implicit enough to stop anyone implementing it to actually read the specification/documentation. Add to that the fact that this is an entirely new behaviour that we have the chance to properly document and make it clear exactly what the server must do.
>> 
>> So IMO we have a clear choice.
> I still think it's simpler to implement it for one feature either side and keep complexity in the client. Generally how many people stay wake after line 10 when reading the spec? :) Lets see if there are more votes...

It probably is simpler to implement (well, to implement with maximum efficiency) and I am not advocating one over the other, but IMO the implementation considerations are a separate part of our choice and are orthogonal to whether it's confusing for those implementing it and consequently whether we see divergence from the spec like we do with feature-by-id. As Gustavo says, he'd implement feature-by-id as one feature because that's what he thinks it means, not because it's difficult. I'd posit that it'd be a one line change for any server maintainer to fix theirs to implement it correctly (i.e. use the feature's start/end to resubmit the query), it's just that it'd be more complicated to do it in a single step from the beginning.

We should be under no illusions though that people are going to be able to implement this easily without reading the documentation carefully, no matter which option is chosen. In particular, I can foresee servers not interpreting the "type" filter appropriately, being likely to process the adjacent query then apply the type filter, which would be wrong. I have a feeling most sources implement the type filter as a passive "post filter" rather than an active one. I can tell you right now that it is going to be really quite difficult for me to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, and I suspect the same will be true for retrofitting lots of other sources.

>> 
>> As to feature-by-id, I know changing behaviour is potentially a very disruptive change, but I think we can potentially do this purely because servers don't tend to implement it correctly anyway. Clients can happily filter out any additional features returned by old servers, and if any clients are reliant on the server including all overlapping features then as far as I am concerned they are either a) targeting specific servers rather than DAS-wide and thus unaffected, or b) already broken :)
> So you agree feature-by_id should be changed if we have the stomach for it? - good and Gustavo too. Well done Andy - You have just agreed to write Spec 1.7 or 3??? ;) Your argument above can be used for leaving the spec as it is then as well - but ideally I agree and guess we can call it spec 1.61 assuming other people agree.

I already have a small list of changes for DAS 1.7 or whatever and think it's fine for that context. In any case, let's keep these two issues separate as Thomas says.

>> 
>> I have to admit that the feature-by-id capability is one of the (many) things I loathe having to explain and would love to change it. Doing so would be consistent with what we were trying to do with 1.6 (i.e. rationalise existing use of the spec) but I chickened out really.
>> 
>> Cheers,
>> Andy
> 
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE.