[DAS] Adjacent feature extension

Mon Mar 7 14:11:20 UTC 2011

On 7 Mar 2011, at 12:43, Andy Jenkinson wrote:

> On 7 Mar 2011, at 11:51, Jonathan Warren wrote:
>
>> On 7 Mar 2011, at 11:19, Andy Jenkinson wrote:
>>
>>> On 7 Mar 2011, at 10:57, Jonathan Warren wrote:
>>>
>>>>
>>>> My vote would ideally to change feature_by_id to return one  
>>>> feature and have the adjacent_feature as returning one feature.  
>>>> This in my opinion would mean these capabilities on servers do  
>>>> "exactly as they say on the tin" and would be easier to implement  
>>>> for data providers and are thus more likely to be implemented?
>>>> If the feature_id capability as it stands is needed it could be  
>>>> changed to something more akin to what it means like  
>>>> feature_id_region but I would bet no one would bother to change  
>>>> it/use it?
>>>>
>>>> However the reality is that we are too late to change the old  
>>>> feature_by_id, but I don't think we need to make the same mistake  
>>>> twice by repeating it for adjacent_features?
>>>
>>> I disagree. I think the problems with feature-by-id are that a)  
>>> the name of the capability implies singular, and b) the concept  
>>> itself (i.e. getting a feature by its ID) is such a common  
>>> operation that is otherwise missing in DAS. I don't think either  
>>> of those apply to an "adjacent" capability unless you specifically  
>>> choose to call it "adjacent-feature" as opposed to "adjacent- 
>>> features". I honestly don't think a capability called "adjacent- 
>>> features" with a query structure like "/das/features?adjacent=foo: 
>>> 1" implies singular, rather the opposite in fact. To me that query  
>>> suggests "get me the features adjacent to foo:1". True that 2  
>>> features is plural which still leaves a "one feature either side"  
>>> interpretation possible, but IMO certainly not implicit enough to  
>>> stop anyone implementing it to actually read the specification/ 
>>> documentation. Add to that the fact that this is an entirely new  
>>> behaviour that we have the chance to properly document and make it  
>>> clear exactly what the server must do.
>>>
>>> So IMO we have a clear choice.
>> I still think it's simpler to implement it for one feature either  
>> side and keep complexity in the client. Generally how many people  
>> stay wake after line 10 when reading the spec? :) Lets see if there  
>> are more votes...
>
> It probably is simpler to implement (well, to implement with maximum  
> efficiency) and I am not advocating one over the other, but IMO the  
> implementation considerations are a separate part of our choice and  
> are orthogonal to whether it's confusing for those implementing it  
> and consequently whether we see divergence from the spec like we do  
> with feature-by-id. As Gustavo says, he'd implement feature-by-id as  
> one feature because that's what he thinks it means, not because it's  
> difficult. I'd posit that it'd be a one line change for any server  
> maintainer to fix theirs to implement it correctly (i.e. use the  
> feature's start/end to resubmit the query), it's just that it'd be  
> more complicated to do it in a single step from the beginning.
>
> We should be under no illusions though that people are going to be  
> able to implement this easily without reading the documentation  
> carefully, no matter which option is chosen.
Good template methods and or examples in tutorials examples will  
encourage use of this command.


> In particular, I can foresee servers not interpreting the "type"  
> filter appropriately, being likely to process the adjacent query  
> then apply the type filter, which would be wrong. I have a feeling  
> most sources implement the type filter as a passive "post filter"  
> rather than an active one. I can tell you right now that it is going  
> to be really quite difficult for me to implement "adjacent"  
> correctly for the ASTD gene/transcript/exon sources, and I suspect  
> the same will be true for retrofitting lots of other sources.
This is an optional capability though right?
>
>>>
>>> As to feature-by-id, I know changing behaviour is potentially a  
>>> very disruptive change, but I think we can potentially do this  
>>> purely because servers don't tend to implement it correctly  
>>> anyway. Clients can happily filter out any additional features  
>>> returned by old servers, and if any clients are reliant on the  
>>> server including all overlapping features then as far as I am  
>>> concerned they are either a) targeting specific servers rather  
>>> than DAS-wide and thus unaffected, or b) already broken :)
>> So you agree feature-by_id should be changed if we have the stomach  
>> for it? - good and Gustavo too. Well done Andy - You have just  
>> agreed to write Spec 1.7 or 3??? ;) Your argument above can be used  
>> for leaving the spec as it is then as well - but ideally I agree  
>> and guess we can call it spec 1.61 assuming other people agree.
>
> I already have a small list of changes for DAS 1.7 or whatever and  
> think it's fine for that context. In any case, let's keep these two  
> issues separate as Thomas says.

I was really hoping not to do another major spec revision for at least  
3 years and to focus on extensions giving new capabilities- otherwise  
for the core capabilities everyone is always playing catch up! This  
maybe something to discuss at some point soon.
>
>>>
>>> I have to admit that the feature-by-id capability is one of the  
>>> (many) things I loathe having to explain and would love to change  
>>> it. Doing so would be consistent with what we were trying to do  
>>> with 1.6 (i.e. rationalise existing use of the spec) but I  
>>> chickened out really.
>>>
>>> Cheers,
>>> Andy
>>
>> Jonathan Warren
>> Senior Developer and DAS coordinator
>> blog: http://biodasman.wordpress.com/
>> jw12 at sanger.ac.uk
>> Ext: 2314
>> Telephone: 01223 492314
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -- 
>> The Wellcome Trust Sanger Institute is operated by Genome  
>> ResearchLimited, a charity registered in England with number  
>> 1021457 and acompany registered in England with number 2742969,  
>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>

Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.