[DAS] Adjacent feature extension

Mon Mar 7 11:16:01 UTC 2011

On 7 Mar 2011, at 10:57, Jonathan Warren wrote:

> 
> On 7 Mar 2011, at 10:35, Thomas Down wrote:
> 
>> On Mon, Mar 7, 2011 at 10:04 AM, Andy Jenkinson <andy.jenkinson at ebi.ac.uk>wrote:
>> 
>>> Hi Thomas,
>>> 
>>> Thanks for this. Regarding the option of whether to return just one feature
>>> per side or all overlapping features, the only other advantage that
>>> immediately springs to mind for the latter (in addition to some measure of
>>> consistency, as you mention) is that it allows the client to immediately
>>> render the exact region of that feature without triggering another request.
>>> It would generally mean changing zoom level. I'm can't say if clients are
>>> likely to follow this mechanism as opposed to, say, pan and centre on the
>>> feature, but if they wanted to it would be more efficient (and possibly a
>>> little bit more efficient anyway depending on how your client does its
>>> requests).
>>> 
>> 
>> Yep, I agree.  I'd be interested to learn whether there are any clients that
>> would seriously consider taking advantage of this.  My own thinking is that
>> even if we do adjust zoom level (as Dalliance sometimes does, e.g. in the
>> "jump to gene..." navigation op), clients are much more likely to zoom to a
>> view that contains the target feature plus a "sensible" amount of flanking
>> sequence, rather than a view where the target feature is perfectly framed.
>> 
>> Furthermore, this rather seems like optimizing for the case where only one
>> annotation source is active.   Surely we're talking about the
>> *distributed*annotation system, and clients will still have to go off
>> and query all the
>> other annotation sources, even if they are able to skip the one which
>> responded to the "adjacent" query.  So long as there's some kind of query
>> parallelization in place, this probably isn't a performance issue.
> 
> My vote would ideally to change feature_by_id to return one feature and have the adjacent_feature as returning one feature. This in my opinion would mean these capabilities on servers do "exactly as they say on the tin" and would be easier to implement for data providers and are thus more likely to be implemented?
> If the feature_id capability as it stands is needed it could be changed to something more akin to what it means like feature_id_region but I would bet no one would bother to change it/use it?
> 
> However the reality is that we are too late to change the old feature_by_id, but I don't think we need to make the same mistake twice by repeating it for adjacent_features?

I agree with Jonathan, feature_by_id sounds like it gets the feature by the requested Id, and to be honest is the way I have implemented before, so if you ask me I will say the adjacent capability should just return one feature. I don't think we are too late to change the old feature_by_id behaviour and we can take this as the opportunity to make such a change.
> 
> 
>> 
>> Do any other client developers feel differently?
>> 
>> 
>>> Disadvantages I can think of:
>>> - "adjacent" request takes marginally longer
>>> - not quite as obvious what clients should put in their UI controls - need
>>> to pick a feature to be able to do "jump to BRCA1"
>>> - risk of servers not implementing it correctly and only returning one
>>> feature anyway (although I don't think this is likely as the concept is
>>> different to "feature-by-id")
>>> 
>>> Some things to further define:
>>> - servers can't return a fake feature
>>> 
>> 
>> Yep, will clarify this.
>> 
>> 
>>> - should servers return features on different reference sequences if there
>>> are none one the current one?
>>> 
>> 
>> In my opinion, absolutely yes.  Otherwise the "10 features in the genome"
>> case remains a massive pain (and potentially a disaster, for
>> inhomogeneous-dstributed data; won't someone think of the MHC tiling arrays?
>> :-).  And even worse for the "10 features in UniProt" case (where I can also
>> see this feature being quite interesting).
>> 
>> I've tried to be explicit about this in my proposal (see the penultimate
>> paragraph + example 3), but any suggestions for further clarifications are
>> welcome.
>> 
>> 
>>> - how should servers treat features that overlap the adjacent range? Treat
>>> them as the adjacent feature to return, or only include features completely
>>> outside the query range? What if the next feature completely outside the
>>> query range is part of the same feature hierarchy (e.g. an exon outside the
>>> current window).
>>> 
>> 
>> It's a point rather than a range, but yes I agree this is still an open
>> question.  I'd actually written the spec such that overlapping features do
>> get returned (on the assumption that clients will do "trivial" cases of
>> next/previous feature in-memory without a network round trip), but again if
>> other client developers do things differently, I'd like to know.
>> 
>> I think "include overlapping" will have less special-cases to worry about,
>> though.  e.g. the PART/PARENT issue you allude to.  Let clients deal with
>> that ("dumb servers, smart clients").
>> 
>>                Thomas.
>> _______________________________________________
>> DAS mailing list
>> DAS at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/das
> 
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a charity registered in England with number 1021457 and acompany registered in England with number 2742969, whose registeredoffice is 215 Euston Road, London, NW1 2BE._______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das