[DAS] Adjacent feature extension

Jonathan Warren jw12 at sanger.ac.uk
Mon Mar 7 15:49:12 UTC 2011


On 7 Mar 2011, at 15:04, Andy Jenkinson wrote:

> On 7 Mar 2011, at 12:16, Jonathan Warren wrote:
>
>> I'd say if we don't have any more objections in the next couple of  
>> days then go with your proposal as is? I'll then put support into  
>> the registry this week if that is the case. If you could also then  
>> copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures 
>>  to the extensions page here:
>> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting  
>> in large letters that it was agreed by the community on such a such  
>> a date?
>
> I think there is a lot left to be clarified so adopting it "as is"  
> is a no go for me. In particular, take a look at this diagram and  
> see if you can work out what will be returned with "adjacent"  
> queries for either side of the viewing area, and do they make sense  
> for what the client is trying to achieve?
> <DAS-Adjacent.png>
>
> The client has "seen" gene 2 and all its parts.
>
> If the client asks for features adjacent to the left/right sides of  
> the viewing area, what should the server return?
I don't think it makes sense to ask for a next right in this case as  
there are features here already. This is for sparse data sources so  
it's ok just to return whats there if someone specifically wants to  
hit the next feature button or a client can blank the next right  
button out. It's up to the client.
Next left should return SNP1 if asked for an adjacent request.... or  
genes and constituents if filtered on gene.
If you take the intention of this as finding features where data is  
sparse then I don't think there are big issues.

Part of the point of the extensions phase is to try these things out  
with examples and refine the specs. To leave acceptance of this will  
be a big mistake in my view.

> To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter?  
> Should gene 1 and all its parts be returned because that is what  
> happens in a segment query? If not, is this confusing for the spec?
> To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or  
> SNP 2?
>
> If no special arrangements are made for excluding overlapping  
> features (thus either gene 2, transcript 2 or exon 4 are returned  
> above), what position should the client submit instead in its  
> overlap query to get SNP 2 and SNP 3? (Hint: it's impossible to get  
> SNP 2).
>
> What if the genes were nonpositional features?
>
>
> None of the above are unresolvable problems, the simplest way it to  
> say that overlapping features should be returned, and that clients  
> should not try to jump beyond them. It means you don't really get a  
> "next feature" capability in the same way a user probably intends  
> (i.e. "next gene"), but does make it impossible to miss transcript 3  
> and SNP 2 in the above diagram. It still needs to be defined which  
> feature will be returned if multiple have the same position that  
> case (ideally we want the one that extends the furthers in the  
> adjacent direction), but that can be complicated by things like  
> nonpositional parent features etc.
>
> The alternative is to exclude overlapping features, but that raises  
> the question of whether parts are considered overlapping if their  
> parents are. Implementing "next gene" means excluding all of gene  
> 2's transcripts and exons (thus returning SNP 2), but is this even  
> what the user meant? Maybe they meant "next transcript".
>
> So not simple. But the main questions are:
> 1. should overlapping features be excluded in overlap requests
> 2. if so, should non-overlapping features with overlapping parents/ 
> parts be excluded?
> 3. separate from 1 and 2, should the nearest feature's parents and  
> parts also be returned?

Jonathan Warren
Senior Developer and DAS coordinator
blog: http://biodasman.wordpress.com/
jw12 at sanger.ac.uk
Ext: 2314
Telephone: 01223 492314









-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the DAS mailing list