[DAS] Adjacent feature extension
Andy Jenkinson
andy.jenkinson at ebi.ac.uk
Mon Mar 7 15:04:32 UTC 2011
On 7 Mar 2011, at 12:16, Jonathan Warren wrote:
> I'd say if we don't have any more objections in the next couple of days then go with your proposal as is? I'll then put support into the registry this week if that is the case. If you could also then copy the proposal from here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the extensions page here:
> http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in large letters that it was agreed by the community on such a such a date?
I think there is a lot left to be clarified so adopting it "as is" is a no go for me. In particular, take a look at this diagram and see if you can work out what will be returned with "adjacent" queries for either side of the viewing area, and do they make sense for what the client is trying to achieve?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAS-Adjacent.png
Type: image/png
Size: 40385 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/das/attachments/20110307/4d2d83cd/attachment-0002.png>
-------------- next part --------------
The client has "seen" gene 2 and all its parts.
If the client asks for features adjacent to the left/right sides of the viewing area, what should the server return?
To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter? Should gene 1 and all its parts be returned because that is what happens in a segment query? If not, is this confusing for the spec?
To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or SNP 2?
If no special arrangements are made for excluding overlapping features (thus either gene 2, transcript 2 or exon 4 are returned above), what position should the client submit instead in its overlap query to get SNP 2 and SNP 3? (Hint: it's impossible to get SNP 2).
What if the genes were nonpositional features?
None of the above are unresolvable problems, the simplest way it to say that overlapping features should be returned, and that clients should not try to jump beyond them. It means you don't really get a "next feature" capability in the same way a user probably intends (i.e. "next gene"), but does make it impossible to miss transcript 3 and SNP 2 in the above diagram. It still needs to be defined which feature will be returned if multiple have the same position that case (ideally we want the one that extends the furthers in the adjacent direction), but that can be complicated by things like nonpositional parent features etc.
The alternative is to exclude overlapping features, but that raises the question of whether parts are considered overlapping if their parents are. Implementing "next gene" means excluding all of gene 2's transcripts and exons (thus returning SNP 2), but is this even what the user meant? Maybe they meant "next transcript".
So not simple. But the main questions are:
1. should overlapping features be excluded in overlap requests
2. if so, should non-overlapping features with overlapping parents/parts be excluded?
3. separate from 1 and 2, should the nearest feature's parents and parts also be returned?
More information about the DAS
mailing list