[DAS] Adjacent feature extension
Thomas Down
thomas.a.down at gmail.com
Mon Mar 7 16:51:10 UTC 2011
There are several more-or-less separate issues tied up here that I see:
1. Handling of PART/PARENT. I neglected this completely in my
original proposal. I can think of a couple of solutions:
a) Treat a whole PART/PARENT graph as a single
"complex" feature. This means that you'll always get full graphs back from
any kind of feature query (modulo type filtering). If any part of the
complex feature counts as "adjacent", then you'll get the whole thing back.
b) Specify the adjacent= filter as ignoring any
features with a PARENT attribute set.
Jonathan/Andy, do either of you have an opinion on either of
these two. I guess a) is probably the least likely to cause surprise?
2. My idea that an overlapping feature can count as adjacent.
This initially seemed nice and simple but Andy's example of "SNP 2" has
shown why it's broken when you have overlapping features. I'm quite happy
to have the adjacent filter only select features that don't overlap the
query position.
3. Features with matching start/end positions (therefore "equally
adjacent"). I'm going to say "server picks" in this case. The client at
least knows there's something there! I actually think this situation will
be pretty rare in practice (for truly separate features, rather than having
a transcript and exon starting in the same position).
Does that tighten things up?
Andy, thanks for thrashing this out. As you can probably work out, the use
cases I've been working to involve rather sparsely-distributed features, but
it's good to sort out the corner cases that arise as the density increases.
Thomas.
On Mon, Mar 7, 2011 at 3:04 PM, Andy Jenkinson <andy.jenkinson at ebi.ac.uk>wrote:
> On 7 Mar 2011, at 12:16, Jonathan Warren wrote:
>
> > I'd say if we don't have any more objections in the next couple of days
> then go with your proposal as is? I'll then put support into the registry
> this week if that is the case. If you could also then copy the proposal from
> here https://github.com/dasmoth/dalliance/wiki/AdjacentFeatures to the
> extensions page here:
> > http://www.biodas.org/wiki/DAS1.6E#Adjacent_Feature_filter noting in
> large letters that it was agreed by the community on such a such a date?
>
> I think there is a lot left to be clarified so adopting it "as is" is a no
> go for me. In particular, take a look at this diagram and see if you can
> work out what will be returned with "adjacent" queries for either side of
> the viewing area, and do they make sense for what the client is trying to
> achieve?
>
>
>
> The client has "seen" gene 2 and all its parts.
>
> If the client asks for features adjacent to the left/right sides of the
> viewing area, what should the server return?
> To the left: gene 1, transcript 1, exon 2, or SNP 1? Does it matter? Should
> gene 1 and all its parts be returned because that is what happens in a
> segment query? If not, is this confusing for the spec?
> To the right: gene 2, transcript 2, exon 4, transcript 3, exon 5, or SNP 2?
>
> If no special arrangements are made for excluding overlapping features
> (thus either gene 2, transcript 2 or exon 4 are returned above), what
> position should the client submit instead in its overlap query to get SNP 2
> and SNP 3? (Hint: it's impossible to get SNP 2).
>
> What if the genes were nonpositional features?
>
>
> None of the above are unresolvable problems, the simplest way it to say
> that overlapping features should be returned, and that clients should not
> try to jump beyond them. It means you don't really get a "next feature"
> capability in the same way a user probably intends (i.e. "next gene"), but
> does make it impossible to miss transcript 3 and SNP 2 in the above diagram.
> It still needs to be defined which feature will be returned if multiple have
> the same position that case (ideally we want the one that extends the
> furthers in the adjacent direction), but that can be complicated by things
> like nonpositional parent features etc.
>
> The alternative is to exclude overlapping features, but that raises the
> question of whether parts are considered overlapping if their parents are.
> Implementing "next gene" means excluding all of gene 2's transcripts and
> exons (thus returning SNP 2), but is this even what the user meant? Maybe
> they meant "next transcript".
>
> So not simple. But the main questions are:
> 1. should overlapping features be excluded in overlap requests
> 2. if so, should non-overlapping features with overlapping parents/parts be
> excluded?
> 3. separate from 1 and 2, should the nearest feature's parents and parts
> also be returned?
>
More information about the DAS
mailing list