[DAS2] how to test features?

Fri Jun 10 19:27:06 UTC 2005

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Friday, June 10, 2005 10:14 AM
> To: DAS/2
> Subject: Re: [DAS2] how to test features?
> 
> Gregg:
> > In DAS/1 the types response can optionally include a count of the
> > number
> > of features available for that particular type and source.  I do
think
> > this would be a useful feature to preserve in DAS/2.
> 
> I don't think it would work that well.  Given the way we've
> structured things it's possible the type/ hierarchy is shared
> by several feature/ hierarchies.
> 
> My two mental models are:
>   - a single server where different source/versions
>       shares the same types/
>   - a server which uses someone else's types/ and regions/
>       as the reference, adding only features/

I don't really see the problem with adding optional feature counts to
the types response.

I don't think that any established repository of genome annotations will
be pointing to an external source for types, as you suggest above.  That
approach would affect not just feature counts but lots of other stuff in
the types response, such as PROP elements and the soon-to-be added
FORMAT elements.  And without a _lot_ more information about what
parameters were used when running the analysis, are you sure that your
"BLAST" type is the same as some other site's "BLAST" type?  Unless of
course you copied the data from the other site in the first place, in
which case we're talking about mirroring which to me is a different
subject.

To me the only potential problem is that without shared type ids we
don't have a useful way of determining that two types from different
servers (and possibly different genomes on the same server) are
equivalent.  But I'm far more comfortable with that situation than with
the alternative, assuming that shared type ids really indicate the exact
same analysis/annotation process.  And we do have ways of determining
that two types have some similarity -- that's what the ontology
attribute is for.

Furthermore, even if a DAS/2 server did refer to an external URI for a
type id, it's not clear to me why that precludes it from adding a
feature count attribute to the TYPE element that it returns from the
types query.  And the same applies to a server sharing the same types
between sources and versions.  Maybe this is semantics, but it seems to
me that the types response is not meant to say "these are the types",
but rather "these are the types and how they are used in the context of
this versioned source".

> 
> > From that
> > information a client/validator could potentially tune its requested
> > range to return on average a desired number of features.
> 
> My shot-in-the-dark guess is that the non-uniform distribution
> of features lessens its usefulness.

Good point.

> > I don't think a server should return only some of the features that
> > meet
> > the search criteria -- should return either all or none.  If a
server
> > decides that a client is requesting too much data, I think servers
> > should be able to return some sort of "returned feature count too
> > large"
> > error code response.
> 
> I can go along with that, though I would like to know what
> a client is supposed to do if it gets that response and
> if we should specify that a server should be able to return at
> least (say) 10,000 features at a time.
> 
> >   And maybe in the source or versioned source
> > response xml have an optional indication of how many features a
server
> > is willing to serve up in a single request, so clients can know this
> > limit in advance.
> 
> In thinking some more about that - what would a client do
> with that information?

Well, since we've got alternative feature request formats, I've thought
about a format that just returns a count of the features passing the
filters.  Which (assuming the server can respond to such a request
quickly) would allow the client to "hunt around" till it found an
appropriate range for the number of features the server maxes out at, or
the number of features the client wants to restrict to.

Another possibility if the client knows the upper limit of the server is
a variant on what you suggested:

> I was thinking that a "limit" option might be useful.
> For example, "up to 1000 features in the range 1M to
> 2M."  'Course then the server may need to report
> "returned 1000 features but 234234 matched the query."

But instead of limiting number of features returned within a range, have
this be based on a single point, so it would be "get the 1000 features
closest to this point".  Not sure if this could be done as a "closest"
filter (that just has a feature count) in combination with a region
filter or if would need to have a combined filter that includes both a
region and the count. If the latter then "get 1000 features nearest 500K
on chr2" could just be something like:

closest=chr2/500000:500000,1000 

I know there was talk at one point about having a "get nearest neighbor"
request in DAS/2.  A "closest" filter would also address that
functionality -- given that a feature location is seqA/pointB:pointC,
then the filter for "get nearest neighbor of typeD" is:

closest=seqA/pointB:pointC,1; type=typeD

Would need to work out issues of what happens when there are multiple
features with same location and "closest" filter only allows some of
them through, or when there are multiple features that actually overlap
the given point.

	gregg 

> 
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2