[DAS2] properties and queries
Lincoln Stein
lstein at cshl.edu
Tue Feb 7 16:00:52 UTC 2006
Hi,
I use the phase information quite a lot and I know that others do as well. The
phase is {0,1,2} and the meaning is described here:
For features of type "CDS", the phase indicates where the feature
begins with reference to the reading frame. The phase is one of the
integers 0, 1, or 2, indicating the number of bases that should be
removed from the beginning of this feature to reach the first base of
the next codon. In other words, a phase of "0" indicates that the next
codon begins at the first base of the region described by the current
line, a phase of "1" indicates that the next codon begins at the
second base of this region, and a phase of "2" indicates that the
codon begins at the third base of this region. This is NOT to be
confused with the frame, which is simply start modulo 3.
Lincoln
On Tuesday 07 February 2006 07:19, Andrew Dalke wrote:
> We've had a long discussion here about properties and how to
> search them. As it stands now the spec has a few holes in it.
>
> Here are the properties we've talked about.
>
> program_name: the program used to make the annotation, like
> "BLASTX 1.2.3"
>
> notes:
> There can be 0 or more notes. Notes might refer to other
> notes (eg, "the previous note said XYZ but I think ABC")
>
> phase: (is it 0, 1, 2 or 1, 2, 3?)
> (And does anyone use this? People here don't use it; Thomas
> "reinfers it by counting along the transcript" "but maybe
> that's just me". Others say they don't use the DAS1 phase.)
>
> icon: a hypothetical image use for the feature, perhaps as
> a binary png;
>
> curation history:
> a list of elements, each with
> - person
> - timestamp
> - reason for change
>
> score: a floating point number, which may be in exponential
> notation like "1E-3"
>
> Each one needs different search mechanisms. For example,
> "annotations done by that buggy version of BLAST 1.2.3"
> "scores better than 1E-2"
> "changes by Andrew done in August 2004"
> "notes with the substring 'helicase'" (case sensitive or not?)
> "notes with the phrase 'E. Coli'" (substring might not work
> if there's the note has 'E.\nColi')
>
> The property storage scheme doesn't handle this quite correctly.
> Here are problems:
>
> - how do you store multiple notes?
>
> Answer 1: use structured named, like "note_1", "note_2", "note_3", ..
> HACK! Then what if a note is deleted? Bigger problem; how do you
> search the "note" field using the existing query language?
>
> Answer 2: allow duplicate note elements, like
> <prop key="note" value="This is a note" />
> <prop key="note" value="The previous note is a lie!" />
> <prop key="note" value="Ignore the 2nd note - silly Cretan!" />
>
> Question: so the order must be preserved if two fields have the
> same name? Can't implement with a dictionary/hash data type.
>
> Question: what if there are duplicate "score" or "phase" elements?
> Which one wins?
>
> Answer 3: Notes are important and we know we need them now.
> Let's have a <NOTE> element and not make it be a property.
>
> <NOTE>This is a note</NOTE>
> <NOTE>The previous note is a lie!</NOTE>
> <NOTE>Is this an E or a NOT-E?</NOTE>
>
> (perhaps also with timestamp and author name, but that's a different
> question.) Then we also define that the "note=" parameter in as
> DAS query is a substring search of the <NOTE> elements of a feature.
>
> I like this one.
>
>
> - How do you do numeric searches?
>
> This is hypothetical. There hasn't been a requirement for this.
> 'Course it may be because people haven't had the ability. In
> any case, how to search numeric fields like "score" with comparisons?
>
>
> - querying non-queryable fields
>
> If there's embedded binary data, like an image, is it searchable?
> Does a server complain and die? Ignore the request?
>
> - more complex text searches
>
> "proteinase but not inhibitor"
>
> - complex data
>
> We have support for non-DAS extensions, which might be
>
> <sanger:curation-history xmlns:sanger="http://www.sanger.ac.uk/das/ext"
>
> <sanger:curation name="Andrew" date="2005-06-07">
> Change the this into that because of some reason or other
> </sanger:curation>
>
>
> Thomas proposed that we support some sort of complex query
> language, probably in XML, and get rid of the simple query scheme
> we have now.
>
> I argued against the complexity of that given that nearly all
> of the queries will be "give me these feature types on this range
> of that chromosome". I also pointed out that developing a
> generic query language is hard, and implementing it is harder.
> Why require all that effort?
>
> Roy commented the other way - in a server with only a few hundred
> features, why require a query language at all? Just return all
> of the features in the request.
>
> Here's what I proposed.
>
> We have the "CATEGORY" (but after discussion I now want to take
> it back to "CAPABILITY" since that's now much closer to what
> it does - it describes where to go to do something)
>
> So I'll use "CAPABILITY"
>
> The current scheme has
>
> <CAPABILITY type="features" query_url="http://...../features">
> <FORMAT ... />
> </CAPABILITY>
>
> This is an extensibility point. Suppose Thomas has an XML
> query search interface support on his server, with Sanger
> clients that handle it. Then there can be
>
> <CAPABILITY type="thomas-xml-search"
> query_url="http.../search-features">
> <FORMAT ... />
> </CAPABILITY>
>
> A client can see the list of CAPABILITIES and decide to
> use the feature search mechanism it likes best.
>
> In addition, we could say that "this supports the normal DAS
> query scheme but also supports extension vocabulary. For example,
>
> <CAPABILITY type="features" query_url="http://...../features">
> <SUPPORTS name="sanger-curation" />
> <FORMAT ... />
> </CAPABILITY>
>
> With this a client knows that the query_url supports the normal
> DAS queries and also supports the "annotator", "annotation_before"
> and "annotation_after" queries, like this
>
> .../features?annotator=Andrew;annotation_before=2005
>
> Possible idea: if there is no SUPPORTs tag then the server
> implements no search syntax and instead returns everything,
> for the example Roy mentioned.
>
> Okay, we're off to lunch.
>
> Andrew
> dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
--
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu
More information about the DAS2
mailing list