From Steve_Chervitz at affymetrix.com  Thu Nov  3 19:24:53 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Thu, 3 Nov 2005 16:24:53 -0800
Subject: [DAS2] DAS/2 weekly meeting notes
Message-ID: <BF8FEA55.17720%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 3 Nov 2005.

$Id: das2-teleconf-2005-11-03.txt,v 1.2 2005/11/04 00:23:27 sac Exp $

Attendees: 
  Affy: Steve Chervitz, Ed Erwin, Gregg Helt
  UCLA: Brian O'connor, Mark Carlson
        
These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

Status Reports
--------------

Gregg:

* A lot happened last week:
  - Major IGB public release (4.02) last Friday (10/28)
  - Attended and presented IGB demo at CSHL Genome Informatics meeting
    on Sunday (10/30)
  - Finished and submitted DAS/2 continuation grant on Tue (11/1).

* Held a DAS/2 BOF (birds of a feather meeting) at CSHL. Good
  discussion and turnout (15). Collected feedback from EBI/Sanger
  folks. Asked people to download the client (IGB) and hit the servers
  (Affy, UCLA), so be looking for more traffic soon.

* TODO: Monitor DAS/2 traffic, collect usage stats for both servers:
    http://netaffxdas.affymetrix.com
    http://biopackages.net.
  Especially check for performance degradation under load.
  Need to parse apache and server logs for things like: # users,
  typical query times, etc.

* IGB demo went well. People were impressed with speed. Requests for
  Gregg's in-memory java DAS/2 server, but code is not yet ready for
  public consumption.

Ed:

* Reviewing various technologies of possible interest:
  - HTTP communication protocol, necessary commands.
  - Using a bean-based property editor for IGB

* Spent time answering user questions on IGB forum (only 1 person posted
  trouble with installing data for use with new IGB release -- not
  bad). Gregg adds: Also no negative feedback from internal release.

Steve:

* Spec work: Posted message about types and features issues in the
  retrieval spec last Thurs (10/26). Mentioned Lincoln's response
  (doing away with xml:base and going with his namespace scheme).
  Gregg talked with Lincoln about this at CSHL and clarified that
  xml:base is for resolving relative URLs in attributes or CDATA
  elements, whereas xmlns is for resolving names of attributes and
  elements. Steve will post response to DAS/2 discussion list about
  this. 

* Tested the IGB release on OS X last week prior to release. Noted the
  display bug that Gregg knows about (disappearing view when you
  select a new DAS/2 annotation source). Found trouble with quick load
  synonym on the Affy internal server synonym. Ed fixed.

* Installed new assembly (Human Nov 2002) available via quickload and
  DAS/2. Gregg says: Use DAS/1 for new genomes at this stage.

* DAS/2 discussion list troubleshooting. Problem with open-bio
  sendmail, DNS.

Brian, Mark:

* Using the DAS/2 layer from the IGB code base and extending it for
  their assay and ontology namespaces. Want to put this new code in
  separate packages to avoid stepping on other IGB functionality.
  DAS/2 layer is currently in com.affymetrix.igb.das2.

 Options
 1. Add subpackages to com.affymetrix.igb.das2.
 2. Move das2 out from under igb to com.affymetrix.das2.
 3. Move das2 out of com.affymetrix to be totally separate. Then
    com.affymetrix.igb.das2 and the assay/ontology code would depend
    on it. 

 Brian is fine with #2. Gregg will check and remove any dependencies
 with the das2 package on IGB code.

* Plan to release their code internally in December. Code is in their
  own CVS repository now. Genoviz/IGB code has not been committed to
  SF yet.

---------------------------
TODO

* Summarize CSHL genome informatics meeting happenings relevant to
  DAS/2 when others who were there are dialed in.

* Move teleconf meeting to a more UK-friendly time. US is now on
  standard time. 9am PST = 12pm EST = 17:00 GMT.
  How does this work for folks?


From Steve_Chervitz at affymetrix.com  Fri Nov  4 15:32:22 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Fri, 04 Nov 2005 12:32:22 -0800
Subject: [DAS2] Spec issues
In-Reply-To: <200510270941.30528.lstein@cshl.edu>
Message-ID: <BF910556.177BE%Steve_Chervitz@affymetrix.com>

As Gregg noted in this week's DAS/2 meeting, xml:base and
XML namespace (xmlns) are complementary technologies:
 
  * xml:base is for resolving relative URLs occurring within attribute
     values or CDATA elements
  * xmlns is for resolving names of attributes and elements.

So bearing this in mind, here's my take:

On Thursday 27 October 2005, Lincoln Stein wrote:
>
> On Wednesday 26 October 2005 07:29 pm, Chervitz, Steve wrote:
> >
> > <snip>
> >
> > Next issue: Feature properties example (only showing relevant attributes):
> >
> > Description: Properties are typed using the ptype attribute. The value of
> > the property may be indicated by a URL given by the href attribute, or may
> > be given inline as the CDATA content of the <PROP> section.
> >
> > <FEATURES xml:base="http://www.wormbase.org/das/genome/volvox/1/">
> >   <FEATURE id="feature/cTel54X.1.2"
> >                    type="type/curated_exon">
> >     <PROP ptype="property/genefinder-score">29</PROP>
> >     <PROP ptype="das:phase">2</PROP>
> >     <PROP ptype="property/protein_translation"
> >                href="/das/protein/volvox/2/feature/CTEL54X.1" />
> >   </FEATURE>
> > </FEATURES>
> >
> > So in contrast to the TYPE properties which are restricted to being simple
> > string-based key:value pairs, FEATURE properties can be more complex, which
> > seems reasonable, given the wild world of features. We might consider using
> > 'key' rather than 'ptype' for FEATURE properties, for consistency with TYPE
> > prop elements (however, read on).
> 
> I'm not so happy with "key" since it is nondescript. Originally this was
> "type" but the word collided with feature type.
> 
> I am getting uncomfortable with the dichotomy we've (I've?) created between
> XML base keys/properties and namespace-based keys/properties. It seems nasty
> to have the ptype attribute be either a relative URI
> (property/genefinder-score), or a controlled vocabulary member (das:phase).
> Is there any reason we shouldn't choose one or the other?
> 
> For example, does this work?
> 
>  <FEATURES xmlns:das="http://www.biodas.org/ns/das/genome/2.00"
>         xmlns:dasprop="http://www.biodas.org/ns/das/genome/2.00/properties"
>         xmlns:type="http://www.wormbase.org/das/genome/volvox/1/type"
>         xmlns:id="http://www.wormbase.org/das/genome/volvox/1/feature">
>         xmlns:prop="http://www.wormbase.org/das/genome/volvox/1/property">
>       <FEATURE das:id="id:cTel54X.1.2"
>                   das:type="type:curated_exon">
>              <PROP das:ptype="prop:genefinder-score">29</PROP>
>              <PROP das:ptype="dasprop:phase">2</PROP>
>              <PROP das:ptype="dasprop:protein_translation"
>  das:href="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1" />
>       </FEATURE>
> 
> This looks so much cleaner to me.

Here's a new version of this example using xml:base, a default xmlns,
and a special attribute to define the URL for the controlled
vocabulary of DAS property keys. I'm also using xlink for the href:

  <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00"
            xmlns:das="http://www.biodas.org/ns/das/genome/2.00"
            xml:base="http://www.wormbase.org/das/genome/volvox/1/"
            xmlns:xlink="http://www.w3.org/1999/xlink"
            das:prop="http://www.biodas.org/ns/das/genome/2.00/properties"
            >
    <FEATURE das:id="feature/cTel54X.1.2"
             das:type="type/curated_exon">
      <PROP das:ptype="property/genefinder-score">29</PROP>
      <PROP das:ptype="das:prop#phase">2</PROP>
      <PROP das:ptype="das:prop#protein_translation"
            xlink:type="simple"
  xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1
/>
    </FEATURE>

According to the W3C XML namespace spec, the default namespace only
applies to elements, which is why there is a separate 'xmlns:das'
pointing to the same URL as the default namespace. This permits
assigning a namespace to the attributes.

The above example avoids using xmlns in a non-standard way (i.e.,
referring to a namespace within attribute values, as in Lincoln's
example). The interpretation is as follows:

 * the 'das:prop' namespace defines the controlled vocabulary for
   property types occurring in this document

 * 'das:id', 'das:type', and 'das:ptype' attribute keys are defined
   within the xmlns:das namespace (i.e., the full id of 'type' is
   derived by appending '#type' to the xmlns:das URL).

 * the values of the 'das:id', 'das:type', and 'das:ptype' attributes
   are URLs relative to xml:base unless they begin with 'das:prop#', in
   which case they are relative to the das:prop namespace.

So, for example, the 'das:ptype#phase' attribute value is really
shorthand for this absolute, globally unique URL (which, if it
existed, could provide metadata about the property type):
http://www.biodas.org/ns/das/genome/2.00/properties#phase

The value of the property for this feature is given by the CDATA (29),
but could also be specified via an xlink:href attribute, as in the
protein_translation property above (which must be resolved to get the
actual value).

What do folks think about this scheme? We could do a similar thing
with type properties.

Also, how do folks feel about using xlink for all of our href
attributes as shown above? Seems more correct to me. We refer to the
xlink namespace already in our XML examples, but don't actually use it
anywhere.

Steve


From suzi at fruitfly.org  Sat Nov  5 17:49:52 2005
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Sat, 5 Nov 2005 17:49:52 -0500
Subject: [DAS2] DAS/2 weekly meeting notes
In-Reply-To: <BF8FEA55.17720%Steve_Chervitz@affymetrix.com>
References: <BF8FEA55.17720%Steve_Chervitz@affymetrix.com>
Message-ID: <01981b37459444007897faab16d589aa@fruitfly.org>

9 PST works for me.

Would it be good to add an introduction to Apollo to introduce IGB 
folks to it onto the agenda?

-S

On Nov 3, 2005, at 7:24 PM, Chervitz, Steve wrote:

> Ed:
>
> ---------------------------
> TODO
>
> * Summarize CSHL genome informatics meeting happenings relevant to
>   DAS/2 when others who were there are dialed in.
>
> * Move teleconf meeting to a more UK-friendly time. US is now on
>   standard time. 9am PST = 12pm EST = 17:00 GMT.
>   How does this work for folks?
>
>
>
>
>
>
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Mon Nov  7 17:01:50 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 7 Nov 2005 23:01:50 +0100
Subject: [DAS2] DAS/2 weekly meeting notes
In-Reply-To: <BF8FEA55.17720%Steve_Chervitz@affymetrix.com>
References: <BF8FEA55.17720%Steve_Chervitz@affymetrix.com>
Message-ID: <ea91e736d3ca2c12c0d22e11c7c01607@dalkescientific.com>

Hi all,

On Nov 4, 2005, at 1:24 AM, Chervitz, Steve wrote:
> * Move teleconf meeting to a more UK-friendly time. US is now on
>   standard time. 9am PST = 12pm EST = 17:00 GMT.
>   How does this work for folks?

But still on a Thursday?  That's 18:00 here and I have
Swedish class on Tuesdays and Thursdays from 17:30-20:00.
Then again I'm not finding it that useful.

Wednesdays would work better for me at that time of day,
or better yet would be Monday.

					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Mon Nov  7 17:41:26 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Mon, 07 Nov 2005 14:41:26 -0800
Subject: [DAS2] DAS/2 weekly meeting notes
In-Reply-To: <ea91e736d3ca2c12c0d22e11c7c01607@dalkescientific.com>
Message-ID: <BF951816.179D5%Steve_Chervitz@affymetrix.com>

Wednesday mornings PST conflict with a science seminar here.
Monday mornings would be possible, but Thursday AMs would be preferable.

Regarding Swedish, try these out:

How to learn Swedish in 1000 difficult lessons
http://www.francisstrand.blogspot.com/
Or
Swedish - a brief presentation
http://web.hhs.se/isa/swedish/

Steve

> From: Andrew Dalke <dalke at dalkescientific.com>
> Date: Mon, 7 Nov 2005 23:01:50 +0100
> To: DAS/2 <das2 at portal.open-bio.org>
> Subject: Re: [DAS2] DAS/2 weekly meeting notes
> 
> Hi all,
> 
> On Nov 4, 2005, at 1:24 AM, Chervitz, Steve wrote:
>> * Move teleconf meeting to a more UK-friendly time. US is now on
>>   standard time. 9am PST = 12pm EST = 17:00 GMT.
>>   How does this work for folks?
> 
> But still on a Thursday?  That's 18:00 here and I have
> Swedish class on Tuesdays and Thursdays from 17:30-20:00.
> Then again I'm not finding it that useful.
> 
> Wednesdays would work better for me at that time of day,
> or better yet would be Monday.
> 
> Andrew
> dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Steve_Chervitz at affymetrix.com  Tue Nov  8 18:12:25 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Tue, 08 Nov 2005 15:12:25 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
 responses!
In-Reply-To: <Pine.LNX.4.58.0510290848430.30816@sumo.ctrl.ucla.edu>
Message-ID: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>

I just updated the spec accordingly. Be sure to re-load your browser:

http://biodas.org/documents/das2/das2_get.html

Steve


> From: Allen Day <allenday at ucla.edu>
> Date: Sat, 29 Oct 2005 08:54:59 -0700 (PDT)
> To: "Helt,Gregg" <Gregg_Helt at affymetrix.com>
> Cc: Steve Chervitz <steve_chervitz at affymetrix.com>
> Subject: Re: New problem with content-type header in DAS/2 server responses!
> 
> Looks like the cache server.  FYI, I have updated the server to use all
> "text/xml" Content-Type for all xml response types.  This was approved by
> Lincoln so that web browsers could be pointed at the das server and "just
> work".  I thought these changes had already made their way into the spec,
> but apparently not.
> 
> The table below summarizes what the server should be giving back.  The
> left column shows the command and format request, and the right side shows
> the response Content-Type.
> 
>                     'das/das2xml'             => 'text/xml',
>                     'domain/das2xml'          => 'text/xml',
>                     'domain/compact'          => 'text/plain',
>                     'feature/das2xml'         => 'text/xml',
>                     'feature/chain'           => 'text/plain', #LOOK
>                     'property/das2xml'        => 'text/xml',
>                     'region/das2xml'          => 'text/xml',
>                     'region/compact'          => 'text/plain',
>                     'sequence/das2xml'        => 'text/plain', #LOOK
>                     'sequence/fasta'          => 'text/plain',
>                     'source/das2xml'          => 'text/xml',
>                     'source/compact'          => 'text/plain',
>                     'type/das2xml'            => 'text/xml',
>                     'type/compact'            => 'text/plain',
>                     'type/obo'                => 'text/plain',
>                     'type/rdf'                => 'text/xml',
>                     'versionedsource/das2xml' => 'text/xml',
> 
> As you can see, the text/plain response to the /feature command is NOT
> being given by the server, but somehow being mangled by the cache.  Is
> this going to severly impact your demo?  If so I can disable the cache
> module.  It will be slow though.  An alternative to the cache would be to
> use our squid proxy.  Brian can probably set you up to use it very
> quickly.
> 
> Let me know what needs to be done ASAP.
> 
> -Allen
> 
> 
> On Fri, 28 Oct 2005, Helt,Gregg wrote:
> 
>> I just tried accessing the biopackages DAS/2 server from IGB, with this
>> query:
>> 
>> http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26
>> 027736:26068042;type=SO:mRNA
>> 
>> and I'm getting back a message where the XML looks fine but here are the
>> headers:
>> 
>> HTTP/1.1 200 OK
>> Date: Sat, 29 Oct 2005 05:49:46 GMT
>> Server: Apache/2.0.51 (Fedora)
>> X-DAS-Status: 200
>> Warning: 113 Heuristic expiration
>> Content-Type: text/plain; charset=UTF-8
>> Age: 259582
>> Content-Length: 6004
>> Keep-Alive: timeout=15, max=100
>> Connection: Keep-Alive
>> 
>> But according to the spec the content type header needs to be:
>> Content-Type: text/x-das-features+xml
>> I'm using this in the IGB DAS/2 client to parse responses based on the
>> content type.  With "text/plain; charset=UTF-8" IGB doesn't know what
>> parser to use and gives up.  So right now I can't visualize annotations
>> from the biopackages server.  I'm pretty sure the server was setting the
>> content-type header correctly on Wednesday -- did anything change since
>> then that could be causing this?  Could the server-side cache be doing
>> this for some reason?
>> 
>> Thanks,
>> Gregg
>>  
>> 


From dalke at dalkescientific.com  Tue Nov  8 19:27:42 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 9 Nov 2005 01:27:42 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
Message-ID: <d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>

My apologies for not tracking what's been going on in the last few
months.  I'm back now and have time for the next few months to work
on things.

So I'll start with this exchange.  I can't find the discussion in the
mailing list history.

Why the decision to use "text/xml" for all xml responses?  I read it
it is so "web browsers can 'just work'".

What are they supposed to do?  Display the XML as some sort of tree
structure?  Is that the only thing?

One thing Allen and I talked about, and he tested, was the ability to
insert a stylesheet declaration in the XML.  Is this part of the
reason to switch to using "text/xml"?

Is there anything I'm missing?

Since it looks like I'm going to be more in charge of the spec 
development,
I would like to start collecting use cases and recording these sorts of
decisions.

I think having different content-types is an important feature.  For
example, it lets a DAS browser figure out what it's looking at before
doing any parsing.  Here's my use case.

I want someone to send an email to someone else along the lines of
   "What do you think about http://blah.blah/das/genome/blah/blah"
with the URL of the object included in the email.

Paste that into a DAS browser and it should be able to figure out that
this is a sequence, a feature, a whatever.  With the old content-types
there was enough information to do that right away.  With this new
one a DAS browser needs to parse the XML to figure out what's in it.
Autodetection of XML formats?  I don't want to go there.

That's also the reason for Gregg's opposition.


You (Allen) and Lincoln, on the other hand, want that user to be able to
go to a web browser and paste the URL in, to get a basic idea of what's
there.

I think that's also important.

I think there are other solutions.  One is "if the server sees a web
browser then return the XML data streams as a 'text/xml'".

For example:
   if "Mozilla" in headers["User-Agent"]:
      ... this is IE, Mozilla, Firefox, and a few others ..

That catches most of the browsers anyone here cares about.  As
another solution, look at the "Accept" header sent by the browser.
Here's what Firefox sends:

Accept: text/xml,application/xml,application/xhtml+xml,text/html;
    q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'

Here's Safari and "links" (a text browser):

Accept: */*

Another rule them might be

if asking_for_xml_format and "*/*" in headers["Accept"]:
     ... return it as "text/xml" ...

Though a better version is to make sure the client doesn't know about
the expected content type:


if asking_for_xml_format:
    return_content_type = ... whatever is appropriate ...

    if (return_content_type not in headers["Accept"]
        and "*/*" in headers["Accept"]):

           return_content_type = "text/xml"
           .... optionally insert style sheet ....


Another solution is to send a "what kind of DAS object are you?" request
to the URL (eg, tack on a ? query or tell the server that the client 
will
"Accept: application/x-das-autodiscovery").


I think that's clumsy, but I mention it as another way to support
both DAS client app and human browser requests of the same URL.


>> From: Allen Day <allenday at ucla.edu>

>> Looks like the cache server.  FYI, I have updated the server to use 
>> all
>> "text/xml" Content-Type for all xml response types.  This was 
>> approved by
>> Lincoln so that web browsers could be pointed at the das server and 
>> "just
>> work".  I thought these changes had already made their way into the 
>> spec,
>> but apparently not.

>> On Fri, 28 Oct 2005, Helt,Gregg wrote:
>>> But according to the spec the content type header needs to be:
>>> Content-Type: text/x-das-features+xml
>>> I'm using this in the IGB DAS/2 client to parse responses based on 
>>> the
>>> content type.  With "text/plain; charset=UTF-8" IGB doesn't know what
>>> parser to use and gives up.  So right now I can't visualize 
>>> annotations
>>> from the biopackages server.  I'm pretty sure the server was setting 
>>> the
>>> content-type header correctly on Wednesday -- did anything change 
>>> since
>>> then that could be causing this?  Could the server-side cache be 
>>> doing
>>> this for some reason?

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Nov  8 19:49:27 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 9 Nov 2005 01:49:27 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
Message-ID: <7e9e19f6885240c668ac677b6ea98ff0@dalkescientific.com>

P.S.

Gregg mentioned one need for wanting more selective content-types.

Here's another.

I expect most of the XML data we return will change.  We may add
an element field or change the meaning of an element.  When that
happens, how does a client know that a "text/xml" is for one
version or another of a given document type?

I expect that will be done by returning something like

Content-Type: text/das2xml; version=2


This, btw, suggests a third solution to the problem of letting DAS/2
and web browser clients both point to the same object - se

Content-Type: text/xml; das-type=das2xml

But that's ugly.

A 4th is to go back to the "add a das-content-type header" solution
from DAS/1.  I don't want that.


Note, btw, that if a given URL can return different MIME types
for the same request then it needs a "Vary: Accept" in the response
headers so caching works correctly.


					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Tue Nov  8 20:58:07 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Tue, 08 Nov 2005 17:58:07 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2
	server responses!
In-Reply-To: <d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>
Message-ID: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>

Andrew,

Andrew Dalke <dalke at dalkescientific.com> wrote on 8 Nov 2005:
> My apologies for not tracking what's been going on in the last few
> months.  I'm back now and have time for the next few months to work
> on things.

Great to have you back. I have been focusing on the spec for the past
several weeks but would be glad to have you take the lead on it. We've been
making the retrieval spec a priority and should really focus on getting it
nailed down as soon as possible to allow others to start implementing
clients and servers against it and providing feedback. We haven't talked
about a freeze or release date for it, but maybe we should.

I started going through the open bugs in bugzilla, but only resolved one
(#1796). While going through and cleaning up the retrieval spec, I ran into
other issues that were not in bugzilla that seemed important. One was this
content-type issue that you address here.

I raised some other issues regarding types and feature properties etc. a
couple of weeks ago that I'd like you to chime in on:
http://portal.open-bio.org/pipermail/das2/2005-October/000271.html

The latest message on this thread is:
http://portal.open-bio.org/pipermail/das2/2005-November/000278.html

> So I'll start with this exchange.  I can't find the discussion in the
> mailing list history.
> 
> Why the decision to use "text/xml" for all xml responses?  I read it
> it is so "web browsers can 'just work'".
> 
> What are they supposed to do?  Display the XML as some sort of tree
> structure?  Is that the only thing?
> 
> One thing Allen and I talked about, and he tested, was the ability to
> insert a stylesheet declaration in the XML.  Is this part of the
> reason to switch to using "text/xml"?

Here's the relevant thread for reference:
http://portal.open-bio.org/pipermail/das2/2005-July/000227.html

In your other email on this thread, you said:

> This, btw, suggests a third solution to the problem of letting DAS/2
> and web browser clients both point to the same object - se
> 
> Content-Type: text/xml; das-type=das2xml
> 
> But that's ugly.

This seems like a good solution (and not too ugly IMHO). The das-type value
could be more detailed (e.g., x-das-features+xml). However, I recall that
there were possible problems with this syntax, but can't remember the
details at the moment.

Whatever the solution we decide, we should strive for simplicity. If we ask
too much of servers and clients, that will be an impediment to
implementation and maintenance.

Steve


From allenday at ucla.edu  Tue Nov  8 21:21:51 2005
From: allenday at ucla.edu (Allen Day)
Date: Tue, 8 Nov 2005 18:21:51 -0800 (PST)
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>

To be even more concise, there are two use cases being presented here:

1) DAS/2 content should be viewable in a web browser, and doing so
requires a HTTP Content-Type header to have value 'text/xml'.

2) DAS/2 content should be viewable in a specialized DAS/2 browser, and be
able to rely on HTTP headers to determine visualization mode, as
XML/DTD/Schema sniffing is undesireable.

The solution proposed in the referenced thread, or perhaps only on a
conference call, is to use the Content-Type header to address (1),
providing information to web browsers, as they are less flexible than a
specialized DAS/2 client.  (2) is addressed using a DAS/2 specific 
X-Das-Content-Type header, e.g.

==================
% GET -e 'http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr22/1000000:2000000;type=SO:mRNA' | head -100
Connection: close
Date: Wed, 09 Nov 2005 02:15:24 GMT
Server: Apache/2.0.51 (Fedora)
Content-Type: text/xml
Expires: Thu, 09 Nov 2006 02:15:24 GMT
Client-Date: Wed, 09 Nov 2005 02:19:16 GMT
Client-Peer: 164.67.183.101:80
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
X-DAS-Content-Type: text/x-das-feature+xml
X-DAS-Server: GMOD/0.0
X-DAS-Status: 200
X-DAS-Version: DAS/2.0
==================

This also has the added benefit of already being implemented for a few
months.  Are there objections to this solution?

-Allen


On Wed, 9 Nov 2005, Andrew Dalke wrote:

> My apologies for not tracking what's been going on in the last few
> months.  I'm back now and have time for the next few months to work
> on things.
> 
> So I'll start with this exchange.  I can't find the discussion in the
> mailing list history.
> 
> Why the decision to use "text/xml" for all xml responses?  I read it
> it is so "web browsers can 'just work'".
> 
> What are they supposed to do?  Display the XML as some sort of tree
> structure?  Is that the only thing?
> 
> One thing Allen and I talked about, and he tested, was the ability to
> insert a stylesheet declaration in the XML.  Is this part of the
> reason to switch to using "text/xml"?
> 
> Is there anything I'm missing?
> 
> Since it looks like I'm going to be more in charge of the spec 
> development,
> I would like to start collecting use cases and recording these sorts of
> decisions.
> 
> I think having different content-types is an important feature.  For
> example, it lets a DAS browser figure out what it's looking at before
> doing any parsing.  Here's my use case.
> 
> I want someone to send an email to someone else along the lines of
>    "What do you think about http://blah.blah/das/genome/blah/blah"
> with the URL of the object included in the email.
> 
> Paste that into a DAS browser and it should be able to figure out that
> this is a sequence, a feature, a whatever.  With the old content-types
> there was enough information to do that right away.  With this new
> one a DAS browser needs to parse the XML to figure out what's in it.
> Autodetection of XML formats?  I don't want to go there.
> 
> That's also the reason for Gregg's opposition.
> 
> 
> You (Allen) and Lincoln, on the other hand, want that user to be able to
> go to a web browser and paste the URL in, to get a basic idea of what's
> there.
> 
> I think that's also important.
> 
> I think there are other solutions.  One is "if the server sees a web
> browser then return the XML data streams as a 'text/xml'".
> 
> For example:
>    if "Mozilla" in headers["User-Agent"]:
>       ... this is IE, Mozilla, Firefox, and a few others ..
> 
> That catches most of the browsers anyone here cares about.  As
> another solution, look at the "Accept" header sent by the browser.
> Here's what Firefox sends:
> 
> Accept: text/xml,application/xml,application/xhtml+xml,text/html;
>     q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'
> 
> Here's Safari and "links" (a text browser):
> 
> Accept: */*
> 
> Another rule them might be
> 
> if asking_for_xml_format and "*/*" in headers["Accept"]:
>      ... return it as "text/xml" ...
> 
> Though a better version is to make sure the client doesn't know about
> the expected content type:
> 
> 
> if asking_for_xml_format:
>     return_content_type = ... whatever is appropriate ...
> 
>     if (return_content_type not in headers["Accept"]
>         and "*/*" in headers["Accept"]):
> 
>            return_content_type = "text/xml"
>            .... optionally insert style sheet ....
> 
> 
> 
> Another solution is to send a "what kind of DAS object are you?" request
> to the URL (eg, tack on a ? query or tell the server that the client 
> will
> "Accept: application/x-das-autodiscovery").
> 
> 
> I think that's clumsy, but I mention it as another way to support
> both DAS client app and human browser requests of the same URL.
> 
> 
> >> From: Allen Day <allenday at ucla.edu>
> 
> >> Looks like the cache server.  FYI, I have updated the server to use 
> >> all
> >> "text/xml" Content-Type for all xml response types.  This was 
> >> approved by
> >> Lincoln so that web browsers could be pointed at the das server and 
> >> "just
> >> work".  I thought these changes had already made their way into the 
> >> spec,
> >> but apparently not.
> 
> >> On Fri, 28 Oct 2005, Helt,Gregg wrote:
> >>> But according to the spec the content type header needs to be:
> >>> Content-Type: text/x-das-features+xml
> >>> I'm using this in the IGB DAS/2 client to parse responses based on 
> >>> the
> >>> content type.  With "text/plain; charset=UTF-8" IGB doesn't know what
> >>> parser to use and gives up.  So right now I can't visualize 
> >>> annotations
> >>> from the biopackages server.  I'm pretty sure the server was setting 
> >>> the
> >>> content-type header correctly on Wednesday -- did anything change 
> >>> since
> >>> then that could be causing this?  Could the server-side cache be 
> >>> doing
> >>> this for some reason?
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From dalke at dalkescientific.com  Wed Nov  9 12:37:21 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 9 Nov 2005 18:37:21 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>
References: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>
Message-ID: <a802f111f5ef1be3e1e556543ddac443@dalkescientific.com>

Steve:
> Here's the relevant thread for reference:
> http://portal.open-bio.org/pipermail/das2/2005-July/000227.html

Ahh, it's the one I half remembered, from July.

Allen said:
>  Not sure how much value there is in
> this, but here is a very simple graphical display of regions on the
> server, and their relative sizes.

I think it's useful to have web browserability, as it were, but I
think it's a secondary goal.  To me the ability to transform the
XML via the stylesheet is something that's technology driven and
not user driven.  That is, nothing in the previous work, including
the DAS/2 proposals from others, mentioned that as a need.

On the other hand, being able to get the content type of what's
coming back from the server is a design goal, and we have an
existing need -- Gregg's example -- for it.

I would rather therefore put the onus on the data provider to be
clever in sniffing out the client than in the DAS/2 client in
sniffing out the data.

Steve:
> In your other email on this thread, you said:
>
>> This, btw, suggests a third solution to the problem of letting DAS/2
>> and web browser clients both point to the same object - se
>>
>> Content-Type: text/xml; das-type=das2xml
>>
>> But that's ugly.
>
> This seems like a good solution (and not too ugly IMHO). The das-type 
> value
> could be more detailed (e.g., x-das-features+xml). However, I recall 
> that
> there were possible problems with this syntax, but can't remember the
> details at the moment.

We have discussed this on-and-off for a while now, eh?  Here's
the previous thread on it:

http://portal.open-bio.org/pipermail/das2/2004-December/000019.html

I need to do a bit more research.  I don't like the idea of making
new headers and I don't like the idea of using a modified content-type
like that.  The first because we aren't doing anything unusual
compared to other projects and the second because I don't have any
experience with that.

I suspect the answer will be:
   - by default if no "?format=" is specified then return "text/xml"
   - if the client sends an "Accept: text/x-das-features+xml" then
       return the document with the proper content-type information

In that way if someone pastes a "http://.../blah?format=xyz and they
get a bunch of garbage that can manually chop off the obvious "format="
part of the query.

But that doesn't agree with my use case, where the DAS/2 client
gets a random URL.  It would need to send "Accept: ..." where the "..."
is a list of all the possible DAS content-types.

I'll think about this some more while I'm out salsa dancing this
evening.  :)

					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Wed Nov  9 20:25:48 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Wed, 09 Nov 2005 17:25:48 -0800
Subject: [DAS2] Agenda for weekly teleconference
Message-ID: <BF97E19C.17C39%Steve_Chervitz@affymetrix.com>

Time & Day:  12:00 Noon PST, Thursday 11 Nov 2005
Tel (US):    800-531-3250
Tel (Int'l): 303-928-2693
ID:          2879055

Agenda
------

* Decide on Europe-friendly time for this teleconference.
  Proposals:
  - Thu 9am PST = 12pm EST = 17:00 GMT
  - Wed 9am PST
  - Mon 9am PST

* DAS/2 get spec issues:
  - Content-type: text/xml vs. text/x-das-blah+xml
    http://portal.open-bio.org/pipermail/das2/2005-November/000287.html

  - XML encoding of type and feature properties:
    http://portal.open-bio.org/pipermail/das2/2005-November/000278.html
   
Time and people permitting:

* Summarize CSHL genome informatics meeting happenings relevant to
  DAS/2 (Allen, Gregg, Suzi, Lincoln).

* Introduction to Apollo (Suzi)

* DAS/2 validation (Andrew)


From dalke at dalkescientific.com  Wed Nov  9 20:34:28 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 10 Nov 2005 02:34:28 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>
	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
Message-ID: <2dd2f4224520b2f35add5af1de821729@dalkescientific.com>

Allen
> To be even more concise, there are two use cases being presented here:
>
> 1) DAS/2 content should be viewable in a web browser, and doing so
> requires a HTTP Content-Type header to have value 'text/xml'.
>
> 2) DAS/2 content should be viewable in a specialized DAS/2 browser, 
> and be
> able to rely on HTTP headers to determine visualization mode, as
> XML/DTD/Schema sniffing is undesireable.

A use case describes what the user wants do to, from the user's
perspective and not the implementation perspective.  Sometimes
they are the same, as when the user mandates certain technical
decisions, but that's not the case here.  The wikipedia has a goo
definition, at http://en.wikipedia.org/wiki/Use_case .

To make use cases read nicely I've found it useful to have
a name better than "the user".  There will be many users of
different aspect of a DAS system.  Some are:
   - a person making the database/DAS adapter
   - an annotator
   - a molecular biologist

The use case where talking about here is to let person X (either
an annotator or a molecular biologist) communicate with person Y.
Rather than saying "X" and "Y" I'll say "Bill" and "Jim".  Bill
send Jim an email saying "I think there's a problem with this
annotation; it looks like it's off-by-one.  Could you take a look
at it for me?"  (Make up your own explanation :)

Jim gets the email, sees the URL, and pastes it into his browser.
If Jim is an annotator this will probably be a specialized DAS/2
client.  If he's not, then more likely it will be a web browser.

Both should "do the right thing", that is, provide meaningful
information about the given entity and options for more exploration
and analysis.


This use case suggests several functional details:

   - There needs to be a way to exchange DAS details via normal
text, for inclusion in email.  DAS uses URLs so we should build
on those.  This means they'll also likely be used in generic
web pages.  Because the specific consumer of a URL isn't known
it's not possible to put a "?format=" field on the end of the
URL. Thus these URLs must not specify the format.

   - DAS/2 client (web browsers and specialized apps) should have
some way to get (and easily get) the URL for a given annotation,
region, feature type, etc.

   - specialized DAS clients (IGB) need a way for users to enter
an arbitrary DAS URL.

If one or more of these won't happen then there's no problem.
For example, if IGB etc. all don't support entering an arbitrary
DAS URL then there's no need to handle both classes of clients.

If there's no demand for direct visualization in a web browser
then there's also no problem.

I'm going to ask about the last.  The whole point of this change
is to support the ability for a generic web browser to go to a
given URL and show something of interest.

  1) who needs that?  Can any of us point to a group of people who
would use a direct web interface to a given DAS/2 URL?  If so,
why didn't it come up in earlier discussions?

  2) what can't they go to a DAS/2 web app elsewhere and from
there tell it "now link in the data from this URL." That is,
view the URL through an intermediary.

  3) why can't we tell people "stick a 'format=html' at the end
to see iT in HTML, if you want to make a web link to it, and if
the server supports HTML displays.

  4) Who wants to make a DAS/2 web app based directly on the
DAS/2 data structure?  Yes, that makes it trivial to have a first
pass web app, but that app will suck. It'll only support browsing
the server's data structure via a tree.  It won't support, say,
the ability to incorporate more or alternate records in a view,
fancy AJAX GUIs, etc.  There will be no way to merge records from
different servers because the annotation server only understands
annotations on that server.


My view now is that having the default MIME type for a DAS/2 entity
be "text/xml", for the purpose of supporting direct web browser
visualization of that entity, is not driven by a realistic use case
and is interesting mostly for technical reasons.

As such, we shouldn't do that.  We should leave the return documents
as distinct MIME types.


That leads me to the result of more research.  The relevant
spec for the MIME type for XML documents is RFC 3023, at
   http://www.ietf.org/rfc/rfc3023.txt

For commentary also see:
   http://www.xml.com/lpt/a/2004/07/21/dive.html
   http://diveintomark.org/archives/2004/02/13/xml-media-types

These say we have lots of things to worry about.  For example,
"text/xml" requires that the content-type include the charset
declaration, else the spec says to assume the document is in
US-ASCII.  There is no way for the XML itself to override that.

If we go the "text/xml" route we mandate that either:
   - all servers include a charset in the content-type
   - those that don't must only serve ASCII data.

The proper MIME type is under "application", as
    "application/x-das-*+xml"

> then the character encoding is determined in this order:
>
> * the encoding given in the charset parameter of the Content-Type
>      HTTP header, or
> * the encoding given in the encoding attribute of the XML declaration
>      within the document, or
> * utf-8.
(quoting from http://www.xml.com/lpt/a/2004/07/21/dive.html )

Apparently some ISPs, eg. in Russian and Japan, will transcode text/xml
documents at the HTTP level, ignoring the encoding information in the
XML itself.  This can lead to problems.

As the author of those commentaries says, ?XML is tough.?
   http://diveintomark.org/archives/2004/07/06/tough


> The solution proposed in the referenced thread, or perhaps only on a
> conference call, is to use the Content-Type header to address (1),
> providing information to web browsers, as they are less flexible than a
> specialized DAS/2 client.  (2) is addressed using a DAS/2 specific
> X-Das-Content-Type header, e.g.

It must have been a conference call.  I don't see mention of that in
my back emails.  I'm thankful to Steve for doing the writeups.

To emphasize what I said earlier, what will happen in the case of
(1)?  Who will implement it?  What will users expect from it?  Why
can't those users go through some intermediate DAS web app to better
view that data?  Why can't we say "add a 'format=html' for interactive
viewing"?

As for (2), I don't want a new header.  I know I talk about conneg
and other neat features in HTTP but in re-reading appendix A of RFC 3023
   http://www.ietf.org/rfc/rfc3023.txt
it talks about over a dozen other solutions to the problem and why
they were excluded.  These include:

> A.10 How about using a conneg tag instead (e.g., accept-features:
>      (syntax=xml))?
>
>    When the conneg protocol is fully defined, this may potentially be a
>    reasonable thing to do.  But given the limited current state of
>    conneg[RFC2703] development, it is not a credible replacement for a
>    MIME-based solution.

In this case I'm willing to let people experiment with the idea
before baking it into the spec.

> A.9 How about a new Alternative-Content-Type header?
>
>    This is better than Appendix A.8, in that no extra functionality
>    needs to be added to a MIME registry to support dispatching of
>    information other than standard content types.  However, it still
>    requires both sender and receiver to be upgraded, and it will also
>    fail in many cases (e.g., web hosting to an outsourced server), 
> where
>    the user can set MIME types (often through implicit mapping to file
>    extensions), but has no way of adding arbitrary HTTP headers.

How much control will DAS/2 data providers have over their server?

I know I want to support people who provide data as a set of files
through Apache, though that's not driven by any use case.  (This
use case would involve a user who has different requirement than
either Jim or Bob.)  mod_mime is designed for that.  I don't know
how to add other headers for this case.

The data providers we have now have control over all the headers.
If that will essentially always be the case then adding a new
header isn't a problem.

Then again, if this is always the case then we can go ahead with
conneg since an argument against conneg is it puts more work on
the server implementations.

In this too I'll be conservative - DAS/2 pushes no new ground
for a web app development project; there should be no reason to
invent a new header.

> A.6 How about labeling with parameters in the other direction (e.g.,
>     application/xml; Content-Feature=iotp)?
>
>    This proposal fails under the simplest case, of a user with neither
>    knowledge of XML nor an XML-capable MIME dispatcher.  In that case,
>    the user's MIME dispatcher is likely to dispatch the content to an
>    XML processing application when the correct default behavior should
>    be to dispatch the content to the application responsible for the
>    content type (e.g., an ecommerce engine for
>    application/iotp+xml[RFC2801], once this media type is registered).
>
>    Note that even if the user had already installed the appropriate
>    application (e.g., the ecommerce engine), and that installation had
>    updated the MIME registry, many operating system level MIME
>    registries such as .mailcap in Unix and HKEY_CLASSES_ROOT in Windows
>    do not currently support dispatching off a parameter, and cannot
>    easily be upgraded to do so.  And, even if the operating system were
>    upgraded to support this, each MIME dispatcher would also separately
>    need to be upgraded.


> X-DAS-Content-Type: text/x-das-feature+xml
> X-DAS-Server: GMOD/0.0
> X-DAS-Status: 200
> X-DAS-Version: DAS/2.0
> ==================
>
> This also has the added benefit of already being implemented for a few
> months.  Are there objections to this solution?

Yes.  Several.

When did "X-DAS-Status" come back into the picture?  I thought
we talked about this in spring and nixed it because it doesn't provide
anything useful than the existing HTTP-level error code.  Or perhaps
it was fall of last year?  I think I remember raking leaves at the time.

More useful, for example, would be a document (html, xml, or otherwise)
which accompanies the error response and gives more information about
what occurred.


What does the "X-DAS-Server" get you that the normal "Server:" doesn't
get you?  What's the use case?

Why is the "X-DAS-Version" at all important?  What's important is
the data content.  It's the document return type/version that's 
important
and not the server version.


But I mentioned most of these over a year ago
   http://portal.open-bio.org/pipermail/das/2004-September/000814.html

In summary:
   - no support for direct web browser access to a URL, expect with a
       likely use case;
   - keep the default response in an XML format
   - change that XML content-type to "application/x-das-*+xml" instead 
of "text/*"
   - have no requirement for new, DAS-specific headers


					Andrew
					dalke at dalkescientific.com


From allenday at ucla.edu  Wed Nov  9 21:18:23 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed, 9 Nov 2005 18:18:23 -0800 (PST)
Subject: [DAS2] Agenda for weekly teleconference
In-Reply-To: <BF97E19C.17C39%Steve_Chervitz@affymetrix.com>
References: <BF97E19C.17C39%Steve_Chervitz@affymetrix.com>
Message-ID: <Pine.LNX.4.58.0511091817240.20974@sumo.ctrl.ucla.edu>

Missing this week, I'm in Rio de Janeiro.  I'm giving a talk on DAS 
tomorrow though, so I'm still contributing! :)

-Allen


On Wed, 9 Nov 2005, Chervitz, Steve wrote:

> Time & Day:  12:00 Noon PST, Thursday 11 Nov 2005
> Tel (US):    800-531-3250
> Tel (Int'l): 303-928-2693
> ID:          2879055
> 
> Agenda
> ------
> 
> * Decide on Europe-friendly time for this teleconference.
>   Proposals:
>   - Thu 9am PST = 12pm EST = 17:00 GMT
>   - Wed 9am PST
>   - Mon 9am PST
> 
> * DAS/2 get spec issues:
>   - Content-type: text/xml vs. text/x-das-blah+xml
>     http://portal.open-bio.org/pipermail/das2/2005-November/000287.html
> 
>   - XML encoding of type and feature properties:
>     http://portal.open-bio.org/pipermail/das2/2005-November/000278.html
>    
> Time and people permitting:
> 
> * Summarize CSHL genome informatics meeting happenings relevant to
>   DAS/2 (Allen, Gregg, Suzi, Lincoln).
> 
> * Introduction to Apollo (Suzi)
> 
> * DAS/2 validation (Andrew)
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From ed_erwin at affymetrix.com  Thu Nov 10 13:33:58 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Thu, 10 Nov 2005 10:33:58 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
Message-ID: <43739296.4030307@affymetrix.com>


Andrew Dalke wrote:
> 
> 
>> X-DAS-Content-Type: text/x-das-feature+xml
>> X-DAS-Server: GMOD/0.0
>> X-DAS-Status: 200
>> X-DAS-Version: DAS/2.0
>> ==================
>>
>> This also has the added benefit of already being implemented for a few
>> months.  Are there objections to this solution?
> 
> 
> Yes.  Several.
> 
> When did "X-DAS-Status" come back into the picture?  I thought
> we talked about this in spring and nixed it because it doesn't provide
> anything useful than the existing HTTP-level error code.  Or perhaps
> it was fall of last year?  I think I remember raking leaves at the time.
> 
> More useful, for example, would be a document (html, xml, or otherwise)
> which accompanies the error response and gives more information about
> what occurred.
> 

Using the HTTP-level error codes can cause problems.

For a user (let's call her Varla) using IE, the browser will intercept 
some error codes and present her with some IE-specific garbage, throwing 
away any content that was sent back in addition to the error code.

Even for a user (Marla this time) using IGB, firewalls and/or caching 
and/or apache port-forwarding mechanisms can throw out anything with a 
status code in the error range.

(I did test having the NetAffx DAS server send HTTP status codes, and I 
did have problems with that in IGB, though I've forgotten the specifics. 
  It was about a year ago....)

I don't care if status code is indicated with a header like
"X-DAS-Status: 200" or with some XML content, or with both.  But I think 
  the HTTP status code has to be a separate thing, and will usually be 
"400" indicating that the user (sorry, I meant to say LeRoy) 
successfully communicated with the DAS server.

Ed


From dalke at dalkescientific.com  Thu Nov 10 14:49:18 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 10 Nov 2005 20:49:18 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <43739296.4030307@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
Message-ID: <83d48ca8f7128fb04efecd673ef61459@dalkescientific.com>

Ed:
> Using the HTTP-level error codes can cause problems.

> I don't care if status code is indicated with a header like
> "X-DAS-Status: 200" or with some XML content, or with both.  But I 
> think  the HTTP status code has to be a separate thing, and will 
> usually be "400" indicating that the user (sorry, I meant to say 
> LeRoy) successfully communicated with the DAS server.

Okay, sounds like using HTTP codes for this causes problems in
practice.

What about returning a different content-type for that case?

200 Ok
Content-Type: application/x-das-error

<body>
Something bad happened.
</body>


Pros:
   - doesn't add a new header
   - just as easy to detect in the client
   - easier to support on the server for some use cases


					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Thu Nov 10 14:34:51 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 10 Nov 2005 14:34:51 -0500
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <43739296.4030307@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
Message-ID: <200511101434.51966.lstein@cshl.edu>

I didn't know that X-DAS-Status had ever been deprecated. I strongly feel that 
the DAS status codes are separate from the HTTP codes and should not try to 
piggyback on the HTTP status line.

Lincoln

On Thursday 10 November 2005 01:33 pm, Ed Erwin wrote:
> Andrew Dalke wrote:
> >> X-DAS-Content-Type: text/x-das-feature+xml
> >> X-DAS-Server: GMOD/0.0
> >> X-DAS-Status: 200
> >> X-DAS-Version: DAS/2.0
> >> ==================
> >>
> >> This also has the added benefit of already being implemented for a few
> >> months.  Are there objections to this solution?
> >
> > Yes.  Several.
> >
> > When did "X-DAS-Status" come back into the picture?  I thought
> > we talked about this in spring and nixed it because it doesn't provide
> > anything useful than the existing HTTP-level error code.  Or perhaps
> > it was fall of last year?  I think I remember raking leaves at the time.
> >
> > More useful, for example, would be a document (html, xml, or otherwise)
> > which accompanies the error response and gives more information about
> > what occurred.
>
> Using the HTTP-level error codes can cause problems.
>
> For a user (let's call her Varla) using IE, the browser will intercept
> some error codes and present her with some IE-specific garbage, throwing
> away any content that was sent back in addition to the error code.
>
> Even for a user (Marla this time) using IGB, firewalls and/or caching
> and/or apache port-forwarding mechanisms can throw out anything with a
> status code in the error range.
>
> (I did test having the NetAffx DAS server send HTTP status codes, and I
> did have problems with that in IGB, though I've forgotten the specifics.
>   It was about a year ago....)
>
> I don't care if status code is indicated with a header like
> "X-DAS-Status: 200" or with some XML content, or with both.  But I think
>   the HTTP status code has to be a separate thing, and will usually be
> "400" indicating that the user (sorry, I meant to say LeRoy)
> successfully communicated with the DAS server.
>
> Ed
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From ed_erwin at affymetrix.com  Thu Nov 10 14:56:12 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Thu, 10 Nov 2005 11:56:12 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <83d48ca8f7128fb04efecd673ef61459@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>	<43739296.4030307@affymetrix.com>
	<83d48ca8f7128fb04efecd673ef61459@dalkescientific.com>
Message-ID: <4373A5DC.3070102@affymetrix.com>


Andrew Dalke wrote:
> Okay, sounds like using HTTP codes for this causes problems in
> practice.
> 
> What about returning a different content-type for that case?
> 
> 200 Ok
> Content-Type: application/x-das-error
> 
> <body>
> Something bad happened.
> </body>
> 

That seems fine to me.

There is still the separate issue of whether the content is 
"application/x-das-error" or simply "text/xml".  But that is another 
discussion that is already ongoing and to which I have nothing to add.


From dalke at dalkescientific.com  Thu Nov 10 15:01:45 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 10 Nov 2005 21:01:45 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <200511101434.51966.lstein@cshl.edu>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
	<200511101434.51966.lstein@cshl.edu>
Message-ID: <7fd7a40582a6d8ccdc694c2a91b6f8b7@dalkescientific.com>

Lincoln:
> I didn't know that X-DAS-Status had ever been deprecated. I strongly 
> feel that
> the DAS status codes are separate from the HTTP codes and should not 
> try to
> piggyback on the HTTP status line.

I'm okay with not having the assertion "something happened at the
DAS level" not be in the HTTP status code.  Not ecstatic, but real
world trumps purity.

I don't like the idea of adding new HTTP headers for this information.

In my client code I need to do the following:

   - was there an HTTP error code?
   - is the return content-type correct?

Having another header means I write

   - was there an HTTP error code?
   - was there a DAS error code?
   - is the return content-type correct?

I would rather have one less bit of code to do wrong.


As I also mentioned, I would like to support DAS annotations
made available through a basic Apache install and a set of files,
likely used by someone who just wants to provide annotations.
This is not one of the current design goals; should it be, or
should we require that everyone have more control over the
server?

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Nov 10 15:10:14 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 10 Nov 2005 21:10:14 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <43739296.4030307@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
Message-ID: <81b4c8e3062e94b2032e37995f26b588@dalkescientific.com>

Ed:
> For a user (let's call her Varla) using IE, the browser will intercept 
> some error codes and present her with some IE-specific garbage, 
> throwing away any content that was sent back in addition to the error 
> code.

Here's the question I had earlier.  Will people be using a DAS/2 
annotation
server directly through a web browser?  As far as I'm aware there's no
demand for this.  None of the proposals mentioned it and the current 
discussion
started from a technical discussion at ISMB; that is, because it could,
and not because it is needed.

I thought most people using IE/Moz/etc. would go a DAS application 
server,
which integrates views from different DAS annotation servers.

All this discussion is about returning pages back from an annotation
server in a form directly viewable by a web browser.

I don't see that as being useful.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Nov 10 16:45:09 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 10 Nov 2005 22:45:09 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <43739296.4030307@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
Message-ID: <725e762a203211651d1850097ae3fcc0@dalkescientific.com>

Further refining this from today's phone meeting

Ed:
> For a user (let's call her Varla) using IE, the browser will intercept 
> some error codes and present her with some IE-specific garbage, 
> throwing away any content that was sent back in addition to the error 
> code.

The case Ed came across was from an in-house group using a Windows call
out to IE as a background process to fetch a web page.  In that case
(as I understand it) it would convert HTTP error responses into its own
error messages.

Ed couldn't during the conversation recall if it was possible to
get ahold of the error code at all.  Did they have to parse the output?

> Even for a user (Marla this time) using IGB, firewalls and/or caching 
> and/or apache port-forwarding mechanisms can throw out anything with a 
> status code in the error range.

404 gets through, yes?

All of those are supposed to be transparent to error codes, or at the
very least translate them from (say) 404 to 400.

Can anyone point me to some reports of one of these mishaps?

We definitely need to have some tie-ins with the HTTP error codes.
Consider these two implementations for getting

http://example.com/das2/genome/dazypus/1.43/

(Note the typo "dazypus" -> "dasypus")

A) One system might have all "/das2" URLs forwarded to a DAS server.

B) Another might have a handler only for "/das2/genome/dasypus" and
let Apache do the rest.

In case A) the DAS server sees that the given resource doesn't exist.
It needs to return an error.  It can return either "200 Ok" followed
by a DAS error payload, or return a "404 Not Found" at the HTTP level.

In case B) the request never gets to the DAS handler because
of the typo.  Apache sees there's nothing for the resource so returns
a "404 Not Found".

The client code is easier if it can check the HTTP error code and
stop on failure.  This means it's best for case A) for the DAS/2
server to return an HTTP error code of 404, and perhaps an optional
ignorable payload.

> (I did test having the NetAffx DAS server send HTTP status codes, and 
> I did have problems with that in IGB, though I've forgotten the 
> specifics.  It was about a year ago....)

Do you have the specifics perhaps in an old email somewhere?

					Andrew
					dalke at dalkescientific.com


From ed_erwin at affymetrix.com  Thu Nov 10 17:43:02 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Thu, 10 Nov 2005 14:43:02 -0800
Subject: [DAS2] Re: how do I load probe sets into IGB now?
In-Reply-To: <83722dde0511101429m398c38ebg8e4df3d9b2a8d0da@mail.gmail.com>
References: <83722dde0511101429m398c38ebg8e4df3d9b2a8d0da@mail.gmail.com>
Message-ID: <4373CCF6.9060508@affymetrix.com>

Hi,

The old DAS loading mechanism is still there, in exactly the same place 
it used to be: File->Load DAS Features.

The new "DAS/2" tab at the bottom is for "DAS/2" servers, of which there 
are only a few at the moment, and which are still experimental.

Ed


Ann Loraine wrote:
> Hi,
> 
> Congratulations everybody on the new release of IGB!
> 
> I have a question about the new Quickload/DAS tab.
> 
> I'm trying to load some probe sets via DAS but can't figure out how to do it.
> 
> I used to be able to get them by using the "DAS" menu item, which
> opened a widget containing a menu of DAS servers.  I would select the
> one labeled AffyDas (or something like that) and then I would get to
> pick the chip (more often, chips) I wanted to see. Then IGB would
> query the server and get me the probe set design sequence alignments
> for the currently-shown region.
> 
> I can't find this in the new interface.
> 
> Can you help?
> 
> -Ann
> 
> --
> Ann Loraine
> Assistant Professor
> Section on Statistical Genetics
> University of Alabama at Birmingham
> http://www.ssg.uab.edu
> http://www.transvar.org


From ed_erwin at affymetrix.com  Thu Nov 10 17:49:47 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Thu, 10 Nov 2005 14:49:47 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <725e762a203211651d1850097ae3fcc0@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>	<43739296.4030307@affymetrix.com>
	<725e762a203211651d1850097ae3fcc0@dalkescientific.com>
Message-ID: <4373CE8B.3000302@affymetrix.com>


Andrew Dalke wrote:
> Further refining this from today's phone meeting
> 
> Ed:
> 
>> For a user (let's call her Varla) using IE, the browser will intercept 
>> some error codes and present her with some IE-specific garbage, 
>> throwing away any content that was sent back in addition to the error 
>> code.
> 
> 
> The case Ed came across was from an in-house group using a Windows call
> out to IE as a background process to fetch a web page.  In that case
> (as I understand it) it would convert HTTP error responses into its own
> error messages.
> 
> Ed couldn't during the conversation recall if it was possible to
> get ahold of the error code at all.  Did they have to parse the output?

Here is some info from microsoft about these "friendly HTTP error messages":

http://support.microsoft.com/kb/q218155/

Note that whether the real error message gets through seems to depend on 
  both the error code, and the length of the content.  How is that friendly?


>> (I did test having the NetAffx DAS server send HTTP status codes, and 
>> I did have problems with that in IGB, though I've forgotten the 
>> specifics.  It was about a year ago....)
> 
> 
> Do you have the specifics perhaps in an old email somewhere?
> 

I can look around when I get back from vacation, which I'm on all next week.

Ed


From Gregg_Helt at affymetrix.com  Thu Nov 10 17:46:23 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 10 Nov 2005 14:46:23 -0800
Subject: [DAS2] RE: how do I load probe sets into IGB now?
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAB0@msex02.affymetrix.com>

That data is on a DAS/1 server.  The new "Data Access" tab is just for
QuickLoad and DAS/2 servers.  DAS/1 servers are still accessible via the
"File --> Load DAS Features" menu item.

In the near term the plan is to soon move the DAS/1 access into the
"Data Access" tab as a DAS/1 subtab alongside the QuickLoad and DAS/2
subtabs, but this wasn't ready in time for the current release.  In the
longer term the probe data will be hosted on both DAS/1 and DAS/2
servers.

	gregg

> -----Original Message-----
> From: Ann Loraine [mailto:aloraine at gmail.com]
> Sent: Thursday, November 10, 2005 2:30 PM
> To: das2 at portal.open-bio.org
> Cc: Helt,Gregg; Erwin, Ed
> Subject: how do I load probe sets into IGB now?
> 
> Hi,
> 
> Congratulations everybody on the new release of IGB!
> 
> I have a question about the new Quickload/DAS tab.
> 
> I'm trying to load some probe sets via DAS but can't figure out how to
do
> it.
> 
> I used to be able to get them by using the "DAS" menu item, which
> opened a widget containing a menu of DAS servers.  I would select the
> one labeled AffyDas (or something like that) and then I would get to
> pick the chip (more often, chips) I wanted to see. Then IGB would
> query the server and get me the probe set design sequence alignments
> for the currently-shown region.
> 
> I can't find this in the new interface.
> 
> Can you help?
> 
> -Ann
> 
> --
> Ann Loraine
> Assistant Professor
> Section on Statistical Genetics
> University of Alabama at Birmingham
> http://www.ssg.uab.edu
> http://www.transvar.org


From dalke at dalkescientific.com  Thu Nov 10 18:19:51 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 11 Nov 2005 00:19:51 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <4373CE8B.3000302@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>	<43739296.4030307@affymetrix.com>
	<725e762a203211651d1850097ae3fcc0@dalkescientific.com>
	<4373CE8B.3000302@affymetrix.com>
Message-ID: <0cc693a86af103c99b668e5f6db2c9e6@dalkescientific.com>

> Here is some info from microsoft about these "friendly HTTP error 
> messages":
>
> http://support.microsoft.com/kb/q218155/
>
> Note that whether the real error message gets through seems to depend 
> on  both the error code, and the length of the content.  How is that 
> friendly?

Indeed.

>> Internet Explorer 5 and later provides a replacement for the HTML 
>> template for the following friendly error messages:
>>
>> 400, 403, 404, 405, 406, 408, 409, 410, 500, 501, 505

I've marked them with ***.  The only ones I think we might use,
were we to piggyback, are 409 (for locking?), 415 (for servers that
don't support a requested format) and 416 (for unsupported range
requests?).

  ***    400: ('Bad request',
               'Bad request syntax or unsupported method'),
         401: ('Unauthorized',
               'No permission -- see authorization schemes'),
         402: ('Payment required',
               'No payment -- see charging schemes'),
  ***    403: ('Forbidden',
               'Request forbidden -- authorization will not help'),
  ***    404: ('Not Found', 'Nothing matches the given URI'),
  ***    405: ('Method Not Allowed',
               'Specified method is invalid for this server.'),
  ***    406: ('Not Acceptable', 'URI not available in preferred 
format.'),
         407: ('Proxy Authentication Required', 'You must authenticate 
with '
               'this proxy before proceeding.'),
  ***    408: ('Request Time-out', 'Request timed out; try again 
later.'),
  ***    409: ('Conflict', 'Request conflict.'),
  ***    410: ('Gone',
               'URI no longer exists and has been permanently removed.'),
         411: ('Length Required', 'Client must specify Content-Length.'),
         412: ('Precondition Failed', 'Precondition in headers is 
false.'),
         413: ('Request Entity Too Large', 'Entity is too large.'),
         414: ('Request-URI Too Long', 'URI is too long.'),
         415: ('Unsupported Media Type', 'Entity body in unsupported 
format.'),
         416: ('Requested Range Not Satisfiable',
               'Cannot satisfy request range.'),
         417: ('Expectation Failed',
               'Expect condition could not be satisfied.'),

  ***    500: ('Internal error', 'Server got itself in trouble'),
  ***    501: ('Not Implemented',
               'Server does not support this operation'),
         502: ('Bad Gateway', 'Invalid responses from another 
server/proxy.'),
         503: ('Service temporarily overloaded',
               'The server cannot process the request due to a high 
load'),
         504: ('Gateway timeout',
               'The gateway server did not receive a timely response'),
  ***    505: ('HTTP Version not supported', 'Cannot fulfill request.'),


> I can look around when I get back from vacation, which I'm on all next 
> week.

Enjoy!
					Andrew
					dalke at dalkescientific.com


From aloraine at gmail.com  Thu Nov 10 17:29:48 2005
From: aloraine at gmail.com (Ann Loraine)
Date: Thu, 10 Nov 2005 16:29:48 -0600
Subject: [DAS2] how do I load probe sets into IGB now?
Message-ID: <83722dde0511101429m398c38ebg8e4df3d9b2a8d0da@mail.gmail.com>

Hi,

Congratulations everybody on the new release of IGB!

I have a question about the new Quickload/DAS tab.

I'm trying to load some probe sets via DAS but can't figure out how to do it.

I used to be able to get them by using the "DAS" menu item, which
opened a widget containing a menu of DAS servers.  I would select the
one labeled AffyDas (or something like that) and then I would get to
pick the chip (more often, chips) I wanted to see. Then IGB would
query the server and get me the probe set design sequence alignments
for the currently-shown region.

I can't find this in the new interface.

Can you help?

-Ann

--
Ann Loraine
Assistant Professor
Section on Statistical Genetics
University of Alabama at Birmingham
http://www.ssg.uab.edu
http://www.transvar.org


From allenday at ucla.edu  Thu Nov 10 20:39:36 2005
From: allenday at ucla.edu (Allen Day)
Date: Thu, 10 Nov 2005 17:39:36 -0800 (PST)
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>
	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0511101724470.20615@sumo.ctrl.ucla.edu>

> What does the "X-DAS-Server" get you that the normal "Server:" doesn't
> get you?  What's the use case?

I don't know.  The absence of this header was actually reported by Dasypus
output sent to me by you on May 26, 2005.  Here's a snippet of the Dasypus
diagnostics, followed by a comment from you:

"Date: Thu, 26 May 2005 12:29:32 -0600
From: Andrew Dalke <dalke at dalkescientific.com>
To: DAS/2 <das2 at portal.open-bio.org>
Subject: [DAS2] dasypus status
[...]
WARNING: Adding X-DAS-Server header 'gmod/0.0'

The prototype doesn't mention the DAS server used.  I stick one
in based on the host name.
[...]"

> Why is the "X-DAS-Version" at all important?  What's important is the
> data content.  It's the document return type/version that's important
> and not the server version.

It was actually originally (as far as I can tell from my email archive)
discussed, along with X-DAS-Status in an email from Lincoln on May 21,
2004, and forwarded to me on August 12, 2004:

"-----Original Message-----
From: Lincoln Stein [mailto:lstein at cshl.edu] 
Sent: Friday, May 21, 2004 1:22 PM
To: edgrif at sanger.ac.uk; Gregg_Helt at affymetrix.com; avc at sanger.ac.uk;
gilmanb at mac.com; dalke at dalkescientific.com
Cc: lstein at cshl.edu; allen.day at ucla.edu
Subject: DAS/2 notes
[...]
In addition to the standard HTTP response headers, DAS servers return the
following HTTP headers:

    X-DAS-Version: DAS/2.0
    X-DAS-Status:  XXX status code
[...]"

> But I mentioned most of these over a year ago
>    http://portal.open-bio.org/pipermail/das/2004-September/000814.html
> 
> In summary:
>    - no support for direct web browser access to a URL, expect with a
>        likely use case;
>    - keep the default response in an XML format
>    - change that XML content-type to "application/x-das-*+xml" instead 
> of "text/*"
>    - have no requirement for new, DAS-specific headers

This discussion suggests we need for a more formal process of modifying
the client and server implementations, e.g. modify spec first and commit,
then update code.

-Allen


From td2 at sanger.ac.uk  Fri Nov 11 04:24:52 2005
From: td2 at sanger.ac.uk (Thomas Down)
Date: Fri, 11 Nov 2005 09:24:52 +0000
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <83d48ca8f7128fb04efecd673ef61459@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
	<83d48ca8f7128fb04efecd673ef61459@dalkescientific.com>
Message-ID: <8C869723-601C-4236-B9FA-88F6D6401016@sanger.ac.uk>


On 10 Nov 2005, at 19:49, Andrew Dalke wrote:

> Ed:
>
>> Using the HTTP-level error codes can cause problems.
>>
>
>
>> I don't care if status code is indicated with a header like
>> "X-DAS-Status: 200" or with some XML content, or with both.  But I  
>> think  the HTTP status code has to be a separate thing, and will  
>> usually be "400" indicating that the user (sorry, I meant to say  
>> LeRoy) successfully communicated with the DAS server.
>>
>
> Okay, sounds like using HTTP codes for this causes problems in
> practice.
>
> What about returning a different content-type for that case?
>
> 200 Ok
> Content-Type: application/x-das-error
>
> <body>
> Something bad happened.
> </body>

That looks reasonable, but could we add a bit of structure:

     <dasError>
        <faultCode>407</faultCode>
        <description>The sky is falling</description>
     </dasError>

(There's also a possible argument for using textual, rather than  
numeric, error codes -- but it would be good to keep at least one  
part of the error response using a well-defined vocabulary for the  
benefit of clients that want to respond to different error conditions  
in different ways).

             Thomas.


From Steve_Chervitz at affymetrix.com  Fri Nov 11 16:24:50 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Fri, 11 Nov 2005 13:24:50 -0800
Subject: [DAS2] how do I load probe sets into IGB now?
In-Reply-To: <83722dde0511101429m398c38ebg8e4df3d9b2a8d0da@mail.gmail.com>
Message-ID: <BF9A4C22.17DDF%Steve_Chervitz@affymetrix.com>

Ann,

Go to File -> Load DAS Features. There should be a DAS server named
'NetAffx-Align' that will give you what you want.

Steve


> From: Ann Loraine <aloraine at gmail.com>
> Date: Thu, 10 Nov 2005 16:29:48 -0600
> To: <das2 at portal.open-bio.org>
> Cc: <ed_erwin at affymetrix.com>, "Helt,Gregg" <gregg_helt at affymetrix.com>
> Subject: [DAS2] how do I load probe sets into IGB now?
> 
> Hi,
> 
> Congratulations everybody on the new release of IGB!
> 
> I have a question about the new Quickload/DAS tab.
> 
> I'm trying to load some probe sets via DAS but can't figure out how to do it.
> 
> I used to be able to get them by using the "DAS" menu item, which
> opened a widget containing a menu of DAS servers.  I would select the
> one labeled AffyDas (or something like that) and then I would get to
> pick the chip (more often, chips) I wanted to see. Then IGB would
> query the server and get me the probe set design sequence alignments
> for the currently-shown region.
> 
> I can't find this in the new interface.
> 
> Can you help?
> 
> -Ann
> 
> --
> Ann Loraine
> Assistant Professor
> Section on Statistical Genetics
> University of Alabama at Birmingham
> http://www.ssg.uab.edu
> http://www.transvar.org
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Steve_Chervitz at affymetrix.com  Fri Nov 11 19:51:41 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Fri, 11 Nov 2005 16:51:41 -0800
Subject: [DAS2] DAS/2 weekly meeting notes for 10 Nov 05
Message-ID: <BF9A7C9D.17E26%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 10 Nov 2005.

$Id: das2-teleconf-2005-11-10.txt,v 1.1 2005/11/12 00:48:39 sac Exp $

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  UCLA: Brian O'connor
  CSHL: Lincoln Stein
  UCBerkeley: Suzi Lewis
  Sweden: Andrew Dalke
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

Agenda Items
------------

* New Euro-friendly meeting time

It was decided to change the time for this weekly teleconference to
Monday 9:30 AM PST (12:30 PM EST, 17:30 UK).

[A] New teleconf time starts next week (Monday 14 Nov)

* Spec Issues

Gregg expressed a need to dedicate some of these weekly meetings to
be focused on resolving spec issues. We will do this for next week's
meeting. 
[A] Everyone come prepared to talk about retrieval spec issues on 11/14.

Content-type issue:
 - Should we use text/xml or application/x-das-blah+xml?
 - Consensus: use application/x-das-blah+xml
 - [A] Steve will rollback changes made to the retrieval spec.
 - Andrew acknowledges that text/xml may be handy for visual debugging
   and other presentation tricks, but is not a user-driven need;
   it's a technical issue.
 - Lincoln: XML handling is very browser-dependent:
    o Firefox - nice DOM tree structure
    o Safari, Konqueror - no special rendering
    o MSIE - "Cannot be displayed"
 - Gregg: Now we just need to ensure that we're actually implementing
   the correct content-type for given responses, which brings up the
   next topic...

* Validation

 - Gregg: we'd like to start using dasypus locally to verify
   client/server compliance with the spec. What state is it in?
 - Andrew: Just getting back to it now.
   [A] Andrew will talk with Chris D. to set up a web interface at
biodas.org

* Apollo

Suzi: Can't talk about Apollo now. Will wait until Nomi is available.
[A] Nomi will present Apollo at the 28 Nov DAS/2 weekly meeting.

Status Reports
--------------

Gregg:

* CSHL Genome Informatics meeting summary of DAS/2-relevant things.
 - Gave talk about DAS/2 and demoed IGB. Went well.
 - Held a DAS BOF that was well-attended (n=15).
   Questions people had about DAS/2 have already been addressed.
   [A] Gregg will write up his CSHL DAS BOF notes and post.

   Discussion centered around what Sanger & EBI are doing with DAS.
   o There are lots of DAS-related projects there.
   o We'd like to have tighter linkage between DAS folks in the states
     and those in the the UK.
     [A] Andrew will visit the UK DAS folks more often.
     Ideas:
     + Help them transition to DAS/2
     + Hold "DASathon" or jamboree there
   o People: Tim Hubbard, Thomas Down, Andreas Prlic
   o Projects: 
     + Serving up 3D structures using modified DAS/1 server (SPICE)
     + Serving up protein annotations using modified DAS/1 server
     + Registry & discovery system for DAS/1 server
       This is SOAP-based. We'd like to have a non-SOAP-based system
       for DAS/2, which follows REST principles.
       - Andreas could likely create an HTTP-based alternative to his
         SOAP system, which uses the same core.
       - [A] Andrew will talk with Andreas P about non-SOAP reg/discovery
       - [A] DAS/2 grant needs progress on reg/discovery w/in next 6 mos

* Grant (DAS/2 continuation)
  Lots of modifications were made just prior to submitting on 1 Nov.
  Some of the changes include:
 - Work closely with Sanger and EBI where they've done lots of work
   (3D structure and protein DAS).
 - More of a mechanism will be in place to drive the spec forward:
   o Andrew = designated 'spec czar' - makes ultimate decisions
   o Lincoln = designated 'spec godfather' - retains veto power
   
Andrew:

* Brought up the header issue from the spec discussion on the list
  this week. 
 - Doesn't like the idea for 4 additional DAS-specific fields
    (error code, das version, server name, and something else)
 - Alternative: server returns content-type: application/x-das-error
 - Advantages: 
   o no new header
   o simplified header -- just check the http error code in the
     content-type. 
   o easier to implement
   o enables a flatfile-based server
   o Fits with REST philosophy of using HTTP as an application
     protocol, not a transport protocol.
 - Ed E: Can't we just return an error section in the document?
   Andrew: We could, but it requires parsing the document and only
   works for XML formats that we're in control of.
 - Gregg: The advantages of having metadata in the header outweighs
   the advantages of enabling a flatfile-based server.
   Andrew: We can utilize the existing header
   Ed E: Piggybacking error codes causes problems with proxy servers
   (see email on the DAS/2 discussion list).
 - Decision:
   [A] Use standard HTTP error codes; use XML to specify error details.
   E.g., server status=200
         content= error document
   Steve: When reviewing spec, encountered potential issues surrounding
   relationship between HTTP and DAS-specific error codes. Using
   standard HTTP codes will obviate this issue.
   Also noted that there's a bugzilla entry regarding error codes
   (which is now moot):
   http://bugzilla.open-bio.org/show_bug.cgi?id=1784
 - Ed E: MSIE hides or modifies content based on certain HTTP error
   codes it gets. This has important implications on windows platforms
   where IE's behavior can get in the way of other network-aware
   applications that don't even (knowingly) use IE.


From Steve_Chervitz at affymetrix.com  Fri Nov 11 20:52:15 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Fri, 11 Nov 2005 17:52:15 -0800
Subject: [DAS2] DAS/2 weekly meeting notes for 10 Nov 05
In-Reply-To: <BF9A7C9D.17E26%Steve_Chervitz@affymetrix.com>
Message-ID: <BF9A8ACF.17E3A%Steve_Chervitz@affymetrix.com>


> Content-type issue:
>  - Should we use text/xml or application/x-das-blah+xml?
>  - Consensus: use application/x-das-blah+xml
>  - [A] Steve will rollback changes made to the retrieval spec.

Done, but I noticed that we had been using text/x-das-blah+xml rather than
application/x-das-blah+xml. I left it as text for now, although
'application' seems more correct according to the RFC on MIME media types,
http://www.rfc-editor.org/rfc/rfc2046.txt which states:

text -- textual information. ...
          Other subtypes [i.e., anything besides 'plain'] are to
          be used for enriched text in forms
          where application software may enhance the
          appearance of the text...

application -- some other kind of data, typically
          either uninterpreted binary data or information to be
          processed by an application.  ...

Steve


From dalke at dalkescientific.com  Mon Nov 14 06:47:09 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 14 Nov 2005 12:47:09 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>
References: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>
Message-ID: <da75232ee9d46453153125e15fa7c8eb@dalkescientific.com>

Steve:
> I raised some other issues regarding types and feature properties etc.  
> a
> couple of weeks ago that I'd like you to chime in on:
> http://portal.open-bio.org/pipermail/das2/2005-October/000271.html
>
> The latest message on this thread is:
> http://portal.open-bio.org/pipermail/das2/2005-November/000278.html

I'll take them part by part.

That last message suggested

   <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00"
             xmlns:das="http://www.biodas.org/ns/das/genome/2.00"
             xml:base="http://www.wormbase.org/das/genome/volvox/1/"
             xmlns:xlink="http://www.w3.org/1999/xlink"
              
das:prop="http://www.biodas.org/ns/das/genome/2.00/properties">
     <FEATURE das:id="feature/cTel54X.1.2"
              das:type="type/curated_exon">
       <PROP das:ptype="property/genefinder-score">29</PROP>
       <PROP das:ptype="das:prop#phase">2</PROP>
       <PROP das:ptype="das:prop#protein_translation"
             xlink:type="simple"
    
xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
CTEL54X.1
/>
     </FEATURE>


I couldn't figure out why the "das:" namespace was needed for the
attributes.  Why can't they be in the default namespace?

The "das:" in the value of an attribute doesn't know anything about
the currently defined namespaces.  So this "das:" must be something
completely different from the xmlns:das=... definition.

>  * the values of the 'das:id', 'das:type', and 'das:ptype' attributes
>    are URLs relative to xml:base unless they begin with 'das:prop#', in
>    which case they are relative to the das:prop namespace.

And from what I can tell about XML, there's no standard way to implement
this using one of the standard XML parsers.  How do you get the das:prop
namespace for a given element?  The parser often does the expansion
for you.  Eg, in one of the Python XML parsers it does the translations
into Clark notation, like

   {http://www.biodas.org/ns/das/genome/2.00}ptype

For more info on XML namespaces, see http://www.jclark.com/xml/xmlns.htm


					Andrew
					dalke at dalkescientific.com


From ap3 at sanger.ac.uk  Mon Nov 14 08:29:26 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon, 14 Nov 2005 13:29:26 +0000
Subject: [DAS2] Re: what info is needed for DAS/2 registration?
In-Reply-To: <955da4ae7783e60944687d86ec691e51@dalkescientific.com>
References: <955da4ae7783e60944687d86ec691e51@dalkescientific.com>
Message-ID: <81fdf1e73ee85ae55550f12ddcee13cf@sanger.ac.uk>

Hi Andrew!

>   Looks like I will be more involved with the DAS/2 spec development,
> and I'll be visiting the UK more often.

good!

>   I want to make sure that the spec includes more of what's
> needed for registration.


o.k. very good, let's go through your mail:

>  My thought is to let the registration
> system be able to query the DAS/2 server to get most of the fields
> it needs, if not all.

o.k.

>   There may still be some need to override the
> definitions,

The experience from doing the das1 registry tells
that some corrections are needed every now and then.  It seems to be  
inevitable
  that sometimes users make mistakes / inaccuracies, etc.

> so at the manual registration level this will be used
> more to pre-populate an entry with a default.

sounds good. - so this means the configuration for setting up a DAS  
source will
get a little bigger.

> In looking at the manual registration page I see the following,
> along with comparisons to the existing DAS/2 spec
>
>  ** Title/Nickname

used by DAS clients for the display of the das tracks

>  ** Description

for the user to get a quick grasp what the data is about. - we have 60  
sources in the registry
  by now and we expect to be up around 100 soon, so one needs a way to  
learn which of the
sources are serving the data which is of particular interest ...

>  ** URL for more detailed description

a link back to the homepage  of the project that provides the data

>
> DAS/2 does not have this information for the service as a whole.
> It does have it for each of the databases, somewhat.  Here is
> an example from the spec.
>
>   <SOURCE id="volvox" description="Volvox Example Database"
>           taxon="http://www.ncbi.nlm.nih.gov/taxon-browser?id=29118"
>      
> doc_href="http://www.wormbase.org/documentation/users_guide/ 
> volvox.html" >
>
>
> Should we add a "title" field to each data source?

yes that would be good

> Should we
> add title/description/url fields to the DAS/2 service as a whole?

not sure what you mean by that

>   ** coordinate system
>
> Each data source may have 1 or more versions.  The version information
> looks like
>      <VERSION id="volvox/1" description="Build 1, October 2002">
>        <ASSEMBLY id="http://www.ensembl.org/das/genome/vv116" />
>      </VERSION>>
>
> In theory that assembly id could be a URL with more detailed
> information about the assembly.  Right now it's used as a unique
> identifier.  There is nothing there to convert these URLs into
> something human-readable.

Hm. not sure if I am completely convinced with representing a  
coordinate system as a url.
What  if two reference servers provide the same assembly or are mirrors  
of each other?

I would see it  in a way where a DAS client would asks the registry  
"where are all the reference servers
for  NCBI 35- homo sapiens?"
  and then gets a list providing e.g. an american and a european mirror  
server
the client could choose the one which is geographically closer.


>
> Possible solutions for this are:
>   - define an "assembly" document, to be put at that URL and
>      include the authority/version/type/organism data mentioned at
>      http://das.sanger.ac.uk/registry/help_coordsys.jsp

something like that.


>  ** DAS url
>
> Yep, DAS/2 has that one.  :)

:-)

>
>   ** Admin email
>
> Hmm.  Yeah, there should be more information about the service as
> a whole.  Admin email and perhaps a documentation href, eg, with
> information about planned downtime.

would be good.

>
>   ** DAS capabilities
>
> That's handled differently in DAS/2.  Did people really use this
> information?

actually this information  is important (for das1) - it is used to  
distinguish reference servers
and annotation servers ( on the client side)
and needed for validation (on the registry side)
"capabilities" are also related to data-types. E.g. a genome DAS client  
does not need
to query a protein structure, because it can not do 3D...

>   ** Test access/ segment code labels

I think there is a misunderstanding here:
the test code is not a  "label"
The test code is e.g. a chromosomal segment or an accession code for a  
protein database
for which annotations are returned if a feature request is being made.

The "label" is used mainly to describe by which project a source is  
being funded.

>> We are currently discussing if the labels should be used to describe
>> a DAS source in more detail. e.g. "experimentally verified",
>> "computational prediction", etc.
>
> These are two different things in one field.

yes you are very right. Together with the BioSapiens DAS people we  
recently decided that there
should be the possibility to assign gene-ontology evidence codes to  
each das source, so in the next
update of the registry, this will be changed.


>
> What I'm going to propose is a generic key/value data structure
> for just about all records.  Some of the key names will be well
> defined.  Others can add new fields to experiment with / extend
> the spec in a semi-constrained fashion.  This would let people
> try out a new property easily.

sounds good.

> In summary it sound like DAS/2 needs:
>   - a few more pieces of meta data (eg, information about the
>       service as a whole)
>   - a bit better defined way to get information about the
>       reference assembly
>

I would agree to both that

Greetings,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From Gregg_Helt at affymetrix.com  Mon Nov 14 12:09:11 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 14 Nov 2005 09:09:11 -0800
Subject: [DAS2] DAS/2 teleconference at 9:30 AM today PST
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FABC@msex02.affymetrix.com>

Just a reminder that we've rescheduled the weekly DAS/2 teleconference
for Mondays @ 9:30 AM Pacific time, starting today.  I'm hoping the new
time will give more people a chance to participate.  Teleconference
numbers:
 
US dialin:                      800-531-3250
International dialin:         303-928-2693
Conference ID:               2879055
 
 We're also revising the format to focus on alternating weeks on the
DAS/2 specification itself or implementations of the specification.
This should allow people who are mainly concerned about one or the other
to avoid extra overhead.  Today we will focus on spec issues.
 
            thanks,
            Gregg Helt
 

From lstein at cshl.edu  Mon Nov 14 12:23:18 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Nov 2005 12:23:18 -0500
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <725e762a203211651d1850097ae3fcc0@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<43739296.4030307@affymetrix.com>
	<725e762a203211651d1850097ae3fcc0@dalkescientific.com>
Message-ID: <200511141223.19367.lstein@cshl.edu>

Well, I give up arguing this one and will go with the way Andrew wants to do 
it. Therefore I propose the following rules:

	1) Return the HTTP 404 error for the case that any component of the DAS2 path
	is invalid. This would apply to the following situations:

		Bad namespace
		Bad data source
		Unknown object ID

	2) Return HTTP 301 and 302 redirects when the requested object has
moved.

	3) Return HTTP 403 (forbidden) for no-lock errors.

	4) Return HTTP 500 when the server crashes.

For all errors there should be a text/x-das-error entity returned that 
describes the error in more detail. 

Lincoln
	

On Thursday 10 November 2005 04:45 pm, Andrew Dalke wrote:
> Further refining this from today's phone meeting
>
> Ed:
> > For a user (let's call her Varla) using IE, the browser will intercept
> > some error codes and present her with some IE-specific garbage,
> > throwing away any content that was sent back in addition to the error
> > code.
>
> The case Ed came across was from an in-house group using a Windows call
> out to IE as a background process to fetch a web page.  In that case
> (as I understand it) it would convert HTTP error responses into its own
> error messages.
>
> Ed couldn't during the conversation recall if it was possible to
> get ahold of the error code at all.  Did they have to parse the output?
>
> > Even for a user (Marla this time) using IGB, firewalls and/or caching
> > and/or apache port-forwarding mechanisms can throw out anything with a
> > status code in the error range.
>
> 404 gets through, yes?
>
> All of those are supposed to be transparent to error codes, or at the
> very least translate them from (say) 404 to 400.
>
> Can anyone point me to some reports of one of these mishaps?
>
> We definitely need to have some tie-ins with the HTTP error codes.
> Consider these two implementations for getting
>
> http://example.com/das2/genome/dazypus/1.43/
>
> (Note the typo "dazypus" -> "dasypus")
>
> A) One system might have all "/das2" URLs forwarded to a DAS server.
>
> B) Another might have a handler only for "/das2/genome/dasypus" and
> let Apache do the rest.
>
> In case A) the DAS server sees that the given resource doesn't exist.
> It needs to return an error.  It can return either "200 Ok" followed
> by a DAS error payload, or return a "404 Not Found" at the HTTP level.
>
> In case B) the request never gets to the DAS handler because
> of the typo.  Apache sees there's nothing for the resource so returns
> a "404 Not Found".
>
> The client code is easier if it can check the HTTP error code and
> stop on failure.  This means it's best for case A) for the DAS/2
> server to return an HTTP error code of 404, and perhaps an optional
> ignorable payload.
>
> > (I did test having the NetAffx DAS server send HTTP status codes, and
> > I did have problems with that in IGB, though I've forgotten the
> > specifics.  It was about a year ago....)
>
> Do you have the specifics perhaps in an old email somewhere?
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Mon Nov 14 12:28:10 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Nov 2005 12:28:10 -0500
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <da75232ee9d46453153125e15fa7c8eb@dalkescientific.com>
References: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>
	<da75232ee9d46453153125e15fa7c8eb@dalkescientific.com>
Message-ID: <200511141228.11358.lstein@cshl.edu>

On Monday 14 November 2005 06:47 am, Andrew Dalke wrote:
> Steve:
> > I raised some other issues regarding types and feature properties etc.
> > a
> > couple of weeks ago that I'd like you to chime in on:
> > http://portal.open-bio.org/pipermail/das2/2005-October/000271.html
> >
> > The latest message on this thread is:
> > http://portal.open-bio.org/pipermail/das2/2005-November/000278.html
>
> I'll take them part by part.
>
> That last message suggested
>
>    <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00"
>              xmlns:das="http://www.biodas.org/ns/das/genome/2.00"
>              xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>              xmlns:xlink="http://www.w3.org/1999/xlink"
>
> das:prop="http://www.biodas.org/ns/das/genome/2.00/properties">
>      <FEATURE das:id="feature/cTel54X.1.2"
>               das:type="type/curated_exon">
>        <PROP das:ptype="property/genefinder-score">29</PROP>
>        <PROP das:ptype="das:prop#phase">2</PROP>
>        <PROP das:ptype="das:prop#protein_translation"
>              xlink:type="simple"
>
> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/
> CTEL54X.1
> />
>      </FEATURE>
>
>
> I couldn't figure out why the "das:" namespace was needed for the
> attributes.  Why can't they be in the default namespace?

The extras das: prefix is not needed since it is the same namespace as the 
default namespace.

My feeling is that we should NOT be using namespaces in attribute names but 
not in attribute values (e.g. das:ptype is ok, but "das:prop#phase" is not 
OK). For attribute values we should be using URIs consistently.

Lincoln

> The "das:" in the value of an attribute doesn't know anything about
> the currently defined namespaces.  So this "das:" must be something
> completely different from the xmlns:das=... definition.
>
> >  * the values of the 'das:id', 'das:type', and 'das:ptype' attributes
> >    are URLs relative to xml:base unless they begin with 'das:prop#', in
> >    which case they are relative to the das:prop namespace.
>
> And from what I can tell about XML, there's no standard way to implement
> this using one of the standard XML parsers.  How do you get the das:prop
> namespace for a given element?  The parser often does the expansion
> for you.  Eg, in one of the Python XML parsers it does the translations
> into Clark notation, like
>
>    {http://www.biodas.org/ns/das/genome/2.00}ptype
>
> For more info on XML namespaces, see http://www.jclark.com/xml/xmlns.htm
>
>
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dalke at dalkescientific.com  Mon Nov 14 12:30:07 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 14 Nov 2005 18:30:07 +0100
Subject: [DAS2] Spec issues
In-Reply-To: <BF857B92.17029%Steve_Chervitz@affymetrix.com>
References: <BF857B92.17029%Steve_Chervitz@affymetrix.com>
Message-ID: <05b94e3a6db3e4894af051f22f25dc4c@dalkescientific.com>

On Nov 4 Steve wrote:
>     <FEATURE das:id="feature/cTel54X.1.2"
>              das:type="type/curated_exon">
>       <PROP das:ptype="property/genefinder-score">29</PROP>
>       <PROP das:ptype="das:prop#phase">2</PROP>
>       <PROP das:ptype="das:prop#protein_translation"
>             xlink:type="simple"
>    
> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
> CTEL54X.1
> />
>     </FEATURE>

I think we're missing something.  This is XML.  We can do

<TYPES xml:base="http://www.wormbase.org/dase/genome/volvox/1/type">
   <TYPE id="curated_gene"
           ontology="http://song.sf.net/ontologies/sofa#gene"
           source="curated"
           xml:base="gene/">
     <das:ptype name="property/genefinder-score">29</das:ptype>
     <das:phase>2</das:phase>
     <das:protein_translation xlink:type="simple"
xlink:href="http://www.wormbase.org/..." />
     <xyz:ack type="html">This message brought to you by  
AT&amp;T</xyz:ack>
   </TYPE
</TYPES>

The whole point of having namespaces in XML is to keep from needing
to define new namespaces like <PROP>.

In doing that, there's no problem in supporting things like "bg:glyph",
etc. because the values are expanded as expected by the XML processor.


> Also, we might want to allow some controlled vocabulary terms to be  
> used for
> the value of type.source (e.g., "das:curated"), to ensure that  
> different
> users use the same term to specify that a feature type is produced by
> curation.

I talked with Andreas Prlic about what other metadata is needed for the
registry system.  He mentioned

     Together with the BioSapiens DAS people we recently decided that
     there should be the possibility to assign gene-ontology evidence
     codes to each das source, so in the next update of the registry,
     this will be changed.

That's at the source level, but perhaps it's also needed at the
annotation level.

> The spec also seems alarmed by the existence of a xml:base attribute  
> in the
> TYPE element. The idea is that any relative URL within this element  
> would be
> resolved using that element's xml:base attribute. How would folks be  
> with
> having the DAS/2 spec fully support the XML Base spec (
> http://www.w3.org/TR/xmlbase/ )? The result of this would be to add an
> optional xml:base attribute to all elements that contain URLs or  
> subelements
> with URLs.

In my reading it seems that xml:base should be included wherever.  See


http://norman.walsh.name/2005/04/01/xinclude
> Ugh.  In the short term, I think there's only one answer: update your  
> schemas to allow xml:base either (a) everywhere or (b) everywhere you  
> want XInclude to be allowed. I urge you to put it everywhere as your  
> users are likely to want to do things you never imagined. ?
>


> Description: Properties are typed using the ptype attribute. The value  
> of
> the property may be indicated by a URL given by the href attribute, or  
> may
> be given inline as the CDATA content of the <PROP> section.
>
> <FEATURES xml:base="http://www.wormbase.org/das/genome/volvox/1/">
>   <FEATURE id="feature/cTel54X.1.2"
>                    type="type/curated_exon">
>     <PROP ptype="property/genefinder-score">29</PROP>
>     <PROP ptype="das:phase">2</PROP>
>     <PROP ptype="property/protein_translation"
>                href="/das/protein/volvox/2/feature/CTEL54X.1" />
>   </FEATURE>
> </FEATURES>
>
> So in contrast to the TYPE properties which are restricted to being  
> simple
> string-based key:value pairs, FEATURE properties can be more complex,  
> which
> seems reasonable, given the wild world of features. We might consider  
> using
> 'key' rather than 'ptype' for FEATURE properties, for consistency with  
> TYPE
> prop elements (however, read on).


My thoughts on these are:
   - come up with a more consistent way to store key/value data
   - the Atom spec has a nice way to say "the data is in this CDATA
as text/html/xml" vs. "this text is over there".  I want to copy its
way of doing things.

   - I'm still not clear about xlink.  Another is the HTML-style
<link href="http://..." rel="...">

Atom uses the "rel=" to encoding information about the link.  For
example, the URL to edit a given document is

   <link ... rel="service.edit">

See http://atomenabled.org/developers/api/atom-api-spec.php


					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Mon Nov 14 14:29:22 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 14 Nov 2005 11:29:22 -0800
Subject: [DAS2] DAS/2 weekly meeting notes for 14 Nov 05
Message-ID: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 14 Nov 2005.

$Id: das2-teleconf-2005-11-14.txt,v 1.2 2005/11/14 19:20:37 sac Exp $

Attendees: 
  Affy: Steve Chervitz, Gregg Helt
  CSHL: Lincoln Stein
  UCBerkeley: Suzi Lewis
  Sweden: Andrew Dalke
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

----------------------------------
AD talked with A. Prlic about registry service, we want to incorporate
what he needs within DAS/2.

What they have:
- name (a few words) - for display of das track
- title, description (paragraph)  - synopsis
- url for more info

we have desc, id, doc_href, taxon
Therefore, we need name attribute
Need :
- name (mandatory)   (done - LS: adding it to spec now)
- desc (optional)

Coord system reg server:
* in das/2 - it's not optional (0 interbase)
* they find this important

We have confusion between assembly and reference server
LS: Need URI that points to assembly, independent of the
reference server. 
GH: Would like to have annot servers that don't know anything about
the ref server.

LS: Could use the region URI to ID the assembly
das/genome/sourceid/region = assembly id/uri

GH: The trouble is that NCBI is a ref source for many assemblies, yet
they lack a das sever. They have no URI.
LS: we can just make one up, or use most appropriate web page

LS: When you request versioned source from a server, it should say what
assembly coords it's working on and give a uri for that. In this case
there's no guarantee you can do a 'get' on that URI.
We want to say:
1- what is unique uri for assembly (everyone agrees to share this)
2- das URL for how to fetch it (some server's region url - trusted,
faithful copy with what is at ncbi). Diff servers could assert that
you can fetch it from various places.

GH: assembly could be an attribute since there'd be only one.
A list of ref servers that serve up that dna.

LS: in versioned source response. new section between capabilities and
namespaces called 'reference_sources'. Add 'assembly' attribute to
version element:
<version
   id=
   desc=
   assembly="" uri that describes assembly - mandatory
   
<reference_sources
   - tells you where to get dna and regions (could be self)
   - contains zero or more subelements --allowing for multiple sources
   where to go to get sequence, region
   

AD: consider ATOM 'link' tag, designed for links to other stuff
  includes 'rel' attribute about how it is linked (e.g., could say:
  use this url to fetch assembly)

GH: these two cases are special enough that they deserve their own
elements and attributes

purpose: if you need to retrieve residues, it tells you the base uri
to go to get the residues.

AD: Don't we already have the sequence request for that?
GH: only reference servers need implement it.

LS: All we need to do is name the assembly in the
versioned sources response

AD: ebi/sanger tracks three fields related to assembly (what they need
per server):
-authority  = equiv to our assembly uri
-organism   = we have as taxon
-type       = ?

Permits people to query things like: find out all servers that offer ncbi
build 35 for human.

Question: What do they mean by 'coord system'? some confusion here
e.g., Do they mean things like: 'this assembly start at 5000 relative
to this other assembly'?

For protein DAS, authority typically defines two diff coord systems:
'pdb resnum, interprot'

It does not permit automated translation between two coord systems.
[A] - Andrew will find out what they use it for

AD: Believes the purpose is intended for human consumption.

LS: an easy fix to a long and persistent problem about identifying which
coord system was used. Can also use for taxon indexing. e.g. at ucsc -
select organism, select assembly of that organism. Could expand it for
kingdom, phyllum, family, etc. We should use the ncbi taxon id.

AD: Where does it go?

LS: eg. bos taurus,
    taxon_id=url to ncbi taxon id page
    name=bos taurus

GH: Coord system type is still unaccounted for. Is this describable by
seq ontology?  
LS: yes but why is it important?
AD: there are 2-3 diff coord systems for protein structure DAS

GH: they have contig, chrom, scaffold - so they're looking at which
level of the assembly is being annotated.

SC: Is there a use case for alternative coord systems in DAS/2 - do we
want to permit people to offer sequence in other than 0 interbase?
GH, AD: No.

GH: is /genome the highest level we want to go?

AD: at top level (sources response), would like to add more info:
- administrative contact: email, url for admin of server
- may want to put other things:
  - pointer to license agreement for use of this data, copyright,
    liability statements. attribute="legalese" href
 - doesn't need to be machine readable
LS: each data source may have a different leagalese (ebi has 100 diff
dbs). Each db may be under control of a diff group.
Should it go in sources or version tag?
AD: sources

GH: put off content-type, status code discussion until next time.
Looking at the http spec itself right now. A surprisingly good read.
well thought out (unlike some of the xml stuff).

LS: Would like to fix a mistake regarding the confounding of
namespaces and xml:base. Want to be consistent here.
- all attrib names use namespaces
- attrib values use relative uri's (xml:base)

SC: See my post here, which also addresses the handling of attribute
values that derive from a controlled vocabulary:
http://portal.open-bio.org/pipermail/das2/2005-November/000278.html
and Andrew's response today:
http://portal.open-bio.org/pipermail/das2/2005-November/000313.html

We need to address remaining spec issues in a separate call.
[A] Continue spec-focused teleconf in two weeks (28 Nov):
- namespaces/xml:base
- http header and status code
- anything else that comes up on the das/2 list.

[A] Next week (21 Nov): Discuss impl details about client, server,
validation suite

Future agenda: impl of writeback features (would like to hear from
ebi/sanger) 


From Steve_Chervitz at affymetrix.com  Mon Nov 14 18:33:09 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 14 Nov 2005 15:33:09 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2
	server responses!
In-Reply-To: <da75232ee9d46453153125e15fa7c8eb@dalkescientific.com>
Message-ID: <BF9E5EB5.17F71%Steve_Chervitz@affymetrix.com>


Andrew Dalke <dalke at dalkescientific.com> wrote on 14 Nov 05:
> Steve:
>> I raised some other issues regarding types and feature properties etc.
>> a
>> couple of weeks ago that I'd like you to chime in on:
>> http://portal.open-bio.org/pipermail/das2/2005-October/000271.html
>> 
>> The latest message on this thread is:
>> http://portal.open-bio.org/pipermail/das2/2005-November/000278.html
> 
> I'll take them part by part.
> 
> That last message suggested
> 
>    <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00"
>              xmlns:das="http://www.biodas.org/ns/das/genome/2.00"
>              xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>              xmlns:xlink="http://www.w3.org/1999/xlink"
>               
> das:prop="http://www.biodas.org/ns/das/genome/2.00/properties">
>      <FEATURE das:id="feature/cTel54X.1.2"
>               das:type="type/curated_exon">
>        <PROP das:ptype="property/genefinder-score">29</PROP>
>        <PROP das:ptype="das:prop#phase">2</PROP>
>        <PROP das:ptype="das:prop#protein_translation"
>              xlink:type="simple"
>     
> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/
> CTEL54X.1
> />
>      </FEATURE>
> 
> 
> I couldn't figure out why the "das:" namespace was needed for the
> attributes.  Why can't they be in the default namespace?

Attributes don't have a default namespace (though one might think such a
thing would be useful). See http://www.w3.org/TR/REC-xml-names/#defaulting

This is a point which has been subject to much consternation:
http://www.rpbourret.com/xml/NamespacesFAQ.htm#q5_3
http://lists.xml.org/archives/xml-dev/200002/msg00094.html
 
> The "das:" in the value of an attribute doesn't know anything about
> the currently defined namespaces.  So this "das:" must be something
> completely different from the xmlns:das=... definition.

No, it refers to the xmlns:das definition in the parent FEATURES element.
 
>>  * the values of the 'das:id', 'das:type', and 'das:ptype' attributes
>>    are URLs relative to xml:base unless they begin with 'das:prop#', in
>>    which case they are relative to the das:prop namespace.
> 
> And from what I can tell about XML, there's no standard way to implement
> this using one of the standard XML parsers.  How do you get the das:prop
> namespace for a given element?

You've identified the key weakness of my proposal: Knowing how to expand
'das:prop' occurring within attribute values would be a DAS-specific
convention ('hack') for mapping to a controlled vocabulary for property
values. So I'm not quite satisfied with this either.

In another message of yours today, you propose an alternative to this:
http://portal.open-bio.org/pipermail/das2/2005-November/000313.html

See my reply to that for more ideas on this topic.

Steve


From td2 at sanger.ac.uk  Tue Nov 15 04:14:01 2005
From: td2 at sanger.ac.uk (Thomas Down)
Date: Tue, 15 Nov 2005 09:14:01 +0000
Subject: [DAS2] DAS/2 weekly meeting notes for 14 Nov 05
In-Reply-To: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>
References: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>
Message-ID: <21CB947F-FAE3-4D56-A110-CAB9606C9C84@sanger.ac.uk>


On 14 Nov 2005, at 19:29, Steve Chervitz wrote:
>
> Coord system reg server:
> * in das/2 - it's not optional (0 interbase)
> * they find this important

By "coordinate system" we're not really talking about the 0-based- 
vs-1-based issue, we're talking about globally unique names for sets  
of reference sequences (genome assemblies, protein databases,  
whatever).  It might be possible to come up with a better name (I  
used to call these "namespaces").

> We have confusion between assembly and reference server
> LS: Need URI that points to assembly, independent of the
> reference server.
> GH: Would like to have annot servers that don't know anything about
> the ref server

Definitely agree with this.  This kind of "opaque assembly  
identifier" is what we've been calling a coord-system name.

> LS: Could use the region URI to ID the assembly
> das/genome/sourceid/region = assembly id/uri
>
> GH: The trouble is that NCBI is a ref source for many assemblies, yet
> they lack a das sever. They have no URI.
> LS: we can just make one up, or use most appropriate web page

This is possibly an argument for avoiding the use of URLs for  
assembly identifiers, if we can't be sure that the organisation  
that's the authority for a given assembly will be running an  
authoritative DAS server.  URNs would be fine, as would the kind of  
structured but location-independent identifer that Andreas has been  
using.

> Question: What do they mean by 'coord system'? some confusion here
> e.g., Do they mean things like: 'this assembly start at 5000 relative
> to this other assembly'?

I think the way to provide this kind of information is in the form of  
a DAS alignment service between two coord-systems.  We love the idea  
of putting up alignments between NCBI34 and NCBI35 then having a  
liftover-like tool which can go off and query the registry to  
discover this.

           Thomas.


From ap3 at sanger.ac.uk  Tue Nov 15 05:24:45 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 15 Nov 2005 10:24:45 +0000
Subject: [DAS2] DAS/2 weekly meeting notes for 14 Nov 05
In-Reply-To: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>
References: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>
Message-ID: <c7f1dae793169f3d3c8dbe3be3caab82@sanger.ac.uk>

Hi!

I realized there were a couple of questions regarding the way 
"coordinate systems"
are defined in the DAS-registry, so it would have been good if I would 
have joined yesterday....
I am glad that the conference is now at a time which is better for us 
europeans and want to join
in future for some of the topics like registry, coordinate systems, 
proteins, etc.


>
> AD: ebi/sanger tracks three fields related to assembly (what they need
> per server):
> -authority  = equiv to our assembly uri
> -organism   = we have as taxon
> -type       = ?


"type" refers to a "physical dimension" of an object. E.g. a 
chromosome, a 3D protein structure, a protein sequence.


>
> Permits people to query things like: find out all servers that offer 
> ncbi
> build 35 for human.
>
> Question: What do they mean by 'coord system'? some confusion here
> e.g., Do they mean things like: 'this assembly start at 5000 relative
> to this other assembly'?

no, as Thomas already mentioned these "coordinate systems" could also 
be called "namespace".
They should be globally unique descriptors for reference objects / 
databases.


>
> For protein DAS, authority typically defines two diff coord systems:
> 'pdb resnum, interprot'

> It does not permit automated translation between two coord systems.

unfortunately this is not that easy in protein space. The mapping from 
the 3D protein structure to the protein
sequence is not straightforward. Think of negative, non-consecutive, 
and "non-numeric" residue numbers
  that can appear in the 3D structures. Therefore we came up with the 
"alignment" DAS - document that allows to map one object in one 
coordinate system to another one. it can also be used to map one 
assembly to another.


> [A] - Andrew will find out what they use it for
>
> AD: Believes the purpose is intended for human consumption.

not only - the DAS clients usually can display a certain "coordinate 
system" e.g. Ensembl can do
Chromosomal ones, but if DAS sources are available that speak the 
"UniProt, Protein Sequence" coordinate
system, it knows how to project these onto the genome. - an 
"intelligent DAS client" :-)


Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From dalke at dalkescientific.com  Wed Nov 16 21:35:32 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 17 Nov 2005 03:35:32 +0100
Subject: [DAS2] (x)link
Message-ID: <dd6e5bc9907e57b3a325d0f612f58c9c@dalkescientific.com>

I mentioned having a generic <link> tag, again based on Atom.

Steve replied:
> Not sure about this one yet. In the Atom API, the value of the rel
> attribute is restricted to a controlled vocabulary of link
> relationships and available services pertaining to editing and
> publishing syndicated content on the web:
> http://atomenabled.org/developers/api/atom-api- 
> spec.php#rfc.section.5.4.1
>
> What would a controlled vocab for DAS resources be?

I don't think I understand the Atom one.  Turns out I was actually
looking at the Atom publishing protocol at
    http://code.blogger.com/archives/atom-docs.html

which defines links including

   <link rel="service.feed" href="https://www.blogger.com/atom/1"
         title="my first blog." type="application/x.atom+xml" />
   <link rel="service.post" href="https://www.blogger.com/atom/1"
         title="my first blog." type="application/x.atom+xml" />
   <link rel="service.feed" href="https://www.blogger.com/atom/2"
         title="fromage blog" type="application/x.atom+xml" />
   <link rel="service.post" href="https://www.blogger.com/atom/2"
         title="fromage blog" type="application/x.atom+xml" />

     The service.post is the URI where you would send an Entry
    to post to your blog. The service.feed is the URI where you
    would make an Atom API request to see the Blog's latest entries.

We could define similar links like:
   - where to edit and/or lock the given resource
   - how to get a list of locks
   - how to get from the given DAS resource to it's
      parent (ie, how to go "up" in the tree, in the case
      of a cross-link from another server)

These could be done as distinct elements or done as qualifications
of an existing element.  The advantage of the latter (using a <link>)
is that others may add their own link types.


> Skimming through the DAS/2 retrieval spec, our use of hrefs is
> simply for pointing at the location of resources on the web
> containing some specified content (e.g., documentation, database
> entry, image data, etc.).

But they are used in different contexts (for human browsing, for
machine fetching, for "service" requests).

> The next/prev/start idea for Atom might have good applicability in the
> DAS world for iterating through versions of annotations or assemblies
> (e.g., rel='link-to-gene-on-next-version-of-genome'). One relationship
> that would be useful for DAS would be 'latest', to get the latest
> version of an annotation.

Hmm.  So every annotation would have an optional <link> section?

In the current scheme do we always get the most recent version of
an annotation?  I didn't realize there was any way to get another
version, except if it's been edited while you weren't looking.

> DAS get URLs themselves seem fairly self-documenting (it's clear a
> given link is for feature, type, or sequence for example), so having a
> separate rel attribute may not provide much additional value for these
> links. But it might be handy for versioning and for DAS/2 writebacks.

I hadn't thought of versioning; I was thinking more of writebacks
an and how to find the parent.

I was also thinking of structure data where I might want the
experimental x-ray density data for a a given structure.  That
might be done like

  <link rel="density.experimental_xray" type="chemical/x-ccp4-edm"
     href="http://blah.blah/">

That's part of the newly submitted DAS proposal so should not really  
drive
this work.


Steve also mentioned xlink.  I've been looking at the spec but
still don't understand its implications.  There are several^H^Hmany
parts to the spec I don't understand, especially in the context of DAS.
locator? "arcrole"?  "actuate"?

Are all our links "simple"?  Do we use anything else because the
href?

Also, I see no mention in that spec of content-type.  One of the
things in the Atom spec is support (though not in the spec proper)
for alternate or multiple way to resolve a link

<link rel="something" title="This is a title">
   <x:mirror href="http://here/"/>
   <x:mirror href="http://there/"/>
   <x:mirror href="http://everywhere/"/>
</link>

or multiple formats

<link title="Look at the mighty squid">
   <x:alt href="http://squid.example.com/squid.gif" />
   <x:alt href="http://squid.example.org/squid.png" />
</link>

(That is, a <link> may contain subelements and these subelements,
if in something other than the "das" namespace, are free to
add variant meanings.)


					Andrew
					dalke at dalkescientific.com


From ilari.scheinin at helsinki.fi  Fri Nov 18 10:22:47 2005
From: ilari.scheinin at helsinki.fi (Ilari Scheinin)
Date: Fri, 18 Nov 2005 17:22:47 +0200
Subject: [DAS2] Getting individual features in DAS/1
Message-ID: <DDF6EB58-A976-473F-BDF3-42DDD0132522@helsinki.fi>

This mail is not really about DAS/2, but the web site says the  
original DAS mailing list is now closed.

I am setting up a DAS server that serves CGH data from my database to  
a visualization software, which in my case is gbrowse. I've already  
set up Dazzle that serves the reference data from a local copy of  
Ensembl. I need to be able to select individual CGH experiments to be  
visualized, and as the measurements from a single CGH experiment  
cover the entire genome, this cannot of course be done by specifying  
a segment along with the features command.

I noticed that there is a feature_id option for getting the features  
in DAS/1.5, but on a closer look, it seems to work by getting the  
segment that the specified feature corresponds to, and then getting  
all features from that segment. My next approach was to use the  
feature type to distinguish between different CGH experiments. As all  
my data is of the type CGH, I thought that I could use spare this  
piece of information for identifying purposes.

First I tried the generic seqfeature plugin. I created a database for  
it with some test data. However, getting features by type does not  
seem to work. I always get all the features from the segment in  
question.

Next I tried the LDAS plugin. Again I created a compatible database  
with some test data. I must have done something wrong the the data  
file I imported to the database, because getting the features does  
not work. I can get the feature types, but trying to get the features  
gives me an ERRORSEGMENT error.

I thought that before I go further, it might be useful to ask whether  
my approach seems reasonable, or is there a better way to achieve  
what I am trying to do? What should I do to be able to visualize  
individual CGH profiles?

I'm grateful for any advice,
Ilari


From ap3 at sanger.ac.uk  Fri Nov 18 11:54:27 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Fri, 18 Nov 2005 16:54:27 +0000
Subject: [DAS2] das registry and das2
Message-ID: <240aa4ff660b7427b6c463ffc10b1307@sanger.ac.uk>

Hi!

I would like to start a discussion of how to provide a proper DAS  
interface for
our das- registration server at http://das.sanger.ac.uk/registry/

Currently it is possible to interact with it using SOAP, or manually  
via the HTML
  interface.  We should also make it accessible using URL requests.

To get this started I would propose the following query syntax.  This  
might also
  provide another opportunity to have a discussion about the coordinate
  system descriptions.

If some of the used terms are unclear, there is some documentation at
http://das.sanger.ac.uk/registry/help_index.jsp

Regards,
Andreas


Request:
http://server/registry/list
http://server/registry/find? 
[keyword,organism,authority,type,capability,label]=searchterm

Response:

<meta>
	<dasSource >
		<id>DS_109</id>
		<nickname>myDasSource</nickname>
		<description>some free text</description>

		<contact name="A.Jones" email="jones at sanger.ac.uk" />
		<!-- would prefer to have to contact only one person, but why not  
several,.. -->
		<contact name="A.Brown" email="brown at sanger.ac.uk" />

		<coordinateSystem>
			<authority>NCBI</authority>
			<assemblyVersion>35</assemblyVersion>
			<type>chromosome</type>
			<organism>
				<name>Homo sapiens</name>
				<taxonId>9606</taxonId>
			</organism>

			<!-- the test accession code / segment id needs to be per coordinate  
system, since a few sources
				support multiple coord sys -->			
			<testCode>4:55349999,55749999</testCode>
		</coordinateSystem>

		<coordinateSystem>
			<!-- one could provide more info like: -->
			<authority href="http://www.ebi.ac.uk/uniprot/">UniProt</authority>
			<type>Protein Sequence</type>
			<testCode>P00280</testCode>
		</coordinateSystem>

		<capability>sequence</capability>
		<capability>features</capability>

		<last_updated>2005-Nov-16</last_updated>	
		<help href="http://www.ebi.ac.uk/uniprot/about.html">about  
uniprot</help>
		
		<label>Ensembl</label>
		<label>BioSapiens</label>		
	
	</dasSource>

	<dasSource>
	<!-- next one here -->
	</dasSource>
</meta>


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From dalke at dalkescientific.com  Fri Nov 18 13:00:12 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 18 Nov 2005 19:00:12 +0100
Subject: [DAS2] das registry and das2
In-Reply-To: <240aa4ff660b7427b6c463ffc10b1307@sanger.ac.uk>
References: <240aa4ff660b7427b6c463ffc10b1307@sanger.ac.uk>
Message-ID: <4569f5d3ff6658e5ead6b979e8b1fba9@dalkescientific.com>

Andreas Prlic:
> I would like to start a discussion of how to provide a proper DAS  
> interface for
> our das- registration server at http://das.sanger.ac.uk/registry/
>
> Currently it is possible to interact with it using SOAP, or manually  
> via the HTML
>  interface.  We should also make it accessible using URL requests.

One of the things Gregg and I talked about at ISMB was that the  
top-level
"das-sources" format is, or can be, identical to what's needed for the
registry server.

As it's structured now the top-level interface to a das2/genome URL
returns a list of sources.  Based on what you need for the registry,
we're going to add support for data about the source itself.

The resulting das-sources XML document is effectively identical to
what you're looking for.

Hence I think the top-level XML format for a DAS/2 service is
identical to the XML format for a registry server.

A difference is the support for searches across sources.  We
don't have that in DAS.

This is an example, btw, of how a generic <link> element could
be useful.  Suppose we don't add this in DAS/2.0.  The EBI
could do something like

<link rel="registry.search" href="" />

to say that the given url (which would be the current URL) also
supports a registry search interface.

Or we could have that all DAS/2 servers implement a search.
I don't think that should be a requirement.

> http://server/registry/list
> http://server/registry/find? 
> [keyword,organism,authority,type,capability,label]=searchterm

My proposal doesn't affect this.

Why do "find" and "list" take different URLs?  Another possibility
is that the same URL returns everything if there are no filters
in place.

Are multiple search terms allowed?  Boolean AND or OR?


					Andrew
					dalke at dalkescientific.com


From ap3 at sanger.ac.uk  Mon Nov 21 05:55:06 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon, 21 Nov 2005 10:55:06 +0000
Subject: [DAS2] das registry and das2
In-Reply-To: <4569f5d3ff6658e5ead6b979e8b1fba9@dalkescientific.com>
References: <240aa4ff660b7427b6c463ffc10b1307@sanger.ac.uk>
	<4569f5d3ff6658e5ead6b979e8b1fba9@dalkescientific.com>
Message-ID: <d1d2295efe505eea8c360ccb47d310de@sanger.ac.uk>

Hi Andrew,

> As it's structured now the top-level interface to a das2/genome URL
> returns a list of sources.  Based on what you need for the registry,
> we're going to add support for data about the source itself.
>
> The resulting das-sources XML document is effectively identical to
> what you're looking for.

that sounds good. I agree the description should look identical for  
both the
sources and the registry. If the sources are already properly described  
this also
makes it easier to "publish" them.

I think most of the fields in the registry are rather clear why they  
are there. The issue that might need
most discussion might be how to describe a coordinate system. This  
information
is important because a DAS client usually understands one or multiple  
coordinate systems.
E.g. Ensembl knows about Chromosomes and Clones,  but it can also  
display UniProt
annotations in some cases. Similar the SPICE DAS client can display  
annotations served in PDB-residue
numbering and UniProt coordinates, but does not know how to deal with  
genomic coordinates.
Therefore the "coordinate system" or "namespace" is an important part  
of the description of a DAS source.

What I found in the current spec-draft that comes closest to this issue  
is the different "domains"
e.g

http://server/das/genome/source/version/features

so I might want to say
http://server/das/genome/homosapiens/ncbi35/features
http://server/das/genome/musmusculus/ncbim34/features

or should it be
http://server/das/genome/ncbi/homosapiens35/features
http://server/das/genome/ncbi/musmusculus34/features
?

Hm. I am not sure, but it seems that one level is missing? - either  
organism or authority ?

The description of the data finally should allow to use the same DAS  
source in multiple DAS-clients.
Some validation will be required on the descriptions, to warn people  
that "homo sapiens" should not be
written as "human" or "homo". or more complicated: Ensembl does not do  
assemblies itself. The assembly
used is currently NCBI_35. Therefore "Ensembl" can not be used as an  
authority for a chromosomal
  coordinate system.
Currently the registry provides a restricted list of allowed coordinate  
systems, to keep this under control.


>> http://server/registry/list
>> http://server/registry/find? 
>> [keyword,organism,authority,type,capability,label]=searchterm
>
> My proposal doesn't affect this.
>
> Why do "find" and "list" take different URLs?  Another possibility
> is that the same URL returns everything if there are no filters
> in place.


yes - better use only one url.  no filters would return all sources.


>
> Are multiple search terms allowed?

yes

> Boolean AND or OR?

We can add a parameter where this can be chosen.

Greetings,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From dalke at dalkescientific.com  Mon Nov 21 12:06:25 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 21 Nov 2005 18:06:25 +0100
Subject: [DAS2] DAS/2 weekly meeting notes for 14 Nov 05
In-Reply-To: <c7f1dae793169f3d3c8dbe3be3caab82@sanger.ac.uk>
References: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>
	<c7f1dae793169f3d3c8dbe3be3caab82@sanger.ac.uk>
Message-ID: <90dff63fdc1e5b32ba97f8c18948758e@dalkescientific.com>

Going through the back emails to prepare for the conference call in 30 
minutes.

Andreas, replying to Steve's comment:
>> For protein DAS, authority typically defines two diff coord systems:
>> 'pdb resnum, interprot'
>
>> It does not permit automated translation between two coord systems.
>
> unfortunately this is not that easy in protein space. The mapping from 
> the 3D
> protein structure to the protein sequence is not straightforward. 
> Think of
> negative, non-consecutive, and "non-numeric" residue numbers that can 
> appear
> in the 3D structures. Therefore we came up with the "alignment" DAS - 
> document
> that allows to map one object in one coordinate system to another one. 
> it can
> also be used to map one assembly to another.

Regarding the structure mapping, when we visited the PDB in August they
said it's not a problem.  The mmCIF records have the information needed
for the mapping.

I've not looked into this though.

> not only - the DAS clients usually can display a certain "coordinate 
> system" e.g. Ensembl can do
> Chromosomal ones, but if DAS sources are available that speak the 
> "UniProt, Protein Sequence" coordinate
> system, it knows how to project these onto the genome. - an 
> "intelligent DAS client" :-)

I like the use case of "user wants to merge annotations from different 
servers.
As DAS currently doesn't have liftover support, the DAS client needs to 
get
annotations only from servers using the same reference coordinate 
system."

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Nov 21 12:08:30 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 21 Nov 2005 18:08:30 +0100
Subject: [DAS2] Getting individual features in DAS/1
In-Reply-To: <DDF6EB58-A976-473F-BDF3-42DDD0132522@helsinki.fi>
References: <DDF6EB58-A976-473F-BDF3-42DDD0132522@helsinki.fi>
Message-ID: <7f239b885d3eca821639654862770c65@dalkescientific.com>

Has anyone answered Ilari's question?

I never used DAS/1 enough to answer it myself.

If the normal DAS list is closed, is this the right place for DAS/1 
questions?


On Nov 18, 2005, at 4:22 PM, Ilari Scheinin wrote:

> This mail is not really about DAS/2, but the web site says the 
> original DAS mailing list is now closed.
>
> I am setting up a DAS server that serves CGH data from my database to 
> a visualization software, which in my case is gbrowse. I've already 
> set up Dazzle that serves the reference data from a local copy of 
> Ensembl. I need to be able to select individual CGH experiments to be 
> visualized, and as the measurements from a single CGH experiment cover 
> the entire genome, this cannot of course be done by specifying a 
> segment along with the features command.
>
> I noticed that there is a feature_id option for getting the features 
> in DAS/1.5, but on a closer look, it seems to work by getting the 
> segment that the specified feature corresponds to, and then getting 
> all features from that segment. My next approach was to use the 
> feature type to distinguish between different CGH experiments. As all 
> my data is of the type CGH, I thought that I could use spare this 
> piece of information for identifying purposes.
>
> First I tried the generic seqfeature plugin. I created a database for 
> it with some test data. However, getting features by type does not 
> seem to work. I always get all the features from the segment in 
> question.
>
> Next I tried the LDAS plugin. Again I created a compatible database 
> with some test data. I must have done something wrong the the data 
> file I imported to the database, because getting the features does not 
> work. I can get the feature types, but trying to get the features 
> gives me an ERRORSEGMENT error.
>
> I thought that before I go further, it might be useful to ask whether 
> my approach seems reasonable, or is there a better way to achieve what 
> I am trying to do? What should I do to be able to visualize individual 
> CGH profiles?
>
> I'm grateful for any advice,
> Ilari

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Nov 21 12:25:06 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 21 Nov 2005 18:25:06 +0100
Subject: [DAS2] das registry and das2
In-Reply-To: <d1d2295efe505eea8c360ccb47d310de@sanger.ac.uk>
References: <240aa4ff660b7427b6c463ffc10b1307@sanger.ac.uk>
	<4569f5d3ff6658e5ead6b979e8b1fba9@dalkescientific.com>
	<d1d2295efe505eea8c360ccb47d310de@sanger.ac.uk>
Message-ID: <21a521b096330a81bfa05b0789d3c92d@dalkescientific.com>

Andreas Prlic wrote:
> Therefore the "coordinate system" or "namespace" is an important part  
> of the description of a DAS source.
>
> What I found in the current spec-draft that comes closest to this  
> issue is the different "domains"
> e.g
>
> http://server/das/genome/source/version/features
>
> so I might want to say
> http://server/das/genome/homosapiens/ncbi35/features
> http://server/das/genome/musmusculus/ncbim34/features
>
> or should it be
> http://server/das/genome/ncbi/homosapiens35/features
> http://server/das/genome/ncbi/musmusculus34/features
> ?
>
> Hm. I am not sure, but it seems that one level is missing? - either  
> organism or authority ?

The species information is available from the data source from the  
'taxon'
attribute, as in

   <SOURCE id="volvox" description="Volvox Example Database"
           taxon="http://www.ncbi.nlm.nih.gov/taxon-browser?id=29118"
      
doc_href="http://www.wormbase.org/documentation/users_guide/ 
volvox.html" >

It's not available through a URL naming.  That's arbitrary in that
the data provider can use any term.

I think there's nothing to preclude a provider from putting the
actual source data one level deeper in the tree.  Personally I
find that that's over-classification.  Who would use it?

> Currently the registry provides a restricted list of allowed
> coordinate systems, to keep this under control.

Thomas:
> This is possibly an argument for avoiding the use of URLs for assembly  
> identifiers, if we can't be sure that the organisation that's the  
> authority for a given assembly will be running an authoritative DAS  
> server.  URNs would be fine, as would the kind of structured but  
> location-independent identifer that Andreas has been using.

I think there's no reason we can't use our own names for these.  Eg,
   http://www.biodas.org/coordinates/NCBI35
or a simple unique id like "NCBI35".

Right now those are treated as opaque identifiers.  There's no name
resolution going on, and the coordinates are (I assume) implicit in
that client software doesn't resolve the name, only check that the
servers are returning data from the same coordinate system.

Perhaps in the future that URL might resolve to something, but there's
no current reason to do so.

In the renewal grant there is reason to compare different coordinates.
When that happens a client needs to pick one reference frame and get
the translation information to the other.  So the liftover service
needs to know about the two coordindate systems.  But it can be done
through hard-coded information (perhaps with some information that
coordinate system X is an alias for Y).  I still don't think there's
any need to resolve these URLs.

Andreas:
>> Are multiple search terms allowed?
>
> yes

Then they should likely be along the same lines used for the DAS/2
searching.

>> Boolean AND or OR?
>
> We can add a parameter where this can be chosen.

The existing DAS/2 uses an AND search only.  Rather "OR" for
multiple fields of the same data type and "AND" across different
fields.

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Mon Nov 21 12:24:37 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 21 Nov 2005 09:24:37 -0800
Subject: [DAS2] Getting individual features in DAS/1
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAD6@msex02.affymetrix.com>

We need to discuss at today's meeting.  I don't think the original DAS
list should be closed, but rather continue to serve as a list to discuss
the DAS/1 protocol and implementations, and the DAS2 mailing list should
focus on DAS/2.  If we mix DAS/1 and DAS/2 discussions in the same
mailing list I think it's going to lead to a lot of confusion.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Monday, November 21, 2005 9:09 AM
> To: DAS/2
> Subject: Re: [DAS2] Getting individual features in DAS/1
> 
> Has anyone answered Ilari's question?
> 
> I never used DAS/1 enough to answer it myself.
> 
> If the normal DAS list is closed, is this the right place for DAS/1
> questions?
> 
> 
> On Nov 18, 2005, at 4:22 PM, Ilari Scheinin wrote:
> 
> > This mail is not really about DAS/2, but the web site says the
> > original DAS mailing list is now closed.
> >
> > I am setting up a DAS server that serves CGH data from my database
to
> > a visualization software, which in my case is gbrowse. I've already
> > set up Dazzle that serves the reference data from a local copy of
> > Ensembl. I need to be able to select individual CGH experiments to
be
> > visualized, and as the measurements from a single CGH experiment
cover
> > the entire genome, this cannot of course be done by specifying a
> > segment along with the features command.
> >
> > I noticed that there is a feature_id option for getting the features
> > in DAS/1.5, but on a closer look, it seems to work by getting the
> > segment that the specified feature corresponds to, and then getting
> > all features from that segment. My next approach was to use the
> > feature type to distinguish between different CGH experiments. As
all
> > my data is of the type CGH, I thought that I could use spare this
> > piece of information for identifying purposes.
> >
> > First I tried the generic seqfeature plugin. I created a database
for
> > it with some test data. However, getting features by type does not
> > seem to work. I always get all the features from the segment in
> > question.
> >
> > Next I tried the LDAS plugin. Again I created a compatible database
> > with some test data. I must have done something wrong the the data
> > file I imported to the database, because getting the features does
not
> > work. I can get the feature types, but trying to get the features
> > gives me an ERRORSEGMENT error.
> >
> > I thought that before I go further, it might be useful to ask
whether
> > my approach seems reasonable, or is there a better way to achieve
what
> > I am trying to do? What should I do to be able to visualize
individual
> > CGH profiles?
> >
> > I'm grateful for any advice,
> > Ilari
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Steve_Chervitz at affymetrix.com  Mon Nov 21 15:15:41 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 21 Nov 2005 12:15:41 -0800
Subject: [DAS2] DAS/2 weekly meeting notes for 21 Nov 05
Message-ID: <BFA76AED.183C4%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 21 Nov 2005.

$Id: das2-teleconf-2005-11-21.txt,v 1.3 2005/11/21 20:15:28 sac Exp $

Attendees: 
  Affy: Steve Chervitz, Gregg Helt
  UCLA: Allen Day, Brian O'connor
  UCBerkeley: Suzi Lewis, Nomi Harris
  Sweden: Andrew Dalke
  Sanger: Andreas Prlic
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

Today's topic: Client-Server implementation issues
----------------------------------------------------

Suzi/Nomi
---------
  Questions for gregg: How to communicate styles in DAS/2?
  GH: Client gets style sheets from server that suggests how to render
things.
  AD: EBI uses this a lot. Most of the DAS systems there use stylesheets.
  
  [A] Andreas will contact folks at Sanger/EBI for stylesheet example code.
  
  GH: The IGB client uses a preference configuration, using java
  preferences rather than special XML file. Windows: sets values in the
  registry. Has been successful. If client can understand DAS/2
  stylesheets and client-side prefs, the client-side prefs should
  override the server styles (others agree).

Steve
-----

* Reported on some analysis of Affymetrix DAS server weblogs. Lots of
  google-bot data download. Lots of spotfire hits, too.  BO: Google
  bots should respect robots.txt

[A] Steve will install robots.txt in the relevant locations

* Reported on getting Greggs DAS/2 server to run on top of apache
  rather than as a stand-alone server. Should be a matter of hooking
  apache up to tomcat using a tomcat connector. Directive for apache
  to defer to tomcat for servlet requests.

[A] Steve will hook up affy das server to apache/tomcat.

Gregg
-----
 * Regarding Spotfire - they are working on a IGB plugin to spotfire
   using http localhost API. This explains our spotfire hits.

   Gregg was previously integrating IGB with spotfire using a java to
   COM bridge. It works, but the COM bridges aren't free etc. etc.
   They are interested in driving IGB from spotfire since they're
   interested in using IGB to provide genome vizualization.  Are
   currently evaluating whether to release it to public or not.  Gregg
   considered putting this in the grant, but would have required
   permission, etc. and time was a factor.  They may eventually commit
   to IGB code base directly, but still need to work out leagalese.
   They will be interested in tracking the interclient API work we are
   doing (IGB-Apollo).

 * No major work on DAS this week, just some niggling IGB issues.

 * Planning another IGB release by end of year that will have
   improvements to DAS/2 clients.  Fixed: access via quickload then
   accesss to DAS/2 causes blankout of screen Fixed: DAS/2 interaction
   
Brian
-----
 * Marc C has committed stuff to IGB code base (genovis). Is there a
   test suite we can use to verify we're not breaking anything?

   GH: No, but hopefully early next year. Definitely needed.
   
 * Also checked in the re-factor - separate namespaces for assay and
   ontology. 

[A] Gregg will relocate das2 package to com.affy.das2 & uncouple from IGB

   GH: There are a few igb dependencies to be unraveled
   (das2feature...).  Don't want to do this in the next release
   since that's pretty significant given upcoming holidays.
 
   GH: Other features to get in:
      * Persistence of preferences.
      * Get rid of hardwiring of DAS2 servers. Already to this for
        DAS/1, just need to replicate for DAS/2.

Allen
-----
 * API for handling ontologies, structures. Communication with Chris
Mungall.
 * Have impl at stanford for autocompletion of ontology terms related to
   samples (Gavin Sherlock's group, SMD).

   What is bioontology group doing for distributing their ontologies,
   what api's are going to be made public?

   SL: Am at stanford right now to talk about that. Will offer bulk
   things like at obo site, but in terms of interactive API, will
   respond to community as best we can.

   Allen: Interested in more integration with bioontology group and
   with his work with SMD.

   Suzi: Not content, but tools right?
   Allen: Yes.
   Suzi: Work with chris. Timing couldn't be better.

[A] Allen will work with Chris M re: ontology API tools for OBO & SMD

* GH: Progress on writeback? Part of grant proposal to get it done by
  june. Will help funding continuation.
  Allen: We could start implementing some of that given the
  refactoring that's now done.

   GH: Ed Griffith at sanger is interested on this. hoping for his
   participation. In the short timeframe, you're server wouldn't have to
   implement it as long as there is at least one server available that
   can do it.
   
   Allen: Need to look at work load. There's no lack of work to be done
   for get requests (faster impls).
   GH: Would prefer to have just one writeback server and a faster get
   server rather than having two writeback capable servers.
   
 * Allen: Optimizations involving serving files, kind of a
   report-version of the chado adapters.
   
   GH: Regarding your rounding ranges optimization for tiling can you
   post to the list?
   
   [A] Allen will post his rounding ranges optimization to DAS/2 list
   
   GH: The idea is to help server-side caching by rounding the range
   requests so you're more likely to hit the same URI (e.g., stop=5010
   becomes 6000). Different clients are more likely to hit the cache.
   
   Not in the spec, just a convention. Requires more smarts in client:
   giving more to the user than they asked for, or throwing out what's
   not asked for. Throwing out what they didn't ask for would be
   nicer. In theory, this won't be an issue with client caching.
   
   SC: Could make client's configuration re: rounding an option.
   GH: Users want fewer options.
   
 * IGB display troubles. Allen had trouble getting it to display
   anything besides mRNA
   GH: IGB expects 2-level or deeper annotations. For single-level annots,
   should connect all with a line.
   Allen: May be doing this for SNPs. But also saw some strange
   responses.
   GH: Needs a fix.
   Allen: will it be in next release?
   GH: harder to do it generally -- easier to hardwire it for particular
   data types. Rendering has to guess how deep you want to go.
   Currently goes to the leaves and then goes 1-level up, rather than
   top-down. IGB uses an extra level than you actually see to keep
   track of other things (e.g., region in query).
   Preferences UI: 'nested' can select two-level or one-level deep.
   Would like to hear what others you have problems with..
   
[A] Gregg will fix IGB display problems for single-level annots.
 
Andrew
------
 * Emailed open-bio root list to set up cgi for online verifier.
   But no response yet.

 * DAS/1 vs DAS/2 mailing list.

GH: Confusion may occur if we combine DAS/1 and DAS/2 discussion.
Let's keep DAS/1 for all DAS/1 spec related discussion.

[A] Steve verify whether the DAS/1 list is still alive.
[A] Steve will put a link to in on biodas.org for DAS/1 list

 * Locking: Plan to talk to EBI about this in January
   They are doing work for style sheets.

[A] Andrew will ask Ed G. to join these meetings

 * Needs test data, mock data set.

[A] Allen will point Andrew at some data for testing.

Andreas
-------
 * The current registry implementation:
   Written in java two ways to interact:
   1) html, can browser available DAS sources, see details, go back to
      DAS client and activate the DAS source in the DAS client.
   2) soap, client contacts registry, get list of available sources.
   Is open source.

[A] Andreas will post link to source code for DAS registry impl.

   GH: A central registry is good, but companies will want their
   own. eg., at affy there may be 5-7.
   Andreas: It's possible to have a set of registries, local vs. public.
   
   GH: Are you OK with idea to have an http-based interface? It can
   run on top of existing core.
   Andreas: Sure.
   
[A] Andreas will provide http-based interface to Sanger DAS registry


Agenda for next week teleconf
-----------------------------
 * Talk more about registry spec issues
 * Retrieval spec issues:
     - Content-type
     - DAS/2 headers
     - Feature and type properties
     - other things?
   
Andrew: Prefer to have most of the discussion online (DAS/2 list) then
the teleconf can be more productive.

[A] Continue discussing spec issues on the list before next teleconf


From allenday at ucla.edu  Mon Nov 21 15:47:51 2005
From: allenday at ucla.edu (Allen Day)
Date: Mon, 21 Nov 2005 12:47:51 -0800 (PST)
Subject: [DAS2] tiled queries for performance
Message-ID: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>

Hi,

I had an idea of how clients may be able to get better response from
servers by using a tiled query technique.  Here's the basic idea:

ClientA wants features in chr1/1010:2020, and issues a request for that
range.  No other clients have previously requested this range, so the
server-side cache faults to the DAS/2 service (slow).

ClientB wants features in chr1/1020:2030, and issues a request for that
range.  Although the intersection of the resulting records with ClientA's
query is large, the URIs are different and the server-side cache faults
again.

If ClientA and ClientB were to each issue two separate "tiled" requests:

 1. chr1/1001:2000
 2. chr1/2001:3000

ClientB could take advantage of the fact that ClientA had been looking at
the same tiles.

For this to work, the clients would need to be using the same tile size.  
The optimal tile size is likely to vary from datasource to datasource,
depending on the length and density distributions of the features
contained in the datasource.  The "sources" or "versioned sources"  
payload could suggest a tiling size to prospective clients.  Servers could
also pre-cache all tiles by hitting each tile after an update of the
datasource (or the DAS/2 service code).

The tradeoff for the performance gains is that clients may now need to do
filtering on the returned records to only return those requested by the
client's client.

-Allen


From ap3 at sanger.ac.uk  Tue Nov 22 08:54:27 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 22 Nov 2005 13:54:27 +0000
Subject: [DAS2] das registry links
Message-ID: <a7f0bce3badbe778038190359de00c31@sanger.ac.uk>

Hi!

There was a question yesterday where to get the source code from the 
das-registration
server and if it is possible to have a local installation.

The source code for the registry is available under LGPL at
http://www.derkholm.net/svn/repos/dasregistry/trunk/
using subversion.

To obtain a local installation, which caches/synchronizes the public 
available data and
allows to add local das sources, see instructions at:
http://www.derkholm.net/svn/repos/dasregistry/trunk/release/install.txt

There is also a das-registry announce-mailing list at
http://lists.sanger.ac.uk/mailman/listinfo/das_registry_announce

Regards,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From ap3 at sanger.ac.uk  Tue Nov 22 12:58:08 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 22 Nov 2005 17:58:08 +0000
Subject: [DAS2] ensembl & stylesheet
Message-ID: <a0b34d36ddd00799f548b297ae1b6e04@sanger.ac.uk>

Hi!

another question yesterday was about ensembl & stylesheet support.

an example das source that provides a stylesheet is the following:

http://das.ensembl.org/das/ens_35_segdup_washu/stylesheet

description about it is at:
http://das.ensembl.org/das/ens_35_segdup_washu/

To show how it is rendered in ensembl follow this "auto-activation"  
link:

http://www.ensembl.org/Homo_sapiens/contigview?conf_script=contigview; 
c=17:14149999.5:1;w=200000;h=; 
add_das_source=(name=SEGDUP_WASHU+url=http://das.ensembl.org/ 
das+dsn=ens_35_segdup_washu+type=ensembl_location+color=black+strand=r+l 
abelflag=U+stylesheet=Y+group=Y+depth=9999+score=N+active=1)


In terms of source code ensembl uses the Bio::DASLite perl module for  
fetching features and stylesheets
http://search.cpan.org/~rpettett/Bio-DasLite-0.10/

Hope this helps,
Cheers,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From gilmanb at pantherinformatics.com  Mon Nov 21 16:46:25 2005
From: gilmanb at pantherinformatics.com (Brian Gilman)
Date: Mon, 21 Nov 2005 16:46:25 -0500
Subject: [DAS2] tiled queries for performance
In-Reply-To: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
Message-ID: <2042BBCD-8490-461D-80C1-1BB4A1FAACB1@pantherinformatics.com>

Hello Everyone,

	I've been lurking on the list and wanted to say hi.

	We're looking into this kind of implementation issue ourselves and  
thought that a bitorrent like cache makes the most sense. ie. all  
servers in the "fabric" are issued the query in a certain "hop  
adjacency". These servers then send their data to the client who's  
job it is to assemble the data.

								HTH,

										-B
--
Brian Gilman
President Panther Informatics Inc.
E-Mail: gilmanb at pantherinformatics.com
         gilmanb at jforge.net
AIM: gilmanb1

01000010 01101001 01101111
01001001 01101110 01100110
01101111 01110010 01101101
01100001 01110100 01101001
01100011 01101001 01100001
01101110


On Nov 21, 2005, at 3:47 PM, Allen Day wrote:

> Hi,
>
> I had an idea of how clients may be able to get better response from
> servers by using a tiled query technique.  Here's the basic idea:
>
> ClientA wants features in chr1/1010:2020, and issues a request for  
> that
> range.  No other clients have previously requested this range, so the
> server-side cache faults to the DAS/2 service (slow).
>
> ClientB wants features in chr1/1020:2030, and issues a request for  
> that
> range.  Although the intersection of the resulting records with  
> ClientA's
> query is large, the URIs are different and the server-side cache  
> faults
> again.
>
> If ClientA and ClientB were to each issue two separate "tiled"  
> requests:
>
>  1. chr1/1001:2000
>  2. chr1/2001:3000
>
> ClientB could take advantage of the fact that ClientA had been  
> looking at
> the same tiles.
>
> For this to work, the clients would need to be using the same tile  
> size.
> The optimal tile size is likely to vary from datasource to datasource,
> depending on the length and density distributions of the features
> contained in the datasource.  The "sources" or "versioned sources"
> payload could suggest a tiling size to prospective clients.   
> Servers could
> also pre-cache all tiles by hitting each tile after an update of the
> datasource (or the DAS/2 service code).
>
> The tradeoff for the performance gains is that clients may now need  
> to do
> filtering on the returned records to only return those requested by  
> the
> client's client.
>
> -Allen
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Steve_Chervitz at affymetrix.com  Wed Nov 23 11:03:55 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Wed, 23 Nov 2005 08:03:55 -0800
Subject: [DAS2] Simple Sharing Extensions for RSS and OPML
Message-ID: <BFA9D2EB.1850D%Steve_Chervitz@affymetrix.com>


This may have some concept relevant to DAS/2 writeback:

http://msdn.microsoft.com/xml/rss/sse/

Steve


From allenday at ucla.edu  Wed Nov 23 18:50:24 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed, 23 Nov 2005 15:50:24 -0800 (PST)
Subject: [DAS2] tiled queries for performance
In-Reply-To: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
Message-ID: <Pine.LNX.4.58.0511231505020.23486@sumo.ctrl.ucla.edu>

More thoughts on this.  The client can eliminate the redundancy in the
records returned by issuing the tiling queries as previously described
(query1), then issuing queries for records that are not contained within
tiles, but overlap the boundaries of 1 or more tiles (query2).

However, by issuing all the overlaps queries at once, we've just deferred
the performance hit one step, because we can't reasonably expect the
server to have cached all combinations of tile overlaps queries.  I think,
to get this tiling optimization to work, the burden needs to be on the
client to identify and remove duplicate responses for multiple
edge-overlaps queries (query3).

1000bp        2000bp        3000bp
|             |             |
| ===         | =====^====  |
|         ====#=====        |
| ============#=============#=====
|             |             |

 <----------->                     query1a
               <----------->       query1b
             <o>           <o>     query2
             <o>                   query3a
                           <o>     query3b

Key:

  |  : tile boundary
  =  : feature
  ^  : gap between child features
  #  : portion of feature overlapping tile boundary.
 <o> : client overlaps query
 <.> : client contains query

-Allen


On Mon, 21 Nov 2005, Allen Day wrote:

> Hi,
> 
> I had an idea of how clients may be able to get better response from
> servers by using a tiled query technique.  Here's the basic idea:
> 
> ClientA wants features in chr1/1010:2020, and issues a request for that
> range.  No other clients have previously requested this range, so the
> server-side cache faults to the DAS/2 service (slow).
> 
> ClientB wants features in chr1/1020:2030, and issues a request for that
> range.  Although the intersection of the resulting records with ClientA's
> query is large, the URIs are different and the server-side cache faults
> again.
> 
> If ClientA and ClientB were to each issue two separate "tiled" requests:
> 
>  1. chr1/1001:2000
>  2. chr1/2001:3000
> 
> ClientB could take advantage of the fact that ClientA had been looking at
> the same tiles.
> 
> For this to work, the clients would need to be using the same tile size.  
> The optimal tile size is likely to vary from datasource to datasource,
> depending on the length and density distributions of the features
> contained in the datasource.  The "sources" or "versioned sources"  
> payload could suggest a tiling size to prospective clients.  Servers could
> also pre-cache all tiles by hitting each tile after an update of the
> datasource (or the DAS/2 service code).
> 
> The tradeoff for the performance gains is that clients may now need to do
> filtering on the returned records to only return those requested by the
> client's client.
> 
> -Allen
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From Steve_Chervitz at affymetrix.com  Wed Nov 23 20:40:13 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Wed, 23 Nov 2005 17:40:13 -0800
Subject: [DAS2] Ontology Lookup Service
Message-ID: <BFAA59FD.185AF%Steve_Chervitz@affymetrix.com>

Allen,

This looks similar to what you have been working on for SMD:

http://www.ebi.ac.uk/ontology-lookup/

Would be interesting to compare it with your ontology DAS-based
implementation (e.g., performance, ease of installation, extending, etc.).

Steve


From dalke at dalkescientific.com  Wed Nov 23 21:52:35 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 24 Nov 2005 03:52:35 +0100
Subject: [DAS2] tiled queries for performance
In-Reply-To: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
Message-ID: <f9e78983a8e22cc3e5a71d95618ad461@dalkescientific.com>

Allen:
> No other clients have previously requested this range, so the
> server-side cache faults to the DAS/2 service (slow).

Admittedly I'm curious about this.  Why is this slow?  What does
slow mean?  I assume "cannot be returned faster than the network
will take it."

How many annotations are in the database?  Figuring one annotation
for every ... 100 bases? gives me 30 million.  Shouldn't a range
search over < only 30 million be fast?  Is this being done in the
database?  Which database and what's the SQL?

If the DB is the bottleneck then pulling it out as a specialized
search might be worthwhile.

What I'm driving at for this is this.  The proposal feels like
a workaround for a given implementation.  To use it requires
more smarts in the client.  Why not put that logic on the server?


					Andrew
					dalke at dalkescientific.com


From allenday at ucla.edu  Thu Nov 24 02:10:36 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed, 23 Nov 2005 23:10:36 -0800
Subject: [DAS2] tiled queries for performance
In-Reply-To: <f9e78983a8e22cc3e5a71d95618ad461@dalkescientific.com>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
	<f9e78983a8e22cc3e5a71d95618ad461@dalkescientific.com>
Message-ID: <5c24dcc30511232310p1623ff4dk9088579cdf58e082@mail.gmail.com>

Hi Andrew.

I'd like to be able to consistently get network-bottlenecked response from
the server.  The largest (250 megabase) SQL range queries typically take ~30
seconds to complete, returning ~500K features.  I'm currently working on
getting the templating system (Template Toolkit aka TT2) we use to flush to
the client periodically, rather than building the entire response first.
This is the current bottleneck; TT2 generation of a 500K record XML document
takes many minutes.  Regardless of how much more optimization work we put
into the server, it's never going to be as fast as serving up pre-queried,
pre-rendered content.

I borrowed the idea of tiling from the Google maps application (
maps.google.com).  In their implementation the server is dumb, and just
serves up a static HTML/Javascript document (the application), and static
PNG images based on latitute/longitude coordinates (the data).  All of the
application logic for what to display occurs client side.  Classic AJAX.

In the DAS protocol, the distribution of the application logic is
distributed between the client and server, sometimes to ill effect.
Requiring both (a) the server to respond to arbitrary range queries, and (b)
the client to display arbitrary ranges unnecessarily creates a bifurcation
of the View component of the application.  Brian was hinting at this when he
mentioned the idea of bittorrent blocks earlier in the thread.

We also require code redundancy between client and server to be able to
fully use the type and exacttype filters.  In this case the Model component
has been bifurcated -- the client needs to build a model the ontology (from
who knows where... presumably processing OBO-Edit files) so the user can
issue queries, and the server needs to also have some representation of the
ontology to generate a response.

Hopefully the ontology DAS extension will help the latter situation outlined
above by getting both client and server to be synchronized on the same data
model.  As far as the tiling optimization goes, it's likely that I'll
implement a preprocessor for the HTTP query so I can break it into tiles --
conceptually very similar to the log10 binning that Lincoln does in the GFF
database.

-Allen


On 11/23/05, Andrew Dalke <dalke at dalkescientific.com> wrote:
>
> Allen:
> > No other clients have previously requested this range, so the
> > server-side cache faults to the DAS/2 service (slow).
>
> Admittedly I'm curious about this.  Why is this slow?  What does
> slow mean?  I assume "cannot be returned faster than the network
> will take it."
>
> How many annotations are in the database?  Figuring one annotation
> for every ... 100 bases? gives me 30 million.  Shouldn't a range
> search over < only 30 million be fast?  Is this being done in the
> database?  Which database and what's the SQL?
>
> If the DB is the bottleneck then pulling it out as a specialized
> search might be worthwhile.
>
> What I'm driving at for this is this.  The proposal feels like
> a workaround for a given implementation.  To use it requires
> more smarts in the client.  Why not put that logic on the server?
>
>
>                                         Andrew
>                                         dalke at dalkescientific.com
>
>


From allenday at ucla.edu  Thu Nov 24 02:21:48 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed, 23 Nov 2005 23:21:48 -0800
Subject: [DAS2] Re: Ontology Lookup Service
In-Reply-To: <BFAA59FD.185AF%Steve_Chervitz@affymetrix.com>
References: <BFAA59FD.185AF%Steve_Chervitz@affymetrix.com>
Message-ID: <5c24dcc30511232321v70f77dc9y7a1ceef22bcf6edc@mail.gmail.com>

Hi Steve.

Yes, this is pretty similar to what we're doing.  The major differences I
see are (a) the query flexibility -- It only lets you retrieve terms from
one ontology at a time, and does not support wildcards (b) the display -- it
doesn't actually show you the dag structure of the ontology, and (c) using
different tech -- Java/SOAP as opposed to Perl/ReST.

-Allen


On 11/23/05, Steve Chervitz <Steve_Chervitz at affymetrix.com> wrote:
>
> Allen,
>
> This looks similar to what you have been working on for SMD:
>
> http://www.ebi.ac.uk/ontology-lookup/
>
> Would be interesting to compare it with your ontology DAS-based
> implementation (e.g., performance, ease of installation, extending, etc.).
>
> Steve
>
>


From dalke at dalkescientific.com  Thu Nov 24 08:28:00 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 24 Nov 2005 14:28:00 +0100
Subject: [DAS2] tiled queries for performance
In-Reply-To: <5c24dcc30511232310p1623ff4dk9088579cdf58e082@mail.gmail.com>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
	<f9e78983a8e22cc3e5a71d95618ad461@dalkescientific.com>
	<5c24dcc30511232310p1623ff4dk9088579cdf58e082@mail.gmail.com>
Message-ID: <9eb929192db24ad93fb2a7cf423aa9c3@dalkescientific.com>

Allen:
>  I'd like to be able to consistently get network-bottlenecked response 
> from the server.? The largest (250 megabase) SQL range queries 
> typically take ~30 seconds to complete, returning ~500K features.? I'm 
> currently working on getting the templating system (Template Toolkit 
> aka TT2) we use to flush to the client periodically, rather than 
> building the entire response first.? This is the current bottleneck; 
> TT2 generation of a 500K record XML document takes many minutes.? 
> Regardless of how much more optimization work we put into the server, 
> it's never going to be as fast as serving up pre-queried, pre-rendered 
> content.

Interesting.  So I was right, in that the range search is fast, but 
wrong
in not considering the template generation problem.

Could that cause a DoS attack by asking for several large ranges at 
once?
You're building up multi-megabyte strings in memory.  (If 1 feature is 
1K
then that's 500MB.)

Ideologically the clean solution might be to have the search return only
a list of identifiers and have the client fetch each feature one-by-one.
This is a tile size of 1.

Implementation-wise this will cause problems unless using HTTP 1.1
pipelining since the act of opening 500K connections takes non-trivial
time.  Adding a "return XML for these ids" service doesn't help either -
it brings us back to the same problem.

But another solution is to cache all the features as XML, leaving out
only the header and footer.  Skip the templating system (rather, it's
upstream of the caching).  Do the search, get the ids, and stream the
contents directly from the cache.

This would be used in feature lookup and for search results.

>  In the DAS protocol, the distribution of the application logic is 
> distributed between the client and server, sometimes to ill effect.? 
> Requiring both (a) the server to respond to arbitrary range queries, 
> and (b) the client to display arbitrary ranges unnecessarily creates a 
> bifurcation of the View component of the application.? Brian was 
> hinting at this when he mentioned the idea of bittorrent blocks 
> earlier in the thread.

What application logic?  There should be many ways to build different
applications on top of DAS.

DAS is a data model.  The client provides the view (or many views).

There are two reasons for query support on the server.

  1. slow bandwidth and limited client resources - otherwise clients 
could
       download and search the data locally
  2. easier support for (certain classes of) application developers

To make the Google comparison, there's no reason Google searches 
couldn't
take place on your personal machine except that you can't download the
Internet and search it in usable time.  With Google providing the 
service
others can do things like provide domain-specific web searches via 
Google,
include Google links in a web browser, or make something like 
Googlefight.

> We also require code redundancy between client and server to be able 
> to fully use the type and exacttype filters.? In this case the Model 
> component has been bifurcated -- the client needs to build a model the 
> ontology (from who knows where... presumably processing OBO-Edit 
> files) so the user can issue queries, and the server needs to also 
> have some representation of the ontology to generate a response.
>
>  Hopefully the ontology DAS extension will help the latter situation 
> outlined above by getting both client and server to be synchronized on 
> the same data model.? As far as the tiling optimization goes, it's 
> likely that I'll implement a preprocessor for the HTTP query so I can 
> break it into tiles -- conceptually very similar to the log10 binning 
> that Lincoln does in the GFF database.

I didn't follow this.  Code redundancy means what?  There's an
exchange of data models - in this case the model for a query.  But any
client/server needs to do this.

Take Entrez, for example.  It supports many types of search fields,
including MeSH (which I think counts as an ontology).  A sophisticated
client may have a GUI to help people identify MeSH terms.  This 
obviously
does some duplicate work as with the server.

Is that what you mean?  If so, why does it matter?

Note also that while Google Maps serves static images only, there's
shared logic between the application (in the browser) and the tools
that generated those maps.  Eg, both have the same code for 
understanding
geography/latitude&longitude.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Nov 24 08:47:26 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 24 Nov 2005 14:47:26 +0100
Subject: [DAS2] tiled queries for performance
In-Reply-To: <2042BBCD-8490-461D-80C1-1BB4A1FAACB1@pantherinformatics.com>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
	<2042BBCD-8490-461D-80C1-1BB4A1FAACB1@pantherinformatics.com>
Message-ID: <22110007fe53238adbda91041ee1baf2@dalkescientific.com>

Hi Brian,

> 	We're looking into this kind of implementation issue ourselves and 
> thought that a bitorrent like cache makes the most sense. ie. all 
> servers in the "fabric" are issued the query in a certain "hop 
> adjacency". These servers then send their data to the client who's job 
> it is to assemble the data.

I go back and forth between the "large data set" model and the "large 
number
of entities" model.

In the first:
   - client requests a large data file
   - server returns it

This can be sped up by distributing the file among many sites and
using something like BitTorrent to put it together, or something like
Coral ( http://www.coralcdn.org/ ) to redirect to nearby caches.

But making the code for this is complicated.  It's possible to build
on BitTorrent and similar systems, but I have no feel for the actual
implementation cost, which makes me wary.  I've looked into a couple
of the P2P toolkits and not gotten the feel that it's any easier than
writing HTTP requests directly.  Plus, who will set up the alternate
servers?

In the second:
   - make query to server
   - server returns list of N identifiers
   - make N-n requests (where 'n' is the number of identifiers already 
resolved)

The id resolution can be done in a distributed fashion and is easily
supported via web caches, either with well-configured proxies or (again)
through Coral.

I like the latter model in part because it's more fine grained.  Eg,
a progress bar can say "downloading feature 4 of 10000", and if a given
feature is already present there's no need to refetch it.

The downside of the 2nd is the need for HTTP 1.1 pipelining to make it
be efficient.  I don't know if we want to have that requirement.  Gregg
came up with the range restrictions because most of the massive results
will be from range searches.  By being a bit more clever about tracking
what's known and not known, a client can get a much smaller results 
page.

These are complementary.  Using Gregg's restricted range queries can
reduce the number of identifiers returned in a search, making the
network overhead even smaller.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Nov 25 10:21:21 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 25 Nov 2005 16:21:21 +0100
Subject: [DAS2] DAS intro
Message-ID: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>

The front of the DAS doc starts

   DAS 2.0 is designed to address the shortcomings of DAS 1.0, including:

That kinda assumes people know what DAS 1.0 is to understand DAS 2.0.

How about this instead, as an overview/introduction.

  ======

DAS/2 describes a data model for genome annotations.  An annotation
server provides information about one or more genome sources.  Each
source may have one or more versions.  Different versions are usually
based on different assemblies.  As an implementation detail an
assembly and corresponding sequence data may be distributed via a
different machine, which is called the reference server.  Portions of
the assembly may have higher relative accuracy than the assembly as a
whole.  A reference server may supply these portions as an alternate
reference frame.

Annotations are located on the genome with a start and end position.
The range may be specified mutiple times if there are alternate
reference frames.  An annotation may contain multiple non-continguous
parts, making it the parent of those parts.  Some parts may have more
than one parent.  Annotations have a type based on terms in SOFA
(Sequence Ontology for Feature Annotation).  Stylesheets contain a set
of properties used to depict a given type.

Annotations can be searched by range, type, and a properties table
associated with each annotation.  These are called feature filters.

DAS/2 is implemented using a ReST architecture.  Each entity (also
called a document or object) has a name, which is a URL.  Fetching the
URL gets information about the entity.  The DAS-specific entities are
all XML documents.  Other entities contain data types with an existing
and frequently used file format.  Where possible, a DAS server returns
data using existing formats.  In some cases a server may describe how
to fetch a given entity in several different formats.
  ======


					Andrew
					dalke at dalkescientific.com


From asims at bcgsc.ca  Fri Nov 25 14:15:17 2005
From: asims at bcgsc.ca (Asim Siddiqui)
Date: Fri, 25 Nov 2005 11:15:17 -0800
Subject: [DAS2] tiled queries for performance
Message-ID: <86C6E520C12E52429ACBCB01546DF4D3BE3E5E@xchange1.phage.bcgsc.ca>


Hi,

I'm a newbie to this list, so apologies if I've missed something
critical.

I think this is a great idea.

I don't see this as a big change to the DAS/2 spec or requiring much in
the way of additional smarts on the client side.
The change is simply that instead of the client getting exactly what it
asks for, it may get more.

My 2 cents,

Asim


-----Original Message-----
From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-bio.org] On Behalf Of Allen Day
Sent: Wednesday, November 23, 2005 11:11 PM
To: Andrew Dalke; DAS/2
Subject: Re: [DAS2] tiled queries for performance

Hi Andrew.

I'd like to be able to consistently get network-bottlenecked response
from the server.  The largest (250 megabase) SQL range queries typically
take ~30 seconds to complete, returning ~500K features.  I'm currently
working on getting the templating system (Template Toolkit aka TT2) we
use to flush to the client periodically, rather than building the entire
response first.
This is the current bottleneck; TT2 generation of a 500K record XML
document takes many minutes.  Regardless of how much more optimization
work we put into the server, it's never going to be as fast as serving
up pre-queried, pre-rendered content.

I borrowed the idea of tiling from the Google maps application (
maps.google.com).  In their implementation the server is dumb, and just
serves up a static HTML/Javascript document (the application), and
static PNG images based on latitute/longitude coordinates (the data).
All of the application logic for what to display occurs client side.
Classic AJAX.

In the DAS protocol, the distribution of the application logic is
distributed between the client and server, sometimes to ill effect.
Requiring both (a) the server to respond to arbitrary range queries, and
(b) the client to display arbitrary ranges unnecessarily creates a
bifurcation of the View component of the application.  Brian was hinting
at this when he mentioned the idea of bittorrent blocks earlier in the
thread.

We also require code redundancy between client and server to be able to
fully use the type and exacttype filters.  In this case the Model
component has been bifurcated -- the client needs to build a model the
ontology (from who knows where... presumably processing OBO-Edit files)
so the user can issue queries, and the server needs to also have some
representation of the ontology to generate a response.

Hopefully the ontology DAS extension will help the latter situation
outlined above by getting both client and server to be synchronized on
the same data model.  As far as the tiling optimization goes, it's
likely that I'll implement a preprocessor for the HTTP query so I can
break it into tiles -- conceptually very similar to the log10 binning
that Lincoln does in the GFF database.

-Allen


On 11/23/05, Andrew Dalke <dalke at dalkescientific.com> wrote:
>
> Allen:
> > No other clients have previously requested this range, so the 
> > server-side cache faults to the DAS/2 service (slow).
>
> Admittedly I'm curious about this.  Why is this slow?  What does slow 
> mean?  I assume "cannot be returned faster than the network will take 
> it."
>
> How many annotations are in the database?  Figuring one annotation for

> every ... 100 bases? gives me 30 million.  Shouldn't a range search 
> over < only 30 million be fast?  Is this being done in the database?  
> Which database and what's the SQL?
>
> If the DB is the bottleneck then pulling it out as a specialized 
> search might be worthwhile.
>
> What I'm driving at for this is this.  The proposal feels like a 
> workaround for a given implementation.  To use it requires more smarts

> in the client.  Why not put that logic on the server?
>
>
>                                         Andrew
>                                         dalke at dalkescientific.com
>
>

_______________________________________________
DAS2 mailing list
DAS2 at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/das2


From suzi at fruitfly.org  Fri Nov 25 17:20:29 2005
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Fri, 25 Nov 2005 14:20:29 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
Message-ID: <59fa39752e4d792d2142fe2682813937@fruitfly.org>

a few minor in-line edits below. trying to simplify and not confuse, as 
this is just an intro.

On Nov 25, 2005, at 7:21 AM, Andrew Dalke wrote:

> The front of the DAS doc starts
>
>   DAS 2.0 is designed to address the shortcomings of DAS 1.0, 
> including:
>
> That kinda assumes people know what DAS 1.0 is to understand DAS 2.0.
>
> How about this instead, as an overview/introduction.
>
>  ======
>
> DAS/2 describes a data model for genome annotations
,  THAT IS, DESCRIPTIONS OF FEATURES LOCATED ON THE GENOMIC SEQUENCE
> .  An annotation
> server provides

SUCH
> information
FOR
>  one or more genome
SEQUENCES.
> Each
GENOMIC SEQUENCE
> may have one or more versions.  Different versions are usually
> based on different assemblies.  As an implementation detail an
> assembly and corresponding sequence data may be distributed via a
> different machine, which is called the reference server.
(DELETED LAST 2 SENTENCES).


>
> Annotations are located on the genome with a start and end position.
> The range may be specified mutiple times if there are alternate
>
SEQUENCES THEY MAY BE PLACED UPON (REFERENCE FRAMES).

> An annotation may contain multiple non-continguous
> parts

(DELECTED PHRASE AND SENTENCE)

> Annotations have a type based on terms in SOFA
> (Sequence Ontology for Feature Annotation).  Stylesheets contain a set
> of properties used to depict a given type.
>
> Annotations can be searched by range, type, and a properties table
> associated with each annotation.  These are called feature filters.
>
> DAS/2 is implemented using a ReST architecture.  Each entity (also
> called a document or object) has a name, which is a URL.  Fetching the
> URL gets information about the entity.  The DAS-specific entities are
> all XML documents.  Other entities contain data types with an existing
> and frequently used file format.  Where possible, a DAS server returns
> data using existing formats.  In some cases a server may describe how
> to fetch a given entity in several different formats.
>  ======
>
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Fri Nov 25 18:43:10 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat, 26 Nov 2005 00:43:10 +0100
Subject: [DAS2] tiled queries for performance
In-Reply-To: <86C6E520C12E52429ACBCB01546DF4D3BE3E5E@xchange1.phage.bcgsc.ca>
References: <86C6E520C12E52429ACBCB01546DF4D3BE3E5E@xchange1.phage.bcgsc.ca>
Message-ID: <9ec33e6fb3efbbe8b39adc52d2b78db7@dalkescientific.com>

Asim Siddiqui
> I think this is a great idea.
>
> I don't see this as a big change to the DAS/2 spec or requiring much in
> the way of additional smarts on the client side.

I agree with Allen on this - in some sense there's no effect on the
spec.  It ends up being an agreement among the clients to request
aligned data, by rounding up/down to the nearest, say, kilobase and
for the server implementers to cache those requests.

> The change is simply that instead of the client getting exactly what it
> asks for, it may get more.

While that's another matter - the client makes a request
and the server is free to expand the range to something it can handle
a bit better.  Allen?  Were you suggesting this instead?

In this case there is a change to the spec, and all clients must
be able to filter or otherwise ignore extra results.

I personally think it's an implementation issue related to performance
and there are ways to make the results be generated fast enough.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Nov 25 19:35:45 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat, 26 Nov 2005 01:35:45 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <59fa39752e4d792d2142fe2682813937@fruitfly.org>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<59fa39752e4d792d2142fe2682813937@fruitfly.org>
Message-ID: <c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>

Hi Suzi,

You're supposed to be on holiday - it's Thanksgiving after all.

Though I'm not celebrating it until next week.  I wonder where
I can find pumpkin pie mix here ...

>> DAS/2 describes a data model for genome annotations
> ,  THAT IS, DESCRIPTIONS OF FEATURES LOCATED ON THE GENOMIC SEQUENCE

Changed, along with the other fixes.

> (DELETED LAST 2 SENTENCES).

That was the two lines about

>> Portions of
>> the assembly may have higher relative accuracy than the assembly as a
>> whole.  A reference server may supply these portions as an alternate
>> reference frame.

In the intro I want to mention all of the parts of DAS.  The
problem is that I still don't understand the /region request.
These two lines were my best attempt at explaining them.

Was the deletion because my understanding is wrong or because it's
not needed for the intro?

I think my confusion is related the concept you mention in:
>> Annotations are located on the genome with a start and end position.
>> The range may be specified mutiple times if there are alternate
>>
> SEQUENCES THEY MAY BE PLACED UPON (REFERENCE FRAMES).

because I don't understand what I should change.  I made up the
term 'reference frame' because of my physics training.  Is it
the correct term here?  Does 'reference frame' as it's normally
used only refer to the full assembly or does it refer to each
"/region" as well?  If I give the coordinates on a contig can
I say it's in the reference frame of that contig?

(Hmm, David Block agrees with me, according to
  http://open-bio.org/bosc2001/abstracts/lightning/block
    The presence of a Tiling_Path table allows the loading of
    any arbitrary length of sequence, in the reference frame
    of any of the contigs that make up the tiling path. )


I thought it was important to mention that a given annotation
may have "several <LOC> tags if the feature's location can be
represented in multiple coordinate systems (e.g. multiple builds
of a genome or multiple contigs)"

Then again, I don't understand how a given feature can be
annotated on multiple builds because I thought that a feature
was only associated with a single versioned source, and a
versioned source has only one build.


I would like to have something in the intro which mentions
"/region".  I just don't know how to do it.  Why does anyone
care about regions and not just point directly to the sequence?

>> An annotation may contain multiple non-continguous
>> parts
>
> (DELECTED PHRASE AND SENTENCE)

The deleted text there was ", making it the parent of those parts.
Some parts may have more than one parent."

I put it there because I remember we talked a lot about this
at CSHL a couple years back and wanted to make sure the data
model handled cases where, say, there were two parents to three
parts.  I seems to me that that structure is important enough
that someone who is trying to get a quick understanding of
DAS annotations would be interested in it.

My internal model for the expected reader is someone like
Allen or Gregg - people who have some experience in data
models for annotations and would like to know that DAS
can handle those sorts of more complicated tree structures.

I'm willing to move it further into the text, but I'm not
convinced that it makes things less confusing or simpler.
Features having parts and parents is an essential part of
the DAS data model.

					Andrew
					dalke at dalkescientific.com


From suzi at fruitfly.org  Fri Nov 25 20:44:54 2005
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Fri, 25 Nov 2005 17:44:54 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<59fa39752e4d792d2142fe2682813937@fruitfly.org>
	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>
Message-ID: <1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>

Hi Andrew,

so there seem to be 2 questions. it would be good to have both in the 
intro, but only as long as the description can be clearly stated in 
just a sentence or two. If it takes more then it is clearly something 
that requires a fuller description outside of the intro.

I'll try to give my understanding (but goodness knows I am peering 
through different lenses). I don't think in terms of the spec at all, 
just the information that needs to be conveyed.

#1 "reference frame" =========================================

"reference frame", is (to my mind) "reference sequence". at least, that 
is what i've always called it.

First, accuracy has nothing at all to do with it, so we don't want the 
sentence in there.

Second, the region of sequence that is returned is nothing more than 
that. Think of it as a special type of feature. This is what makes a 
transformation possible from one coordinate-system to another (by 
adding the correct offsets)

Third, just think of "reference sequence" as a coordinate system. One 
can have the exact same feature and indicate that: on 
coordinate-system-A this feature starts and ends here, and on 
coordinate-system-B it starts and ends there. Thus a feature's 
coordinates may be given both on a chromosome, and on a contig, and on 
any other coordinate-system that can be derived through a transform 
from these. So you could change the sentence below to read "A reference 
server may supply features where the locations (start and end) are 
relative to either contigs, some other arbitrary region, or to the 
entire chromosome."

#2 "multiple parents" =========================================

It still is easier for me to think of this in terms of sequences. We 
may know that somewhere out in the world a sequence must exist, but the 
data/sequence we have collected is fragmentary. For example, thinly 
sequenced genomes (resulting in many separate contigs) or a pair of 
ESTs from an cDNA. In either of these cases we need to be able to have 
the many to many relationships you talk about. This one is perhaps too 
subtle for the introduction, but if we decide to include it then I 
think it should first be phrased in terms of the problem (biological 
sampling) and then in terms of the solution (multiple parents).

-S

On Nov 25, 2005, at 4:35 PM, Andrew Dalke wrote:

> Hi Suzi,
>
> You're supposed to be on holiday - it's Thanksgiving after all.
>
> Though I'm not celebrating it until next week.  I wonder where
> I can find pumpkin pie mix here ...
>
>>> DAS/2 describes a data model for genome annotations
>> ,  THAT IS, DESCRIPTIONS OF FEATURES LOCATED ON THE GENOMIC SEQUENCE
>
> Changed, along with the other fixes.
>
>> (DELETED LAST 2 SENTENCES).
>
> That was the two lines about
>
>>> Portions of
>>> the assembly may have higher relative accuracy than the assembly as a
>>> whole.  A reference server may supply these portions as an alternate
>>> reference frame.
>
> In the intro I want to mention all of the parts of DAS.  The
> problem is that I still don't understand the /region request.
> These two lines were my best attempt at explaining them.
>
> Was the deletion because my understanding is wrong or because it's
> not needed for the intro?
>
> I think my confusion is related the concept you mention in:
>>> Annotations are located on the genome with a start and end position.
>>> The range may be specified mutiple times if there are alternate
>>>
>> SEQUENCES THEY MAY BE PLACED UPON (REFERENCE FRAMES).
>
> because I don't understand what I should change.  I made up the
> term 'reference frame' because of my physics training.  Is it
> the correct term here?  Does 'reference frame' as it's normally
> used only refer to the full assembly or does it refer to each
> "/region" as well?  If I give the coordinates on a contig can
> I say it's in the reference frame of that contig?
>
> (Hmm, David Block agrees with me, according to
>  http://open-bio.org/bosc2001/abstracts/lightning/block
>    The presence of a Tiling_Path table allows the loading of
>    any arbitrary length of sequence, in the reference frame
>    of any of the contigs that make up the tiling path. )
>
>
>
> I thought it was important to mention that a given annotation
> may have "several <LOC> tags if the feature's location can be
> represented in multiple coordinate systems (e.g. multiple builds
> of a genome or multiple contigs)"
>
> Then again, I don't understand how a given feature can be
> annotated on multiple builds because I thought that a feature
> was only associated with a single versioned source, and a
> versioned source has only one build.
>
>
> I would like to have something in the intro which mentions
> "/region".  I just don't know how to do it.  Why does anyone
> care about regions and not just point directly to the sequence?
>
>>> An annotation may contain multiple non-continguous
>>> parts
>>
>> (DELECTED PHRASE AND SENTENCE)
>
> The deleted text there was ", making it the parent of those parts.
> Some parts may have more than one parent."
>
> I put it there because I remember we talked a lot about this
> at CSHL a couple years back and wanted to make sure the data
> model handled cases where, say, there were two parents to three
> parts.  I seems to me that that structure is important enough
> that someone who is trying to get a quick understanding of
> DAS annotations would be interested in it.
>
> My internal model for the expected reader is someone like
> Allen or Gregg - people who have some experience in data
> models for annotations and would like to know that DAS
> can handle those sorts of more complicated tree structures.
>
> I'm willing to move it further into the text, but I'm not
> convinced that it makes things less confusing or simpler.
> Features having parts and parents is an essential part of
> the DAS data model.
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Sat Nov 26 20:20:24 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sun, 27 Nov 2005 02:20:24 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<59fa39752e4d792d2142fe2682813937@fruitfly.org>
	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>
	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
Message-ID: <9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>

Suzi:
> so there seem to be 2 questions. it would be good to have both in the 
> intro, but only as long as the description can be clearly stated in 
> just a sentence or two. If it takes more then it is clearly something 
> that requires a fuller description outside of the intro.

Agreed.

> I'll try to give my understanding (but goodness knows I am peering 
> through different lenses). I don't think in terms of the spec at all, 
> just the information that needs to be conveyed.
>
> #1 "reference frame" =========================================
>
> "reference frame", is (to my mind) "reference sequence". at least, 
> that is what i've always called it.


> First, accuracy has nothing at all to do with it, so we don't want the 
> sentence in there.

I'm fine with that.  I've found it best to declare my ignorance early
than to keep it hidden.

> Second, the region of sequence that is returned is nothing more than 
> that. Think of it as a special type of feature. This is what makes a 
> transformation possible from one coordinate-system to another (by 
> adding the correct offsets)

I can think of it as a feature just fine.  But then shouldn't each 
region
also be a feature?  Why wouldn't all contigs be visible as an 
annotation?

Contigs are in SOFA as

     @is_a at contig ; SO:0000149 @is_a@ assembly_component ;
         SO:0000143 @part_of@ supercontig ; SO:0000148

What advantage is there to break this feature out at a "/region"?

One that I can see is that the reference server provides the regions
while the annotation server provides the other features.  But if
that's the case we could have the reference server also provide the
regions as features, and the annotation server makes references to
those features rather than to regions.

That is, in the current scheme we have:

<feature> has 0 or more <loc> element, where the 'pos' attribute
    links to region + start/stop range and the optional 'seq' attribute
    links to the sequence range, as in:

    <LOC  pos="region/Chr3/1271:1507:1" seq="sequence/Chr3/1271:1507:1"/>


<region> is only a link to the sequence and a length, as in:

    <REGION id="../sequence/ctg2/100:200" length="100" name="ABCDE" />


One alternate possibility is to change that so "pos" points to a
/feature (instead of a /region) and have features for each contig or
other assembly component.  The result would look like:

    <LOC  pos="feature/AB1234/671:907:1" 
seq="sequence/Chr3/1271:1507:1"/>

   <FEATURE id="feature/AB1234" type="ABCDE_type"> ...

Doing this, however, means that all features must support subranges.


As an alternate solution without ranges, use

    <LOC  pos="feature/AB1234" seq="sequence/Chr3/1271:1507:1"/>

and then look up the sequence coordinates of feature/AB1234 to
figure out where it starts/stops.


The other advantage to a region is you can ask for the assembly
via the 'agp' format.  But because of the the existing support for
formats which are only valid for some feature you can do that by asking
for, say, all assembly_component features (via the feature filter) and 
return
the results in 'agp' format.

> Third, just think of "reference sequence" as a coordinate system. One 
> can have the exact same feature and indicate that: on 
> coordinate-system-A this feature starts and ends here, and on 
> coordinate-system-B it starts and ends there. Thus a feature's 
> coordinates may be given both on a chromosome, and on a contig, and on 
> any other coordinate-system that can be derived through a transform 
> from these.

I believe I understand this.  There really is only one reference frame 
for
the entire genome sequence, for a given assembly, and all other 
coordinate
systems are a fixed and definite offset of that single reference frame.
I believe this is called the golden path?

My reference to accuracy is because I figured that given two features
A and B on an assembly component X then the fuzziness in the relative
distance between A and B is small if X is also small.  That is, smaller
terms are less likely to have changes as the golden path changes.


>  So you could change the sentence below to read "A reference server 
> may supply features where the locations (start and end) are relative 
> to either contigs, some other arbitrary region, or to the entire 
> chromosome."

Why not always supply it relative to the chromosome coordinates?  The 
spec
now allows that as an optional field.  I can't figure out why you would
want to do otherwise.

Is it because sometimes it's easier to work with, say, a large number of
contig reference frames than with one large reference frame?  Does that
mean we shift the complexity of coordinate translation from the data
provider to the data consumer?  (Making it easier to generate data than
to consume data.)


> This one is perhaps too subtle for the introduction, but if we decide 
> to include it then I think it should first be phrased in terms of the 
> problem (biological sampling) and then in terms of the solution 
> (multiple parents).

Oh, definitely.  It's some place where I just don't have the domain
knowledge to explain it or even come up with examples.

					Andrew
					dalke at dalkescientific.com


From suzi at fruitfly.org  Sat Nov 26 20:24:07 2005
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Sat, 26 Nov 2005 17:24:07 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<59fa39752e4d792d2142fe2682813937@fruitfly.org>
	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>
	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
Message-ID: <e3dd0b2a0be613ba9ce83792801a41bd@fruitfly.org>

Lets add this to the agenda for Monday morning. Hopefully that will be 
faster than via e-mail.

On Nov 26, 2005, at 5:20 PM, Andrew Dalke wrote:

> Suzi:
>> so there seem to be 2 questions. it would be good to have both in the 
>> intro, but only as long as the description can be clearly stated in 
>> just a sentence or two. If it takes more then it is clearly something 
>> that requires a fuller description outside of the intro.
>
> Agreed.
>
>> I'll try to give my understanding (but goodness knows I am peering 
>> through different lenses). I don't think in terms of the spec at all, 
>> just the information that needs to be conveyed.
>>
>> #1 "reference frame" =========================================
>>
>> "reference frame", is (to my mind) "reference sequence". at least, 
>> that is what i've always called it.
>
>
>> First, accuracy has nothing at all to do with it, so we don't want 
>> the sentence in there.
>
> I'm fine with that.  I've found it best to declare my ignorance early
> than to keep it hidden.
>
>> Second, the region of sequence that is returned is nothing more than 
>> that. Think of it as a special type of feature. This is what makes a 
>> transformation possible from one coordinate-system to another (by 
>> adding the correct offsets)
>
> I can think of it as a feature just fine.  But then shouldn't each 
> region
> also be a feature?  Why wouldn't all contigs be visible as an 
> annotation?
>
> Contigs are in SOFA as
>
>     @is_a at contig ; SO:0000149 @is_a@ assembly_component ;
>         SO:0000143 @part_of@ supercontig ; SO:0000148
>
> What advantage is there to break this feature out at a "/region"?
>
> One that I can see is that the reference server provides the regions
> while the annotation server provides the other features.  But if
> that's the case we could have the reference server also provide the
> regions as features, and the annotation server makes references to
> those features rather than to regions.
>
> That is, in the current scheme we have:
>
> <feature> has 0 or more <loc> element, where the 'pos' attribute
>    links to region + start/stop range and the optional 'seq' attribute
>    links to the sequence range, as in:
>
>    <LOC  pos="region/Chr3/1271:1507:1" 
> seq="sequence/Chr3/1271:1507:1"/>
>
>
> <region> is only a link to the sequence and a length, as in:
>
>    <REGION id="../sequence/ctg2/100:200" length="100" name="ABCDE" />
>
>
> One alternate possibility is to change that so "pos" points to a
> /feature (instead of a /region) and have features for each contig or
> other assembly component.  The result would look like:
>
>    <LOC  pos="feature/AB1234/671:907:1" 
> seq="sequence/Chr3/1271:1507:1"/>
>
>   <FEATURE id="feature/AB1234" type="ABCDE_type"> ...
>
> Doing this, however, means that all features must support subranges.
>
>
> As an alternate solution without ranges, use
>
>    <LOC  pos="feature/AB1234" seq="sequence/Chr3/1271:1507:1"/>
>
> and then look up the sequence coordinates of feature/AB1234 to
> figure out where it starts/stops.
>
>
> The other advantage to a region is you can ask for the assembly
> via the 'agp' format.  But because of the the existing support for
> formats which are only valid for some feature you can do that by asking
> for, say, all assembly_component features (via the feature filter) and 
> return
> the results in 'agp' format.
>
>> Third, just think of "reference sequence" as a coordinate system. One 
>> can have the exact same feature and indicate that: on 
>> coordinate-system-A this feature starts and ends here, and on 
>> coordinate-system-B it starts and ends there. Thus a feature's 
>> coordinates may be given both on a chromosome, and on a contig, and 
>> on any other coordinate-system that can be derived through a 
>> transform from these.
>
> I believe I understand this.  There really is only one reference frame 
> for
> the entire genome sequence, for a given assembly, and all other 
> coordinate
> systems are a fixed and definite offset of that single reference frame.
> I believe this is called the golden path?
>
> My reference to accuracy is because I figured that given two features
> A and B on an assembly component X then the fuzziness in the relative
> distance between A and B is small if X is also small.  That is, smaller
> terms are less likely to have changes as the golden path changes.
>
>
>>  So you could change the sentence below to read "A reference server 
>> may supply features where the locations (start and end) are relative 
>> to either contigs, some other arbitrary region, or to the entire 
>> chromosome."
>
> Why not always supply it relative to the chromosome coordinates?  The 
> spec
> now allows that as an optional field.  I can't figure out why you would
> want to do otherwise.
>
> Is it because sometimes it's easier to work with, say, a large number 
> of
> contig reference frames than with one large reference frame?  Does that
> mean we shift the complexity of coordinate translation from the data
> provider to the data consumer?  (Making it easier to generate data than
> to consume data.)
>
>
>> This one is perhaps too subtle for the introduction, but if we decide 
>> to include it then I think it should first be phrased in terms of the 
>> problem (biological sampling) and then in terms of the solution 
>> (multiple parents).
>
> Oh, definitely.  It's some place where I just don't have the domain
> knowledge to explain it or even come up with examples.
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Mon Nov 28 04:44:18 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 28 Nov 2005 01:44:18 -0800
Subject: [DAS2] tiled queries for performance
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FADF@msex02.affymetrix.com>


> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Thursday, November 24, 2005 5:47 AM
> To: Brian Gilman
> Cc: DAS/2
> Subject: Re: [DAS2] tiled queries for performance
> 
> Hi Brian,
> 
> > 	We're looking into this kind of implementation issue ourselves
and
> > thought that a bitorrent like cache makes the most sense. ie. all
> > servers in the "fabric" are issued the query in a certain "hop
> > adjacency". These servers then send their data to the client who's
job
> > it is to assemble the data.
> 
> I go back and forth between the "large data set" model and the "large
> number
> of entities" model.
> 
> In the first:
>    - client requests a large data file
>    - server returns it
> 
> This can be sped up by distributing the file among many sites and
> using something like BitTorrent to put it together, or something like
> Coral ( http://www.coralcdn.org/ ) to redirect to nearby caches.
> 
> But making the code for this is complicated.  It's possible to build
> on BitTorrent and similar systems, but I have no feel for the actual
> implementation cost, which makes me wary.  I've looked into a couple
> of the P2P toolkits and not gotten the feel that it's any easier than
> writing HTTP requests directly.  Plus, who will set up the alternate
> servers?

My hope would be that any system like this could be hidden behind a
single HTTP GET request and hence require no changes to the DAS/2
protocol.  Standard web caches already work this way.  I'm less familiar
with the BitTorrent approach, but I'm guessing that the client-side code
that stitches together the pieces from multiple servers could be
encapsulated in a client-side daemon that responds to localhost HTTP
calls.
 
> In the second:
>    - make query to server
>    - server returns list of N identifiers
>    - make N-n requests (where 'n' is the number of identifiers already
> resolved)
> 
> The id resolution can be done in a distributed fashion and is easily
> supported via web caches, either with well-configured proxies or
(again)
> through Coral.
> 
> I like the latter model in part because it's more fine grained.  Eg,
> a progress bar can say "downloading feature 4 of 10000", and if a
given
> feature is already present there's no need to refetch it.
> 
> The downside of the 2nd is the need for HTTP 1.1 pipelining to make it
> be efficient.  I don't know if we want to have that requirement.  

I'm wary of this "large number of entities" approach, for several
reasons.  Due to the overhead for TCP/IP, HTTP headers, and extra XML
stuff like doctype and namespace declarations, making an HTTP GET
request per feature will increase the total number of bytes that need to
be transmitted.  It will also increase the parsing overhead on the
client side.  And if the features contain little information (for
example just type, parts/parents, and location) that overhead could
easily exceed the time taken to process the "useful" data.  As you
indicated, some performance problems could be alleviated by HTTP 1.1
pipelining, but that adds additional requirements to both client and
server.  Also, for persistent caching on the local machine when you
start splitting up the data into hundreds of thousands of files, I
suspect the additional disk seek time will far exceed disk read time and
become a serious performance impediment.

Having said that, in theory this approach is (almost) testable using the
current DAS/2 spec.  Create one DAS/2 server that in response to feature
queries returns only the minimum required information for "N" features:
id and type.  And have feature ids returned be URLs on another DAS/2
server that _does_ return full feature information (location, alignment,
etc.).  Then make "N-n" single-feature queries with those URLs to get
full information.  Due to the current DAS/2 requirement that any parts /
parents referenced also be included in the same XML doc, this would only
be a reasonable test for features with no hierarchical structure, such
as SNPs.

> Gregg
> came up with the range restrictions because most of the massive
results
> will be from range searches.  By being a bit more clever about
tracking
> what's known and not known, a client can get a much smaller results
> page.
>
>
> These are complementary.  Using Gregg's restricted range queries can
> reduce the number of identifiers returned in a search, making the
> network overhead even smaller.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Mon Nov 28 05:05:33 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 28 Nov 2005 02:05:33 -0800
Subject: [DAS2] das registry and das2
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE0@msex02.affymetrix.com>


> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Friday, November 18, 2005 10:00 AM
> To: DAS/2
> Subject: Re: [DAS2] das registry and das2
> 
> Andreas Prlic:
> > I would like to start a discussion of how to provide a proper DAS
> > interface for
> > our das- registration server at http://das.sanger.ac.uk/registry/
> >
> > Currently it is possible to interact with it using SOAP, or manually
> > via the HTML
> >  interface.  We should also make it accessible using URL requests.
> 
> One of the things Gregg and I talked about at ISMB was that the
> top-level
> "das-sources" format is, or can be, identical to what's needed for the
> registry server.
> 

Some of what we discussed I wrote up in a post ealier this year: 
http://portal.open-bio.org/pipermail/das2/2005-June/000198.html

Another post that might be useful in current discussions is a summary of
what was discussed in the DAS/2 registry meeting we had in Hinxton back
in September 2004:
http://portal.open-bio.org/pipermail/das2/2005-June/000197.html


	gregg


From Gregg_Helt at affymetrix.com  Mon Nov 28 05:58:00 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 28 Nov 2005 02:58:00 -0800
Subject: [DAS2] tiled queries for performance
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE1@msex02.affymetrix.com>

The attachment is a PowerPoint slide showing one of the feature query
optimizations that the IGB client currently uses, which combines
"overlaps" and "inside" filters.  When used consistently this guarantees
that the same feature is not returned in multiple feature queries.
However in general I agree that it is the client's responsibility to
reasonably handle cases where the same feature is returned multiple
times.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Allen Day
> Sent: Wednesday, November 23, 2005 3:50 PM
> To: das2 at portal.open-bio.org
> Subject: Re: [DAS2] tiled queries for performance
> 
> More thoughts on this.  The client can eliminate the redundancy in the
> records returned by issuing the tiling queries as previously described
> (query1), then issuing queries for records that are not contained
within
> tiles, but overlap the boundaries of 1 or more tiles (query2).
> 
> However, by issuing all the overlaps queries at once, we've just
deferred
> the performance hit one step, because we can't reasonably expect the
> server to have cached all combinations of tile overlaps queries.  I
think,
> to get this tiling optimization to work, the burden needs to be on the
> client to identify and remove duplicate responses for multiple
> edge-overlaps queries (query3).
> 
> 1000bp        2000bp        3000bp
> |             |             |
> | ===         | =====^====  |
> |         ====#=====        |
> | ============#=============#=====
> |             |             |
> 
>  <----------->                     query1a
>                <----------->       query1b
>              <o>           <o>     query2
>              <o>                   query3a
>                            <o>     query3b
> 
> Key:
> 
>   |  : tile boundary
>   =  : feature
>   ^  : gap between child features
>   #  : portion of feature overlapping tile boundary.
>  <o> : client overlaps query
>  <.> : client contains query
> 
> -Allen
> 
> 
> 
> On Mon, 21 Nov 2005, Allen Day wrote:
> 
> > Hi,
> >
> > I had an idea of how clients may be able to get better response from
> > servers by using a tiled query technique.  Here's the basic idea:
> >
> > ClientA wants features in chr1/1010:2020, and issues a request for
that
> > range.  No other clients have previously requested this range, so
the
> > server-side cache faults to the DAS/2 service (slow).
> >
> > ClientB wants features in chr1/1020:2030, and issues a request for
that
> > range.  Although the intersection of the resulting records with
> ClientA's
> > query is large, the URIs are different and the server-side cache
faults
> > again.
> >
> > If ClientA and ClientB were to each issue two separate "tiled"
requests:
> >
> >  1. chr1/1001:2000
> >  2. chr1/2001:3000
> >
> > ClientB could take advantage of the fact that ClientA had been
looking
> at
> > the same tiles.
> >
> > For this to work, the clients would need to be using the same tile
size.
> > The optimal tile size is likely to vary from datasource to
datasource,
> > depending on the length and density distributions of the features
> > contained in the datasource.  The "sources" or "versioned sources"
> > payload could suggest a tiling size to prospective clients.  Servers
> could
> > also pre-cache all tiles by hitting each tile after an update of the
> > datasource (or the DAS/2 service code).
> >
> > The tradeoff for the performance gains is that clients may now need
to
> do
> > filtering on the returned records to only return those requested by
the
> > client's client.
> >
> > -Allen
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> >
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAS2_Query_Optimization.ppt
Type: application/vnd.ms-powerpoint
Size: 287744 bytes
Desc: DAS2_Query_Optimization.ppt
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20051128/03f7f254/attachment.ppt>

From ap3 at sanger.ac.uk  Mon Nov 28 06:48:03 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon, 28 Nov 2005 11:48:03 +0000
Subject: [DAS2] DAS intro
In-Reply-To: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
Message-ID: <b3bf5797fbf04cde998830d9822d1733@sanger.ac.uk>

Hi!


> How about this instead, as an overview/introduction.
>
>  ======
>
> DAS/2 describes a data model for genome annotations.

Can we formulate the start a little more general?

something like:

DAS/2  is a protocol to share biological data. It provides 
specifications for how
to share annotations of genomes and proteins, assays, ontologies  
(space fore more here...).

then I would continue with your text.


Cheers,
Andreas


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From dalke at dalkescientific.com  Mon Nov 28 12:10:30 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 28 Nov 2005 18:10:30 +0100
Subject: [DAS2] mtg topics for Nov 28
Message-ID: <b4c91693e3e2df1fc5b124ca4ac4a04a@dalkescientific.com>

Here are the spec issues I would like to talk about for today's meeting,
culled from the last few weeks of emails and phone calls

1) DAS Status Code in headers

The current spec says
>  X-DAS-Status: XXX status code
>
> The list of status codes is similar, but not identical, to those used  
> by DAS/1:
>
> 200 OK, data follows
> 400 Bad namespace
> 401 Bad data source
> 402 Bad data format
> 403 Unknown object ID
> 404 Invalid object ID
> 405 Region coordinate error
> 406 No lock
> 407 Access denied
> 500 Server error
> 501 Unimplemented feature

I argued that these are not needed.  Some of them are duplicates with
HTTP error codes and those which are not can be covered by an error
code "300" along with an (optional) XML payload.

The major problem with doing this seems to be in how MS IE handles
certain error codes.  While IE is not a target browser, MS software
may use IE as a component for fetching data.  From the link Ed dug
up, it looks like this won't be a problem.

Lincoln's last email on this was a tepid

> I give up arguing this one and will go with the way Andrew wants to do
> it. Therefore I propose the following rules:
>
> 	1) Return the HTTP 404 error for the case that any component of the  
> DAS2 path
> 	is invalid. This would apply to the following situations:
>
> 		Bad namespace
> 		Bad data source
> 		Unknown object ID
>
> 	2) Return HTTP 301 and 302 redirects when the requested object has
> moved.
>
> 	3) Return HTTP 403 (forbidden) for no-lock errors.
>
> 	4) Return HTTP 500 when the server crashes.
>
> For all errors there should be a text/x-das-error entity returned that
> describes the error in more detail.

The "x-das-error" format must have an invariant string, either an
error code or fixed text, and a possible optional explanatory text
section. Note the "should" in that last paragraph - this is optional.


2) Content-type

There was some discussion about changing the content type to
"text/xml" to support viewing DAS results in a browser.  We decided
that that wasn't a valid use case.

In doing the research for this I found that the general recommendation
for these sorts of XML documents is to put the document under  
"application/*"
instead of "text/*".

One reason is from http://www.ietf.org/rfc/rfc3023.txt

    If an XML document -- that is, the unprocessed, source XML document
    -- is readable by casual users, text/xml is preferable to
    application/xml.  MIME user agents (and web user agents) that do not
    have explicit support for text/xml will treat it as text/plain, for
    example, by displaying the XML MIME entity as plain text.
    Application/xml is preferable when the XML MIME entity is unreadable
    by casual users.  Similarly, text/xml-external-parsed-entity is
    preferable when an external parsed entity is readable by casual
    users, but application/xml-external-parsed-entity is preferable when
    a plain text display is inappropriate.

       NOTE: Users are in general not used to text containing tags such
       as <price>, and often find such tags quite disorienting or
       annoying.  If one is not sure, the conservative principle would
       suggest using application/* instead of text/* so as not to put
       information in front of users that they will quite likely not
       understand.

Another is the difference in how application/* and text/* handle
character set encodings.

We use "text/x-...+xml" - I propose changing this to  
"application/x-...+xml"

I don't think there are any objections to this.  The main objection is
to the difficulty of ploughing through all the specs related to charsets
and unicode.

3) Key/value data

As Steve pointed out, the spec is incomplete on how to handle key/value
data associated with a record.  The main problem is in how it handles
namespaces.  It mixes an internal attribute value namespace with the
xml namespace, which doesn't happen.

For example,

<FEATURES
      xmlns="http://www.biodas.org/ns/das/genome/2.00"
      xmlns:das="http://www.biodas.org/ns/das/properties/2.00"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://www.wormbase.org/das/genome/volvox/1/">

  <FEATURE   id = "feature/cTel54X"
           type = "type/gene"
           name = "tg-3">

     <PROP ptype  = "das:note">This is a telomeric repeat</PROP>
     <PROP ptype  = "das:alias">birx28</PROP>
     <PROP  ptype = "property/entrez_dbxref"


Steve proposed using xml:namespaced attributes, like

<FEATURES
      xmlns="http://www.biodas.org/ns/das/genome/2.00"
      xmlns:das="http://www.biodas.org/ns/das/properties/2.00"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://www.wormbase.org/das/genome/volvox/1/">

  <FEATURE   id = "feature/cTel54X"
           type = "type/gene"
           name = "tg-3">

     <PROP das:ptype="das:prop#note">This is a telomeric repeat</PROP>
     <PROP das:ptype="property/genefinder-score">29</PROP>
     <PROP das:ptype="das:prop#protein_translation"
       xlink:type="simple"
        
xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
CTEL54X.1
     /></PROP>

I proposed using the "eXtensible" in XML, like this

<FEATURES
      xmlns="http://www.biodas.org/ns/das/genome/2.00"
      xmlns:das="http://www.biodas.org/ns/das/properties/2.00"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://www.wormbase.org/das/genome/volvox/1/">

  <FEATURE   id = "feature/cTel54X"
           type = "type/gene"
           name = "tg-3">

     <das:note>This is a telomeric repeat</PROP>
     <some_other_ns:gf-score>29</PROP>
     <das:protein_translation"
       xlink:type="simple"
        
xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
CTEL54X.1
     /></PROP>


Steve's concern with this was the validation.  I looked into the
Relax-NG spec and it support this just fine.

4) Standard form for key/value pairs

Furthermore, I looked into how Atom handles this.  They also allow
extensible key/value data in parts of the spec.  Quoting from an
earlier email, which I now see I only sent to Steve

> In an extensible section there are two/three categories of elements:
>   - those in the "atom:" namespace
>   - "simple extension elements" not in the "atom:" namespace
>   - "structured extension elements" not in the "atom:" namespace.
>
> Most of the "atom:" elements share a common structure.  For example:
>   - the type= attribute indicates of the contents are text, escaped
>       HTML or XHTML; or an explicit content-type like "chemical/x-pdb".
>
>   - the src= attribute indicates that the content of the element is
>       empty and to go to the given URL instead (apparently the hip
>       term for URL these days is IRL - internationalized Resource  
> Identifiers.
>       I think we only need to use URLs)
>
>
> These are not always used for all elements; if it's appropriate for a
> given field then it's used.
>
>
>  Simple extension elements are always of the form
>     <element>Content goes here</element>
> where 'element' is not part of the 'atom:' namespace.  Consumers of
> this data may treat it as simple key/value data.
>
>  Structured extension elements always have at least an attribute
> or a sub-element, so must look like
>   <element attr="xyz"> .. </element>
> -or-
>   <element> .. <subelement /> .. </element>
>
> If the element isn't known this field may be ignored.
>
> These three things provide for:
>   - a set of well-define elements, understandable by everyone
>   - a simple extension for things which can be key/value data
>   - a way to store or refer to more complex data types


5) xlink and <link>

Several places in the spec include or may include links to documents
elsewhere.  The XLink specification describes an general extensibility
mechanism for such links.

xlinks have 1 of about 4 properties, the most important are:
   - where does the link go to
   - what kind of link is it
   - what should the browser do with such a link

I personally don't understand the xlink spec well enough to want
to use it, and I haven't come across examples of it in use.  I am
wary about specs like that.

Another is to use something like the <link> element from HTML 4.0
and in Atom.  This looks something like

  <link rel="density.experimental_xray" type="chemical/x-ccp4-edm"
     href="http://blah.blah/"></link>

that is, it has:
   - a category for how the link is related to the given object ('rel')
   - an optional MIME type (use, eg, if the server has multiple ways
         to provide data for the same 'rel' category)
   - an href to the data

As implemented in Atom the contents of a <link> are extensible,
which allows people to experiment with things like mirroring.


<link rel="something" title="This is a title"
       xmlns:x="blah/blah" href="http://default>
   <x:mirror href="http://here/"/>
   <x:mirror href="http://there/"/>
   <x:mirror href="http://everywhere/"/>
</link>

In any case we need a way to provide typed links to other documents.
Such links may include:
   - link from a given feature to the versioned source
   - link from a versioned source to the lock document

6) Source filters

This comes from Andreas Prlic.

We can support metadata servers via the same <SOURCES> document
returned from the entry point to a DAS server.

However, a metadata server may also support searches, eg, to show
only H. sapiens annotations using the build 1234 assembly.

Should we make this property searching part of the DAS/2 spec, which
means everyone must support it, or should we say it's optional
but if implemented it must be done in a standard way?

Or leave it for version 2.1, once we have more experience with
DAS in real-life?  (Though we already have that experience.)

7) /regions

Could someone please explain to me the point of the /region subtree?

As far as I can tell, a region is just a type of feature.  A generic
feature is located somewhere on the genome (with respect to a given
assembly), and may also say it's on various 'region' features.

I don't see the need for a separate namespace for this.

8) Tiled queries

Do they need spec changes, or spec recommendations?


I think I've mentioned everything to be covered.


					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Mon Nov 28 12:14:28 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 28 Nov 2005 09:14:28 -0800
Subject: [DAS2] tiled queries for performance
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE2@msex02.affymetrix.com>

I don't think we should allow servers to return features than do not
meet the criteria specified in the query feature filters, it's an
invitation for ambiguity.  This may seem harmless with just an
"overlaps" region filter, but what about "inside", "contains",
"identical"?  What about "type", etc?

If different DAS/2 server implementations contain the same data, they
should return the same set of features for a given feature query.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Friday, November 25, 2005 3:43 PM
> To: Asim Siddiqui
> Cc: DAS/2
> Subject: Re: [DAS2] tiled queries for performance
>
> > The change is simply that instead of the client getting exactly what
it
> > asks for, it may get more.
> 
> While that's another matter - the client makes a request
> and the server is free to expand the range to something it can handle
> a bit better.  Allen?  Were you suggesting this instead?
> 
> In this case there is a change to the spec, and all clients must
> be able to filter or otherwise ignore extra results.
> 
> I personally think it's an implementation issue related to performance
> and there are ways to make the results be generated fast enough.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 


From dalke at dalkescientific.com  Mon Nov 28 12:14:52 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 28 Nov 2005 18:14:52 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <b3bf5797fbf04cde998830d9822d1733@sanger.ac.uk>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<b3bf5797fbf04cde998830d9822d1733@sanger.ac.uk>
Message-ID: <e7a0e92638b29d2f5fd3655f879d1a06@dalkescientific.com>

Andreas Prlic:
> Can we formulate the start a little more general?
>
> something like:
>
> DAS/2  is a protocol to share biological data. It provides 
> specifications for how
> to share annotations of genomes and proteins, assays, ontologies  
> (space fore more here...).

I thought about that, but the DAS/2.0 spec doesn't include any of those.
Perhaps be more definite instead and say this is DAS/2.0?

Or say "Other projects (link, link, link) extend DAS/2 to protein,
assay and ontology data sets."


					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Mon Nov 28 12:24:32 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 28 Nov 2005 12:24:32 -0500
Subject: [DAS2] DAS intro
In-Reply-To: <9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
Message-ID: <200511281224.32885.lstein@cshl.edu>

>     <LOC  pos="region/Chr3/1271:1507:1" seq="sequence/Chr3/1271:1507:1"/>
>
>
> <region> is only a link to the sequence and a length, as in:
>
>     <REGION id="../sequence/ctg2/100:200" length="100" name="ABCDE" />

You know, this is still kind of ugly. I hate to revisit this so late in the 
game, but can't we make sequence retrieval a three-step process?

1) Feature request returns:

	<LOC  pos="region/Chr3/1271:1507:1" />

2) Region request returns:

	<REGION id="Chr3/1271:1507:1" seq="../sequence/Chr3/1271:1507:1" />

(where seq= could be an absolute URL if someone else owns the bases)

3) Sequence request then returns the bases

Lincoln

>
>
> One alternate possibility is to change that so "pos" points to a
> /feature (instead of a /region) and have features for each contig or
> other assembly component.  The result would look like:
>
>     <LOC  pos="feature/AB1234/671:907:1"
> seq="sequence/Chr3/1271:1507:1"/>
>
>    <FEATURE id="feature/AB1234" type="ABCDE_type"> ...
>
> Doing this, however, means that all features must support subranges.
>
>
> As an alternate solution without ranges, use
>
>     <LOC  pos="feature/AB1234" seq="sequence/Chr3/1271:1507:1"/>
>
> and then look up the sequence coordinates of feature/AB1234 to
> figure out where it starts/stops.
>
>
> The other advantage to a region is you can ask for the assembly
> via the 'agp' format.  But because of the the existing support for
> formats which are only valid for some feature you can do that by asking
> for, say, all assembly_component features (via the feature filter) and
> return
> the results in 'agp' format.
>
> > Third, just think of "reference sequence" as a coordinate system. One
> > can have the exact same feature and indicate that: on
> > coordinate-system-A this feature starts and ends here, and on
> > coordinate-system-B it starts and ends there. Thus a feature's
> > coordinates may be given both on a chromosome, and on a contig, and on
> > any other coordinate-system that can be derived through a transform
> > from these.
>
> I believe I understand this.  There really is only one reference frame
> for
> the entire genome sequence, for a given assembly, and all other
> coordinate
> systems are a fixed and definite offset of that single reference frame.
> I believe this is called the golden path?
>
> My reference to accuracy is because I figured that given two features
> A and B on an assembly component X then the fuzziness in the relative
> distance between A and B is small if X is also small.  That is, smaller
> terms are less likely to have changes as the golden path changes.
>
> >  So you could change the sentence below to read "A reference server
> > may supply features where the locations (start and end) are relative
> > to either contigs, some other arbitrary region, or to the entire
> > chromosome."
>
> Why not always supply it relative to the chromosome coordinates?  The
> spec
> now allows that as an optional field.  I can't figure out why you would
> want to do otherwise.
>
> Is it because sometimes it's easier to work with, say, a large number of
> contig reference frames than with one large reference frame?  Does that
> mean we shift the complexity of coordinate translation from the data
> provider to the data consumer?  (Making it easier to generate data than
> to consume data.)
>
> > This one is perhaps too subtle for the introduction, but if we decide
> > to include it then I think it should first be phrased in terms of the
> > problem (biological sampling) and then in terms of the solution
> > (multiple parents).
>
> Oh, definitely.  It's some place where I just don't have the domain
> knowledge to explain it or even come up with examples.
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Mon Nov 28 12:08:35 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 28 Nov 2005 12:08:35 -0500
Subject: [DAS2] DAS intro
In-Reply-To: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
Message-ID: <200511281208.36204.lstein@cshl.edu>

Yes, this is a better intro.

Lincoln

On Friday 25 November 2005 10:21 am, Andrew Dalke wrote:
> The front of the DAS doc starts
>
>    DAS 2.0 is designed to address the shortcomings of DAS 1.0, including:
>
> That kinda assumes people know what DAS 1.0 is to understand DAS 2.0.
>
> How about this instead, as an overview/introduction.
>
>   ======
>
> DAS/2 describes a data model for genome annotations.  An annotation
> server provides information about one or more genome sources.  Each
> source may have one or more versions.  Different versions are usually
> based on different assemblies.  As an implementation detail an
> assembly and corresponding sequence data may be distributed via a
> different machine, which is called the reference server.  Portions of
> the assembly may have higher relative accuracy than the assembly as a
> whole.  A reference server may supply these portions as an alternate
> reference frame.
>
> Annotations are located on the genome with a start and end position.
> The range may be specified mutiple times if there are alternate
> reference frames.  An annotation may contain multiple non-continguous
> parts, making it the parent of those parts.  Some parts may have more
> than one parent.  Annotations have a type based on terms in SOFA
> (Sequence Ontology for Feature Annotation).  Stylesheets contain a set
> of properties used to depict a given type.
>
> Annotations can be searched by range, type, and a properties table
> associated with each annotation.  These are called feature filters.
>
> DAS/2 is implemented using a ReST architecture.  Each entity (also
> called a document or object) has a name, which is a URL.  Fetching the
> URL gets information about the entity.  The DAS-specific entities are
> all XML documents.  Other entities contain data types with an existing
> and frequently used file format.  Where possible, a DAS server returns
> data using existing formats.  In some cases a server may describe how
> to fetch a given entity in several different formats.
>   ======
>
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Mon Nov 28 12:11:24 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 28 Nov 2005 12:11:24 -0500
Subject: [DAS2] tiled queries for performance
In-Reply-To: <9ec33e6fb3efbbe8b39adc52d2b78db7@dalkescientific.com>
References: <86C6E520C12E52429ACBCB01546DF4D3BE3E5E@xchange1.phage.bcgsc.ca>
	<9ec33e6fb3efbbe8b39adc52d2b78db7@dalkescientific.com>
Message-ID: <200511281211.25239.lstein@cshl.edu>

One thing to do is to add to the spec a note that the server is free to return 
features from a range larger than requested. This way the server is free to 
expand the range to the 1k boundaries.

My preference, however, would be for the server to implement a filter that 
removes from the precalculated tiled XML output all features that are outside 
the range. This would be completely transparent to the client.

Lincoln

On Friday 25 November 2005 06:43 pm, Andrew Dalke wrote:
> Asim Siddiqui
>
> > I think this is a great idea.
> >
> > I don't see this as a big change to the DAS/2 spec or requiring much in
> > the way of additional smarts on the client side.
>
> I agree with Allen on this - in some sense there's no effect on the
> spec.  It ends up being an agreement among the clients to request
> aligned data, by rounding up/down to the nearest, say, kilobase and
> for the server implementers to cache those requests.
>
> > The change is simply that instead of the client getting exactly what it
> > asks for, it may get more.
>
> While that's another matter - the client makes a request
> and the server is free to expand the range to something it can handle
> a bit better.  Allen?  Were you suggesting this instead?
>
> In this case there is a change to the spec, and all clients must
> be able to filter or otherwise ignore extra results.
>
> I personally think it's an implementation issue related to performance
> and there are ways to make the results be generated fast enough.
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From Gregg_Helt at affymetrix.com  Mon Nov 28 12:30:27 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 28 Nov 2005 09:30:27 -0800
Subject: [DAS2] Agenda for today's DAS/2 meeting
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE3@msex02.affymetrix.com>

Today we're going over spec issues.  Here's my short list of topics to
cover:
 
DAS-specific headers
Error codes
Feature properties
Registry & Discovery
 
Please feel free to add!
 
            gregg
 

From td2 at sanger.ac.uk  Mon Nov 28 12:27:31 2005
From: td2 at sanger.ac.uk (Thomas Down)
Date: Mon, 28 Nov 2005 17:27:31 +0000
Subject: [DAS2] DAS intro
In-Reply-To: <e7a0e92638b29d2f5fd3655f879d1a06@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<b3bf5797fbf04cde998830d9822d1733@sanger.ac.uk>
	<e7a0e92638b29d2f5fd3655f879d1a06@dalkescientific.com>
Message-ID: <83634851-73AD-454A-B027-644539CF1869@sanger.ac.uk>


On 28 Nov 2005, at 17:14, Andrew Dalke wrote:

> Andreas Prlic:
>> Can we formulate the start a little more general?
>>
>> something like:
>>
>> DAS/2  is a protocol to share biological data. It provides  
>> specifications for how
>> to share annotations of genomes and proteins, assays, ontologies   
>> (space fore more here...).
>
> I thought about that, but the DAS/2.0 spec doesn't include any of  
> those.

There are pages about assay and ontology retrieval on the website.   
Are these not part of the spec?  Or are they being counted as  
something else (DAS/2.1?)

            Thomas.


From dalke at dalkescientific.com  Mon Nov 28 13:09:17 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 28 Nov 2005 19:09:17 +0100
Subject: properties and key/value data (was Re: [DAS2] Spec issues)
In-Reply-To: <cfb538488b311ba27c955c8ba060d8f9@dalkescientific.com>
References: <BF9E7C8C.17F89%Steve_Chervitz@affymetrix.com>
	<cfb538488b311ba27c955c8ba060d8f9@dalkescientific.com>
Message-ID: <a70ab5ce08de37c456fe791ca4178b66@dalkescientific.com>

Here's the email I sent to Steve that I meant to send to everyone.

On Nov 17, 2005, at 2:09 AM, Andrew Dalke wrote:

> I think I understand the Atom spec better now.  In brief, the
> Atom document contains sections which are extensible and sections
> which are not.
>
> In an extensible section there are two/three categories of elements:
>   - those in the "atom:" namespace
>   - "simple extension elements" not in the "atom:" namespace
>   - "structured extension elements" not in the "atom:" namespace.
>
> Most of the "atom:" elements share a common structure.  For example:
>   - the type= attribute indicates of the contents are text, escaped
>       HTML or XHTML; or an explicit content-type like "chemical/x-pdb".
>
>   - the src= attribute indicates that the content of the element is
>       empty and to go to the given URL instead (apparently the hip
>       term for URL these days is IRL - internationalized Resource  
> Identifiers.
>       I think we only need to use URLs)
>
>
> These are not always used for all elements; if it's appropriate for a
> given field then it's used.
>
>
>  Simple extension elements are always of the form
>     <element>Content goes here</element>
> where 'element' is not part of the 'atom:' namespace.  Consumers of
> this data may treat it as simple key/value data.
>
>  Structured extension elements always have at least an attribute
> or a sub-element, so must look like
>   <element attr="xyz"> .. </element>
> -or-
>   <element> .. <subelement /> .. </element>
>
> If the element isn't known this field may be ignored.
>
> These three things provide for:
>   - a set of well-define elements, understandable by everyone
>   - a simple extension for things which can be key/value data
>   - a way to store or refer to more complex data types
>
>
> Steve, responding to an earlier posting of mine:
>> Interesting, but a problem with this is that it effectively creates a
>> new version of the TYPES schema every time a new property is added to
>> the DAS properties controlled vocabulary. I would hope for a solution
>> that decouples the content of the controlled vocab from the data
>> exchange format.
>
> I looked into that.  Relax-NG lets you define a "can be anything
> except ...".  The Atom spec is defined with the following
>
> # Simple Extension
>
> simpleExtensionElement =
>    element * - atom:* {
>       text
>    }
>
> # Structured Extension
>
> structuredExtensionElement =
>    element * - atom:* {
>       (attribute * { text }+,
>          (text|anyElement)*)
>     | (attribute * { text }*,
>        (text?, anyElement+, (text|anyElement)*))
>    }
>
> The "element * - atom:*" means "Any element except those in
> the atom namespace."
>
> Thus we can validate anything with DAS/2 tags, and ignore
> validate of anything not part of DAS/2.  And we can say that
> extensions are only allowed in certain parts of the spec and
> not in others.
>
> We would need to update the schema when we add new "das:" elements,
> but we already need to do that.
>
> We wouldn't need to change the schema to allow others to develop
> their own extensions. Indeed, the schema would still let use
> verify that extensions are still well-formed.
>
>> Here's my next attempt, which more fully exploits xml:base to achieve
>> this decoupling:
>>
>>   <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
>>             xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>>             xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>>             xmlns:xlink="http://www.w3.org/1999/xlink"
>>>
>>     <FEATURE das:id="feature/cTel54X.1.2"
>>              das:type="type/curated_exon">
>>       <PROPERTIES>
>>         <PROP das:ptype="property/genefinder-score">29</PROP>
>>       </PROPERTIES>
>>       <PROPERTIES
>> xml:base="http://www.biodas.org/ns/das/genome/2.00/properties">
>>         <PROP das:ptype="phase">2</PROP>
>>         <PROP das:ptype="protein_translation"
>>               xlink:type="simple"
>>
>> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
>> CTEL54X.1"
>> />
>>       </PROPERTIES>
>>     </FEATURE>
>
> Vs.
>
> <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
>             xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>              
> xmlns:prop="http://www.biodas.org/ns/das/genome/2.00/properties"
>           xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>           xmlns:xlink="http://www.w3.org/1999/xlink">
>   <FEATURE id="feature/xTel54X.1.2"
>            das:type="type/curated_exon">
>      <prop:genefinder-score>29</score>
>      <prop:phase>2</phase>
>      <prop:protein_translation
>   src="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1"  
> />
>   </FEATURE>
> </FEATURES>
>
> The main differences are:
>   - the properties are defined elements in the prop: namespace (though
>       I think they can just as easily be in the das: namespace)
>
>   - I'm using lower-case since that seems to be the trend these days.
>
>
>
>> So now we have the following arrangement:
>>
>>  * the attribute keys 'das:id', 'das:type', and 'das:ptype' are  
>> defined
>>    within the xmlns:das namespace (i.e., the full id of 'das:type' is
>>    derived by appending 'type' to the xmlns:das URL).
>
> I don't follow why the attributes have full namespaces.  Is that
> to allow extensibility of element attribute on a per-element basis?
>
> I kept "das:type" above because "type" already has too many meanings.
>
>>  * the attributes values of 'das:id', 'das:type', and 'das:ptype' are
>>    URLs relative to xml:base.
>
> Are all attribute values relative to xml:base or only those three?
>
> Are xlink:href fields relative to xml:base as well?  I assume "yes".
>
>>  * The FEATURE element may contain zero or more PROPERTIES
>>    sub-elements, each with it's own xml:base attribute, effectively
>>    changing what xml:base is used within the containted PROP
>>    sub-elements.
>>
>> So in this example, the property  
>> 'das:ptype="property/genefinder-score"'
>> inherits its xml:base from its grandparent FEATURES element and so
>> expands to:
>>
>> http://www.wormbase.org/das/genome/volvox/1/property/genefinder-score
>>
>> while the 'das:ptype="phase"' and 'das:ptype="protein_translation"'
>> properties inherit xml:base from their PROPERTIES parent element and
>> so expand to:
>>
>> http://www.biodas.org/ns/das/genome/2.00/properties/phase
>> http://www.biodas.org/ns/das/genome/2.00/properties/ 
>> protein_translation
>
> This is also what happens with the "prop:" namespaced elements, just
> at the element level instead of the attribute level.
>
> To keep this on key/value data I've shifted the rest of the reply
> to the next email.

					Andrew
					dalke at dalkescientific.com


From asims at bcgsc.ca  Mon Nov 28 14:21:47 2005
From: asims at bcgsc.ca (Asim Siddiqui)
Date: Mon, 28 Nov 2005 11:21:47 -0800
Subject: [DAS2] tiled queries for performance
Message-ID: <86C6E520C12E52429ACBCB01546DF4D3BE3EF8@xchange1.phage.bcgsc.ca>


Agreed - in light of this, my suggestion doesn't make sense,
though Allen's idea may be workable through some other means.

Asim
 

-----Original Message-----
From: Helt,Gregg [mailto:Gregg_Helt at affymetrix.com] 
Sent: Monday, November 28, 2005 9:14 AM
To: Andrew Dalke; Asim Siddiqui
Cc: DAS/2
Subject: RE: [DAS2] tiled queries for performance

I don't think we should allow servers to return features than do not
meet the criteria specified in the query feature filters, it's an
invitation for ambiguity.  This may seem harmless with just an
"overlaps" region filter, but what about "inside", "contains",
"identical"?  What about "type", etc?

If different DAS/2 server implementations contain the same data, they
should return the same set of features for a given feature query.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Friday, November 25, 2005 3:43 PM
> To: Asim Siddiqui
> Cc: DAS/2
> Subject: Re: [DAS2] tiled queries for performance
>
> > The change is simply that instead of the client getting exactly what
it
> > asks for, it may get more.
> 
> While that's another matter - the client makes a request and the 
> server is free to expand the range to something it can handle a bit 
> better.  Allen?  Were you suggesting this instead?
> 
> In this case there is a change to the spec, and all clients must be 
> able to filter or otherwise ignore extra results.
> 
> I personally think it's an implementation issue related to performance

> and there are ways to make the results be generated fast enough.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 


From allenday at ucla.edu  Mon Nov 28 15:11:59 2005
From: allenday at ucla.edu (Allen Day)
Date: Mon, 28 Nov 2005 12:11:59 -0800 (PST)
Subject: [DAS2] tiled queries for performance
In-Reply-To: <200511281211.25239.lstein@cshl.edu>
References: <86C6E520C12E52429ACBCB01546DF4D3BE3E5E@xchange1.phage.bcgsc.ca>
	<9ec33e6fb3efbbe8b39adc52d2b78db7@dalkescientific.com>
	<200511281211.25239.lstein@cshl.edu>
Message-ID: <Pine.LNX.4.58.0511281209030.32182@sumo.ctrl.ucla.edu>

On Mon, 28 Nov 2005, Lincoln Stein wrote:

> One thing to do is to add to the spec a note that the server is free to return 
> features from a range larger than requested. This way the server is free to 
> expand the range to the 1k boundaries.

This would require the returned payload to contain the bounds of the 
features actually returned.  E.g. if client asks for 1500..1600, and 
server responds with 1001..2000, it needs a way to tell the client what 
the actual bounds of the response are.

> 
> My preference, however, would be for the server to implement a filter that 
> removes from the precalculated tiled XML output all features that are outside 
> the range. This would be completely transparent to the client.

Yes, this is what I plan to do if we agree to use one of the tiling
variants.

-Allen

> 
> Lincoln
> 
> On Friday 25 November 2005 06:43 pm, Andrew Dalke wrote:
> > Asim Siddiqui
> >
> > > I think this is a great idea.
> > >
> > > I don't see this as a big change to the DAS/2 spec or requiring much in
> > > the way of additional smarts on the client side.
> >
> > I agree with Allen on this - in some sense there's no effect on the
> > spec.  It ends up being an agreement among the clients to request
> > aligned data, by rounding up/down to the nearest, say, kilobase and
> > for the server implementers to cache those requests.
> >
> > > The change is simply that instead of the client getting exactly what it
> > > asks for, it may get more.
> >
> > While that's another matter - the client makes a request
> > and the server is free to expand the range to something it can handle
> > a bit better.  Allen?  Were you suggesting this instead?
> >
> > In this case there is a change to the spec, and all clients must
> > be able to filter or otherwise ignore extra results.
> >
> > I personally think it's an implementation issue related to performance
> > and there are ways to make the results be generated fast enough.
> >
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> 
> 


From Steve_Chervitz at affymetrix.com  Mon Nov 28 17:07:29 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 28 Nov 2005 14:07:29 -0800
Subject: properties and key/value data (was Re: [DAS2] Spec issues)
In-Reply-To: <BF9E7C8C.17F89%Steve_Chervitz@affymetrix.com>
Message-ID: <BFB0BFA1.186F4%Steve_Chervitz@affymetrix.com>

To give some context to the message that Andrew recently forwarded to the
list, below is the message I sent to Andrew that prompted his reply (I also
meant to send to the list instead of to just Andrew).

It contains my fix to the 'namespace in attribute values' problem regarding
properties which I mentioned in today's conf call, and is, I believe, the
only viable alternative to Andrew's relax-NG based solution.

Basically, the trick is to enclose PROP elements that are relative to the
same xml:base within a parent PROPERTIES element and then permit multiple
PROPERTIES elements within a feature. This way you can allow property
attribute URIs that are relative to different xml:bases.

To clarify a point of possible confusion, there are really two sets of
key-value pairs to keep in mind:

1. The key-value pair for the property type.
2. The key-value pair for the property itself.

So in this example:

  <PROP das:ptype="property/genefinder-score">29</PROP>

The key for the type is 'das:ptype' and it's value is
'property/genefinder-score' and this value is a relative URL based on
xml:base in the enclosing PROPERTIES element (or in it's grandparent or
great-grandparent element, etc.). The value of the property itself is 29 and
it's key is the whole key-value pair for the type (
das:ptype="property/genefinder-score").

In Andrew's Relax-NG equivalent:

  <prop:genefinder-score>29</score>

the element name contains both the key ('prop:') and the value of the
property type ('genefinder-score'), while the element name as a whole serves
as the key for the property itself (value=29). The 'prop:genefinder-score'
string is not a relative URL, but is just a namespace-scoped element name,
with 'prop:' serving merely to make 'genefinder-score' globally unique,
relative to the URI defined by:

  xmlns:prop="http://www.biodas.org/ns/das/genome/2.00/properties"

A potential drawback of the Relax-NG approach, as discussed in today's conf
call, is that the value of the property type is not resolvable as in the
other approach using the PROPERTIES parent element.

Andrew doesn't see a need for resolvability, e.g., for a dynamically
discoverable schema fragment. But I thought of another use case besides the
one mentioned in today's call (determining data type such as int or float,
which isn't of much use in practice). The URL for the type could point to a
human readable definition of the term. A user may not need clarification of
'genefinder-score' but might for something like 'softberry-ztuple'.

One could still satisfy such a use case under the Relax-NG approach by
providing a resolvable URL based on the element name + namespace such as:

http://www.biodas.org/ns/das/genome/2.00/properties#genefinder-score

True, there's no XML spec that says this is legal, but we could declare that
such a convention will hold for all biodas.org-based properties. One problem
with the above convention is that it's not obvious what the URL resolves to.
So we could have something like:

http://www.biodas.org/ns/das/genome/2.00/properties?prop=genefinder-score&de
fine=true

http://www.biodas.org/ns/das/genome/2.00/properties?prop=genefinder-score&sc
hema=true

Just a thought.

Steve 

> From: Steve Chervitz <Steve_Chervitz at affymetrix.com>
> Date: Mon, 14 Nov 2005 17:40:28 -0800
> To: Andrew Dalke <dalke at dalkescientific.com>
> Conversation: [DAS2] Spec issues
> Subject: Re: [DAS2] Spec issues
> 
> 
> Andrew Dalke <dalke at dalkescientific.com> wrote on 14 Nov 2005:
>> 
>> To: DAS/2 <das2 at portal.open-bio.org>
>> Subject: Re: [DAS2] Spec issues
>> 
>> On Nov 4 Steve wrote:
>>>     <FEATURE das:id="feature/cTel54X.1.2"
>>>              das:type="type/curated_exon">
>>>       <PROP das:ptype="property/genefinder-score">29</PROP>
>>>       <PROP das:ptype="das:prop#phase">2</PROP>
>>>       <PROP das:ptype="das:prop#protein_translation"
>>>             xlink:type="simple"
>>>    
>>> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/
>>> CTEL54X.1
>>> />
>>>     </FEATURE>
>> 
>> I think we're missing something.  This is XML.  We can do
>> 
>> <TYPES xml:base="http://www.wormbase.org/dase/genome/volvox/1/type">
>>    <TYPE id="curated_gene"
>>            ontology="http://song.sf.net/ontologies/sofa#gene"
>>            source="curated"
>>            xml:base="gene/">
>>      <das:ptype name="property/genefinder-score">29</das:ptype>
>>      <das:phase>2</das:phase>
>>      <das:protein_translation xlink:type="simple"
>> xlink:href="http://www.wormbase.org/..." />
>>      <xyz:ack type="html">This message brought to you by
>> AT&amp;T</xyz:ack>
>>    </TYPE
>> </TYPES>
>> 
>> The whole point of having namespaces in XML is to keep from needing
>> to define new namespaces like <PROP>.
>> 
>> In doing that, there's no problem in supporting things like "bg:glyph",
>> etc. because the values are expanded as expected by the XML processor.
> 
> Interesting, but a problem with this is that it effectively creates a
> new version of the TYPES schema every time a new property is added to
> the DAS properties controlled vocabulary. I would hope for a solution
> that decouples the content of the controlled vocab from the data
> exchange format.
> 
> Here's my next attempt, which more fully exploits xml:base to achieve
> this decoupling:
> 
>   <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
>             xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>             xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>             xmlns:xlink="http://www.w3.org/1999/xlink"
>> 
>     <FEATURE das:id="feature/cTel54X.1.2"
>              das:type="type/curated_exon">
>       <PROPERTIES>
>         <PROP das:ptype="property/genefinder-score">29</PROP>
>       </PROPERTIES>
>       <PROPERTIES
> xml:base="http://www.biodas.org/ns/das/genome/2.00/properties">
>         <PROP das:ptype="phase">2</PROP>
>         <PROP das:ptype="protein_translation"
>               xlink:type="simple"
>               
> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1" />
>       </PROPERTIES>
>     </FEATURE>
> 
> So now we have the following arrangement:
> 
>  * the attribute keys 'das:id', 'das:type', and 'das:ptype' are defined
>    within the xmlns:das namespace (i.e., the full id of 'das:type' is
>    derived by appending 'type' to the xmlns:das URL).
> 
>  * the attributes values of 'das:id', 'das:type', and 'das:ptype' are
>    URLs relative to xml:base.
> 
>  * The FEATURE element may contain zero or more PROPERTIES
>    sub-elements, each with it's own xml:base attribute, effectively
>    changing what xml:base is used within the containted PROP
>    sub-elements. 
> 
> So in this example, the property 'das:ptype="property/genefinder-score"'
> inherits its xml:base from its grandparent FEATURES element and so
> expands to: 
> 
> http://www.wormbase.org/das/genome/volvox/1/property/genefinder-score
> 
> while the 'das:ptype="phase"' and 'das:ptype="protein_translation"'
> properties inherit xml:base from their PROPERTIES parent element and
> so expand to:
> 
> http://www.biodas.org/ns/das/genome/2.00/properties/phase
> http://www.biodas.org/ns/das/genome/2.00/properties/protein_translation
> 
> 
>>> Also, we might want to allow some controlled vocabulary terms to be
>>> used for
>>> the value of type.source (e.g., "das:curated"), to ensure that
>>> different
>>> users use the same term to specify that a feature type is produced by
>>> curation.
>> 
>> I talked with Andreas Prlic about what other metadata is needed for the
>> registry system.  He mentioned
>> 
>>      Together with the BioSapiens DAS people we recently decided that
>>      there should be the possibility to assign gene-ontology evidence
>>      codes to each das source, so in the next update of the registry,
>>      this will be changed.
>> 
>> That's at the source level, but perhaps it's also needed at the
>> annotation level.
> 
> I like this idea. Good re-use of GO technology.
>  
>> <snip>
>> 
>> My thoughts on these are:
>>    - come up with a more consistent way to store key/value data
>>    - the Atom spec has a nice way to say "the data is in this CDATA
>> as text/html/xml" vs. "this text is over there".  I want to copy its
>> way of doing things.
>> 
>>    - I'm still not clear about xlink.  Another is the HTML-style
>> <link href="http://..." rel="...">
>> 
>> Atom uses the "rel=" to encoding information about the link.  For
>> example, the URL to edit a given document is
>> 
>>    <link ... rel="service.edit">
>> 
>> See http://atomenabled.org/developers/api/atom-api-spec.php
> 
> Not sure about this one yet. In the Atom API, the value of the rel
> attribute is restricted to a controlled vocabulary of link
> relationships and available services pertaining to editing and
> publishing syndicated content on the web:
> http://atomenabled.org/developers/api/atom-api-spec.php#rfc.section.5.4.1
> 
> What would a controlled vocab for DAS resources be?
> 
> Skimming through the DAS/2 retrieval spec, our use of hrefs is
> simply for pointing at the location of resources on the web
> containing some specified content (e.g., documentation, database
> entry, image data, etc.).
> 
> The next/prev/start idea for Atom might have good applicability in the
> DAS world for iterating through versions of annotations or assemblies
> (e.g., rel='link-to-gene-on-next-version-of-genome'). One relationship
> that would be useful for DAS would be 'latest', to get the latest
> version of an annotation.
> 
> DAS get URLs themselves seem fairly self-documenting (it's clear a
> given link is for feature, type, or sequence for example), so having a
> separate rel attribute may not provide much additional value for these
> links. But it might be handy for versioning and for DAS/2 writebacks.
> 
> Here's another link about Atom:
> http://en.wikipedia.org/wiki/Atom_%28standard%29
> 
> Steve


From ed_erwin at affymetrix.com  Mon Nov 28 17:09:23 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Mon, 28 Nov 2005 14:09:23 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
Message-ID: <438B8013.3060107@affymetrix.com>


Andrew Dalke wrote:
> 
> I believe I understand this.  There really is only one reference frame for
> the entire genome sequence, for a given assembly, and all other coordinate
> systems are a fixed and definite offset of that single reference frame.

No.  The coordinate transformations are often more complicated than 
simple offsets.  The coordinate space for features on one contig can be 
'backwards' with respect to a different contig, and the coordinate space 
for a gene may skip over one or more gaps with respect to the genomic 
sequence.

Also, the term 'reference frame' bugs me a bit because 'frame' always 
makes me think of 'reading frame', which is not what you intend.


From Steve_Chervitz at affymetrix.com  Mon Nov 28 17:55:28 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 28 Nov 2005 14:55:28 -0800
Subject: [DAS2] DAS/1 vs DAS/2 discussion list
In-Reply-To: <C71929195D04BF48BAECD499AF717B48B6FAD6@msex02.affymetrix.com>
Message-ID: <BFB0CAE0.18707%Steve_Chervitz@affymetrix.com>

The DAS/1 list is still open and working. I updated biodas.org to reflect
this and set up a special page to inform people about which list to use:

http://biodas.org/documents/biodas-lists.html

Subscribers on the DAS/1 list have not been automatically added to the DAS/2
list. They must actively subscribe themselves here:

http://biodas.org/mailman/listinfo/das2

Steve


> From: "Helt,Gregg" <Gregg_Helt at affymetrix.com>
> Date: Mon, 21 Nov 2005 09:24:37 -0800
> To: Andrew Dalke <dalke at dalkescientific.com>, DAS/2 <das2 at portal.open-bio.org>
> Conversation: [DAS2] Getting individual features in DAS/1
> Subject: RE: [DAS2] Getting individual features in DAS/1
> 
> We need to discuss at today's meeting.  I don't think the original DAS
> list should be closed, but rather continue to serve as a list to discuss
> the DAS/1 protocol and implementations, and the DAS2 mailing list should
> focus on DAS/2.  If we mix DAS/1 and DAS/2 discussions in the same
> mailing list I think it's going to lead to a lot of confusion.
> 
> gregg
> 
>> -----Original Message-----
>> From: das2-bounces at portal.open-bio.org
> [mailto:das2-bounces at portal.open-
>> bio.org] On Behalf Of Andrew Dalke
>> Sent: Monday, November 21, 2005 9:09 AM
>> To: DAS/2
>> Subject: Re: [DAS2] Getting individual features in DAS/1
>> 
>> Has anyone answered Ilari's question?
>> 
>> I never used DAS/1 enough to answer it myself.
>> 
>> If the normal DAS list is closed, is this the right place for DAS/1
>> questions?
>> 
>> 
>> On Nov 18, 2005, at 4:22 PM, Ilari Scheinin wrote:
>> 
>>> This mail is not really about DAS/2, but the web site says the
>>> original DAS mailing list is now closed.
>>> 
>>> I am setting up a DAS server that serves CGH data from my database
> to
>>> a visualization software, which in my case is gbrowse. I've already
>>> set up Dazzle that serves the reference data from a local copy of
>>> Ensembl. I need to be able to select individual CGH experiments to
> be
>>> visualized, and as the measurements from a single CGH experiment
> cover
>>> the entire genome, this cannot of course be done by specifying a
>>> segment along with the features command.
>>> 
>>> I noticed that there is a feature_id option for getting the features
>>> in DAS/1.5, but on a closer look, it seems to work by getting the
>>> segment that the specified feature corresponds to, and then getting
>>> all features from that segment. My next approach was to use the
>>> feature type to distinguish between different CGH experiments. As
> all
>>> my data is of the type CGH, I thought that I could use spare this
>>> piece of information for identifying purposes.
>>> 
>>> First I tried the generic seqfeature plugin. I created a database
> for
>>> it with some test data. However, getting features by type does not
>>> seem to work. I always get all the features from the segment in
>>> question.
>>> 
>>> Next I tried the LDAS plugin. Again I created a compatible database
>>> with some test data. I must have done something wrong the the data
>>> file I imported to the database, because getting the features does
> not
>>> work. I can get the feature types, but trying to get the features
>>> gives me an ERRORSEGMENT error.
>>> 
>>> I thought that before I go further, it might be useful to ask
> whether
>>> my approach seems reasonable, or is there a better way to achieve
> what
>>> I am trying to do? What should I do to be able to visualize
> individual
>>> CGH profiles?
>>> 
>>> I'm grateful for any advice,
>>> Ilari
>> 
>> Andrew
>> dalke at dalkescientific.com
>> 
>> _______________________________________________
>> DAS2 mailing list
>> DAS2 at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/das2
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Mon Nov 28 19:01:08 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 29 Nov 2005 01:01:08 +0100
Subject: properties and key/value data (was Re: [DAS2] Spec issues)
In-Reply-To: <BFB0BFA1.186F4%Steve_Chervitz@affymetrix.com>
References: <BFB0BFA1.186F4%Steve_Chervitz@affymetrix.com>
Message-ID: <bb823710169971a87b214920557be03e@dalkescientific.com>

Steve:
> To clarify a point of possible confusion, there are really two sets of
> key-value pairs to keep in mind:
>
> 1. The key-value pair for the property type.
> 2. The key-value pair for the property itself.

I don't see that #1 is a useful distinction.

> So in this example:
>
>   <PROP das:ptype="property/genefinder-score">29</PROP>
>
> The key for the type is 'das:ptype' and it's value is
> 'property/genefinder-score' and this value is a relative URL based on
> xml:base in the enclosing PROPERTIES element (or in it's grandparent or
> great-grandparent element, etc.). The value of the property itself is  
> 29 and
> it's key is the whole key-value pair for the type (
> das:ptype="property/genefinder-score").

How do I make an extension type?  For example, I want to add
a new property for 3D structure depiction, which can be one of
"cartoon", "ribbons", or "wires".

Let's say it's under my company web site in
   http://www.dalkescientific.com/das-types/rep3d

How do I write it?  I tried but couldn't figure it out.

What does that URL resolve, if anything?


> In Andrew's Relax-NG equivalent:
>
>   <prop:genefinder-score>29</score>
>
> the element name contains both the key ('prop:') and the value of the
> property type ('genefinder-score'), while the element name as a whole  
> serves
> as the key for the property itself (value=29). The  
> 'prop:genefinder-score'
> string is not a relative URL, but is just a namespace-scoped element  
> name,
> with 'prop:' serving merely to make 'genefinder-score' globally unique,
> relative to the URI defined by:
>
>   xmlns:prop="http://www.biodas.org/ns/das/genome/2.00/properties"

It took me a while to understand XML namespaces.  This helped
   http://www.jclark.com/xml/xmlns.htm

He uses (for purposes of explanation) the so-called "Clark notation".
An example from that document is

    <cars:part xmlns:cars="http://www.cars.com/xml"/>
      maps to
   <{http://www.cars.com/xml}part/>

"""The role of the URI in a universal name is purely to allow
applications to recognize the name. There are no guarantees about
the resource identified by the URI."""

Using Clark notation helps with remembering that, since { and }
here are not valid for URLs.

The element name "prop:genefinder-score" is a convenient way to
write the full element name, and that's all.  There is no meaning
to the parts of the name.  "prop:" is not a key, since given these
two namespace definitions

   <... xmlns:prop="http://www.dalkescientific.com/"
        xmlns:wash="http://www.dalkescientific.com/">

then these two elements are identical

     <prop:genefinder-score>29</score>
     <wash:genefinder-score>29</score>

I think Steve is saying the same thing as I am - I wanted to rephrase
it to make sure.


> A potential drawback of the Relax-NG approach, as discussed in today's  
> conf
> call, is that the value of the property type is not resolvable as in  
> the
> other approach using the PROPERTIES parent element.
>
> Andrew doesn't see a need for resolvability, e.g., for a dynamically
> discoverable schema fragment. But I thought of another use case  
> besides the
> one mentioned in today's call (determining data type such as int or  
> float,
> which isn't of much use in practice). The URL for the type could point  
> to a
> human readable definition of the term. A user may not need  
> clarification of
> 'genefinder-score' but might for something like 'softberry-ztuple'.

Who is the user that would want the clarification?  That is, what human
will be doing the reading?

Once clarified, what does that user do with the information?

In my opinion, the only people who care about this are developers,
and more specifically, developers who will extend a client to support
new data types.  Users of, say, the web front end or of IGB don't care.

That's a relatively small number of people.  And the use case is
solved by having the doc_href for the versioned source include a
link to any extensions served.


Here's another solution. Somewhere early in the results include

<link ref="das.format_href" href="http://example.org/blah">

where the schema includes links for each of the fields, including
any extensions.  It doesn't need to be a <link>, just something
meant as a shout out to developer people.


> One could still satisfy such a use case under the Relax-NG approach by
> providing a resolvable URL based on the element name + namespace such  
> as:
>
> http://www.biodas.org/ns/das/genome/2.00/properties#genefinder-score
>
> True, there's no XML spec that says this is legal, but we could  
> declare that
> such a convention will hold for all biodas.org-based properties. One  
> problem
> with the above convention is that it's not obvious what the URL  
> resolves to.
> So we could have something like:
>
> http://www.biodas.org/ns/das/genome/2.00/properties?prop=genefinder- 
> score&de
> fine=true
>
> http://www.biodas.org/ns/das/genome/2.00/properties?prop=genefinder- 
> score&sc
> hema=true

We could do this, though it's a bit complicated with some tools which
represent element via Clark notation - it needs a bit of string munging.

I suggest that the reason why "it's not obvious what the URL resolves
to" is because there's nothing which will actually use this.

It is easier to just have a human-readable link either on the doc_href
page or via some special "if you're a developer, look here" reference,
and don't worry about automating it further.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Nov 28 19:16:17 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 29 Nov 2005 01:16:17 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <438B8013.3060107@affymetrix.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
	<438B8013.3060107@affymetrix.com>
Message-ID: <e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>

Ed Erwin:
> No.  The coordinate transformations are often more complicated than 
> simple offsets.  The coordinate space for features on one contig can 
> be 'backwards' with respect to a different contig, and the coordinate 
> space for a gene may skip over one or more gaps with respect to the 
> genomic sequence.

The /region entities in the DAS/2 spec are defined as

<REGION> (zero or more)
A top-level region on the genome (similar to the "entry points" of
the DAS/1 protocol).
     id ? the URI of the sequence ID
     length ? length of the sequence
     name (optional) ? a human-readable label for use when referring
        to the region
     doc_href (optional) ? a URL that gives additional information
        about this region

Here is an example

    <REGION id="../sequence/ctg2" length="81918" name="VolvoxContig2" />

This is a very simple definition.  As far as I can tell it does not
capture the information for, say, skipping.

How would you represent "the coordinate space for a gene [that skips]
over one or more gapes with respect to the genomic sequence" using the
current DAS/2 object model?

Or goes backwards?  I don't see anything like that.

> Also, the term 'reference frame' bugs me a bit because 'frame' always
> makes me think of 'reading frame', which is not what you intend.

Oh, I agree.  It's a bad term.  Very very few genomics people use it,
according to Google.

There's a theory, popular in usenet and in some wikis, is that experts
rarely write the details because after all they know the topic.  The
best way to get a detailed explanation is to post something in error
and wait for the corrections.  :)

					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Mon Nov 28 22:05:40 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 28 Nov 2005 19:05:40 -0800
Subject: [DAS2] DAS/2 weekly meeting notes for 28 Nov 05
Message-ID: <BFB10584.1875F%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 28 Nov 2005.

$Id: das2-teleconf-2005-11-28.txt,v 1.1 2005/11/29 03:06:04 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  UC Berkeley: Suzi Lewis
  Sanger: Thomas Down, Andreas Prlic
  Sweden: Andrew Dalke
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Today's topic: Spec issues (for DAS/2 retrievals)
-------------------------------------------------

We are following the agenda summary in Andrew's email:
http://portal.open-bio.org/pipermail/das2/2005-November/000352.html


1) DAS Status Code in headers
-----------------------------
Use http error codes and not das-specific ones.
das-error to provide more detail.

GH: Do we really need a detailed response document?

TD: How do you distinguish different parts of the error-causing
request?
AD: how detailed do we need to be?

LS: If you wish to do error recovery, you could have problems with one
part and not another. You give up granularity.

GH: Willing to give up the granularity in favor of simplicity.

AD: Possibilities of error

LS: How about everything that can be turned into an http error should
be. And have a special section to provide das details. E.g.:
    <x-das-error id="code#" description="...">
client is still going to have to understand das error codes

GH, AD: client does need to be there.
AD: Using only http error codes reduces complexity - you only need to
check one place. Another benefit - you can provide a file-based das
server (this was not an use case from the RFCs, just AD's pet idea he
envisions as potentially useful).

GH: Can't think of DAS/1 clients that did anything meaningful with
those das error codes.
AD: NCBI entrez server - does lots of extra error support. Don't want
to go there with das.

TD, LS: DAS error codes can be used to tell client which part of the
URL is at fault. Now it will be just '404 not found'.

AD: REST API says use the http protocol directly.
LS: There are some things in the DAS API that don't translate into
http error codes.

AD: We can support this with error document.

[A] Use HTTP error codes and x-das-error document with code and optional
description.


2) Content-type
---------------

[A] No objections to using: application/x-das+blah+xml


3) Key/value data
-----------------

Three possibilities summarized in Andrew's email.

1) (current spec) using namespace in attrib value.
2) (steve, lincoln) all attribute values are URI's
3) (andrew) Relax-NG based, drop in well-structured XML

SC: (clarified proposal #2). For more, see today's post at:
http://portal.open-bio.org/pipermail/das2/2005-November/000363.html

AD: What's wrong with the Relax-NG based approach?
LS: I don't understand it yet.
SC: Community lacks experience with Relax-NG in general.

TD: Does it let you to point to schema fragments for data types?
AD: There are ways to define it in the schema, haven't looked at it.

LS: This looks great. Would propose having a convention that if it's a
simple, single-valued key, value should be encoded in an attribute
(value="blah"), not as content of a section (CDATA). Reason: It's more
consistent with rest of spec, and it's easier to parse. So in the
example, genefinder-score is not correctly encoded.

AD: That's not in the das: namespace, hence is not under our
control. We can use this convention for things in the das namespace.

AD: User can put it any xml as long as it's reasonably well-formed. We
can define what well-formed is. This is what atom uses. Allows some
simple key val data on client as if it were native data. It permits
searches without needing to know about complex data.

GH: Likes idea of allowing arbitrary xml.

SC: Not completely arbitrary since we limit use of das: namespace,
and possibly other aspects.

LS: So we're going to say we have properties represented as key/val
pairs using this syntax. You'll find 'das:' as well as possibly other
namespaces. I think that works.

What becomes of /property url (ptype)? Does that go away and replaced
by namespace?

AD: Possibly use it for data type (e.g., float). Or we could make it
discoverable? 

LS: Easier to make it part of the spec.

TD: If this can work like XML schema, we could have a pointer to an
xsi. Is there a way to put a pointer to a schema url?

AD: Found this to be useless. Hard coding what is expected is better
than having discoverability.

TD: With the xsi schema location, you can put multiple schema
locations for the das schema, and your extension, separate pointers to
both in a single document.

AD: Never found dynamically resolved schemas useful for anything
LS: In theory they are. Why not?
AD: Knowing that something's an int does say what that int is supposed
to mean.
LS: Right. Let's make sure that the common types of annotation a
server would want to return are in the spec from the get go. Anyone
that doesn't care about extensions can ignore additional properties.

No doubt people will make extensions to DAS/2 that are implemented on
client and server that are in-house, private extensions that only work
in client-server pairs.

Should we allow schema fragments to be brought in via xsi?

TD: this would be in the top-level element. Or can put it on an
enclosing element.
AD: Is there a good reason to do it?

LS: Let's not seek discoverability.

[A] Andrew will flesh out his Relax-NG based property encoding approach.

SC: You could put your schema at the url pointed do by 'das:'

AD: Don't see a need. I found that many of the DAS/1 schema
fragments/documents were in valid. This didn't seem to bother DAS/1
clients and users.
LS: In the real world, people don't validate.

5) xlink and <link>
-------------------

AD: The official xlink spec is long. Have not fully groked it.
GH: Does anyone else have experience with it?  (silence...) Seems like
a reason to not go there.

AD: Atom, uses link to say, "Here's some generic linked out stuff". We
could use it to say, "I'm looking for the stylesheet for this thing or
the schema for the xml document."

GH: We need to draw line between generic links and specific
things. eg. feature ids, all ids are resolvable links, and so could in
principle be specified with link tags.

AD: Link from feature to versioned source it's a part of. Client can
figure out context from url.
Use case: DAS user sends email to colleague, 'look at this url for
feature X'. The other user enters URL in his das browser, client can
identify the das2-versioned source given the feature URL.
LS: They would rely on xml:base.
Nothing in the current DAS/2 spec says that the xml base is for the
versioned source.
LS: But it does give you the versioned source. This is absolutely part
of the spec.
AD: Nothing in the spec that says that features have to be on the same
machine as the rest of the data.
LS: Why does user want versioned source on the same machine that the
feature came from?

AD: Nothing in the spec says that that a feature has to be under
'feature' in the URL.

GH: Generalizing the info href element to be more generic, to specify
what that link means is fine as long as we don't do this for everything
that can be a link. Doc hrefs are fine, not ids.

LS: We're not going to demand that people specify links. (Something
about giving people enough rope to hang themselves with...)

GH: Ids are opaque uris to id the feature.

LS: The HTML link tag has been around a long time, and used a total of
two times: style sheets, copyright statements. This could have easily
been done with a stylesheet tag and copyright tag (without needing a
general link tag).

[A] Consider the xlink/link tags issue tabled.

6) Source filters
-----------------

GH: Use case: DAS/2 client is trying to discover what registry has,
query can be the same as for any das server, you can just apply
additional filters when dealing with a registry.

AP: Client would use tags that a registry server must implement.

GH: A non-registry server can implement as well.

TD: say filtering is optional in general.
AD: I tend to not like optional things. Filtering is required for features.

GH: The spec can state the filters that a registry is required to
implement on sources query. General DAS/2 servers are not requiredd,
but can if they want. What if you send a sources query with filters that it
doesn't understand?

LS: Return everything
GH: Return error
AP: Client can filter out what they want

GH: It's already important to have search capability in client.
Use case: On given genome, show me all gene predicitons for this
region. You need to go to all servers, which could be many.

AD: Can you filter by type of features that can be returned?
AP: Can be added.

GH: Want to be able to search on ontology term, not just id of the
type. 
AD: Need meta-data server to ask of DAS/2 servers what features do you
implement? 

LS: Does metadata protocol need to be part of das spec, or an
additional protocol on top? There should be an optional section of
DAS/2 that is implemented by metadata servers or registrys that allows
you to do servers. Shouldn't overload the core server spec.

GH: Concerned with the response. It's so close to the same xml, it
might as well be the same. Makes it easy for clients to know about
both servers and metadata servers. could call it 'sources' or
something else.

LS: Filtering by feature type, do we need that info that's returned by
sources document?
GH: No, it's part of the query.
LS: Metadata server would have to do a types request.

AD: What if there's a mismatch in SOFA version?
LS: We're in trouble.
AD: Concerned about change in meaning.
SL: Not important.

LS: Use case: There's a 'restriction site' node in SOFA 1.4 with five
terms underneath it. In version 1.5, now there's six terms. A metadata
server running off of the old version is using an incomplete
node. Metadata engine should always run off the latest version.

AP: Registry at Sanger checks every 2 hrs with server.

AD: How is this better than having client do it itself? What features
do you know with this type and this range?
GH: If lots of DAS servers, this will be time intensive
AD: Can we wait until there are lots of servers?
AP: We have 17.
LS: Current paradigm - EBI has many servers that just do one type of
feature e.g, there's a server that just does repeat elements.
So there are servers that will serve up one or a few feat types.
AD: Had not considered that.
LS: Happy to have optional filter syntax added to sources request
supported by metadata servers. Gregg is right about returning error
(unimplemented). Will not change protocol in fundamental way. Just an
annex, just optional section supported by metadata servers.

GH: Based on Andreas' queries in soap, can we squeeze everything in to
params on url? filterable?

AP: yes

AD: optional fields will include species, build#, type, etc.

[A] Add optional filter syntax to sources request. Allow unimpl error
return.

7) /regions
-----------

LS: In sofa, a feature of type region is root of all other features -
everything is a region. Has props - ref sequence it's on, start,
strandedness. The reason for region is for retrieving assemblies.

SC: Region is also currently the only way to get back a list of
available sequence ids without getting all sequence data. The
top-level sequence request returns data along with sequence.

LS/GH: region could be called 'landmarks'

[A] Andrew will work directly with Lincoln on revising region request.

8) Tiled queries
----------------

LS: This doesn't need to be in spec. If client filters features by a
range, is there a contract such that server must return exact range he
asked for, contained in, or is ok for server to return more?
GH: We need to be more strict.
LS: Agree. Client should trim it.

[A] Tiled queries should not be part of the spec.

Other issues 
------------

AP: There are still some other issues not addressed in this
call. E.g., Not possible to handle situation where protein
sequence in a structure varies from genome. Can defer to the next
spec discussion conf call.


From ed_erwin at affymetrix.com  Tue Nov 29 14:30:41 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 29 Nov 2005 11:30:41 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>	<438B8013.3060107@affymetrix.com>
	<e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>
Message-ID: <438CAC61.1090104@affymetrix.com>


Andrew Dalke wrote:
> Ed Erwin:
> 
>> No.  The coordinate transformations are often more complicated than 
>> simple offsets.  The coordinate space for features on one contig can 
>> be 'backwards' with respect to a different contig, and the coordinate 
>> space for a gene may skip over one or more gaps with respect to the 
>> genomic sequence.
> 
> 
> The /region entities in the DAS/2 spec are defined as
> 
> <REGION> (zero or more)
> A top-level region on the genome (similar to the "entry points" of
> the DAS/1 protocol).
>     id ? the URI of the sequence ID
>     length ? length of the sequence
>     name (optional) ? a human-readable label for use when referring
>        to the region
>     doc_href (optional) ? a URL that gives additional information
>        about this region
> 
> Here is an example
> 
>    <REGION id="../sequence/ctg2" length="81918" name="VolvoxContig2" />
> 

I had to go back and look-up the context for this discussion.  Here it is:

 >> [Suzi wrote]
>> Third, just think of "reference sequence" as a coordinate system. One 
>> can have the exact same feature and indicate that: on 
>> coordinate-system-A this feature starts and ends here, and on 
>> coordinate-system-B it starts and ends there. Thus a feature's 
>> coordinates may be given both on a chromosome, and on a contig, and on 
>> any other coordinate-system that can be derived through a transform 
>> from these.
> 
 > [Andrew wrote]
> I believe I understand this.  There really is only one reference frame 
> for the entire genome sequence, for a given assembly, and all other 
> coordinate systems are a fixed and definite offset of that single 
 > reference frame.

I understand this as talking about coordinates in general, not the 
<region> elements or "pos" attributes in the spec.  Suzi specifically 
mentions chromosomes and contigs; one can definitely be backwards with 
respect to the other. But top-level regions in an assembly would 
probably all be chromosomes or all be contigs, rather than a mixture.

There is not one single "reference frame" for an assembly: rather there 
is one coordinate axis for *each* top-level region.  If those top-level 
regions are chromosomes, then there is no relationship between the 
coordinates on different ones.  If those top-level regions are contigs 
or ESTs (which I believe is allowed by the spec), then positions on one 
of them can be related to positions on others through various transforms.


 > This is a very simple definition.  As far as I can tell it does not
 > capture the information for, say, skipping.
 >
 > How would you represent "the coordinate space for a gene [that skips]
 > over one or more gapes with respect to the genomic sequence" using the
 > current DAS/2 object model?
 >
 > Or goes backwards?  I don't see anything like that.

You represent gaps with <FEATURE> tag parent-child relationships, and 
going backwards by specifying "+1" strand on one contig and "-1" strand 
on the other.

The spec does not requires a DAS/2 server to know how to perform 
transformations from one coordinate system to another, but your 
statement "there really is only one reference frame for the entire 
genome sequence" is wrong as I understand it.  There is one coordinate 
axis for *each* top-level region.


From ed_erwin at affymetrix.com  Tue Nov 29 14:36:13 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 29 Nov 2005 11:36:13 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
Message-ID: <438CADAD.8060403@affymetrix.com>


Andrew Dalke wrote:
> The front of the DAS doc starts
> 
>   DAS 2.0 is designed to address the shortcomings of DAS 1.0, including:
> 
> That kinda assumes people know what DAS 1.0 is to understand DAS 2.0.
> 
> How about this instead, as an overview/introduction.
> 
>  ======
> 
> DAS/2 describes a data model for genome annotations.  

In general I like this better than the original introduction.  Thanks 
for writing it.

But I agree with Andreas that the first line is better as:

 > DAS/2  is a protocol to share biological data.

I definitely think of DAS as a protocol first, rather than a data model 
first.


From ed_erwin at affymetrix.com  Tue Nov 29 15:16:11 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 29 Nov 2005 12:16:11 -0800
Subject: [DAS2] mtg topics for Nov 28
In-Reply-To: <b4c91693e3e2df1fc5b124ca4ac4a04a@dalkescientific.com>
References: <b4c91693e3e2df1fc5b124ca4ac4a04a@dalkescientific.com>
Message-ID: <438CB70B.4030005@affymetrix.com>


Andrew Dalke wrote:
> Here are the spec issues I would like to talk about for today's meeting,
> culled from the last few weeks of emails and phone calls
> 
> 1) DAS Status Code in headers
> 
> The current spec says
> 
>>  X-DAS-Status: XXX status code
>>
>> The list of status codes is similar, but not identical, to those used  
>> by DAS/1:
>>
>> 200 OK, data follows
>> 400 Bad namespace
>> 401 Bad data source
>> 402 Bad data format
>> 403 Unknown object ID
>> 404 Invalid object ID
>> 405 Region coordinate error
>> 406 No lock
>> 407 Access denied
>> 500 Server error
>> 501 Unimplemented feature
> 
> 
> I argued that these are not needed.  Some of them are duplicates with
> HTTP error codes and those which are not can be covered by an error
> code "300" along with an (optional) XML payload.
> 
> The major problem with doing this seems to be in how MS IE handles
> certain error codes.  While IE is not a target browser, MS software
> may use IE as a component for fetching data.  From the link Ed dug
> up, it looks like this won't be a problem.
> 

I'm not going to argue anymore against moving the X-DAS-Status code up 
into the HTTP status code.  I'm willing to try it and see if it works.

But I want to re-iterate why I'm suspicious of this.  I have experience 
trying this in two separate projects and it failed both times.  (Still, 
I think those problems won't occur this time.)

1.  I tried this on a project internally at Affymetrix.  It didn't work 
in this case because the client code was (indirectly) using MS IE code, 
and IE was throwing away the HTTP content when the header had certain 
error codes.  This doesn't bother me much now, though, because I doubt 
many DAS clients will be written that interface with IE, and because I 
now know that you can force IE to keep the HTTP content as long as you 
make sure the content is always at least 512 characters long.  So if we 
ever run into this problem, there is an easy work-around.

2. I tried putting the X-DAS-Status codes into the HTTP status code in 
our internal DAS/1 server about a year ago.  (In DAS/1 they are not 
supposed to be in the HTTP status codes, but I misunderstood the spec.) 
  I ran into problems when I tried that, and that is the main reason I 
objected to trying that in DAS/2.

Unfortunately, I can't remember what those problems were....

The problem might have been:
a) the IGB client didn't understand the status codes because they 
weren't in the expected place.

If this is the case, then the problem was benign, because we are now 
writing new code to support the new spec, so we can make IGB understand 
whatever we want.

b) I use Apache's ".htaccess" files to do some URL re-direction on our 
DAS/1 client machine.

see http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html#RewriteRule

It is possible that this was causing the original HTTP status code to be 
replaced with a different one.

I'm currently using the "proxy" form of redirect, which seems to keep 
the status code intact.  Earlier I was using the "redirect" form of 
redirect, which may change the status code to 302.

-----

Based on my experience with apache re-direction, I have a vague fear 
that we may run into cases where firewalls, or html cachers and 
optimizers may mangle the HTTP status codes for some users at some 
point.  But since I have no confirmed evidence that that will happen, I 
have no objection to  going ahead and trying to use HTTP status codes.


From Steve_Chervitz at affymetrix.com  Tue Nov 29 15:33:29 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Tue, 29 Nov 2005 12:33:29 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <438CADAD.8060403@affymetrix.com>
Message-ID: <BFB1FB19.18803%Steve_Chervitz@affymetrix.com>


Ed Erwin wrote: 
> Andrew Dalke wrote:
>> The front of the DAS doc starts
>> 
>>   DAS 2.0 is designed to address the shortcomings of DAS 1.0, including:
>> 
>> That kinda assumes people know what DAS 1.0 is to understand DAS 2.0.
>> 
>> How about this instead, as an overview/introduction.
>> 
>>  ======
>> 
>> DAS/2 describes a data model for genome annotations.
> 
> In general I like this better than the original introduction.  Thanks
> for writing it.
> 
> But I agree with Andreas that the first line is better as:
> 
>> DAS/2  is a protocol to share biological data.
> 
> I definitely think of DAS as a protocol first, rather than a data model
> first.

I concur. The main aim of DAS is to define an API to allow clients to query
servers in order to retrieve bioinformatics data objects in defined response
formats. Of course, the writeback facility of DAS/2 will make DAS more of a
two-way street so we could say 'sharing and editing', but I think retrieval
is more fundamental and probably accounts for the majority of uses.

How about this for the first line:

  DAS is a protocol for sharing biological data.

No need to limit it to version 2. This applies to all versions. Use 'DAS/2'
when talking about new features in this version, such as writeback.

Steve


From dalke at dalkescientific.com  Tue Nov 29 17:17:02 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 29 Nov 2005 23:17:02 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <BFB1FB19.18803%Steve_Chervitz@affymetrix.com>
References: <BFB1FB19.18803%Steve_Chervitz@affymetrix.com>
Message-ID: <b6fdd2ebaffe5c14067451853ec5553c@dalkescientific.com>

Steve:
> How about this for the first line:
>
>   DAS is a protocol for sharing biological data.
>
> No need to limit it to version 2. This applies to all versions. Use 
> 'DAS/2'
> when talking about new features in this version, such as writeback.

Done.  Made a few changes to the CVS intro text to reduce the use
of "DAS/2".  So that email I just sent is out of date.  :)

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Nov 29 19:02:07 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 30 Nov 2005 01:02:07 +0100
Subject: What are regions for? (was Re: [DAS2] DAS intro)
In-Reply-To: <438CAC61.1090104@affymetrix.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>	<438B8013.3060107@affymetrix.com>
	<e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>
	<438CAC61.1090104@affymetrix.com>
Message-ID: <921477a6bd799b5e19b965b3cd39d239@dalkescientific.com>

Ed:
> I understand this as talking about coordinates in general, not the 
> <region> elements or "pos" attributes in the spec.  Suzi specifically 
> mentions chromosomes and contigs; one can definitely be backwards with 
> respect to the other. But top-level regions in an assembly would 
> probably all be chromosomes or all be contigs, rather than a mixture.

I'm trying to figure out when people use the /region.

In my way of understanding things there is the genomic sequence.
That consists of a set of chromosomes, each with a list of bases.

A chromosome is assembled from parts.  One of these parts is
called a 'contig'.  I thought I knew what it was, but according to
   http://staden.sourceforge.net/contig.html
there are several meanings.

What I understand is that a 'contig' is a sequenced chunk of
DNA which has overlaps with other contigs and when combined
can be used to deduce the entire sequence (excepting regions
of repeats and other ambiguities).  The best such deduction
is the golden path.

For DAS/2 we assume sequenced genomes.  When will people
use top-level regions which are not chromosomes?

Chromosome top-level regions are identical to the /sequence,
except for the ability to get the assembly and the sequence
data directly.  Is that correct?

The spec allows links from a feature into several different
regions.  This suggests to me that sometimes there will be
regions which are a mixture of contigs and chromosomes.
Else why support that ability?

There is nothing in the spec (that I know of) which allows any
hierarchy to the regions - all regions are top-level.  Is
this correct?

> If those top-level regions are chromosomes, then there is no 
> relationship between the coordinates on different ones.

While I understand that, I did get it wrong when I wrote it down.

In my head I was thinking "each base has a 1-to-1 mapping to a
number, and if two bases are next to each other then the corresponding
two numbers are next to each other."  This is invalid because the
converse is not true - if one number is the end of a chromosome and
the other is the start of the next then the two bases are not next
to each other.


>   If those top-level regions are contigs or ESTs (which I believe is 
> allowed by the spec), then positions on one of them can be related to 
> positions on others through various transforms.

Those are allowed.  Will people use them?  What advantage is there
to having these be a special category instead of a feature?

> You represent gaps with <FEATURE> tag parent-child relationships, and 
> going backwards by specifying "+1" strand on one contig and "-1" 
> strand on the other.

Something like this?  (Yes, this is hand-wavy)  Here's a <FEATURE>
(and note, this is NOT a <REGION>) with two subfeatures, one on the
forward strand and one on the reverse.

   <feature id="A">
     <part id="A.1"/>
     <part id="A.2"/>
   </feature>

   <feature id="A.1">
     <parent id="A" />
     <LOC pos="region/Chr3/1271:2917:1" />
   </feature>

   <feature id="A.2">
     <parent id="A" />
     <LOC pos="region/Chr3/5541:5523:-1" />
   </feature>


This I understand just fine.  I don't understand why the
positions are given in /region spec instead of either:

   - directly to /sequence space, eg

   <feature id="A.2">
     <parent id="A" />
     <LOC seq="sequence/Chr3/5541:5523:-1" />
   </feature>
     ...

-or-


   - point to a feature of type 'region' which provides the
        region coordinates

   <feature id="A.2">
     <parent id="A" />
     <LOC on="feature/contig1" />
   </feature>
      ...
   <feature id="contig1" type="region">
     <LOC seq="Chr3/5541:5523:-1" />
   </feature>

(Again, hand-wavy.  I think best looking at data and code.)

> The spec does not requires a DAS/2 server to know how to perform 
> transformations from one coordinate system to another, but your 
> statement "there really is only one reference frame for the entire 
> genome sequence" is wrong as I understand it.  There is one coordinate 
> axis for *each* top-level region.

Understood.

My questions, to summarize, are:
   - why do we need a /region space when we can
       1. point directly to a sequence (for chromosome regions) and/or
       2. point to a "contig" or "assembly" or "region" feature type
               (for other regions)

   - When would someone have regions which have more than one of
      contigs, ESTs and chromosomes?  Especially given that this
      is the genome spec, so chromosome-level info is known, at
      least enough for a rough assembly.

In other words, what are regions for?

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Nov 29 19:26:41 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 30 Nov 2005 01:26:41 +0100
Subject: [DAS2] mtg topics for Nov 28
In-Reply-To: <438CB70B.4030005@affymetrix.com>
References: <b4c91693e3e2df1fc5b124ca4ac4a04a@dalkescientific.com>
	<438CB70B.4030005@affymetrix.com>
Message-ID: <45f7dbc8e14fa2a68af6c1d03153d715@dalkescientific.com>

Ed:
> I'm not going to argue anymore against moving the X-DAS-Status code up 
> into the HTTP status code.  I'm willing to try it and see if it works.
>
> But I want to re-iterate why I'm suspicious of this.  I have 
> experience trying this in two separate projects and it failed both 
> times.  (Still, I think those problems won't occur this time.)
>
> 1.  I tried this on a project internally at Affymetrix.  It didn't 
> work in this case because the client code was (indirectly) using MS IE 
> code, and IE was throwing away the HTTP content when the header had 
> certain error codes.

This was a two-part problem:
   - identifying in client code that a given error occured
   - extracting the payload when the error occurred

As far as I can tell, the problem you are concerned about is
the second part.

Personally I don't want an application/x-das-error+xml return
document.  Several others do.  Thing is, when Gregg asked
if anyone used the DAS/1 error codes for anything other than
"there was an error", no one said anything.  I could hear the
proverbial crickets chirping (or in my case, snow falling).

I am convinced that the actual error content will be server
implementation specific and as such non-portable across
clients.  I will flesh out a document type for this then
ask Thomas, Lincoln etc. to provide a list of defined
error code extensions that their servers will return.

It's likely they'll not be able to agree on it, because
their code will do different styles of error checking.

I'll also dodge the whole mess by saying that the error
document payload is optional, so clients are highly unlikely
to read it for anything meaningful.  (Except perhaps some
text shunted to the user.)

That makes more work in the spec implementation for something
I can almost guarantee will be ignored by DAS clients.

> b) I use Apache's ".htaccess" files to do some URL re-direction on our 
> DAS/1 client machine.
>
> see http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html#RewriteRule
>
> It is possible that this was causing the original HTTP status code to 
> be replaced with a different one.
>
> I'm currently using the "proxy" form of redirect, which seems to keep 
> the status code intact.  Earlier I was using the "redirect" form of 
> redirect, which may change the status code to 302.

I don't understand how the old one would be a problem in the
web clients I'm familiar with.  It should be:

   send request to server
       get 302 "moved temporarily" response along with new URL
   repeat until no redirect or reached max redirect limit
   request new URL
       get headers/payload back

The redirects shouldn't affect the real response code, which
would be the last in the chain.  If it did, it would also
affect 404 and 200 responses.

> Based on my experience with apache re-direction, I have a vague fear 
> that we may run into cases where firewalls, or html cachers and 
> optimizers may mangle the HTTP status codes for some users at some 
> point.  But since I have no confirmed evidence that that will happen, 
> I have no objection to  going ahead and trying to use HTTP status 
> codes.

I know that fear.  I've had intermediate web caches misconfigured
which cached anything HTML page for an hour, making me unable
to edit my web site and see the changes.

That was with a normal 200 response code, so likely misconfigured
caches will affect other response codes.  But what's there to
do about that?  What's the error rate?  We're using normal HTTP
and if a web cache breaks for us - we aren't doing anything
fancy; no content-negotiation, no 'If-Modified-Since', etc - then
it will break for anyone doing HTTP.  That's anyone exchanging
HTML, sending RSS, etc.


					Andrew
					dalke at dalkescientific.com


From ed_erwin at affymetrix.com  Tue Nov 29 19:34:11 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 29 Nov 2005 16:34:11 -0800
Subject: [DAS2] mtg topics for Nov 28
In-Reply-To: <45f7dbc8e14fa2a68af6c1d03153d715@dalkescientific.com>
References: <b4c91693e3e2df1fc5b124ca4ac4a04a@dalkescientific.com>
	<438CB70B.4030005@affymetrix.com>
	<45f7dbc8e14fa2a68af6c1d03153d715@dalkescientific.com>
Message-ID: <438CF383.5050604@affymetrix.com>


>> I'm currently using the "proxy" form of redirect, which seems to keep 
>> the status code intact.  Earlier I was using the "redirect" form of 
>> redirect, which may change the status code to 302.
> 
> 
> I don't understand how the old one would be a problem in the
> web clients I'm familiar with.  It should be:
> 
>   send request to server
>       get 302 "moved temporarily" response along with new URL
>   repeat until no redirect or reached max redirect limit
>   request new URL
>       get headers/payload back

Unlike modern web browsers, IGB isn't smart enough to do that.  Maybe 
someday it will need to be, but it isn't there yet.


From dalke at dalkescientific.com  Tue Nov 29 17:13:49 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 29 Nov 2005 23:13:49 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <438CADAD.8060403@affymetrix.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<438CADAD.8060403@affymetrix.com>
Message-ID: <24b1a9183d9f344398f80839f4c71b6e@dalkescientific.com>

Ed:
> I definitely think of DAS as a protocol first, rather than a data 
> model first.

Mmm.  I see you all's point.  All protocols express a data model, though
neither side necessarily must implement it that way.

Here's the updated text.  This is what I just committed to CVS.  Note
that it's missing mention of the '/region' section.

=====

DAS/2 is a protocol for sharing biological data.  This version of the
specification describes features located on the genomic sequence.
Future extensions will add support for sharing annotations of expression
data, protein sequences, 3D structures, and ontologies.

A DAS/2 annotation server provides feature information about one or
more genome sources.  Each source may have one or more versions.
Different versions are usually based on different assemblies.  As an
implementation detail an assembly and corresponding sequence data may
be distributed via a different machine, which is called the reference
server.

Annotations are located on the genomic sequence with a start and end
position.  The range may be specified mutiple times if there are
alternate reference frames.  An annotation may contain multiple
non-continguous parts, making it the parent of those parts.  Some
parts may have more than one parent.  Annotations have a type based on
terms in SOFA (Sequence Ontology for Feature Annotation).  Stylesheets
contain a set of properties used to depict a given type.

Annotations can be searched by range, type, and a properties table
associated with each annotation.  These are called feature filters.

DAS/2 is implemented using a ReST architecture.  Each entity (also
called a document or object) has a name, which is a URL.  Fetching the
URL gets information about the entity.  The DAS-specific entities are
all XML documents.  Other entities contain data types with an existing
and frequently used file format.  Where possible, a DAS server returns
data using existing formats.  In some cases a server may describe how
to fetch a given entity in several different formats.

=====

					Andrew
					dalke at dalkescientific.com


From ed_erwin at affymetrix.com  Tue Nov 29 19:37:07 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 29 Nov 2005 16:37:07 -0800
Subject: What are regions for? (was Re: [DAS2] DAS intro)
In-Reply-To: <921477a6bd799b5e19b965b3cd39d239@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>	<438B8013.3060107@affymetrix.com>	<e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>	<438CAC61.1090104@affymetrix.com>
	<921477a6bd799b5e19b965b3cd39d239@dalkescientific.com>
Message-ID: <438CF433.1020707@affymetrix.com>

Andrew Dalke wrote:
> My questions, to summarize, are:
>   - why do we need a /region space when we can
>       1. point directly to a sequence (for chromosome regions) and/or
>       2. point to a "contig" or "assembly" or "region" feature type
>               (for other regions)

The way I understand it, that is what region is for: to point directly 
to a location on a sequence and/or contig.

>   - When would someone have regions which have more than one of
>      contigs, ESTs and chromosomes?  Especially given that this
>      is the genome spec, so chromosome-level info is known, at
>      least enough for a rough assembly.

I think they do it mainly 1) when the assembly is incomplete or 2) to 
preserve annotations from the past when the assembly was incomplete. 
There could be more reasons.

Here is an example of a DAS/1 server that contains both chromosomes and 
"other" short sequences as entry points:

http://servlet.sanger.ac.uk:8080/das/ensembl_Homo_sapiens_core_28_35a/entry_points

See here for some more genomes that are treated similarly:

http://servlet.sanger.ac.uk:8080/das


> In other words, what are regions for?
> 
>                     Andrew
>                     dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Tue Nov 29 20:26:29 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 30 Nov 2005 02:26:29 +0100
Subject: What is /region for? (was Re: [DAS2] DAS intro)
In-Reply-To: <438CF433.1020707@affymetrix.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>	<438B8013.3060107@affymetrix.com>	<e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>	<438CAC61.1090104@affymetrix.com>
	<921477a6bd799b5e19b965b3cd39d239@dalkescientific.com>
	<438CF433.1020707@affymetrix.com>
Message-ID: <6fd85d539c25833e9b6f7f41b3429231@dalkescientific.com>

(Changed the Subject line slightly to be a bit clearer. I hope.)

On Nov 30, 2005, at 1:37 AM, Ed Erwin wrote:
> Andrew Dalke wrote:
>> My questions, to summarize, are:
>>   - why do we need a /region space when we can
>>       1. point directly to a sequence (for chromosome regions) and/or
>>       2. point to a "contig" or "assembly" or "region" feature type
>>               (for other regions)
>
> The way I understand it, that is what region is for: to point directly 
> to a location on a sequence and/or contig.

Am I not asking the question correctly?  Am I missing the
obvious?  Been known to happen before!

I know what regions are.  I don't know why they are in
a distinct /region subtree.

I'm happy - enthusiastic - ecstatic - that there are different
ways to identify certain regions.  I fully accept that they
are in use every day and widely understood.

Why are they special enough to get their own /region subtree?
Why can't they be features?

Here's my proposal.  Leaf node parts of a <feature> always point
to a /sequence and optionally point to one or more /feature
elements which are of type "region".  (Or some other part of
SOFA - perhaps assembly-component?)

What to know where the feature is on a given "region" feature?
Then look up the region to find its /sequence location.  Use
these two /sequence locations to get the location in the region.
Both /sequence locations are in the same "coordinate space" of
"identifier + start/end offset"

BTW, if regions are a type of features then you can search for
them.  Eg, search for all top-level regions in the range 100000
to 2000000.  Can't do that with the /region container.  Can
if the region data is in the /feature container.


>>   - When would someone have regions which have more than one of
>>      contigs, ESTs and chromosomes?  Especially given that this
>>      is the genome spec, so chromosome-level info is known, at
>>      least enough for a rough assembly.
>
> I think they do it mainly 1) when the assembly is incomplete or 2) to 
> preserve annotations from the past when the assembly was incomplete. 
> There could be more reasons.
>
> Here is an example of a DAS/1 server that contains both chromosomes 
> and "other" short sequences as entry points:

Okay, I'm fine with that.  Thanks.

Is a goal of DAS to support incomplete genomes?

Note, btw, that the /sequence subtree does not need to contain
only chromosomes.  From the spec

   seqid is the sequence ID, and can correspond to an assembled
   chromosome, a contig, a clone, or any other accessionable
   chunk of sequence.

Hence for incomplete genomes, put the sequence data as
best you can under /sequence and have the /feature subtree
point to it.

>> In other words, what are regions for?

Still don't understand the need for a /region namespace.
Repeat: I understand regions, I just don't see why they
go in their own subtree and aren't part of some other data chunk.

Please, someone sketch out some example with hand-waving
XML that shows how having a /region is the appropriate solution.
That's what I'm worried about now - the representation in XML.

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Tue Nov 29 21:08:47 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Tue, 29 Nov 2005 18:08:47 -0800
Subject: [DAS2] mtg topics for Nov 28
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE8@msex02.affymetrix.com>

Actually I think by default the java networking library that IGB uses
follows most redirections automatically without IGB having to worry
about it.  I'm not familiar with what different forms of redirection
might do to the status codes, but I expect that as long as the
redirection is successful the code IGB would actually see would be 200
OK.

IGB does have a ways to go to properly respond to all possible HTTP
status codes though...

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Ed Erwin
> Sent: Tuesday, November 29, 2005 4:34 PM
> To: Andrew Dalke
> Cc: DAS/2
> Subject: Re: [DAS2] mtg topics for Nov 28
> 
> 
> >> I'm currently using the "proxy" form of redirect, which seems to
keep
> >> the status code intact.  Earlier I was using the "redirect" form of
> >> redirect, which may change the status code to 302.
> >
> >
> > I don't understand how the old one would be a problem in the
> > web clients I'm familiar with.  It should be:
> >
> >   send request to server
> >       get 302 "moved temporarily" response along with new URL
> >   repeat until no redirect or reached max redirect limit
> >   request new URL
> >       get headers/payload back
> 
> Unlike modern web browsers, IGB isn't smart enough to do that.  Maybe
> someday it will need to be, but it isn't there yet.
> 
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Tue Nov 29 21:17:24 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Tue, 29 Nov 2005 18:17:24 -0800
Subject: [DAS2] mtg topics for Nov 28
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE9@msex02.affymetrix.com>


> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Ed Erwin
> Sent: Tuesday, November 29, 2005 12:16 PM
> To: Andrew Dalke
> Cc: DAS/2
> Subject: Re: [DAS2] mtg topics for Nov 28
...
> 2. I tried putting the X-DAS-Status codes into the HTTP status code in
> our internal DAS/1 server about a year ago.  (In DAS/1 they are not
> supposed to be in the HTTP status codes, but I misunderstood the
spec.)
>   I ran into problems when I tried that, and that is the main reason I
> objected to trying that in DAS/2.
> 
> Unfortunately, I can't remember what those problems were....
> 
> The problem might have been:
> a) the IGB client didn't understand the status codes because they
> weren't in the expected place.
> 
> If this is the case, then the problem was benign, because we are now
> writing new code to support the new spec, so we can make IGB
understand
> whatever we want.

I'm pretty sure this was the problem (IGB didn't know where to find the
status codes).

	gregg


From Steve_Chervitz at affymetrix.com  Fri Nov  4 00:24:53 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Thu, 3 Nov 2005 16:24:53 -0800
Subject: [DAS2] DAS/2 weekly meeting notes
Message-ID: <BF8FEA55.17720%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 3 Nov 2005.

$Id: das2-teleconf-2005-11-03.txt,v 1.2 2005/11/04 00:23:27 sac Exp $

Attendees: 
  Affy: Steve Chervitz, Ed Erwin, Gregg Helt
  UCLA: Brian O'connor, Mark Carlson
        
These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

Status Reports
--------------

Gregg:

* A lot happened last week:
  - Major IGB public release (4.02) last Friday (10/28)
  - Attended and presented IGB demo at CSHL Genome Informatics meeting
    on Sunday (10/30)
  - Finished and submitted DAS/2 continuation grant on Tue (11/1).

* Held a DAS/2 BOF (birds of a feather meeting) at CSHL. Good
  discussion and turnout (15). Collected feedback from EBI/Sanger
  folks. Asked people to download the client (IGB) and hit the servers
  (Affy, UCLA), so be looking for more traffic soon.

* TODO: Monitor DAS/2 traffic, collect usage stats for both servers:
    http://netaffxdas.affymetrix.com
    http://biopackages.net.
  Especially check for performance degradation under load.
  Need to parse apache and server logs for things like: # users,
  typical query times, etc.

* IGB demo went well. People were impressed with speed. Requests for
  Gregg's in-memory java DAS/2 server, but code is not yet ready for
  public consumption.

Ed:

* Reviewing various technologies of possible interest:
  - HTTP communication protocol, necessary commands.
  - Using a bean-based property editor for IGB

* Spent time answering user questions on IGB forum (only 1 person posted
  trouble with installing data for use with new IGB release -- not
  bad). Gregg adds: Also no negative feedback from internal release.

Steve:

* Spec work: Posted message about types and features issues in the
  retrieval spec last Thurs (10/26). Mentioned Lincoln's response
  (doing away with xml:base and going with his namespace scheme).
  Gregg talked with Lincoln about this at CSHL and clarified that
  xml:base is for resolving relative URLs in attributes or CDATA
  elements, whereas xmlns is for resolving names of attributes and
  elements. Steve will post response to DAS/2 discussion list about
  this. 

* Tested the IGB release on OS X last week prior to release. Noted the
  display bug that Gregg knows about (disappearing view when you
  select a new DAS/2 annotation source). Found trouble with quick load
  synonym on the Affy internal server synonym. Ed fixed.

* Installed new assembly (Human Nov 2002) available via quickload and
  DAS/2. Gregg says: Use DAS/1 for new genomes at this stage.

* DAS/2 discussion list troubleshooting. Problem with open-bio
  sendmail, DNS.

Brian, Mark:

* Using the DAS/2 layer from the IGB code base and extending it for
  their assay and ontology namespaces. Want to put this new code in
  separate packages to avoid stepping on other IGB functionality.
  DAS/2 layer is currently in com.affymetrix.igb.das2.

 Options
 1. Add subpackages to com.affymetrix.igb.das2.
 2. Move das2 out from under igb to com.affymetrix.das2.
 3. Move das2 out of com.affymetrix to be totally separate. Then
    com.affymetrix.igb.das2 and the assay/ontology code would depend
    on it. 

 Brian is fine with #2. Gregg will check and remove any dependencies
 with the das2 package on IGB code.

* Plan to release their code internally in December. Code is in their
  own CVS repository now. Genoviz/IGB code has not been committed to
  SF yet.

---------------------------
TODO

* Summarize CSHL genome informatics meeting happenings relevant to
  DAS/2 when others who were there are dialed in.

* Move teleconf meeting to a more UK-friendly time. US is now on
  standard time. 9am PST = 12pm EST = 17:00 GMT.
  How does this work for folks?


From Steve_Chervitz at affymetrix.com  Fri Nov  4 20:32:22 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Fri, 04 Nov 2005 12:32:22 -0800
Subject: [DAS2] Spec issues
In-Reply-To: <200510270941.30528.lstein@cshl.edu>
Message-ID: <BF910556.177BE%Steve_Chervitz@affymetrix.com>

As Gregg noted in this week's DAS/2 meeting, xml:base and
XML namespace (xmlns) are complementary technologies:
 
  * xml:base is for resolving relative URLs occurring within attribute
     values or CDATA elements
  * xmlns is for resolving names of attributes and elements.

So bearing this in mind, here's my take:

On Thursday 27 October 2005, Lincoln Stein wrote:
>
> On Wednesday 26 October 2005 07:29 pm, Chervitz, Steve wrote:
> >
> > <snip>
> >
> > Next issue: Feature properties example (only showing relevant attributes):
> >
> > Description: Properties are typed using the ptype attribute. The value of
> > the property may be indicated by a URL given by the href attribute, or may
> > be given inline as the CDATA content of the <PROP> section.
> >
> > <FEATURES xml:base="http://www.wormbase.org/das/genome/volvox/1/">
> >   <FEATURE id="feature/cTel54X.1.2"
> >                    type="type/curated_exon">
> >     <PROP ptype="property/genefinder-score">29</PROP>
> >     <PROP ptype="das:phase">2</PROP>
> >     <PROP ptype="property/protein_translation"
> >                href="/das/protein/volvox/2/feature/CTEL54X.1" />
> >   </FEATURE>
> > </FEATURES>
> >
> > So in contrast to the TYPE properties which are restricted to being simple
> > string-based key:value pairs, FEATURE properties can be more complex, which
> > seems reasonable, given the wild world of features. We might consider using
> > 'key' rather than 'ptype' for FEATURE properties, for consistency with TYPE
> > prop elements (however, read on).
> 
> I'm not so happy with "key" since it is nondescript. Originally this was
> "type" but the word collided with feature type.
> 
> I am getting uncomfortable with the dichotomy we've (I've?) created between
> XML base keys/properties and namespace-based keys/properties. It seems nasty
> to have the ptype attribute be either a relative URI
> (property/genefinder-score), or a controlled vocabulary member (das:phase).
> Is there any reason we shouldn't choose one or the other?
> 
> For example, does this work?
> 
>  <FEATURES xmlns:das="http://www.biodas.org/ns/das/genome/2.00"
>         xmlns:dasprop="http://www.biodas.org/ns/das/genome/2.00/properties"
>         xmlns:type="http://www.wormbase.org/das/genome/volvox/1/type"
>         xmlns:id="http://www.wormbase.org/das/genome/volvox/1/feature">
>         xmlns:prop="http://www.wormbase.org/das/genome/volvox/1/property">
>       <FEATURE das:id="id:cTel54X.1.2"
>                   das:type="type:curated_exon">
>              <PROP das:ptype="prop:genefinder-score">29</PROP>
>              <PROP das:ptype="dasprop:phase">2</PROP>
>              <PROP das:ptype="dasprop:protein_translation"
>  das:href="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1" />
>       </FEATURE>
> 
> This looks so much cleaner to me.

Here's a new version of this example using xml:base, a default xmlns,
and a special attribute to define the URL for the controlled
vocabulary of DAS property keys. I'm also using xlink for the href:

  <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00"
            xmlns:das="http://www.biodas.org/ns/das/genome/2.00"
            xml:base="http://www.wormbase.org/das/genome/volvox/1/"
            xmlns:xlink="http://www.w3.org/1999/xlink"
            das:prop="http://www.biodas.org/ns/das/genome/2.00/properties"
            >
    <FEATURE das:id="feature/cTel54X.1.2"
             das:type="type/curated_exon">
      <PROP das:ptype="property/genefinder-score">29</PROP>
      <PROP das:ptype="das:prop#phase">2</PROP>
      <PROP das:ptype="das:prop#protein_translation"
            xlink:type="simple"
  xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1
/>
    </FEATURE>

According to the W3C XML namespace spec, the default namespace only
applies to elements, which is why there is a separate 'xmlns:das'
pointing to the same URL as the default namespace. This permits
assigning a namespace to the attributes.

The above example avoids using xmlns in a non-standard way (i.e.,
referring to a namespace within attribute values, as in Lincoln's
example). The interpretation is as follows:

 * the 'das:prop' namespace defines the controlled vocabulary for
   property types occurring in this document

 * 'das:id', 'das:type', and 'das:ptype' attribute keys are defined
   within the xmlns:das namespace (i.e., the full id of 'type' is
   derived by appending '#type' to the xmlns:das URL).

 * the values of the 'das:id', 'das:type', and 'das:ptype' attributes
   are URLs relative to xml:base unless they begin with 'das:prop#', in
   which case they are relative to the das:prop namespace.

So, for example, the 'das:ptype#phase' attribute value is really
shorthand for this absolute, globally unique URL (which, if it
existed, could provide metadata about the property type):
http://www.biodas.org/ns/das/genome/2.00/properties#phase

The value of the property for this feature is given by the CDATA (29),
but could also be specified via an xlink:href attribute, as in the
protein_translation property above (which must be resolved to get the
actual value).

What do folks think about this scheme? We could do a similar thing
with type properties.

Also, how do folks feel about using xlink for all of our href
attributes as shown above? Seems more correct to me. We refer to the
xlink namespace already in our XML examples, but don't actually use it
anywhere.

Steve


From suzi at fruitfly.org  Sat Nov  5 22:49:52 2005
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Sat, 5 Nov 2005 17:49:52 -0500
Subject: [DAS2] DAS/2 weekly meeting notes
In-Reply-To: <BF8FEA55.17720%Steve_Chervitz@affymetrix.com>
References: <BF8FEA55.17720%Steve_Chervitz@affymetrix.com>
Message-ID: <01981b37459444007897faab16d589aa@fruitfly.org>

9 PST works for me.

Would it be good to add an introduction to Apollo to introduce IGB 
folks to it onto the agenda?

-S

On Nov 3, 2005, at 7:24 PM, Chervitz, Steve wrote:

> Ed:
>
> ---------------------------
> TODO
>
> * Summarize CSHL genome informatics meeting happenings relevant to
>   DAS/2 when others who were there are dialed in.
>
> * Move teleconf meeting to a more UK-friendly time. US is now on
>   standard time. 9am PST = 12pm EST = 17:00 GMT.
>   How does this work for folks?
>
>
>
>
>
>
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Mon Nov  7 22:01:50 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 7 Nov 2005 23:01:50 +0100
Subject: [DAS2] DAS/2 weekly meeting notes
In-Reply-To: <BF8FEA55.17720%Steve_Chervitz@affymetrix.com>
References: <BF8FEA55.17720%Steve_Chervitz@affymetrix.com>
Message-ID: <ea91e736d3ca2c12c0d22e11c7c01607@dalkescientific.com>

Hi all,

On Nov 4, 2005, at 1:24 AM, Chervitz, Steve wrote:
> * Move teleconf meeting to a more UK-friendly time. US is now on
>   standard time. 9am PST = 12pm EST = 17:00 GMT.
>   How does this work for folks?

But still on a Thursday?  That's 18:00 here and I have
Swedish class on Tuesdays and Thursdays from 17:30-20:00.
Then again I'm not finding it that useful.

Wednesdays would work better for me at that time of day,
or better yet would be Monday.

					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Mon Nov  7 22:41:26 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Mon, 07 Nov 2005 14:41:26 -0800
Subject: [DAS2] DAS/2 weekly meeting notes
In-Reply-To: <ea91e736d3ca2c12c0d22e11c7c01607@dalkescientific.com>
Message-ID: <BF951816.179D5%Steve_Chervitz@affymetrix.com>

Wednesday mornings PST conflict with a science seminar here.
Monday mornings would be possible, but Thursday AMs would be preferable.

Regarding Swedish, try these out:

How to learn Swedish in 1000 difficult lessons
http://www.francisstrand.blogspot.com/
Or
Swedish - a brief presentation
http://web.hhs.se/isa/swedish/

Steve

> From: Andrew Dalke <dalke at dalkescientific.com>
> Date: Mon, 7 Nov 2005 23:01:50 +0100
> To: DAS/2 <das2 at portal.open-bio.org>
> Subject: Re: [DAS2] DAS/2 weekly meeting notes
> 
> Hi all,
> 
> On Nov 4, 2005, at 1:24 AM, Chervitz, Steve wrote:
>> * Move teleconf meeting to a more UK-friendly time. US is now on
>>   standard time. 9am PST = 12pm EST = 17:00 GMT.
>>   How does this work for folks?
> 
> But still on a Thursday?  That's 18:00 here and I have
> Swedish class on Tuesdays and Thursdays from 17:30-20:00.
> Then again I'm not finding it that useful.
> 
> Wednesdays would work better for me at that time of day,
> or better yet would be Monday.
> 
> Andrew
> dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Steve_Chervitz at affymetrix.com  Tue Nov  8 23:12:25 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Tue, 08 Nov 2005 15:12:25 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
 responses!
In-Reply-To: <Pine.LNX.4.58.0510290848430.30816@sumo.ctrl.ucla.edu>
Message-ID: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>

I just updated the spec accordingly. Be sure to re-load your browser:

http://biodas.org/documents/das2/das2_get.html

Steve


> From: Allen Day <allenday at ucla.edu>
> Date: Sat, 29 Oct 2005 08:54:59 -0700 (PDT)
> To: "Helt,Gregg" <Gregg_Helt at affymetrix.com>
> Cc: Steve Chervitz <steve_chervitz at affymetrix.com>
> Subject: Re: New problem with content-type header in DAS/2 server responses!
> 
> Looks like the cache server.  FYI, I have updated the server to use all
> "text/xml" Content-Type for all xml response types.  This was approved by
> Lincoln so that web browsers could be pointed at the das server and "just
> work".  I thought these changes had already made their way into the spec,
> but apparently not.
> 
> The table below summarizes what the server should be giving back.  The
> left column shows the command and format request, and the right side shows
> the response Content-Type.
> 
>                     'das/das2xml'             => 'text/xml',
>                     'domain/das2xml'          => 'text/xml',
>                     'domain/compact'          => 'text/plain',
>                     'feature/das2xml'         => 'text/xml',
>                     'feature/chain'           => 'text/plain', #LOOK
>                     'property/das2xml'        => 'text/xml',
>                     'region/das2xml'          => 'text/xml',
>                     'region/compact'          => 'text/plain',
>                     'sequence/das2xml'        => 'text/plain', #LOOK
>                     'sequence/fasta'          => 'text/plain',
>                     'source/das2xml'          => 'text/xml',
>                     'source/compact'          => 'text/plain',
>                     'type/das2xml'            => 'text/xml',
>                     'type/compact'            => 'text/plain',
>                     'type/obo'                => 'text/plain',
>                     'type/rdf'                => 'text/xml',
>                     'versionedsource/das2xml' => 'text/xml',
> 
> As you can see, the text/plain response to the /feature command is NOT
> being given by the server, but somehow being mangled by the cache.  Is
> this going to severly impact your demo?  If so I can disable the cache
> module.  It will be slow though.  An alternative to the cache would be to
> use our squid proxy.  Brian can probably set you up to use it very
> quickly.
> 
> Let me know what needs to be done ASAP.
> 
> -Allen
> 
> 
> On Fri, 28 Oct 2005, Helt,Gregg wrote:
> 
>> I just tried accessing the biopackages DAS/2 server from IGB, with this
>> query:
>> 
>> http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26
>> 027736:26068042;type=SO:mRNA
>> 
>> and I'm getting back a message where the XML looks fine but here are the
>> headers:
>> 
>> HTTP/1.1 200 OK
>> Date: Sat, 29 Oct 2005 05:49:46 GMT
>> Server: Apache/2.0.51 (Fedora)
>> X-DAS-Status: 200
>> Warning: 113 Heuristic expiration
>> Content-Type: text/plain; charset=UTF-8
>> Age: 259582
>> Content-Length: 6004
>> Keep-Alive: timeout=15, max=100
>> Connection: Keep-Alive
>> 
>> But according to the spec the content type header needs to be:
>> Content-Type: text/x-das-features+xml
>> I'm using this in the IGB DAS/2 client to parse responses based on the
>> content type.  With "text/plain; charset=UTF-8" IGB doesn't know what
>> parser to use and gives up.  So right now I can't visualize annotations
>> from the biopackages server.  I'm pretty sure the server was setting the
>> content-type header correctly on Wednesday -- did anything change since
>> then that could be causing this?  Could the server-side cache be doing
>> this for some reason?
>> 
>> Thanks,
>> Gregg
>>  
>> 


From dalke at dalkescientific.com  Wed Nov  9 00:27:42 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 9 Nov 2005 01:27:42 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
Message-ID: <d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>

My apologies for not tracking what's been going on in the last few
months.  I'm back now and have time for the next few months to work
on things.

So I'll start with this exchange.  I can't find the discussion in the
mailing list history.

Why the decision to use "text/xml" for all xml responses?  I read it
it is so "web browsers can 'just work'".

What are they supposed to do?  Display the XML as some sort of tree
structure?  Is that the only thing?

One thing Allen and I talked about, and he tested, was the ability to
insert a stylesheet declaration in the XML.  Is this part of the
reason to switch to using "text/xml"?

Is there anything I'm missing?

Since it looks like I'm going to be more in charge of the spec 
development,
I would like to start collecting use cases and recording these sorts of
decisions.

I think having different content-types is an important feature.  For
example, it lets a DAS browser figure out what it's looking at before
doing any parsing.  Here's my use case.

I want someone to send an email to someone else along the lines of
   "What do you think about http://blah.blah/das/genome/blah/blah"
with the URL of the object included in the email.

Paste that into a DAS browser and it should be able to figure out that
this is a sequence, a feature, a whatever.  With the old content-types
there was enough information to do that right away.  With this new
one a DAS browser needs to parse the XML to figure out what's in it.
Autodetection of XML formats?  I don't want to go there.

That's also the reason for Gregg's opposition.


You (Allen) and Lincoln, on the other hand, want that user to be able to
go to a web browser and paste the URL in, to get a basic idea of what's
there.

I think that's also important.

I think there are other solutions.  One is "if the server sees a web
browser then return the XML data streams as a 'text/xml'".

For example:
   if "Mozilla" in headers["User-Agent"]:
      ... this is IE, Mozilla, Firefox, and a few others ..

That catches most of the browsers anyone here cares about.  As
another solution, look at the "Accept" header sent by the browser.
Here's what Firefox sends:

Accept: text/xml,application/xml,application/xhtml+xml,text/html;
    q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'

Here's Safari and "links" (a text browser):

Accept: */*

Another rule them might be

if asking_for_xml_format and "*/*" in headers["Accept"]:
     ... return it as "text/xml" ...

Though a better version is to make sure the client doesn't know about
the expected content type:


if asking_for_xml_format:
    return_content_type = ... whatever is appropriate ...

    if (return_content_type not in headers["Accept"]
        and "*/*" in headers["Accept"]):

           return_content_type = "text/xml"
           .... optionally insert style sheet ....


Another solution is to send a "what kind of DAS object are you?" request
to the URL (eg, tack on a ? query or tell the server that the client 
will
"Accept: application/x-das-autodiscovery").


I think that's clumsy, but I mention it as another way to support
both DAS client app and human browser requests of the same URL.


>> From: Allen Day <allenday at ucla.edu>

>> Looks like the cache server.  FYI, I have updated the server to use 
>> all
>> "text/xml" Content-Type for all xml response types.  This was 
>> approved by
>> Lincoln so that web browsers could be pointed at the das server and 
>> "just
>> work".  I thought these changes had already made their way into the 
>> spec,
>> but apparently not.

>> On Fri, 28 Oct 2005, Helt,Gregg wrote:
>>> But according to the spec the content type header needs to be:
>>> Content-Type: text/x-das-features+xml
>>> I'm using this in the IGB DAS/2 client to parse responses based on 
>>> the
>>> content type.  With "text/plain; charset=UTF-8" IGB doesn't know what
>>> parser to use and gives up.  So right now I can't visualize 
>>> annotations
>>> from the biopackages server.  I'm pretty sure the server was setting 
>>> the
>>> content-type header correctly on Wednesday -- did anything change 
>>> since
>>> then that could be causing this?  Could the server-side cache be 
>>> doing
>>> this for some reason?

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Wed Nov  9 00:49:27 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 9 Nov 2005 01:49:27 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
Message-ID: <7e9e19f6885240c668ac677b6ea98ff0@dalkescientific.com>

P.S.

Gregg mentioned one need for wanting more selective content-types.

Here's another.

I expect most of the XML data we return will change.  We may add
an element field or change the meaning of an element.  When that
happens, how does a client know that a "text/xml" is for one
version or another of a given document type?

I expect that will be done by returning something like

Content-Type: text/das2xml; version=2


This, btw, suggests a third solution to the problem of letting DAS/2
and web browser clients both point to the same object - se

Content-Type: text/xml; das-type=das2xml

But that's ugly.

A 4th is to go back to the "add a das-content-type header" solution
from DAS/1.  I don't want that.


Note, btw, that if a given URL can return different MIME types
for the same request then it needs a "Vary: Accept" in the response
headers so caching works correctly.


					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Wed Nov  9 01:58:07 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Tue, 08 Nov 2005 17:58:07 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2
	server responses!
In-Reply-To: <d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>
Message-ID: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>

Andrew,

Andrew Dalke <dalke at dalkescientific.com> wrote on 8 Nov 2005:
> My apologies for not tracking what's been going on in the last few
> months.  I'm back now and have time for the next few months to work
> on things.

Great to have you back. I have been focusing on the spec for the past
several weeks but would be glad to have you take the lead on it. We've been
making the retrieval spec a priority and should really focus on getting it
nailed down as soon as possible to allow others to start implementing
clients and servers against it and providing feedback. We haven't talked
about a freeze or release date for it, but maybe we should.

I started going through the open bugs in bugzilla, but only resolved one
(#1796). While going through and cleaning up the retrieval spec, I ran into
other issues that were not in bugzilla that seemed important. One was this
content-type issue that you address here.

I raised some other issues regarding types and feature properties etc. a
couple of weeks ago that I'd like you to chime in on:
http://portal.open-bio.org/pipermail/das2/2005-October/000271.html

The latest message on this thread is:
http://portal.open-bio.org/pipermail/das2/2005-November/000278.html

> So I'll start with this exchange.  I can't find the discussion in the
> mailing list history.
> 
> Why the decision to use "text/xml" for all xml responses?  I read it
> it is so "web browsers can 'just work'".
> 
> What are they supposed to do?  Display the XML as some sort of tree
> structure?  Is that the only thing?
> 
> One thing Allen and I talked about, and he tested, was the ability to
> insert a stylesheet declaration in the XML.  Is this part of the
> reason to switch to using "text/xml"?

Here's the relevant thread for reference:
http://portal.open-bio.org/pipermail/das2/2005-July/000227.html

In your other email on this thread, you said:

> This, btw, suggests a third solution to the problem of letting DAS/2
> and web browser clients both point to the same object - se
> 
> Content-Type: text/xml; das-type=das2xml
> 
> But that's ugly.

This seems like a good solution (and not too ugly IMHO). The das-type value
could be more detailed (e.g., x-das-features+xml). However, I recall that
there were possible problems with this syntax, but can't remember the
details at the moment.

Whatever the solution we decide, we should strive for simplicity. If we ask
too much of servers and clients, that will be an impediment to
implementation and maintenance.

Steve


From allenday at ucla.edu  Wed Nov  9 02:21:51 2005
From: allenday at ucla.edu (Allen Day)
Date: Tue, 8 Nov 2005 18:21:51 -0800 (PST)
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>

To be even more concise, there are two use cases being presented here:

1) DAS/2 content should be viewable in a web browser, and doing so
requires a HTTP Content-Type header to have value 'text/xml'.

2) DAS/2 content should be viewable in a specialized DAS/2 browser, and be
able to rely on HTTP headers to determine visualization mode, as
XML/DTD/Schema sniffing is undesireable.

The solution proposed in the referenced thread, or perhaps only on a
conference call, is to use the Content-Type header to address (1),
providing information to web browsers, as they are less flexible than a
specialized DAS/2 client.  (2) is addressed using a DAS/2 specific 
X-Das-Content-Type header, e.g.

==================
% GET -e 'http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr22/1000000:2000000;type=SO:mRNA' | head -100
Connection: close
Date: Wed, 09 Nov 2005 02:15:24 GMT
Server: Apache/2.0.51 (Fedora)
Content-Type: text/xml
Expires: Thu, 09 Nov 2006 02:15:24 GMT
Client-Date: Wed, 09 Nov 2005 02:19:16 GMT
Client-Peer: 164.67.183.101:80
Client-Response-Num: 1
Client-Transfer-Encoding: chunked
X-DAS-Content-Type: text/x-das-feature+xml
X-DAS-Server: GMOD/0.0
X-DAS-Status: 200
X-DAS-Version: DAS/2.0
==================

This also has the added benefit of already being implemented for a few
months.  Are there objections to this solution?

-Allen


On Wed, 9 Nov 2005, Andrew Dalke wrote:

> My apologies for not tracking what's been going on in the last few
> months.  I'm back now and have time for the next few months to work
> on things.
> 
> So I'll start with this exchange.  I can't find the discussion in the
> mailing list history.
> 
> Why the decision to use "text/xml" for all xml responses?  I read it
> it is so "web browsers can 'just work'".
> 
> What are they supposed to do?  Display the XML as some sort of tree
> structure?  Is that the only thing?
> 
> One thing Allen and I talked about, and he tested, was the ability to
> insert a stylesheet declaration in the XML.  Is this part of the
> reason to switch to using "text/xml"?
> 
> Is there anything I'm missing?
> 
> Since it looks like I'm going to be more in charge of the spec 
> development,
> I would like to start collecting use cases and recording these sorts of
> decisions.
> 
> I think having different content-types is an important feature.  For
> example, it lets a DAS browser figure out what it's looking at before
> doing any parsing.  Here's my use case.
> 
> I want someone to send an email to someone else along the lines of
>    "What do you think about http://blah.blah/das/genome/blah/blah"
> with the URL of the object included in the email.
> 
> Paste that into a DAS browser and it should be able to figure out that
> this is a sequence, a feature, a whatever.  With the old content-types
> there was enough information to do that right away.  With this new
> one a DAS browser needs to parse the XML to figure out what's in it.
> Autodetection of XML formats?  I don't want to go there.
> 
> That's also the reason for Gregg's opposition.
> 
> 
> You (Allen) and Lincoln, on the other hand, want that user to be able to
> go to a web browser and paste the URL in, to get a basic idea of what's
> there.
> 
> I think that's also important.
> 
> I think there are other solutions.  One is "if the server sees a web
> browser then return the XML data streams as a 'text/xml'".
> 
> For example:
>    if "Mozilla" in headers["User-Agent"]:
>       ... this is IE, Mozilla, Firefox, and a few others ..
> 
> That catches most of the browsers anyone here cares about.  As
> another solution, look at the "Accept" header sent by the browser.
> Here's what Firefox sends:
> 
> Accept: text/xml,application/xml,application/xhtml+xml,text/html;
>     q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'
> 
> Here's Safari and "links" (a text browser):
> 
> Accept: */*
> 
> Another rule them might be
> 
> if asking_for_xml_format and "*/*" in headers["Accept"]:
>      ... return it as "text/xml" ...
> 
> Though a better version is to make sure the client doesn't know about
> the expected content type:
> 
> 
> if asking_for_xml_format:
>     return_content_type = ... whatever is appropriate ...
> 
>     if (return_content_type not in headers["Accept"]
>         and "*/*" in headers["Accept"]):
> 
>            return_content_type = "text/xml"
>            .... optionally insert style sheet ....
> 
> 
> 
> Another solution is to send a "what kind of DAS object are you?" request
> to the URL (eg, tack on a ? query or tell the server that the client 
> will
> "Accept: application/x-das-autodiscovery").
> 
> 
> I think that's clumsy, but I mention it as another way to support
> both DAS client app and human browser requests of the same URL.
> 
> 
> >> From: Allen Day <allenday at ucla.edu>
> 
> >> Looks like the cache server.  FYI, I have updated the server to use 
> >> all
> >> "text/xml" Content-Type for all xml response types.  This was 
> >> approved by
> >> Lincoln so that web browsers could be pointed at the das server and 
> >> "just
> >> work".  I thought these changes had already made their way into the 
> >> spec,
> >> but apparently not.
> 
> >> On Fri, 28 Oct 2005, Helt,Gregg wrote:
> >>> But according to the spec the content type header needs to be:
> >>> Content-Type: text/x-das-features+xml
> >>> I'm using this in the IGB DAS/2 client to parse responses based on 
> >>> the
> >>> content type.  With "text/plain; charset=UTF-8" IGB doesn't know what
> >>> parser to use and gives up.  So right now I can't visualize 
> >>> annotations
> >>> from the biopackages server.  I'm pretty sure the server was setting 
> >>> the
> >>> content-type header correctly on Wednesday -- did anything change 
> >>> since
> >>> then that could be causing this?  Could the server-side cache be 
> >>> doing
> >>> this for some reason?
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From dalke at dalkescientific.com  Wed Nov  9 17:37:21 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 9 Nov 2005 18:37:21 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>
References: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>
Message-ID: <a802f111f5ef1be3e1e556543ddac443@dalkescientific.com>

Steve:
> Here's the relevant thread for reference:
> http://portal.open-bio.org/pipermail/das2/2005-July/000227.html

Ahh, it's the one I half remembered, from July.

Allen said:
>  Not sure how much value there is in
> this, but here is a very simple graphical display of regions on the
> server, and their relative sizes.

I think it's useful to have web browserability, as it were, but I
think it's a secondary goal.  To me the ability to transform the
XML via the stylesheet is something that's technology driven and
not user driven.  That is, nothing in the previous work, including
the DAS/2 proposals from others, mentioned that as a need.

On the other hand, being able to get the content type of what's
coming back from the server is a design goal, and we have an
existing need -- Gregg's example -- for it.

I would rather therefore put the onus on the data provider to be
clever in sniffing out the client than in the DAS/2 client in
sniffing out the data.

Steve:
> In your other email on this thread, you said:
>
>> This, btw, suggests a third solution to the problem of letting DAS/2
>> and web browser clients both point to the same object - se
>>
>> Content-Type: text/xml; das-type=das2xml
>>
>> But that's ugly.
>
> This seems like a good solution (and not too ugly IMHO). The das-type 
> value
> could be more detailed (e.g., x-das-features+xml). However, I recall 
> that
> there were possible problems with this syntax, but can't remember the
> details at the moment.

We have discussed this on-and-off for a while now, eh?  Here's
the previous thread on it:

http://portal.open-bio.org/pipermail/das2/2004-December/000019.html

I need to do a bit more research.  I don't like the idea of making
new headers and I don't like the idea of using a modified content-type
like that.  The first because we aren't doing anything unusual
compared to other projects and the second because I don't have any
experience with that.

I suspect the answer will be:
   - by default if no "?format=" is specified then return "text/xml"
   - if the client sends an "Accept: text/x-das-features+xml" then
       return the document with the proper content-type information

In that way if someone pastes a "http://.../blah?format=xyz and they
get a bunch of garbage that can manually chop off the obvious "format="
part of the query.

But that doesn't agree with my use case, where the DAS/2 client
gets a random URL.  It would need to send "Accept: ..." where the "..."
is a list of all the possible DAS content-types.

I'll think about this some more while I'm out salsa dancing this
evening.  :)

					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Thu Nov 10 01:25:48 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Wed, 09 Nov 2005 17:25:48 -0800
Subject: [DAS2] Agenda for weekly teleconference
Message-ID: <BF97E19C.17C39%Steve_Chervitz@affymetrix.com>

Time & Day:  12:00 Noon PST, Thursday 11 Nov 2005
Tel (US):    800-531-3250
Tel (Int'l): 303-928-2693
ID:          2879055

Agenda
------

* Decide on Europe-friendly time for this teleconference.
  Proposals:
  - Thu 9am PST = 12pm EST = 17:00 GMT
  - Wed 9am PST
  - Mon 9am PST

* DAS/2 get spec issues:
  - Content-type: text/xml vs. text/x-das-blah+xml
    http://portal.open-bio.org/pipermail/das2/2005-November/000287.html

  - XML encoding of type and feature properties:
    http://portal.open-bio.org/pipermail/das2/2005-November/000278.html
   
Time and people permitting:

* Summarize CSHL genome informatics meeting happenings relevant to
  DAS/2 (Allen, Gregg, Suzi, Lincoln).

* Introduction to Apollo (Suzi)

* DAS/2 validation (Andrew)


From dalke at dalkescientific.com  Thu Nov 10 01:34:28 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 10 Nov 2005 02:34:28 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>
	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
Message-ID: <2dd2f4224520b2f35add5af1de821729@dalkescientific.com>

Allen
> To be even more concise, there are two use cases being presented here:
>
> 1) DAS/2 content should be viewable in a web browser, and doing so
> requires a HTTP Content-Type header to have value 'text/xml'.
>
> 2) DAS/2 content should be viewable in a specialized DAS/2 browser, 
> and be
> able to rely on HTTP headers to determine visualization mode, as
> XML/DTD/Schema sniffing is undesireable.

A use case describes what the user wants do to, from the user's
perspective and not the implementation perspective.  Sometimes
they are the same, as when the user mandates certain technical
decisions, but that's not the case here.  The wikipedia has a goo
definition, at http://en.wikipedia.org/wiki/Use_case .

To make use cases read nicely I've found it useful to have
a name better than "the user".  There will be many users of
different aspect of a DAS system.  Some are:
   - a person making the database/DAS adapter
   - an annotator
   - a molecular biologist

The use case where talking about here is to let person X (either
an annotator or a molecular biologist) communicate with person Y.
Rather than saying "X" and "Y" I'll say "Bill" and "Jim".  Bill
send Jim an email saying "I think there's a problem with this
annotation; it looks like it's off-by-one.  Could you take a look
at it for me?"  (Make up your own explanation :)

Jim gets the email, sees the URL, and pastes it into his browser.
If Jim is an annotator this will probably be a specialized DAS/2
client.  If he's not, then more likely it will be a web browser.

Both should "do the right thing", that is, provide meaningful
information about the given entity and options for more exploration
and analysis.


This use case suggests several functional details:

   - There needs to be a way to exchange DAS details via normal
text, for inclusion in email.  DAS uses URLs so we should build
on those.  This means they'll also likely be used in generic
web pages.  Because the specific consumer of a URL isn't known
it's not possible to put a "?format=" field on the end of the
URL. Thus these URLs must not specify the format.

   - DAS/2 client (web browsers and specialized apps) should have
some way to get (and easily get) the URL for a given annotation,
region, feature type, etc.

   - specialized DAS clients (IGB) need a way for users to enter
an arbitrary DAS URL.

If one or more of these won't happen then there's no problem.
For example, if IGB etc. all don't support entering an arbitrary
DAS URL then there's no need to handle both classes of clients.

If there's no demand for direct visualization in a web browser
then there's also no problem.

I'm going to ask about the last.  The whole point of this change
is to support the ability for a generic web browser to go to a
given URL and show something of interest.

  1) who needs that?  Can any of us point to a group of people who
would use a direct web interface to a given DAS/2 URL?  If so,
why didn't it come up in earlier discussions?

  2) what can't they go to a DAS/2 web app elsewhere and from
there tell it "now link in the data from this URL." That is,
view the URL through an intermediary.

  3) why can't we tell people "stick a 'format=html' at the end
to see iT in HTML, if you want to make a web link to it, and if
the server supports HTML displays.

  4) Who wants to make a DAS/2 web app based directly on the
DAS/2 data structure?  Yes, that makes it trivial to have a first
pass web app, but that app will suck. It'll only support browsing
the server's data structure via a tree.  It won't support, say,
the ability to incorporate more or alternate records in a view,
fancy AJAX GUIs, etc.  There will be no way to merge records from
different servers because the annotation server only understands
annotations on that server.


My view now is that having the default MIME type for a DAS/2 entity
be "text/xml", for the purpose of supporting direct web browser
visualization of that entity, is not driven by a realistic use case
and is interesting mostly for technical reasons.

As such, we shouldn't do that.  We should leave the return documents
as distinct MIME types.


That leads me to the result of more research.  The relevant
spec for the MIME type for XML documents is RFC 3023, at
   http://www.ietf.org/rfc/rfc3023.txt

For commentary also see:
   http://www.xml.com/lpt/a/2004/07/21/dive.html
   http://diveintomark.org/archives/2004/02/13/xml-media-types

These say we have lots of things to worry about.  For example,
"text/xml" requires that the content-type include the charset
declaration, else the spec says to assume the document is in
US-ASCII.  There is no way for the XML itself to override that.

If we go the "text/xml" route we mandate that either:
   - all servers include a charset in the content-type
   - those that don't must only serve ASCII data.

The proper MIME type is under "application", as
    "application/x-das-*+xml"

> then the character encoding is determined in this order:
>
> * the encoding given in the charset parameter of the Content-Type
>      HTTP header, or
> * the encoding given in the encoding attribute of the XML declaration
>      within the document, or
> * utf-8.
(quoting from http://www.xml.com/lpt/a/2004/07/21/dive.html )

Apparently some ISPs, eg. in Russian and Japan, will transcode text/xml
documents at the HTTP level, ignoring the encoding information in the
XML itself.  This can lead to problems.

As the author of those commentaries says, ?XML is tough.?
   http://diveintomark.org/archives/2004/07/06/tough


> The solution proposed in the referenced thread, or perhaps only on a
> conference call, is to use the Content-Type header to address (1),
> providing information to web browsers, as they are less flexible than a
> specialized DAS/2 client.  (2) is addressed using a DAS/2 specific
> X-Das-Content-Type header, e.g.

It must have been a conference call.  I don't see mention of that in
my back emails.  I'm thankful to Steve for doing the writeups.

To emphasize what I said earlier, what will happen in the case of
(1)?  Who will implement it?  What will users expect from it?  Why
can't those users go through some intermediate DAS web app to better
view that data?  Why can't we say "add a 'format=html' for interactive
viewing"?

As for (2), I don't want a new header.  I know I talk about conneg
and other neat features in HTTP but in re-reading appendix A of RFC 3023
   http://www.ietf.org/rfc/rfc3023.txt
it talks about over a dozen other solutions to the problem and why
they were excluded.  These include:

> A.10 How about using a conneg tag instead (e.g., accept-features:
>      (syntax=xml))?
>
>    When the conneg protocol is fully defined, this may potentially be a
>    reasonable thing to do.  But given the limited current state of
>    conneg[RFC2703] development, it is not a credible replacement for a
>    MIME-based solution.

In this case I'm willing to let people experiment with the idea
before baking it into the spec.

> A.9 How about a new Alternative-Content-Type header?
>
>    This is better than Appendix A.8, in that no extra functionality
>    needs to be added to a MIME registry to support dispatching of
>    information other than standard content types.  However, it still
>    requires both sender and receiver to be upgraded, and it will also
>    fail in many cases (e.g., web hosting to an outsourced server), 
> where
>    the user can set MIME types (often through implicit mapping to file
>    extensions), but has no way of adding arbitrary HTTP headers.

How much control will DAS/2 data providers have over their server?

I know I want to support people who provide data as a set of files
through Apache, though that's not driven by any use case.  (This
use case would involve a user who has different requirement than
either Jim or Bob.)  mod_mime is designed for that.  I don't know
how to add other headers for this case.

The data providers we have now have control over all the headers.
If that will essentially always be the case then adding a new
header isn't a problem.

Then again, if this is always the case then we can go ahead with
conneg since an argument against conneg is it puts more work on
the server implementations.

In this too I'll be conservative - DAS/2 pushes no new ground
for a web app development project; there should be no reason to
invent a new header.

> A.6 How about labeling with parameters in the other direction (e.g.,
>     application/xml; Content-Feature=iotp)?
>
>    This proposal fails under the simplest case, of a user with neither
>    knowledge of XML nor an XML-capable MIME dispatcher.  In that case,
>    the user's MIME dispatcher is likely to dispatch the content to an
>    XML processing application when the correct default behavior should
>    be to dispatch the content to the application responsible for the
>    content type (e.g., an ecommerce engine for
>    application/iotp+xml[RFC2801], once this media type is registered).
>
>    Note that even if the user had already installed the appropriate
>    application (e.g., the ecommerce engine), and that installation had
>    updated the MIME registry, many operating system level MIME
>    registries such as .mailcap in Unix and HKEY_CLASSES_ROOT in Windows
>    do not currently support dispatching off a parameter, and cannot
>    easily be upgraded to do so.  And, even if the operating system were
>    upgraded to support this, each MIME dispatcher would also separately
>    need to be upgraded.


> X-DAS-Content-Type: text/x-das-feature+xml
> X-DAS-Server: GMOD/0.0
> X-DAS-Status: 200
> X-DAS-Version: DAS/2.0
> ==================
>
> This also has the added benefit of already being implemented for a few
> months.  Are there objections to this solution?

Yes.  Several.

When did "X-DAS-Status" come back into the picture?  I thought
we talked about this in spring and nixed it because it doesn't provide
anything useful than the existing HTTP-level error code.  Or perhaps
it was fall of last year?  I think I remember raking leaves at the time.

More useful, for example, would be a document (html, xml, or otherwise)
which accompanies the error response and gives more information about
what occurred.


What does the "X-DAS-Server" get you that the normal "Server:" doesn't
get you?  What's the use case?

Why is the "X-DAS-Version" at all important?  What's important is
the data content.  It's the document return type/version that's 
important
and not the server version.


But I mentioned most of these over a year ago
   http://portal.open-bio.org/pipermail/das/2004-September/000814.html

In summary:
   - no support for direct web browser access to a URL, expect with a
       likely use case;
   - keep the default response in an XML format
   - change that XML content-type to "application/x-das-*+xml" instead 
of "text/*"
   - have no requirement for new, DAS-specific headers


					Andrew
					dalke at dalkescientific.com


From allenday at ucla.edu  Thu Nov 10 02:18:23 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed, 9 Nov 2005 18:18:23 -0800 (PST)
Subject: [DAS2] Agenda for weekly teleconference
In-Reply-To: <BF97E19C.17C39%Steve_Chervitz@affymetrix.com>
References: <BF97E19C.17C39%Steve_Chervitz@affymetrix.com>
Message-ID: <Pine.LNX.4.58.0511091817240.20974@sumo.ctrl.ucla.edu>

Missing this week, I'm in Rio de Janeiro.  I'm giving a talk on DAS 
tomorrow though, so I'm still contributing! :)

-Allen


On Wed, 9 Nov 2005, Chervitz, Steve wrote:

> Time & Day:  12:00 Noon PST, Thursday 11 Nov 2005
> Tel (US):    800-531-3250
> Tel (Int'l): 303-928-2693
> ID:          2879055
> 
> Agenda
> ------
> 
> * Decide on Europe-friendly time for this teleconference.
>   Proposals:
>   - Thu 9am PST = 12pm EST = 17:00 GMT
>   - Wed 9am PST
>   - Mon 9am PST
> 
> * DAS/2 get spec issues:
>   - Content-type: text/xml vs. text/x-das-blah+xml
>     http://portal.open-bio.org/pipermail/das2/2005-November/000287.html
> 
>   - XML encoding of type and feature properties:
>     http://portal.open-bio.org/pipermail/das2/2005-November/000278.html
>    
> Time and people permitting:
> 
> * Summarize CSHL genome informatics meeting happenings relevant to
>   DAS/2 (Allen, Gregg, Suzi, Lincoln).
> 
> * Introduction to Apollo (Suzi)
> 
> * DAS/2 validation (Andrew)
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From ed_erwin at affymetrix.com  Thu Nov 10 18:33:58 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Thu, 10 Nov 2005 10:33:58 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
Message-ID: <43739296.4030307@affymetrix.com>


Andrew Dalke wrote:
> 
> 
>> X-DAS-Content-Type: text/x-das-feature+xml
>> X-DAS-Server: GMOD/0.0
>> X-DAS-Status: 200
>> X-DAS-Version: DAS/2.0
>> ==================
>>
>> This also has the added benefit of already being implemented for a few
>> months.  Are there objections to this solution?
> 
> 
> Yes.  Several.
> 
> When did "X-DAS-Status" come back into the picture?  I thought
> we talked about this in spring and nixed it because it doesn't provide
> anything useful than the existing HTTP-level error code.  Or perhaps
> it was fall of last year?  I think I remember raking leaves at the time.
> 
> More useful, for example, would be a document (html, xml, or otherwise)
> which accompanies the error response and gives more information about
> what occurred.
> 

Using the HTTP-level error codes can cause problems.

For a user (let's call her Varla) using IE, the browser will intercept 
some error codes and present her with some IE-specific garbage, throwing 
away any content that was sent back in addition to the error code.

Even for a user (Marla this time) using IGB, firewalls and/or caching 
and/or apache port-forwarding mechanisms can throw out anything with a 
status code in the error range.

(I did test having the NetAffx DAS server send HTTP status codes, and I 
did have problems with that in IGB, though I've forgotten the specifics. 
  It was about a year ago....)

I don't care if status code is indicated with a header like
"X-DAS-Status: 200" or with some XML content, or with both.  But I think 
  the HTTP status code has to be a separate thing, and will usually be 
"400" indicating that the user (sorry, I meant to say LeRoy) 
successfully communicated with the DAS server.

Ed


From dalke at dalkescientific.com  Thu Nov 10 19:49:18 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 10 Nov 2005 20:49:18 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <43739296.4030307@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
Message-ID: <83d48ca8f7128fb04efecd673ef61459@dalkescientific.com>

Ed:
> Using the HTTP-level error codes can cause problems.

> I don't care if status code is indicated with a header like
> "X-DAS-Status: 200" or with some XML content, or with both.  But I 
> think  the HTTP status code has to be a separate thing, and will 
> usually be "400" indicating that the user (sorry, I meant to say 
> LeRoy) successfully communicated with the DAS server.

Okay, sounds like using HTTP codes for this causes problems in
practice.

What about returning a different content-type for that case?

200 Ok
Content-Type: application/x-das-error

<body>
Something bad happened.
</body>


Pros:
   - doesn't add a new header
   - just as easy to detect in the client
   - easier to support on the server for some use cases


					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Thu Nov 10 19:34:51 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 10 Nov 2005 14:34:51 -0500
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <43739296.4030307@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
Message-ID: <200511101434.51966.lstein@cshl.edu>

I didn't know that X-DAS-Status had ever been deprecated. I strongly feel that 
the DAS status codes are separate from the HTTP codes and should not try to 
piggyback on the HTTP status line.

Lincoln

On Thursday 10 November 2005 01:33 pm, Ed Erwin wrote:
> Andrew Dalke wrote:
> >> X-DAS-Content-Type: text/x-das-feature+xml
> >> X-DAS-Server: GMOD/0.0
> >> X-DAS-Status: 200
> >> X-DAS-Version: DAS/2.0
> >> ==================
> >>
> >> This also has the added benefit of already being implemented for a few
> >> months.  Are there objections to this solution?
> >
> > Yes.  Several.
> >
> > When did "X-DAS-Status" come back into the picture?  I thought
> > we talked about this in spring and nixed it because it doesn't provide
> > anything useful than the existing HTTP-level error code.  Or perhaps
> > it was fall of last year?  I think I remember raking leaves at the time.
> >
> > More useful, for example, would be a document (html, xml, or otherwise)
> > which accompanies the error response and gives more information about
> > what occurred.
>
> Using the HTTP-level error codes can cause problems.
>
> For a user (let's call her Varla) using IE, the browser will intercept
> some error codes and present her with some IE-specific garbage, throwing
> away any content that was sent back in addition to the error code.
>
> Even for a user (Marla this time) using IGB, firewalls and/or caching
> and/or apache port-forwarding mechanisms can throw out anything with a
> status code in the error range.
>
> (I did test having the NetAffx DAS server send HTTP status codes, and I
> did have problems with that in IGB, though I've forgotten the specifics.
>   It was about a year ago....)
>
> I don't care if status code is indicated with a header like
> "X-DAS-Status: 200" or with some XML content, or with both.  But I think
>   the HTTP status code has to be a separate thing, and will usually be
> "400" indicating that the user (sorry, I meant to say LeRoy)
> successfully communicated with the DAS server.
>
> Ed
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From ed_erwin at affymetrix.com  Thu Nov 10 19:56:12 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Thu, 10 Nov 2005 11:56:12 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <83d48ca8f7128fb04efecd673ef61459@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>	<43739296.4030307@affymetrix.com>
	<83d48ca8f7128fb04efecd673ef61459@dalkescientific.com>
Message-ID: <4373A5DC.3070102@affymetrix.com>


Andrew Dalke wrote:
> Okay, sounds like using HTTP codes for this causes problems in
> practice.
> 
> What about returning a different content-type for that case?
> 
> 200 Ok
> Content-Type: application/x-das-error
> 
> <body>
> Something bad happened.
> </body>
> 

That seems fine to me.

There is still the separate issue of whether the content is 
"application/x-das-error" or simply "text/xml".  But that is another 
discussion that is already ongoing and to which I have nothing to add.


From dalke at dalkescientific.com  Thu Nov 10 20:01:45 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 10 Nov 2005 21:01:45 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <200511101434.51966.lstein@cshl.edu>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
	<200511101434.51966.lstein@cshl.edu>
Message-ID: <7fd7a40582a6d8ccdc694c2a91b6f8b7@dalkescientific.com>

Lincoln:
> I didn't know that X-DAS-Status had ever been deprecated. I strongly 
> feel that
> the DAS status codes are separate from the HTTP codes and should not 
> try to
> piggyback on the HTTP status line.

I'm okay with not having the assertion "something happened at the
DAS level" not be in the HTTP status code.  Not ecstatic, but real
world trumps purity.

I don't like the idea of adding new HTTP headers for this information.

In my client code I need to do the following:

   - was there an HTTP error code?
   - is the return content-type correct?

Having another header means I write

   - was there an HTTP error code?
   - was there a DAS error code?
   - is the return content-type correct?

I would rather have one less bit of code to do wrong.


As I also mentioned, I would like to support DAS annotations
made available through a basic Apache install and a set of files,
likely used by someone who just wants to provide annotations.
This is not one of the current design goals; should it be, or
should we require that everyone have more control over the
server?

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Nov 10 20:10:14 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 10 Nov 2005 21:10:14 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <43739296.4030307@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
Message-ID: <81b4c8e3062e94b2032e37995f26b588@dalkescientific.com>

Ed:
> For a user (let's call her Varla) using IE, the browser will intercept 
> some error codes and present her with some IE-specific garbage, 
> throwing away any content that was sent back in addition to the error 
> code.

Here's the question I had earlier.  Will people be using a DAS/2 
annotation
server directly through a web browser?  As far as I'm aware there's no
demand for this.  None of the proposals mentioned it and the current 
discussion
started from a technical discussion at ISMB; that is, because it could,
and not because it is needed.

I thought most people using IE/Moz/etc. would go a DAS application 
server,
which integrates views from different DAS annotation servers.

All this discussion is about returning pages back from an annotation
server in a form directly viewable by a web browser.

I don't see that as being useful.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Nov 10 21:45:09 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 10 Nov 2005 22:45:09 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <43739296.4030307@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
Message-ID: <725e762a203211651d1850097ae3fcc0@dalkescientific.com>

Further refining this from today's phone meeting

Ed:
> For a user (let's call her Varla) using IE, the browser will intercept 
> some error codes and present her with some IE-specific garbage, 
> throwing away any content that was sent back in addition to the error 
> code.

The case Ed came across was from an in-house group using a Windows call
out to IE as a background process to fetch a web page.  In that case
(as I understand it) it would convert HTTP error responses into its own
error messages.

Ed couldn't during the conversation recall if it was possible to
get ahold of the error code at all.  Did they have to parse the output?

> Even for a user (Marla this time) using IGB, firewalls and/or caching 
> and/or apache port-forwarding mechanisms can throw out anything with a 
> status code in the error range.

404 gets through, yes?

All of those are supposed to be transparent to error codes, or at the
very least translate them from (say) 404 to 400.

Can anyone point me to some reports of one of these mishaps?

We definitely need to have some tie-ins with the HTTP error codes.
Consider these two implementations for getting

http://example.com/das2/genome/dazypus/1.43/

(Note the typo "dazypus" -> "dasypus")

A) One system might have all "/das2" URLs forwarded to a DAS server.

B) Another might have a handler only for "/das2/genome/dasypus" and
let Apache do the rest.

In case A) the DAS server sees that the given resource doesn't exist.
It needs to return an error.  It can return either "200 Ok" followed
by a DAS error payload, or return a "404 Not Found" at the HTTP level.

In case B) the request never gets to the DAS handler because
of the typo.  Apache sees there's nothing for the resource so returns
a "404 Not Found".

The client code is easier if it can check the HTTP error code and
stop on failure.  This means it's best for case A) for the DAS/2
server to return an HTTP error code of 404, and perhaps an optional
ignorable payload.

> (I did test having the NetAffx DAS server send HTTP status codes, and 
> I did have problems with that in IGB, though I've forgotten the 
> specifics.  It was about a year ago....)

Do you have the specifics perhaps in an old email somewhere?

					Andrew
					dalke at dalkescientific.com


From ed_erwin at affymetrix.com  Thu Nov 10 22:43:02 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Thu, 10 Nov 2005 14:43:02 -0800
Subject: [DAS2] Re: how do I load probe sets into IGB now?
In-Reply-To: <83722dde0511101429m398c38ebg8e4df3d9b2a8d0da@mail.gmail.com>
References: <83722dde0511101429m398c38ebg8e4df3d9b2a8d0da@mail.gmail.com>
Message-ID: <4373CCF6.9060508@affymetrix.com>

Hi,

The old DAS loading mechanism is still there, in exactly the same place 
it used to be: File->Load DAS Features.

The new "DAS/2" tab at the bottom is for "DAS/2" servers, of which there 
are only a few at the moment, and which are still experimental.

Ed


Ann Loraine wrote:
> Hi,
> 
> Congratulations everybody on the new release of IGB!
> 
> I have a question about the new Quickload/DAS tab.
> 
> I'm trying to load some probe sets via DAS but can't figure out how to do it.
> 
> I used to be able to get them by using the "DAS" menu item, which
> opened a widget containing a menu of DAS servers.  I would select the
> one labeled AffyDas (or something like that) and then I would get to
> pick the chip (more often, chips) I wanted to see. Then IGB would
> query the server and get me the probe set design sequence alignments
> for the currently-shown region.
> 
> I can't find this in the new interface.
> 
> Can you help?
> 
> -Ann
> 
> --
> Ann Loraine
> Assistant Professor
> Section on Statistical Genetics
> University of Alabama at Birmingham
> http://www.ssg.uab.edu
> http://www.transvar.org


From ed_erwin at affymetrix.com  Thu Nov 10 22:49:47 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Thu, 10 Nov 2005 14:49:47 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <725e762a203211651d1850097ae3fcc0@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>	<43739296.4030307@affymetrix.com>
	<725e762a203211651d1850097ae3fcc0@dalkescientific.com>
Message-ID: <4373CE8B.3000302@affymetrix.com>


Andrew Dalke wrote:
> Further refining this from today's phone meeting
> 
> Ed:
> 
>> For a user (let's call her Varla) using IE, the browser will intercept 
>> some error codes and present her with some IE-specific garbage, 
>> throwing away any content that was sent back in addition to the error 
>> code.
> 
> 
> The case Ed came across was from an in-house group using a Windows call
> out to IE as a background process to fetch a web page.  In that case
> (as I understand it) it would convert HTTP error responses into its own
> error messages.
> 
> Ed couldn't during the conversation recall if it was possible to
> get ahold of the error code at all.  Did they have to parse the output?

Here is some info from microsoft about these "friendly HTTP error messages":

http://support.microsoft.com/kb/q218155/

Note that whether the real error message gets through seems to depend on 
  both the error code, and the length of the content.  How is that friendly?


>> (I did test having the NetAffx DAS server send HTTP status codes, and 
>> I did have problems with that in IGB, though I've forgotten the 
>> specifics.  It was about a year ago....)
> 
> 
> Do you have the specifics perhaps in an old email somewhere?
> 

I can look around when I get back from vacation, which I'm on all next week.

Ed


From Gregg_Helt at affymetrix.com  Thu Nov 10 22:46:23 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 10 Nov 2005 14:46:23 -0800
Subject: [DAS2] RE: how do I load probe sets into IGB now?
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAB0@msex02.affymetrix.com>

That data is on a DAS/1 server.  The new "Data Access" tab is just for
QuickLoad and DAS/2 servers.  DAS/1 servers are still accessible via the
"File --> Load DAS Features" menu item.

In the near term the plan is to soon move the DAS/1 access into the
"Data Access" tab as a DAS/1 subtab alongside the QuickLoad and DAS/2
subtabs, but this wasn't ready in time for the current release.  In the
longer term the probe data will be hosted on both DAS/1 and DAS/2
servers.

	gregg

> -----Original Message-----
> From: Ann Loraine [mailto:aloraine at gmail.com]
> Sent: Thursday, November 10, 2005 2:30 PM
> To: das2 at portal.open-bio.org
> Cc: Helt,Gregg; Erwin, Ed
> Subject: how do I load probe sets into IGB now?
> 
> Hi,
> 
> Congratulations everybody on the new release of IGB!
> 
> I have a question about the new Quickload/DAS tab.
> 
> I'm trying to load some probe sets via DAS but can't figure out how to
do
> it.
> 
> I used to be able to get them by using the "DAS" menu item, which
> opened a widget containing a menu of DAS servers.  I would select the
> one labeled AffyDas (or something like that) and then I would get to
> pick the chip (more often, chips) I wanted to see. Then IGB would
> query the server and get me the probe set design sequence alignments
> for the currently-shown region.
> 
> I can't find this in the new interface.
> 
> Can you help?
> 
> -Ann
> 
> --
> Ann Loraine
> Assistant Professor
> Section on Statistical Genetics
> University of Alabama at Birmingham
> http://www.ssg.uab.edu
> http://www.transvar.org


From dalke at dalkescientific.com  Thu Nov 10 23:19:51 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 11 Nov 2005 00:19:51 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <4373CE8B.3000302@affymetrix.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>	<43739296.4030307@affymetrix.com>
	<725e762a203211651d1850097ae3fcc0@dalkescientific.com>
	<4373CE8B.3000302@affymetrix.com>
Message-ID: <0cc693a86af103c99b668e5f6db2c9e6@dalkescientific.com>

> Here is some info from microsoft about these "friendly HTTP error 
> messages":
>
> http://support.microsoft.com/kb/q218155/
>
> Note that whether the real error message gets through seems to depend 
> on  both the error code, and the length of the content.  How is that 
> friendly?

Indeed.

>> Internet Explorer 5 and later provides a replacement for the HTML 
>> template for the following friendly error messages:
>>
>> 400, 403, 404, 405, 406, 408, 409, 410, 500, 501, 505

I've marked them with ***.  The only ones I think we might use,
were we to piggyback, are 409 (for locking?), 415 (for servers that
don't support a requested format) and 416 (for unsupported range
requests?).

  ***    400: ('Bad request',
               'Bad request syntax or unsupported method'),
         401: ('Unauthorized',
               'No permission -- see authorization schemes'),
         402: ('Payment required',
               'No payment -- see charging schemes'),
  ***    403: ('Forbidden',
               'Request forbidden -- authorization will not help'),
  ***    404: ('Not Found', 'Nothing matches the given URI'),
  ***    405: ('Method Not Allowed',
               'Specified method is invalid for this server.'),
  ***    406: ('Not Acceptable', 'URI not available in preferred 
format.'),
         407: ('Proxy Authentication Required', 'You must authenticate 
with '
               'this proxy before proceeding.'),
  ***    408: ('Request Time-out', 'Request timed out; try again 
later.'),
  ***    409: ('Conflict', 'Request conflict.'),
  ***    410: ('Gone',
               'URI no longer exists and has been permanently removed.'),
         411: ('Length Required', 'Client must specify Content-Length.'),
         412: ('Precondition Failed', 'Precondition in headers is 
false.'),
         413: ('Request Entity Too Large', 'Entity is too large.'),
         414: ('Request-URI Too Long', 'URI is too long.'),
         415: ('Unsupported Media Type', 'Entity body in unsupported 
format.'),
         416: ('Requested Range Not Satisfiable',
               'Cannot satisfy request range.'),
         417: ('Expectation Failed',
               'Expect condition could not be satisfied.'),

  ***    500: ('Internal error', 'Server got itself in trouble'),
  ***    501: ('Not Implemented',
               'Server does not support this operation'),
         502: ('Bad Gateway', 'Invalid responses from another 
server/proxy.'),
         503: ('Service temporarily overloaded',
               'The server cannot process the request due to a high 
load'),
         504: ('Gateway timeout',
               'The gateway server did not receive a timely response'),
  ***    505: ('HTTP Version not supported', 'Cannot fulfill request.'),


> I can look around when I get back from vacation, which I'm on all next 
> week.

Enjoy!
					Andrew
					dalke at dalkescientific.com


From aloraine at gmail.com  Thu Nov 10 22:29:48 2005
From: aloraine at gmail.com (Ann Loraine)
Date: Thu, 10 Nov 2005 16:29:48 -0600
Subject: [DAS2] how do I load probe sets into IGB now?
Message-ID: <83722dde0511101429m398c38ebg8e4df3d9b2a8d0da@mail.gmail.com>

Hi,

Congratulations everybody on the new release of IGB!

I have a question about the new Quickload/DAS tab.

I'm trying to load some probe sets via DAS but can't figure out how to do it.

I used to be able to get them by using the "DAS" menu item, which
opened a widget containing a menu of DAS servers.  I would select the
one labeled AffyDas (or something like that) and then I would get to
pick the chip (more often, chips) I wanted to see. Then IGB would
query the server and get me the probe set design sequence alignments
for the currently-shown region.

I can't find this in the new interface.

Can you help?

-Ann

--
Ann Loraine
Assistant Professor
Section on Statistical Genetics
University of Alabama at Birmingham
http://www.ssg.uab.edu
http://www.transvar.org


From allenday at ucla.edu  Fri Nov 11 01:39:36 2005
From: allenday at ucla.edu (Allen Day)
Date: Thu, 10 Nov 2005 17:39:36 -0800 (PST)
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>
	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0511101724470.20615@sumo.ctrl.ucla.edu>

> What does the "X-DAS-Server" get you that the normal "Server:" doesn't
> get you?  What's the use case?

I don't know.  The absence of this header was actually reported by Dasypus
output sent to me by you on May 26, 2005.  Here's a snippet of the Dasypus
diagnostics, followed by a comment from you:

"Date: Thu, 26 May 2005 12:29:32 -0600
From: Andrew Dalke <dalke at dalkescientific.com>
To: DAS/2 <das2 at portal.open-bio.org>
Subject: [DAS2] dasypus status
[...]
WARNING: Adding X-DAS-Server header 'gmod/0.0'

The prototype doesn't mention the DAS server used.  I stick one
in based on the host name.
[...]"

> Why is the "X-DAS-Version" at all important?  What's important is the
> data content.  It's the document return type/version that's important
> and not the server version.

It was actually originally (as far as I can tell from my email archive)
discussed, along with X-DAS-Status in an email from Lincoln on May 21,
2004, and forwarded to me on August 12, 2004:

"-----Original Message-----
From: Lincoln Stein [mailto:lstein at cshl.edu] 
Sent: Friday, May 21, 2004 1:22 PM
To: edgrif at sanger.ac.uk; Gregg_Helt at affymetrix.com; avc at sanger.ac.uk;
gilmanb at mac.com; dalke at dalkescientific.com
Cc: lstein at cshl.edu; allen.day at ucla.edu
Subject: DAS/2 notes
[...]
In addition to the standard HTTP response headers, DAS servers return the
following HTTP headers:

    X-DAS-Version: DAS/2.0
    X-DAS-Status:  XXX status code
[...]"

> But I mentioned most of these over a year ago
>    http://portal.open-bio.org/pipermail/das/2004-September/000814.html
> 
> In summary:
>    - no support for direct web browser access to a URL, expect with a
>        likely use case;
>    - keep the default response in an XML format
>    - change that XML content-type to "application/x-das-*+xml" instead 
> of "text/*"
>    - have no requirement for new, DAS-specific headers

This discussion suggests we need for a more formal process of modifying
the client and server implementations, e.g. modify spec first and commit,
then update code.

-Allen


From td2 at sanger.ac.uk  Fri Nov 11 09:24:52 2005
From: td2 at sanger.ac.uk (Thomas Down)
Date: Fri, 11 Nov 2005 09:24:52 +0000
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <83d48ca8f7128fb04efecd673ef61459@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>	<d55894e6ce314f2aa1310d95a5f2ddb1@dalkescientific.com>	<Pine.LNX.4.58.0511081804390.27914@sumo.ctrl.ucla.edu>
	<2dd2f4224520b2f35add5af1de821729@dalkescientific.com>
	<43739296.4030307@affymetrix.com>
	<83d48ca8f7128fb04efecd673ef61459@dalkescientific.com>
Message-ID: <8C869723-601C-4236-B9FA-88F6D6401016@sanger.ac.uk>


On 10 Nov 2005, at 19:49, Andrew Dalke wrote:

> Ed:
>
>> Using the HTTP-level error codes can cause problems.
>>
>
>
>> I don't care if status code is indicated with a header like
>> "X-DAS-Status: 200" or with some XML content, or with both.  But I  
>> think  the HTTP status code has to be a separate thing, and will  
>> usually be "400" indicating that the user (sorry, I meant to say  
>> LeRoy) successfully communicated with the DAS server.
>>
>
> Okay, sounds like using HTTP codes for this causes problems in
> practice.
>
> What about returning a different content-type for that case?
>
> 200 Ok
> Content-Type: application/x-das-error
>
> <body>
> Something bad happened.
> </body>

That looks reasonable, but could we add a bit of structure:

     <dasError>
        <faultCode>407</faultCode>
        <description>The sky is falling</description>
     </dasError>

(There's also a possible argument for using textual, rather than  
numeric, error codes -- but it would be good to keep at least one  
part of the error response using a well-defined vocabulary for the  
benefit of clients that want to respond to different error conditions  
in different ways).

             Thomas.


From Steve_Chervitz at affymetrix.com  Fri Nov 11 21:24:50 2005
From: Steve_Chervitz at affymetrix.com (Chervitz, Steve)
Date: Fri, 11 Nov 2005 13:24:50 -0800
Subject: [DAS2] how do I load probe sets into IGB now?
In-Reply-To: <83722dde0511101429m398c38ebg8e4df3d9b2a8d0da@mail.gmail.com>
Message-ID: <BF9A4C22.17DDF%Steve_Chervitz@affymetrix.com>

Ann,

Go to File -> Load DAS Features. There should be a DAS server named
'NetAffx-Align' that will give you what you want.

Steve


> From: Ann Loraine <aloraine at gmail.com>
> Date: Thu, 10 Nov 2005 16:29:48 -0600
> To: <das2 at portal.open-bio.org>
> Cc: <ed_erwin at affymetrix.com>, "Helt,Gregg" <gregg_helt at affymetrix.com>
> Subject: [DAS2] how do I load probe sets into IGB now?
> 
> Hi,
> 
> Congratulations everybody on the new release of IGB!
> 
> I have a question about the new Quickload/DAS tab.
> 
> I'm trying to load some probe sets via DAS but can't figure out how to do it.
> 
> I used to be able to get them by using the "DAS" menu item, which
> opened a widget containing a menu of DAS servers.  I would select the
> one labeled AffyDas (or something like that) and then I would get to
> pick the chip (more often, chips) I wanted to see. Then IGB would
> query the server and get me the probe set design sequence alignments
> for the currently-shown region.
> 
> I can't find this in the new interface.
> 
> Can you help?
> 
> -Ann
> 
> --
> Ann Loraine
> Assistant Professor
> Section on Statistical Genetics
> University of Alabama at Birmingham
> http://www.ssg.uab.edu
> http://www.transvar.org
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Steve_Chervitz at affymetrix.com  Sat Nov 12 00:51:41 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Fri, 11 Nov 2005 16:51:41 -0800
Subject: [DAS2] DAS/2 weekly meeting notes for 10 Nov 05
Message-ID: <BF9A7C9D.17E26%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 10 Nov 2005.

$Id: das2-teleconf-2005-11-10.txt,v 1.1 2005/11/12 00:48:39 sac Exp $

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  UCLA: Brian O'connor
  CSHL: Lincoln Stein
  UCBerkeley: Suzi Lewis
  Sweden: Andrew Dalke
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

Agenda Items
------------

* New Euro-friendly meeting time

It was decided to change the time for this weekly teleconference to
Monday 9:30 AM PST (12:30 PM EST, 17:30 UK).

[A] New teleconf time starts next week (Monday 14 Nov)

* Spec Issues

Gregg expressed a need to dedicate some of these weekly meetings to
be focused on resolving spec issues. We will do this for next week's
meeting. 
[A] Everyone come prepared to talk about retrieval spec issues on 11/14.

Content-type issue:
 - Should we use text/xml or application/x-das-blah+xml?
 - Consensus: use application/x-das-blah+xml
 - [A] Steve will rollback changes made to the retrieval spec.
 - Andrew acknowledges that text/xml may be handy for visual debugging
   and other presentation tricks, but is not a user-driven need;
   it's a technical issue.
 - Lincoln: XML handling is very browser-dependent:
    o Firefox - nice DOM tree structure
    o Safari, Konqueror - no special rendering
    o MSIE - "Cannot be displayed"
 - Gregg: Now we just need to ensure that we're actually implementing
   the correct content-type for given responses, which brings up the
   next topic...

* Validation

 - Gregg: we'd like to start using dasypus locally to verify
   client/server compliance with the spec. What state is it in?
 - Andrew: Just getting back to it now.
   [A] Andrew will talk with Chris D. to set up a web interface at
biodas.org

* Apollo

Suzi: Can't talk about Apollo now. Will wait until Nomi is available.
[A] Nomi will present Apollo at the 28 Nov DAS/2 weekly meeting.

Status Reports
--------------

Gregg:

* CSHL Genome Informatics meeting summary of DAS/2-relevant things.
 - Gave talk about DAS/2 and demoed IGB. Went well.
 - Held a DAS BOF that was well-attended (n=15).
   Questions people had about DAS/2 have already been addressed.
   [A] Gregg will write up his CSHL DAS BOF notes and post.

   Discussion centered around what Sanger & EBI are doing with DAS.
   o There are lots of DAS-related projects there.
   o We'd like to have tighter linkage between DAS folks in the states
     and those in the the UK.
     [A] Andrew will visit the UK DAS folks more often.
     Ideas:
     + Help them transition to DAS/2
     + Hold "DASathon" or jamboree there
   o People: Tim Hubbard, Thomas Down, Andreas Prlic
   o Projects: 
     + Serving up 3D structures using modified DAS/1 server (SPICE)
     + Serving up protein annotations using modified DAS/1 server
     + Registry & discovery system for DAS/1 server
       This is SOAP-based. We'd like to have a non-SOAP-based system
       for DAS/2, which follows REST principles.
       - Andreas could likely create an HTTP-based alternative to his
         SOAP system, which uses the same core.
       - [A] Andrew will talk with Andreas P about non-SOAP reg/discovery
       - [A] DAS/2 grant needs progress on reg/discovery w/in next 6 mos

* Grant (DAS/2 continuation)
  Lots of modifications were made just prior to submitting on 1 Nov.
  Some of the changes include:
 - Work closely with Sanger and EBI where they've done lots of work
   (3D structure and protein DAS).
 - More of a mechanism will be in place to drive the spec forward:
   o Andrew = designated 'spec czar' - makes ultimate decisions
   o Lincoln = designated 'spec godfather' - retains veto power
   
Andrew:

* Brought up the header issue from the spec discussion on the list
  this week. 
 - Doesn't like the idea for 4 additional DAS-specific fields
    (error code, das version, server name, and something else)
 - Alternative: server returns content-type: application/x-das-error
 - Advantages: 
   o no new header
   o simplified header -- just check the http error code in the
     content-type. 
   o easier to implement
   o enables a flatfile-based server
   o Fits with REST philosophy of using HTTP as an application
     protocol, not a transport protocol.
 - Ed E: Can't we just return an error section in the document?
   Andrew: We could, but it requires parsing the document and only
   works for XML formats that we're in control of.
 - Gregg: The advantages of having metadata in the header outweighs
   the advantages of enabling a flatfile-based server.
   Andrew: We can utilize the existing header
   Ed E: Piggybacking error codes causes problems with proxy servers
   (see email on the DAS/2 discussion list).
 - Decision:
   [A] Use standard HTTP error codes; use XML to specify error details.
   E.g., server status=200
         content= error document
   Steve: When reviewing spec, encountered potential issues surrounding
   relationship between HTTP and DAS-specific error codes. Using
   standard HTTP codes will obviate this issue.
   Also noted that there's a bugzilla entry regarding error codes
   (which is now moot):
   http://bugzilla.open-bio.org/show_bug.cgi?id=1784
 - Ed E: MSIE hides or modifies content based on certain HTTP error
   codes it gets. This has important implications on windows platforms
   where IE's behavior can get in the way of other network-aware
   applications that don't even (knowingly) use IE.


From Steve_Chervitz at affymetrix.com  Sat Nov 12 01:52:15 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Fri, 11 Nov 2005 17:52:15 -0800
Subject: [DAS2] DAS/2 weekly meeting notes for 10 Nov 05
In-Reply-To: <BF9A7C9D.17E26%Steve_Chervitz@affymetrix.com>
Message-ID: <BF9A8ACF.17E3A%Steve_Chervitz@affymetrix.com>


> Content-type issue:
>  - Should we use text/xml or application/x-das-blah+xml?
>  - Consensus: use application/x-das-blah+xml
>  - [A] Steve will rollback changes made to the retrieval spec.

Done, but I noticed that we had been using text/x-das-blah+xml rather than
application/x-das-blah+xml. I left it as text for now, although
'application' seems more correct according to the RFC on MIME media types,
http://www.rfc-editor.org/rfc/rfc2046.txt which states:

text -- textual information. ...
          Other subtypes [i.e., anything besides 'plain'] are to
          be used for enriched text in forms
          where application software may enhance the
          appearance of the text...

application -- some other kind of data, typically
          either uninterpreted binary data or information to be
          processed by an application.  ...

Steve


From dalke at dalkescientific.com  Mon Nov 14 11:47:09 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 14 Nov 2005 12:47:09 +0100
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>
References: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>
Message-ID: <da75232ee9d46453153125e15fa7c8eb@dalkescientific.com>

Steve:
> I raised some other issues regarding types and feature properties etc.  
> a
> couple of weeks ago that I'd like you to chime in on:
> http://portal.open-bio.org/pipermail/das2/2005-October/000271.html
>
> The latest message on this thread is:
> http://portal.open-bio.org/pipermail/das2/2005-November/000278.html

I'll take them part by part.

That last message suggested

   <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00"
             xmlns:das="http://www.biodas.org/ns/das/genome/2.00"
             xml:base="http://www.wormbase.org/das/genome/volvox/1/"
             xmlns:xlink="http://www.w3.org/1999/xlink"
              
das:prop="http://www.biodas.org/ns/das/genome/2.00/properties">
     <FEATURE das:id="feature/cTel54X.1.2"
              das:type="type/curated_exon">
       <PROP das:ptype="property/genefinder-score">29</PROP>
       <PROP das:ptype="das:prop#phase">2</PROP>
       <PROP das:ptype="das:prop#protein_translation"
             xlink:type="simple"
    
xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
CTEL54X.1
/>
     </FEATURE>


I couldn't figure out why the "das:" namespace was needed for the
attributes.  Why can't they be in the default namespace?

The "das:" in the value of an attribute doesn't know anything about
the currently defined namespaces.  So this "das:" must be something
completely different from the xmlns:das=... definition.

>  * the values of the 'das:id', 'das:type', and 'das:ptype' attributes
>    are URLs relative to xml:base unless they begin with 'das:prop#', in
>    which case they are relative to the das:prop namespace.

And from what I can tell about XML, there's no standard way to implement
this using one of the standard XML parsers.  How do you get the das:prop
namespace for a given element?  The parser often does the expansion
for you.  Eg, in one of the Python XML parsers it does the translations
into Clark notation, like

   {http://www.biodas.org/ns/das/genome/2.00}ptype

For more info on XML namespaces, see http://www.jclark.com/xml/xmlns.htm


					Andrew
					dalke at dalkescientific.com


From ap3 at sanger.ac.uk  Mon Nov 14 13:29:26 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon, 14 Nov 2005 13:29:26 +0000
Subject: [DAS2] Re: what info is needed for DAS/2 registration?
In-Reply-To: <955da4ae7783e60944687d86ec691e51@dalkescientific.com>
References: <955da4ae7783e60944687d86ec691e51@dalkescientific.com>
Message-ID: <81fdf1e73ee85ae55550f12ddcee13cf@sanger.ac.uk>

Hi Andrew!

>   Looks like I will be more involved with the DAS/2 spec development,
> and I'll be visiting the UK more often.

good!

>   I want to make sure that the spec includes more of what's
> needed for registration.


o.k. very good, let's go through your mail:

>  My thought is to let the registration
> system be able to query the DAS/2 server to get most of the fields
> it needs, if not all.

o.k.

>   There may still be some need to override the
> definitions,

The experience from doing the das1 registry tells
that some corrections are needed every now and then.  It seems to be  
inevitable
  that sometimes users make mistakes / inaccuracies, etc.

> so at the manual registration level this will be used
> more to pre-populate an entry with a default.

sounds good. - so this means the configuration for setting up a DAS  
source will
get a little bigger.

> In looking at the manual registration page I see the following,
> along with comparisons to the existing DAS/2 spec
>
>  ** Title/Nickname

used by DAS clients for the display of the das tracks

>  ** Description

for the user to get a quick grasp what the data is about. - we have 60  
sources in the registry
  by now and we expect to be up around 100 soon, so one needs a way to  
learn which of the
sources are serving the data which is of particular interest ...

>  ** URL for more detailed description

a link back to the homepage  of the project that provides the data

>
> DAS/2 does not have this information for the service as a whole.
> It does have it for each of the databases, somewhat.  Here is
> an example from the spec.
>
>   <SOURCE id="volvox" description="Volvox Example Database"
>           taxon="http://www.ncbi.nlm.nih.gov/taxon-browser?id=29118"
>      
> doc_href="http://www.wormbase.org/documentation/users_guide/ 
> volvox.html" >
>
>
> Should we add a "title" field to each data source?

yes that would be good

> Should we
> add title/description/url fields to the DAS/2 service as a whole?

not sure what you mean by that

>   ** coordinate system
>
> Each data source may have 1 or more versions.  The version information
> looks like
>      <VERSION id="volvox/1" description="Build 1, October 2002">
>        <ASSEMBLY id="http://www.ensembl.org/das/genome/vv116" />
>      </VERSION>>
>
> In theory that assembly id could be a URL with more detailed
> information about the assembly.  Right now it's used as a unique
> identifier.  There is nothing there to convert these URLs into
> something human-readable.

Hm. not sure if I am completely convinced with representing a  
coordinate system as a url.
What  if two reference servers provide the same assembly or are mirrors  
of each other?

I would see it  in a way where a DAS client would asks the registry  
"where are all the reference servers
for  NCBI 35- homo sapiens?"
  and then gets a list providing e.g. an american and a european mirror  
server
the client could choose the one which is geographically closer.


>
> Possible solutions for this are:
>   - define an "assembly" document, to be put at that URL and
>      include the authority/version/type/organism data mentioned at
>      http://das.sanger.ac.uk/registry/help_coordsys.jsp

something like that.


>  ** DAS url
>
> Yep, DAS/2 has that one.  :)

:-)

>
>   ** Admin email
>
> Hmm.  Yeah, there should be more information about the service as
> a whole.  Admin email and perhaps a documentation href, eg, with
> information about planned downtime.

would be good.

>
>   ** DAS capabilities
>
> That's handled differently in DAS/2.  Did people really use this
> information?

actually this information  is important (for das1) - it is used to  
distinguish reference servers
and annotation servers ( on the client side)
and needed for validation (on the registry side)
"capabilities" are also related to data-types. E.g. a genome DAS client  
does not need
to query a protein structure, because it can not do 3D...

>   ** Test access/ segment code labels

I think there is a misunderstanding here:
the test code is not a  "label"
The test code is e.g. a chromosomal segment or an accession code for a  
protein database
for which annotations are returned if a feature request is being made.

The "label" is used mainly to describe by which project a source is  
being funded.

>> We are currently discussing if the labels should be used to describe
>> a DAS source in more detail. e.g. "experimentally verified",
>> "computational prediction", etc.
>
> These are two different things in one field.

yes you are very right. Together with the BioSapiens DAS people we  
recently decided that there
should be the possibility to assign gene-ontology evidence codes to  
each das source, so in the next
update of the registry, this will be changed.


>
> What I'm going to propose is a generic key/value data structure
> for just about all records.  Some of the key names will be well
> defined.  Others can add new fields to experiment with / extend
> the spec in a semi-constrained fashion.  This would let people
> try out a new property easily.

sounds good.

> In summary it sound like DAS/2 needs:
>   - a few more pieces of meta data (eg, information about the
>       service as a whole)
>   - a bit better defined way to get information about the
>       reference assembly
>

I would agree to both that

Greetings,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From Gregg_Helt at affymetrix.com  Mon Nov 14 17:09:11 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 14 Nov 2005 09:09:11 -0800
Subject: [DAS2] DAS/2 teleconference at 9:30 AM today PST
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FABC@msex02.affymetrix.com>

Just a reminder that we've rescheduled the weekly DAS/2 teleconference
for Mondays @ 9:30 AM Pacific time, starting today.  I'm hoping the new
time will give more people a chance to participate.  Teleconference
numbers:
 
US dialin:                      800-531-3250
International dialin:         303-928-2693
Conference ID:               2879055
 
 We're also revising the format to focus on alternating weeks on the
DAS/2 specification itself or implementations of the specification.
This should allow people who are mainly concerned about one or the other
to avoid extra overhead.  Today we will focus on spec issues.
 
            thanks,
            Gregg Helt
 

From lstein at cshl.edu  Mon Nov 14 17:23:18 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Nov 2005 12:23:18 -0500
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <725e762a203211651d1850097ae3fcc0@dalkescientific.com>
References: <BF9670D9.17B05%Steve_Chervitz@affymetrix.com>
	<43739296.4030307@affymetrix.com>
	<725e762a203211651d1850097ae3fcc0@dalkescientific.com>
Message-ID: <200511141223.19367.lstein@cshl.edu>

Well, I give up arguing this one and will go with the way Andrew wants to do 
it. Therefore I propose the following rules:

	1) Return the HTTP 404 error for the case that any component of the DAS2 path
	is invalid. This would apply to the following situations:

		Bad namespace
		Bad data source
		Unknown object ID

	2) Return HTTP 301 and 302 redirects when the requested object has
moved.

	3) Return HTTP 403 (forbidden) for no-lock errors.

	4) Return HTTP 500 when the server crashes.

For all errors there should be a text/x-das-error entity returned that 
describes the error in more detail. 

Lincoln
	

On Thursday 10 November 2005 04:45 pm, Andrew Dalke wrote:
> Further refining this from today's phone meeting
>
> Ed:
> > For a user (let's call her Varla) using IE, the browser will intercept
> > some error codes and present her with some IE-specific garbage,
> > throwing away any content that was sent back in addition to the error
> > code.
>
> The case Ed came across was from an in-house group using a Windows call
> out to IE as a background process to fetch a web page.  In that case
> (as I understand it) it would convert HTTP error responses into its own
> error messages.
>
> Ed couldn't during the conversation recall if it was possible to
> get ahold of the error code at all.  Did they have to parse the output?
>
> > Even for a user (Marla this time) using IGB, firewalls and/or caching
> > and/or apache port-forwarding mechanisms can throw out anything with a
> > status code in the error range.
>
> 404 gets through, yes?
>
> All of those are supposed to be transparent to error codes, or at the
> very least translate them from (say) 404 to 400.
>
> Can anyone point me to some reports of one of these mishaps?
>
> We definitely need to have some tie-ins with the HTTP error codes.
> Consider these two implementations for getting
>
> http://example.com/das2/genome/dazypus/1.43/
>
> (Note the typo "dazypus" -> "dasypus")
>
> A) One system might have all "/das2" URLs forwarded to a DAS server.
>
> B) Another might have a handler only for "/das2/genome/dasypus" and
> let Apache do the rest.
>
> In case A) the DAS server sees that the given resource doesn't exist.
> It needs to return an error.  It can return either "200 Ok" followed
> by a DAS error payload, or return a "404 Not Found" at the HTTP level.
>
> In case B) the request never gets to the DAS handler because
> of the typo.  Apache sees there's nothing for the resource so returns
> a "404 Not Found".
>
> The client code is easier if it can check the HTTP error code and
> stop on failure.  This means it's best for case A) for the DAS/2
> server to return an HTTP error code of 404, and perhaps an optional
> ignorable payload.
>
> > (I did test having the NetAffx DAS server send HTTP status codes, and
> > I did have problems with that in IGB, though I've forgotten the
> > specifics.  It was about a year ago....)
>
> Do you have the specifics perhaps in an old email somewhere?
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Mon Nov 14 17:28:10 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 14 Nov 2005 12:28:10 -0500
Subject: [DAS2] Re: New problem with content-type header in DAS/2 server
	responses!
In-Reply-To: <da75232ee9d46453153125e15fa7c8eb@dalkescientific.com>
References: <BF9697AF.17B43%Steve_Chervitz@affymetrix.com>
	<da75232ee9d46453153125e15fa7c8eb@dalkescientific.com>
Message-ID: <200511141228.11358.lstein@cshl.edu>

On Monday 14 November 2005 06:47 am, Andrew Dalke wrote:
> Steve:
> > I raised some other issues regarding types and feature properties etc.
> > a
> > couple of weeks ago that I'd like you to chime in on:
> > http://portal.open-bio.org/pipermail/das2/2005-October/000271.html
> >
> > The latest message on this thread is:
> > http://portal.open-bio.org/pipermail/das2/2005-November/000278.html
>
> I'll take them part by part.
>
> That last message suggested
>
>    <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00"
>              xmlns:das="http://www.biodas.org/ns/das/genome/2.00"
>              xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>              xmlns:xlink="http://www.w3.org/1999/xlink"
>
> das:prop="http://www.biodas.org/ns/das/genome/2.00/properties">
>      <FEATURE das:id="feature/cTel54X.1.2"
>               das:type="type/curated_exon">
>        <PROP das:ptype="property/genefinder-score">29</PROP>
>        <PROP das:ptype="das:prop#phase">2</PROP>
>        <PROP das:ptype="das:prop#protein_translation"
>              xlink:type="simple"
>
> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/
> CTEL54X.1
> />
>      </FEATURE>
>
>
> I couldn't figure out why the "das:" namespace was needed for the
> attributes.  Why can't they be in the default namespace?

The extras das: prefix is not needed since it is the same namespace as the 
default namespace.

My feeling is that we should NOT be using namespaces in attribute names but 
not in attribute values (e.g. das:ptype is ok, but "das:prop#phase" is not 
OK). For attribute values we should be using URIs consistently.

Lincoln

> The "das:" in the value of an attribute doesn't know anything about
> the currently defined namespaces.  So this "das:" must be something
> completely different from the xmlns:das=... definition.
>
> >  * the values of the 'das:id', 'das:type', and 'das:ptype' attributes
> >    are URLs relative to xml:base unless they begin with 'das:prop#', in
> >    which case they are relative to the das:prop namespace.
>
> And from what I can tell about XML, there's no standard way to implement
> this using one of the standard XML parsers.  How do you get the das:prop
> namespace for a given element?  The parser often does the expansion
> for you.  Eg, in one of the Python XML parsers it does the translations
> into Clark notation, like
>
>    {http://www.biodas.org/ns/das/genome/2.00}ptype
>
> For more info on XML namespaces, see http://www.jclark.com/xml/xmlns.htm
>
>
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dalke at dalkescientific.com  Mon Nov 14 17:30:07 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 14 Nov 2005 18:30:07 +0100
Subject: [DAS2] Spec issues
In-Reply-To: <BF857B92.17029%Steve_Chervitz@affymetrix.com>
References: <BF857B92.17029%Steve_Chervitz@affymetrix.com>
Message-ID: <05b94e3a6db3e4894af051f22f25dc4c@dalkescientific.com>

On Nov 4 Steve wrote:
>     <FEATURE das:id="feature/cTel54X.1.2"
>              das:type="type/curated_exon">
>       <PROP das:ptype="property/genefinder-score">29</PROP>
>       <PROP das:ptype="das:prop#phase">2</PROP>
>       <PROP das:ptype="das:prop#protein_translation"
>             xlink:type="simple"
>    
> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
> CTEL54X.1
> />
>     </FEATURE>

I think we're missing something.  This is XML.  We can do

<TYPES xml:base="http://www.wormbase.org/dase/genome/volvox/1/type">
   <TYPE id="curated_gene"
           ontology="http://song.sf.net/ontologies/sofa#gene"
           source="curated"
           xml:base="gene/">
     <das:ptype name="property/genefinder-score">29</das:ptype>
     <das:phase>2</das:phase>
     <das:protein_translation xlink:type="simple"
xlink:href="http://www.wormbase.org/..." />
     <xyz:ack type="html">This message brought to you by  
AT&amp;T</xyz:ack>
   </TYPE
</TYPES>

The whole point of having namespaces in XML is to keep from needing
to define new namespaces like <PROP>.

In doing that, there's no problem in supporting things like "bg:glyph",
etc. because the values are expanded as expected by the XML processor.


> Also, we might want to allow some controlled vocabulary terms to be  
> used for
> the value of type.source (e.g., "das:curated"), to ensure that  
> different
> users use the same term to specify that a feature type is produced by
> curation.

I talked with Andreas Prlic about what other metadata is needed for the
registry system.  He mentioned

     Together with the BioSapiens DAS people we recently decided that
     there should be the possibility to assign gene-ontology evidence
     codes to each das source, so in the next update of the registry,
     this will be changed.

That's at the source level, but perhaps it's also needed at the
annotation level.

> The spec also seems alarmed by the existence of a xml:base attribute  
> in the
> TYPE element. The idea is that any relative URL within this element  
> would be
> resolved using that element's xml:base attribute. How would folks be  
> with
> having the DAS/2 spec fully support the XML Base spec (
> http://www.w3.org/TR/xmlbase/ )? The result of this would be to add an
> optional xml:base attribute to all elements that contain URLs or  
> subelements
> with URLs.

In my reading it seems that xml:base should be included wherever.  See


http://norman.walsh.name/2005/04/01/xinclude
> Ugh.  In the short term, I think there's only one answer: update your  
> schemas to allow xml:base either (a) everywhere or (b) everywhere you  
> want XInclude to be allowed. I urge you to put it everywhere as your  
> users are likely to want to do things you never imagined. ?
>


> Description: Properties are typed using the ptype attribute. The value  
> of
> the property may be indicated by a URL given by the href attribute, or  
> may
> be given inline as the CDATA content of the <PROP> section.
>
> <FEATURES xml:base="http://www.wormbase.org/das/genome/volvox/1/">
>   <FEATURE id="feature/cTel54X.1.2"
>                    type="type/curated_exon">
>     <PROP ptype="property/genefinder-score">29</PROP>
>     <PROP ptype="das:phase">2</PROP>
>     <PROP ptype="property/protein_translation"
>                href="/das/protein/volvox/2/feature/CTEL54X.1" />
>   </FEATURE>
> </FEATURES>
>
> So in contrast to the TYPE properties which are restricted to being  
> simple
> string-based key:value pairs, FEATURE properties can be more complex,  
> which
> seems reasonable, given the wild world of features. We might consider  
> using
> 'key' rather than 'ptype' for FEATURE properties, for consistency with  
> TYPE
> prop elements (however, read on).


My thoughts on these are:
   - come up with a more consistent way to store key/value data
   - the Atom spec has a nice way to say "the data is in this CDATA
as text/html/xml" vs. "this text is over there".  I want to copy its
way of doing things.

   - I'm still not clear about xlink.  Another is the HTML-style
<link href="http://..." rel="...">

Atom uses the "rel=" to encoding information about the link.  For
example, the URL to edit a given document is

   <link ... rel="service.edit">

See http://atomenabled.org/developers/api/atom-api-spec.php


					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Mon Nov 14 19:29:22 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 14 Nov 2005 11:29:22 -0800
Subject: [DAS2] DAS/2 weekly meeting notes for 14 Nov 05
Message-ID: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 14 Nov 2005.

$Id: das2-teleconf-2005-11-14.txt,v 1.2 2005/11/14 19:20:37 sac Exp $

Attendees: 
  Affy: Steve Chervitz, Gregg Helt
  CSHL: Lincoln Stein
  UCBerkeley: Suzi Lewis
  Sweden: Andrew Dalke
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

----------------------------------
AD talked with A. Prlic about registry service, we want to incorporate
what he needs within DAS/2.

What they have:
- name (a few words) - for display of das track
- title, description (paragraph)  - synopsis
- url for more info

we have desc, id, doc_href, taxon
Therefore, we need name attribute
Need :
- name (mandatory)   (done - LS: adding it to spec now)
- desc (optional)

Coord system reg server:
* in das/2 - it's not optional (0 interbase)
* they find this important

We have confusion between assembly and reference server
LS: Need URI that points to assembly, independent of the
reference server. 
GH: Would like to have annot servers that don't know anything about
the ref server.

LS: Could use the region URI to ID the assembly
das/genome/sourceid/region = assembly id/uri

GH: The trouble is that NCBI is a ref source for many assemblies, yet
they lack a das sever. They have no URI.
LS: we can just make one up, or use most appropriate web page

LS: When you request versioned source from a server, it should say what
assembly coords it's working on and give a uri for that. In this case
there's no guarantee you can do a 'get' on that URI.
We want to say:
1- what is unique uri for assembly (everyone agrees to share this)
2- das URL for how to fetch it (some server's region url - trusted,
faithful copy with what is at ncbi). Diff servers could assert that
you can fetch it from various places.

GH: assembly could be an attribute since there'd be only one.
A list of ref servers that serve up that dna.

LS: in versioned source response. new section between capabilities and
namespaces called 'reference_sources'. Add 'assembly' attribute to
version element:
<version
   id=
   desc=
   assembly="" uri that describes assembly - mandatory
   
<reference_sources
   - tells you where to get dna and regions (could be self)
   - contains zero or more subelements --allowing for multiple sources
   where to go to get sequence, region
   

AD: consider ATOM 'link' tag, designed for links to other stuff
  includes 'rel' attribute about how it is linked (e.g., could say:
  use this url to fetch assembly)

GH: these two cases are special enough that they deserve their own
elements and attributes

purpose: if you need to retrieve residues, it tells you the base uri
to go to get the residues.

AD: Don't we already have the sequence request for that?
GH: only reference servers need implement it.

LS: All we need to do is name the assembly in the
versioned sources response

AD: ebi/sanger tracks three fields related to assembly (what they need
per server):
-authority  = equiv to our assembly uri
-organism   = we have as taxon
-type       = ?

Permits people to query things like: find out all servers that offer ncbi
build 35 for human.

Question: What do they mean by 'coord system'? some confusion here
e.g., Do they mean things like: 'this assembly start at 5000 relative
to this other assembly'?

For protein DAS, authority typically defines two diff coord systems:
'pdb resnum, interprot'

It does not permit automated translation between two coord systems.
[A] - Andrew will find out what they use it for

AD: Believes the purpose is intended for human consumption.

LS: an easy fix to a long and persistent problem about identifying which
coord system was used. Can also use for taxon indexing. e.g. at ucsc -
select organism, select assembly of that organism. Could expand it for
kingdom, phyllum, family, etc. We should use the ncbi taxon id.

AD: Where does it go?

LS: eg. bos taurus,
    taxon_id=url to ncbi taxon id page
    name=bos taurus

GH: Coord system type is still unaccounted for. Is this describable by
seq ontology?  
LS: yes but why is it important?
AD: there are 2-3 diff coord systems for protein structure DAS

GH: they have contig, chrom, scaffold - so they're looking at which
level of the assembly is being annotated.

SC: Is there a use case for alternative coord systems in DAS/2 - do we
want to permit people to offer sequence in other than 0 interbase?
GH, AD: No.

GH: is /genome the highest level we want to go?

AD: at top level (sources response), would like to add more info:
- administrative contact: email, url for admin of server
- may want to put other things:
  - pointer to license agreement for use of this data, copyright,
    liability statements. attribute="legalese" href
 - doesn't need to be machine readable
LS: each data source may have a different leagalese (ebi has 100 diff
dbs). Each db may be under control of a diff group.
Should it go in sources or version tag?
AD: sources

GH: put off content-type, status code discussion until next time.
Looking at the http spec itself right now. A surprisingly good read.
well thought out (unlike some of the xml stuff).

LS: Would like to fix a mistake regarding the confounding of
namespaces and xml:base. Want to be consistent here.
- all attrib names use namespaces
- attrib values use relative uri's (xml:base)

SC: See my post here, which also addresses the handling of attribute
values that derive from a controlled vocabulary:
http://portal.open-bio.org/pipermail/das2/2005-November/000278.html
and Andrew's response today:
http://portal.open-bio.org/pipermail/das2/2005-November/000313.html

We need to address remaining spec issues in a separate call.
[A] Continue spec-focused teleconf in two weeks (28 Nov):
- namespaces/xml:base
- http header and status code
- anything else that comes up on the das/2 list.

[A] Next week (21 Nov): Discuss impl details about client, server,
validation suite

Future agenda: impl of writeback features (would like to hear from
ebi/sanger) 


From Steve_Chervitz at affymetrix.com  Mon Nov 14 23:33:09 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 14 Nov 2005 15:33:09 -0800
Subject: [DAS2] Re: New problem with content-type header in DAS/2
	server responses!
In-Reply-To: <da75232ee9d46453153125e15fa7c8eb@dalkescientific.com>
Message-ID: <BF9E5EB5.17F71%Steve_Chervitz@affymetrix.com>


Andrew Dalke <dalke at dalkescientific.com> wrote on 14 Nov 05:
> Steve:
>> I raised some other issues regarding types and feature properties etc.
>> a
>> couple of weeks ago that I'd like you to chime in on:
>> http://portal.open-bio.org/pipermail/das2/2005-October/000271.html
>> 
>> The latest message on this thread is:
>> http://portal.open-bio.org/pipermail/das2/2005-November/000278.html
> 
> I'll take them part by part.
> 
> That last message suggested
> 
>    <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00"
>              xmlns:das="http://www.biodas.org/ns/das/genome/2.00"
>              xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>              xmlns:xlink="http://www.w3.org/1999/xlink"
>               
> das:prop="http://www.biodas.org/ns/das/genome/2.00/properties">
>      <FEATURE das:id="feature/cTel54X.1.2"
>               das:type="type/curated_exon">
>        <PROP das:ptype="property/genefinder-score">29</PROP>
>        <PROP das:ptype="das:prop#phase">2</PROP>
>        <PROP das:ptype="das:prop#protein_translation"
>              xlink:type="simple"
>     
> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/
> CTEL54X.1
> />
>      </FEATURE>
> 
> 
> I couldn't figure out why the "das:" namespace was needed for the
> attributes.  Why can't they be in the default namespace?

Attributes don't have a default namespace (though one might think such a
thing would be useful). See http://www.w3.org/TR/REC-xml-names/#defaulting

This is a point which has been subject to much consternation:
http://www.rpbourret.com/xml/NamespacesFAQ.htm#q5_3
http://lists.xml.org/archives/xml-dev/200002/msg00094.html
 
> The "das:" in the value of an attribute doesn't know anything about
> the currently defined namespaces.  So this "das:" must be something
> completely different from the xmlns:das=... definition.

No, it refers to the xmlns:das definition in the parent FEATURES element.
 
>>  * the values of the 'das:id', 'das:type', and 'das:ptype' attributes
>>    are URLs relative to xml:base unless they begin with 'das:prop#', in
>>    which case they are relative to the das:prop namespace.
> 
> And from what I can tell about XML, there's no standard way to implement
> this using one of the standard XML parsers.  How do you get the das:prop
> namespace for a given element?

You've identified the key weakness of my proposal: Knowing how to expand
'das:prop' occurring within attribute values would be a DAS-specific
convention ('hack') for mapping to a controlled vocabulary for property
values. So I'm not quite satisfied with this either.

In another message of yours today, you propose an alternative to this:
http://portal.open-bio.org/pipermail/das2/2005-November/000313.html

See my reply to that for more ideas on this topic.

Steve


From td2 at sanger.ac.uk  Tue Nov 15 09:14:01 2005
From: td2 at sanger.ac.uk (Thomas Down)
Date: Tue, 15 Nov 2005 09:14:01 +0000
Subject: [DAS2] DAS/2 weekly meeting notes for 14 Nov 05
In-Reply-To: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>
References: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>
Message-ID: <21CB947F-FAE3-4D56-A110-CAB9606C9C84@sanger.ac.uk>


On 14 Nov 2005, at 19:29, Steve Chervitz wrote:
>
> Coord system reg server:
> * in das/2 - it's not optional (0 interbase)
> * they find this important

By "coordinate system" we're not really talking about the 0-based- 
vs-1-based issue, we're talking about globally unique names for sets  
of reference sequences (genome assemblies, protein databases,  
whatever).  It might be possible to come up with a better name (I  
used to call these "namespaces").

> We have confusion between assembly and reference server
> LS: Need URI that points to assembly, independent of the
> reference server.
> GH: Would like to have annot servers that don't know anything about
> the ref server

Definitely agree with this.  This kind of "opaque assembly  
identifier" is what we've been calling a coord-system name.

> LS: Could use the region URI to ID the assembly
> das/genome/sourceid/region = assembly id/uri
>
> GH: The trouble is that NCBI is a ref source for many assemblies, yet
> they lack a das sever. They have no URI.
> LS: we can just make one up, or use most appropriate web page

This is possibly an argument for avoiding the use of URLs for  
assembly identifiers, if we can't be sure that the organisation  
that's the authority for a given assembly will be running an  
authoritative DAS server.  URNs would be fine, as would the kind of  
structured but location-independent identifer that Andreas has been  
using.

> Question: What do they mean by 'coord system'? some confusion here
> e.g., Do they mean things like: 'this assembly start at 5000 relative
> to this other assembly'?

I think the way to provide this kind of information is in the form of  
a DAS alignment service between two coord-systems.  We love the idea  
of putting up alignments between NCBI34 and NCBI35 then having a  
liftover-like tool which can go off and query the registry to  
discover this.

           Thomas.


From ap3 at sanger.ac.uk  Tue Nov 15 10:24:45 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 15 Nov 2005 10:24:45 +0000
Subject: [DAS2] DAS/2 weekly meeting notes for 14 Nov 05
In-Reply-To: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>
References: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>
Message-ID: <c7f1dae793169f3d3c8dbe3be3caab82@sanger.ac.uk>

Hi!

I realized there were a couple of questions regarding the way 
"coordinate systems"
are defined in the DAS-registry, so it would have been good if I would 
have joined yesterday....
I am glad that the conference is now at a time which is better for us 
europeans and want to join
in future for some of the topics like registry, coordinate systems, 
proteins, etc.


>
> AD: ebi/sanger tracks three fields related to assembly (what they need
> per server):
> -authority  = equiv to our assembly uri
> -organism   = we have as taxon
> -type       = ?


"type" refers to a "physical dimension" of an object. E.g. a 
chromosome, a 3D protein structure, a protein sequence.


>
> Permits people to query things like: find out all servers that offer 
> ncbi
> build 35 for human.
>
> Question: What do they mean by 'coord system'? some confusion here
> e.g., Do they mean things like: 'this assembly start at 5000 relative
> to this other assembly'?

no, as Thomas already mentioned these "coordinate systems" could also 
be called "namespace".
They should be globally unique descriptors for reference objects / 
databases.


>
> For protein DAS, authority typically defines two diff coord systems:
> 'pdb resnum, interprot'

> It does not permit automated translation between two coord systems.

unfortunately this is not that easy in protein space. The mapping from 
the 3D protein structure to the protein
sequence is not straightforward. Think of negative, non-consecutive, 
and "non-numeric" residue numbers
  that can appear in the 3D structures. Therefore we came up with the 
"alignment" DAS - document that allows to map one object in one 
coordinate system to another one. it can also be used to map one 
assembly to another.


> [A] - Andrew will find out what they use it for
>
> AD: Believes the purpose is intended for human consumption.

not only - the DAS clients usually can display a certain "coordinate 
system" e.g. Ensembl can do
Chromosomal ones, but if DAS sources are available that speak the 
"UniProt, Protein Sequence" coordinate
system, it knows how to project these onto the genome. - an 
"intelligent DAS client" :-)


Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From dalke at dalkescientific.com  Thu Nov 17 02:35:32 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 17 Nov 2005 03:35:32 +0100
Subject: [DAS2] (x)link
Message-ID: <dd6e5bc9907e57b3a325d0f612f58c9c@dalkescientific.com>

I mentioned having a generic <link> tag, again based on Atom.

Steve replied:
> Not sure about this one yet. In the Atom API, the value of the rel
> attribute is restricted to a controlled vocabulary of link
> relationships and available services pertaining to editing and
> publishing syndicated content on the web:
> http://atomenabled.org/developers/api/atom-api- 
> spec.php#rfc.section.5.4.1
>
> What would a controlled vocab for DAS resources be?

I don't think I understand the Atom one.  Turns out I was actually
looking at the Atom publishing protocol at
    http://code.blogger.com/archives/atom-docs.html

which defines links including

   <link rel="service.feed" href="https://www.blogger.com/atom/1"
         title="my first blog." type="application/x.atom+xml" />
   <link rel="service.post" href="https://www.blogger.com/atom/1"
         title="my first blog." type="application/x.atom+xml" />
   <link rel="service.feed" href="https://www.blogger.com/atom/2"
         title="fromage blog" type="application/x.atom+xml" />
   <link rel="service.post" href="https://www.blogger.com/atom/2"
         title="fromage blog" type="application/x.atom+xml" />

     The service.post is the URI where you would send an Entry
    to post to your blog. The service.feed is the URI where you
    would make an Atom API request to see the Blog's latest entries.

We could define similar links like:
   - where to edit and/or lock the given resource
   - how to get a list of locks
   - how to get from the given DAS resource to it's
      parent (ie, how to go "up" in the tree, in the case
      of a cross-link from another server)

These could be done as distinct elements or done as qualifications
of an existing element.  The advantage of the latter (using a <link>)
is that others may add their own link types.


> Skimming through the DAS/2 retrieval spec, our use of hrefs is
> simply for pointing at the location of resources on the web
> containing some specified content (e.g., documentation, database
> entry, image data, etc.).

But they are used in different contexts (for human browsing, for
machine fetching, for "service" requests).

> The next/prev/start idea for Atom might have good applicability in the
> DAS world for iterating through versions of annotations or assemblies
> (e.g., rel='link-to-gene-on-next-version-of-genome'). One relationship
> that would be useful for DAS would be 'latest', to get the latest
> version of an annotation.

Hmm.  So every annotation would have an optional <link> section?

In the current scheme do we always get the most recent version of
an annotation?  I didn't realize there was any way to get another
version, except if it's been edited while you weren't looking.

> DAS get URLs themselves seem fairly self-documenting (it's clear a
> given link is for feature, type, or sequence for example), so having a
> separate rel attribute may not provide much additional value for these
> links. But it might be handy for versioning and for DAS/2 writebacks.

I hadn't thought of versioning; I was thinking more of writebacks
an and how to find the parent.

I was also thinking of structure data where I might want the
experimental x-ray density data for a a given structure.  That
might be done like

  <link rel="density.experimental_xray" type="chemical/x-ccp4-edm"
     href="http://blah.blah/">

That's part of the newly submitted DAS proposal so should not really  
drive
this work.


Steve also mentioned xlink.  I've been looking at the spec but
still don't understand its implications.  There are several^H^Hmany
parts to the spec I don't understand, especially in the context of DAS.
locator? "arcrole"?  "actuate"?

Are all our links "simple"?  Do we use anything else because the
href?

Also, I see no mention in that spec of content-type.  One of the
things in the Atom spec is support (though not in the spec proper)
for alternate or multiple way to resolve a link

<link rel="something" title="This is a title">
   <x:mirror href="http://here/"/>
   <x:mirror href="http://there/"/>
   <x:mirror href="http://everywhere/"/>
</link>

or multiple formats

<link title="Look at the mighty squid">
   <x:alt href="http://squid.example.com/squid.gif" />
   <x:alt href="http://squid.example.org/squid.png" />
</link>

(That is, a <link> may contain subelements and these subelements,
if in something other than the "das" namespace, are free to
add variant meanings.)


					Andrew
					dalke at dalkescientific.com


From ilari.scheinin at helsinki.fi  Fri Nov 18 15:22:47 2005
From: ilari.scheinin at helsinki.fi (Ilari Scheinin)
Date: Fri, 18 Nov 2005 17:22:47 +0200
Subject: [DAS2] Getting individual features in DAS/1
Message-ID: <DDF6EB58-A976-473F-BDF3-42DDD0132522@helsinki.fi>

This mail is not really about DAS/2, but the web site says the  
original DAS mailing list is now closed.

I am setting up a DAS server that serves CGH data from my database to  
a visualization software, which in my case is gbrowse. I've already  
set up Dazzle that serves the reference data from a local copy of  
Ensembl. I need to be able to select individual CGH experiments to be  
visualized, and as the measurements from a single CGH experiment  
cover the entire genome, this cannot of course be done by specifying  
a segment along with the features command.

I noticed that there is a feature_id option for getting the features  
in DAS/1.5, but on a closer look, it seems to work by getting the  
segment that the specified feature corresponds to, and then getting  
all features from that segment. My next approach was to use the  
feature type to distinguish between different CGH experiments. As all  
my data is of the type CGH, I thought that I could use spare this  
piece of information for identifying purposes.

First I tried the generic seqfeature plugin. I created a database for  
it with some test data. However, getting features by type does not  
seem to work. I always get all the features from the segment in  
question.

Next I tried the LDAS plugin. Again I created a compatible database  
with some test data. I must have done something wrong the the data  
file I imported to the database, because getting the features does  
not work. I can get the feature types, but trying to get the features  
gives me an ERRORSEGMENT error.

I thought that before I go further, it might be useful to ask whether  
my approach seems reasonable, or is there a better way to achieve  
what I am trying to do? What should I do to be able to visualize  
individual CGH profiles?

I'm grateful for any advice,
Ilari


From ap3 at sanger.ac.uk  Fri Nov 18 16:54:27 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Fri, 18 Nov 2005 16:54:27 +0000
Subject: [DAS2] das registry and das2
Message-ID: <240aa4ff660b7427b6c463ffc10b1307@sanger.ac.uk>

Hi!

I would like to start a discussion of how to provide a proper DAS  
interface for
our das- registration server at http://das.sanger.ac.uk/registry/

Currently it is possible to interact with it using SOAP, or manually  
via the HTML
  interface.  We should also make it accessible using URL requests.

To get this started I would propose the following query syntax.  This  
might also
  provide another opportunity to have a discussion about the coordinate
  system descriptions.

If some of the used terms are unclear, there is some documentation at
http://das.sanger.ac.uk/registry/help_index.jsp

Regards,
Andreas


Request:
http://server/registry/list
http://server/registry/find? 
[keyword,organism,authority,type,capability,label]=searchterm

Response:

<meta>
	<dasSource >
		<id>DS_109</id>
		<nickname>myDasSource</nickname>
		<description>some free text</description>

		<contact name="A.Jones" email="jones at sanger.ac.uk" />
		<!-- would prefer to have to contact only one person, but why not  
several,.. -->
		<contact name="A.Brown" email="brown at sanger.ac.uk" />

		<coordinateSystem>
			<authority>NCBI</authority>
			<assemblyVersion>35</assemblyVersion>
			<type>chromosome</type>
			<organism>
				<name>Homo sapiens</name>
				<taxonId>9606</taxonId>
			</organism>

			<!-- the test accession code / segment id needs to be per coordinate  
system, since a few sources
				support multiple coord sys -->			
			<testCode>4:55349999,55749999</testCode>
		</coordinateSystem>

		<coordinateSystem>
			<!-- one could provide more info like: -->
			<authority href="http://www.ebi.ac.uk/uniprot/">UniProt</authority>
			<type>Protein Sequence</type>
			<testCode>P00280</testCode>
		</coordinateSystem>

		<capability>sequence</capability>
		<capability>features</capability>

		<last_updated>2005-Nov-16</last_updated>	
		<help href="http://www.ebi.ac.uk/uniprot/about.html">about  
uniprot</help>
		
		<label>Ensembl</label>
		<label>BioSapiens</label>		
	
	</dasSource>

	<dasSource>
	<!-- next one here -->
	</dasSource>
</meta>


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From dalke at dalkescientific.com  Fri Nov 18 18:00:12 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 18 Nov 2005 19:00:12 +0100
Subject: [DAS2] das registry and das2
In-Reply-To: <240aa4ff660b7427b6c463ffc10b1307@sanger.ac.uk>
References: <240aa4ff660b7427b6c463ffc10b1307@sanger.ac.uk>
Message-ID: <4569f5d3ff6658e5ead6b979e8b1fba9@dalkescientific.com>

Andreas Prlic:
> I would like to start a discussion of how to provide a proper DAS  
> interface for
> our das- registration server at http://das.sanger.ac.uk/registry/
>
> Currently it is possible to interact with it using SOAP, or manually  
> via the HTML
>  interface.  We should also make it accessible using URL requests.

One of the things Gregg and I talked about at ISMB was that the  
top-level
"das-sources" format is, or can be, identical to what's needed for the
registry server.

As it's structured now the top-level interface to a das2/genome URL
returns a list of sources.  Based on what you need for the registry,
we're going to add support for data about the source itself.

The resulting das-sources XML document is effectively identical to
what you're looking for.

Hence I think the top-level XML format for a DAS/2 service is
identical to the XML format for a registry server.

A difference is the support for searches across sources.  We
don't have that in DAS.

This is an example, btw, of how a generic <link> element could
be useful.  Suppose we don't add this in DAS/2.0.  The EBI
could do something like

<link rel="registry.search" href="" />

to say that the given url (which would be the current URL) also
supports a registry search interface.

Or we could have that all DAS/2 servers implement a search.
I don't think that should be a requirement.

> http://server/registry/list
> http://server/registry/find? 
> [keyword,organism,authority,type,capability,label]=searchterm

My proposal doesn't affect this.

Why do "find" and "list" take different URLs?  Another possibility
is that the same URL returns everything if there are no filters
in place.

Are multiple search terms allowed?  Boolean AND or OR?


					Andrew
					dalke at dalkescientific.com


From ap3 at sanger.ac.uk  Mon Nov 21 10:55:06 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon, 21 Nov 2005 10:55:06 +0000
Subject: [DAS2] das registry and das2
In-Reply-To: <4569f5d3ff6658e5ead6b979e8b1fba9@dalkescientific.com>
References: <240aa4ff660b7427b6c463ffc10b1307@sanger.ac.uk>
	<4569f5d3ff6658e5ead6b979e8b1fba9@dalkescientific.com>
Message-ID: <d1d2295efe505eea8c360ccb47d310de@sanger.ac.uk>

Hi Andrew,

> As it's structured now the top-level interface to a das2/genome URL
> returns a list of sources.  Based on what you need for the registry,
> we're going to add support for data about the source itself.
>
> The resulting das-sources XML document is effectively identical to
> what you're looking for.

that sounds good. I agree the description should look identical for  
both the
sources and the registry. If the sources are already properly described  
this also
makes it easier to "publish" them.

I think most of the fields in the registry are rather clear why they  
are there. The issue that might need
most discussion might be how to describe a coordinate system. This  
information
is important because a DAS client usually understands one or multiple  
coordinate systems.
E.g. Ensembl knows about Chromosomes and Clones,  but it can also  
display UniProt
annotations in some cases. Similar the SPICE DAS client can display  
annotations served in PDB-residue
numbering and UniProt coordinates, but does not know how to deal with  
genomic coordinates.
Therefore the "coordinate system" or "namespace" is an important part  
of the description of a DAS source.

What I found in the current spec-draft that comes closest to this issue  
is the different "domains"
e.g

http://server/das/genome/source/version/features

so I might want to say
http://server/das/genome/homosapiens/ncbi35/features
http://server/das/genome/musmusculus/ncbim34/features

or should it be
http://server/das/genome/ncbi/homosapiens35/features
http://server/das/genome/ncbi/musmusculus34/features
?

Hm. I am not sure, but it seems that one level is missing? - either  
organism or authority ?

The description of the data finally should allow to use the same DAS  
source in multiple DAS-clients.
Some validation will be required on the descriptions, to warn people  
that "homo sapiens" should not be
written as "human" or "homo". or more complicated: Ensembl does not do  
assemblies itself. The assembly
used is currently NCBI_35. Therefore "Ensembl" can not be used as an  
authority for a chromosomal
  coordinate system.
Currently the registry provides a restricted list of allowed coordinate  
systems, to keep this under control.


>> http://server/registry/list
>> http://server/registry/find? 
>> [keyword,organism,authority,type,capability,label]=searchterm
>
> My proposal doesn't affect this.
>
> Why do "find" and "list" take different URLs?  Another possibility
> is that the same URL returns everything if there are no filters
> in place.


yes - better use only one url.  no filters would return all sources.


>
> Are multiple search terms allowed?

yes

> Boolean AND or OR?

We can add a parameter where this can be chosen.

Greetings,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From dalke at dalkescientific.com  Mon Nov 21 17:06:25 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 21 Nov 2005 18:06:25 +0100
Subject: [DAS2] DAS/2 weekly meeting notes for 14 Nov 05
In-Reply-To: <c7f1dae793169f3d3c8dbe3be3caab82@sanger.ac.uk>
References: <BF9E2592.17F30%Steve_Chervitz@affymetrix.com>
	<c7f1dae793169f3d3c8dbe3be3caab82@sanger.ac.uk>
Message-ID: <90dff63fdc1e5b32ba97f8c18948758e@dalkescientific.com>

Going through the back emails to prepare for the conference call in 30 
minutes.

Andreas, replying to Steve's comment:
>> For protein DAS, authority typically defines two diff coord systems:
>> 'pdb resnum, interprot'
>
>> It does not permit automated translation between two coord systems.
>
> unfortunately this is not that easy in protein space. The mapping from 
> the 3D
> protein structure to the protein sequence is not straightforward. 
> Think of
> negative, non-consecutive, and "non-numeric" residue numbers that can 
> appear
> in the 3D structures. Therefore we came up with the "alignment" DAS - 
> document
> that allows to map one object in one coordinate system to another one. 
> it can
> also be used to map one assembly to another.

Regarding the structure mapping, when we visited the PDB in August they
said it's not a problem.  The mmCIF records have the information needed
for the mapping.

I've not looked into this though.

> not only - the DAS clients usually can display a certain "coordinate 
> system" e.g. Ensembl can do
> Chromosomal ones, but if DAS sources are available that speak the 
> "UniProt, Protein Sequence" coordinate
> system, it knows how to project these onto the genome. - an 
> "intelligent DAS client" :-)

I like the use case of "user wants to merge annotations from different 
servers.
As DAS currently doesn't have liftover support, the DAS client needs to 
get
annotations only from servers using the same reference coordinate 
system."

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Nov 21 17:08:30 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 21 Nov 2005 18:08:30 +0100
Subject: [DAS2] Getting individual features in DAS/1
In-Reply-To: <DDF6EB58-A976-473F-BDF3-42DDD0132522@helsinki.fi>
References: <DDF6EB58-A976-473F-BDF3-42DDD0132522@helsinki.fi>
Message-ID: <7f239b885d3eca821639654862770c65@dalkescientific.com>

Has anyone answered Ilari's question?

I never used DAS/1 enough to answer it myself.

If the normal DAS list is closed, is this the right place for DAS/1 
questions?


On Nov 18, 2005, at 4:22 PM, Ilari Scheinin wrote:

> This mail is not really about DAS/2, but the web site says the 
> original DAS mailing list is now closed.
>
> I am setting up a DAS server that serves CGH data from my database to 
> a visualization software, which in my case is gbrowse. I've already 
> set up Dazzle that serves the reference data from a local copy of 
> Ensembl. I need to be able to select individual CGH experiments to be 
> visualized, and as the measurements from a single CGH experiment cover 
> the entire genome, this cannot of course be done by specifying a 
> segment along with the features command.
>
> I noticed that there is a feature_id option for getting the features 
> in DAS/1.5, but on a closer look, it seems to work by getting the 
> segment that the specified feature corresponds to, and then getting 
> all features from that segment. My next approach was to use the 
> feature type to distinguish between different CGH experiments. As all 
> my data is of the type CGH, I thought that I could use spare this 
> piece of information for identifying purposes.
>
> First I tried the generic seqfeature plugin. I created a database for 
> it with some test data. However, getting features by type does not 
> seem to work. I always get all the features from the segment in 
> question.
>
> Next I tried the LDAS plugin. Again I created a compatible database 
> with some test data. I must have done something wrong the the data 
> file I imported to the database, because getting the features does not 
> work. I can get the feature types, but trying to get the features 
> gives me an ERRORSEGMENT error.
>
> I thought that before I go further, it might be useful to ask whether 
> my approach seems reasonable, or is there a better way to achieve what 
> I am trying to do? What should I do to be able to visualize individual 
> CGH profiles?
>
> I'm grateful for any advice,
> Ilari

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Nov 21 17:25:06 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 21 Nov 2005 18:25:06 +0100
Subject: [DAS2] das registry and das2
In-Reply-To: <d1d2295efe505eea8c360ccb47d310de@sanger.ac.uk>
References: <240aa4ff660b7427b6c463ffc10b1307@sanger.ac.uk>
	<4569f5d3ff6658e5ead6b979e8b1fba9@dalkescientific.com>
	<d1d2295efe505eea8c360ccb47d310de@sanger.ac.uk>
Message-ID: <21a521b096330a81bfa05b0789d3c92d@dalkescientific.com>

Andreas Prlic wrote:
> Therefore the "coordinate system" or "namespace" is an important part  
> of the description of a DAS source.
>
> What I found in the current spec-draft that comes closest to this  
> issue is the different "domains"
> e.g
>
> http://server/das/genome/source/version/features
>
> so I might want to say
> http://server/das/genome/homosapiens/ncbi35/features
> http://server/das/genome/musmusculus/ncbim34/features
>
> or should it be
> http://server/das/genome/ncbi/homosapiens35/features
> http://server/das/genome/ncbi/musmusculus34/features
> ?
>
> Hm. I am not sure, but it seems that one level is missing? - either  
> organism or authority ?

The species information is available from the data source from the  
'taxon'
attribute, as in

   <SOURCE id="volvox" description="Volvox Example Database"
           taxon="http://www.ncbi.nlm.nih.gov/taxon-browser?id=29118"
      
doc_href="http://www.wormbase.org/documentation/users_guide/ 
volvox.html" >

It's not available through a URL naming.  That's arbitrary in that
the data provider can use any term.

I think there's nothing to preclude a provider from putting the
actual source data one level deeper in the tree.  Personally I
find that that's over-classification.  Who would use it?

> Currently the registry provides a restricted list of allowed
> coordinate systems, to keep this under control.

Thomas:
> This is possibly an argument for avoiding the use of URLs for assembly  
> identifiers, if we can't be sure that the organisation that's the  
> authority for a given assembly will be running an authoritative DAS  
> server.  URNs would be fine, as would the kind of structured but  
> location-independent identifer that Andreas has been using.

I think there's no reason we can't use our own names for these.  Eg,
   http://www.biodas.org/coordinates/NCBI35
or a simple unique id like "NCBI35".

Right now those are treated as opaque identifiers.  There's no name
resolution going on, and the coordinates are (I assume) implicit in
that client software doesn't resolve the name, only check that the
servers are returning data from the same coordinate system.

Perhaps in the future that URL might resolve to something, but there's
no current reason to do so.

In the renewal grant there is reason to compare different coordinates.
When that happens a client needs to pick one reference frame and get
the translation information to the other.  So the liftover service
needs to know about the two coordindate systems.  But it can be done
through hard-coded information (perhaps with some information that
coordinate system X is an alias for Y).  I still don't think there's
any need to resolve these URLs.

Andreas:
>> Are multiple search terms allowed?
>
> yes

Then they should likely be along the same lines used for the DAS/2
searching.

>> Boolean AND or OR?
>
> We can add a parameter where this can be chosen.

The existing DAS/2 uses an AND search only.  Rather "OR" for
multiple fields of the same data type and "AND" across different
fields.

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Mon Nov 21 17:24:37 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 21 Nov 2005 09:24:37 -0800
Subject: [DAS2] Getting individual features in DAS/1
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAD6@msex02.affymetrix.com>

We need to discuss at today's meeting.  I don't think the original DAS
list should be closed, but rather continue to serve as a list to discuss
the DAS/1 protocol and implementations, and the DAS2 mailing list should
focus on DAS/2.  If we mix DAS/1 and DAS/2 discussions in the same
mailing list I think it's going to lead to a lot of confusion.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Monday, November 21, 2005 9:09 AM
> To: DAS/2
> Subject: Re: [DAS2] Getting individual features in DAS/1
> 
> Has anyone answered Ilari's question?
> 
> I never used DAS/1 enough to answer it myself.
> 
> If the normal DAS list is closed, is this the right place for DAS/1
> questions?
> 
> 
> On Nov 18, 2005, at 4:22 PM, Ilari Scheinin wrote:
> 
> > This mail is not really about DAS/2, but the web site says the
> > original DAS mailing list is now closed.
> >
> > I am setting up a DAS server that serves CGH data from my database
to
> > a visualization software, which in my case is gbrowse. I've already
> > set up Dazzle that serves the reference data from a local copy of
> > Ensembl. I need to be able to select individual CGH experiments to
be
> > visualized, and as the measurements from a single CGH experiment
cover
> > the entire genome, this cannot of course be done by specifying a
> > segment along with the features command.
> >
> > I noticed that there is a feature_id option for getting the features
> > in DAS/1.5, but on a closer look, it seems to work by getting the
> > segment that the specified feature corresponds to, and then getting
> > all features from that segment. My next approach was to use the
> > feature type to distinguish between different CGH experiments. As
all
> > my data is of the type CGH, I thought that I could use spare this
> > piece of information for identifying purposes.
> >
> > First I tried the generic seqfeature plugin. I created a database
for
> > it with some test data. However, getting features by type does not
> > seem to work. I always get all the features from the segment in
> > question.
> >
> > Next I tried the LDAS plugin. Again I created a compatible database
> > with some test data. I must have done something wrong the the data
> > file I imported to the database, because getting the features does
not
> > work. I can get the feature types, but trying to get the features
> > gives me an ERRORSEGMENT error.
> >
> > I thought that before I go further, it might be useful to ask
whether
> > my approach seems reasonable, or is there a better way to achieve
what
> > I am trying to do? What should I do to be able to visualize
individual
> > CGH profiles?
> >
> > I'm grateful for any advice,
> > Ilari
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Steve_Chervitz at affymetrix.com  Mon Nov 21 20:15:41 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 21 Nov 2005 12:15:41 -0800
Subject: [DAS2] DAS/2 weekly meeting notes for 21 Nov 05
Message-ID: <BFA76AED.183C4%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 21 Nov 2005.

$Id: das2-teleconf-2005-11-21.txt,v 1.3 2005/11/21 20:15:28 sac Exp $

Attendees: 
  Affy: Steve Chervitz, Gregg Helt
  UCLA: Allen Day, Brian O'connor
  UCBerkeley: Suzi Lewis, Nomi Harris
  Sweden: Andrew Dalke
  Sanger: Andreas Prlic
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

Today's topic: Client-Server implementation issues
----------------------------------------------------

Suzi/Nomi
---------
  Questions for gregg: How to communicate styles in DAS/2?
  GH: Client gets style sheets from server that suggests how to render
things.
  AD: EBI uses this a lot. Most of the DAS systems there use stylesheets.
  
  [A] Andreas will contact folks at Sanger/EBI for stylesheet example code.
  
  GH: The IGB client uses a preference configuration, using java
  preferences rather than special XML file. Windows: sets values in the
  registry. Has been successful. If client can understand DAS/2
  stylesheets and client-side prefs, the client-side prefs should
  override the server styles (others agree).

Steve
-----

* Reported on some analysis of Affymetrix DAS server weblogs. Lots of
  google-bot data download. Lots of spotfire hits, too.  BO: Google
  bots should respect robots.txt

[A] Steve will install robots.txt in the relevant locations

* Reported on getting Greggs DAS/2 server to run on top of apache
  rather than as a stand-alone server. Should be a matter of hooking
  apache up to tomcat using a tomcat connector. Directive for apache
  to defer to tomcat for servlet requests.

[A] Steve will hook up affy das server to apache/tomcat.

Gregg
-----
 * Regarding Spotfire - they are working on a IGB plugin to spotfire
   using http localhost API. This explains our spotfire hits.

   Gregg was previously integrating IGB with spotfire using a java to
   COM bridge. It works, but the COM bridges aren't free etc. etc.
   They are interested in driving IGB from spotfire since they're
   interested in using IGB to provide genome vizualization.  Are
   currently evaluating whether to release it to public or not.  Gregg
   considered putting this in the grant, but would have required
   permission, etc. and time was a factor.  They may eventually commit
   to IGB code base directly, but still need to work out leagalese.
   They will be interested in tracking the interclient API work we are
   doing (IGB-Apollo).

 * No major work on DAS this week, just some niggling IGB issues.

 * Planning another IGB release by end of year that will have
   improvements to DAS/2 clients.  Fixed: access via quickload then
   accesss to DAS/2 causes blankout of screen Fixed: DAS/2 interaction
   
Brian
-----
 * Marc C has committed stuff to IGB code base (genovis). Is there a
   test suite we can use to verify we're not breaking anything?

   GH: No, but hopefully early next year. Definitely needed.
   
 * Also checked in the re-factor - separate namespaces for assay and
   ontology. 

[A] Gregg will relocate das2 package to com.affy.das2 & uncouple from IGB

   GH: There are a few igb dependencies to be unraveled
   (das2feature...).  Don't want to do this in the next release
   since that's pretty significant given upcoming holidays.
 
   GH: Other features to get in:
      * Persistence of preferences.
      * Get rid of hardwiring of DAS2 servers. Already to this for
        DAS/1, just need to replicate for DAS/2.

Allen
-----
 * API for handling ontologies, structures. Communication with Chris
Mungall.
 * Have impl at stanford for autocompletion of ontology terms related to
   samples (Gavin Sherlock's group, SMD).

   What is bioontology group doing for distributing their ontologies,
   what api's are going to be made public?

   SL: Am at stanford right now to talk about that. Will offer bulk
   things like at obo site, but in terms of interactive API, will
   respond to community as best we can.

   Allen: Interested in more integration with bioontology group and
   with his work with SMD.

   Suzi: Not content, but tools right?
   Allen: Yes.
   Suzi: Work with chris. Timing couldn't be better.

[A] Allen will work with Chris M re: ontology API tools for OBO & SMD

* GH: Progress on writeback? Part of grant proposal to get it done by
  june. Will help funding continuation.
  Allen: We could start implementing some of that given the
  refactoring that's now done.

   GH: Ed Griffith at sanger is interested on this. hoping for his
   participation. In the short timeframe, you're server wouldn't have to
   implement it as long as there is at least one server available that
   can do it.
   
   Allen: Need to look at work load. There's no lack of work to be done
   for get requests (faster impls).
   GH: Would prefer to have just one writeback server and a faster get
   server rather than having two writeback capable servers.
   
 * Allen: Optimizations involving serving files, kind of a
   report-version of the chado adapters.
   
   GH: Regarding your rounding ranges optimization for tiling can you
   post to the list?
   
   [A] Allen will post his rounding ranges optimization to DAS/2 list
   
   GH: The idea is to help server-side caching by rounding the range
   requests so you're more likely to hit the same URI (e.g., stop=5010
   becomes 6000). Different clients are more likely to hit the cache.
   
   Not in the spec, just a convention. Requires more smarts in client:
   giving more to the user than they asked for, or throwing out what's
   not asked for. Throwing out what they didn't ask for would be
   nicer. In theory, this won't be an issue with client caching.
   
   SC: Could make client's configuration re: rounding an option.
   GH: Users want fewer options.
   
 * IGB display troubles. Allen had trouble getting it to display
   anything besides mRNA
   GH: IGB expects 2-level or deeper annotations. For single-level annots,
   should connect all with a line.
   Allen: May be doing this for SNPs. But also saw some strange
   responses.
   GH: Needs a fix.
   Allen: will it be in next release?
   GH: harder to do it generally -- easier to hardwire it for particular
   data types. Rendering has to guess how deep you want to go.
   Currently goes to the leaves and then goes 1-level up, rather than
   top-down. IGB uses an extra level than you actually see to keep
   track of other things (e.g., region in query).
   Preferences UI: 'nested' can select two-level or one-level deep.
   Would like to hear what others you have problems with..
   
[A] Gregg will fix IGB display problems for single-level annots.
 
Andrew
------
 * Emailed open-bio root list to set up cgi for online verifier.
   But no response yet.

 * DAS/1 vs DAS/2 mailing list.

GH: Confusion may occur if we combine DAS/1 and DAS/2 discussion.
Let's keep DAS/1 for all DAS/1 spec related discussion.

[A] Steve verify whether the DAS/1 list is still alive.
[A] Steve will put a link to in on biodas.org for DAS/1 list

 * Locking: Plan to talk to EBI about this in January
   They are doing work for style sheets.

[A] Andrew will ask Ed G. to join these meetings

 * Needs test data, mock data set.

[A] Allen will point Andrew at some data for testing.

Andreas
-------
 * The current registry implementation:
   Written in java two ways to interact:
   1) html, can browser available DAS sources, see details, go back to
      DAS client and activate the DAS source in the DAS client.
   2) soap, client contacts registry, get list of available sources.
   Is open source.

[A] Andreas will post link to source code for DAS registry impl.

   GH: A central registry is good, but companies will want their
   own. eg., at affy there may be 5-7.
   Andreas: It's possible to have a set of registries, local vs. public.
   
   GH: Are you OK with idea to have an http-based interface? It can
   run on top of existing core.
   Andreas: Sure.
   
[A] Andreas will provide http-based interface to Sanger DAS registry


Agenda for next week teleconf
-----------------------------
 * Talk more about registry spec issues
 * Retrieval spec issues:
     - Content-type
     - DAS/2 headers
     - Feature and type properties
     - other things?
   
Andrew: Prefer to have most of the discussion online (DAS/2 list) then
the teleconf can be more productive.

[A] Continue discussing spec issues on the list before next teleconf


From allenday at ucla.edu  Mon Nov 21 20:47:51 2005
From: allenday at ucla.edu (Allen Day)
Date: Mon, 21 Nov 2005 12:47:51 -0800 (PST)
Subject: [DAS2] tiled queries for performance
Message-ID: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>

Hi,

I had an idea of how clients may be able to get better response from
servers by using a tiled query technique.  Here's the basic idea:

ClientA wants features in chr1/1010:2020, and issues a request for that
range.  No other clients have previously requested this range, so the
server-side cache faults to the DAS/2 service (slow).

ClientB wants features in chr1/1020:2030, and issues a request for that
range.  Although the intersection of the resulting records with ClientA's
query is large, the URIs are different and the server-side cache faults
again.

If ClientA and ClientB were to each issue two separate "tiled" requests:

 1. chr1/1001:2000
 2. chr1/2001:3000

ClientB could take advantage of the fact that ClientA had been looking at
the same tiles.

For this to work, the clients would need to be using the same tile size.  
The optimal tile size is likely to vary from datasource to datasource,
depending on the length and density distributions of the features
contained in the datasource.  The "sources" or "versioned sources"  
payload could suggest a tiling size to prospective clients.  Servers could
also pre-cache all tiles by hitting each tile after an update of the
datasource (or the DAS/2 service code).

The tradeoff for the performance gains is that clients may now need to do
filtering on the returned records to only return those requested by the
client's client.

-Allen


From ap3 at sanger.ac.uk  Tue Nov 22 13:54:27 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 22 Nov 2005 13:54:27 +0000
Subject: [DAS2] das registry links
Message-ID: <a7f0bce3badbe778038190359de00c31@sanger.ac.uk>

Hi!

There was a question yesterday where to get the source code from the 
das-registration
server and if it is possible to have a local installation.

The source code for the registry is available under LGPL at
http://www.derkholm.net/svn/repos/dasregistry/trunk/
using subversion.

To obtain a local installation, which caches/synchronizes the public 
available data and
allows to add local das sources, see instructions at:
http://www.derkholm.net/svn/repos/dasregistry/trunk/release/install.txt

There is also a das-registry announce-mailing list at
http://lists.sanger.ac.uk/mailman/listinfo/das_registry_announce

Regards,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From ap3 at sanger.ac.uk  Tue Nov 22 17:58:08 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 22 Nov 2005 17:58:08 +0000
Subject: [DAS2] ensembl & stylesheet
Message-ID: <a0b34d36ddd00799f548b297ae1b6e04@sanger.ac.uk>

Hi!

another question yesterday was about ensembl & stylesheet support.

an example das source that provides a stylesheet is the following:

http://das.ensembl.org/das/ens_35_segdup_washu/stylesheet

description about it is at:
http://das.ensembl.org/das/ens_35_segdup_washu/

To show how it is rendered in ensembl follow this "auto-activation"  
link:

http://www.ensembl.org/Homo_sapiens/contigview?conf_script=contigview; 
c=17:14149999.5:1;w=200000;h=; 
add_das_source=(name=SEGDUP_WASHU+url=http://das.ensembl.org/ 
das+dsn=ens_35_segdup_washu+type=ensembl_location+color=black+strand=r+l 
abelflag=U+stylesheet=Y+group=Y+depth=9999+score=N+active=1)


In terms of source code ensembl uses the Bio::DASLite perl module for  
fetching features and stylesheets
http://search.cpan.org/~rpettett/Bio-DasLite-0.10/

Hope this helps,
Cheers,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From gilmanb at pantherinformatics.com  Mon Nov 21 21:46:25 2005
From: gilmanb at pantherinformatics.com (Brian Gilman)
Date: Mon, 21 Nov 2005 16:46:25 -0500
Subject: [DAS2] tiled queries for performance
In-Reply-To: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
Message-ID: <2042BBCD-8490-461D-80C1-1BB4A1FAACB1@pantherinformatics.com>

Hello Everyone,

	I've been lurking on the list and wanted to say hi.

	We're looking into this kind of implementation issue ourselves and  
thought that a bitorrent like cache makes the most sense. ie. all  
servers in the "fabric" are issued the query in a certain "hop  
adjacency". These servers then send their data to the client who's  
job it is to assemble the data.

								HTH,

										-B
--
Brian Gilman
President Panther Informatics Inc.
E-Mail: gilmanb at pantherinformatics.com
         gilmanb at jforge.net
AIM: gilmanb1

01000010 01101001 01101111
01001001 01101110 01100110
01101111 01110010 01101101
01100001 01110100 01101001
01100011 01101001 01100001
01101110


On Nov 21, 2005, at 3:47 PM, Allen Day wrote:

> Hi,
>
> I had an idea of how clients may be able to get better response from
> servers by using a tiled query technique.  Here's the basic idea:
>
> ClientA wants features in chr1/1010:2020, and issues a request for  
> that
> range.  No other clients have previously requested this range, so the
> server-side cache faults to the DAS/2 service (slow).
>
> ClientB wants features in chr1/1020:2030, and issues a request for  
> that
> range.  Although the intersection of the resulting records with  
> ClientA's
> query is large, the URIs are different and the server-side cache  
> faults
> again.
>
> If ClientA and ClientB were to each issue two separate "tiled"  
> requests:
>
>  1. chr1/1001:2000
>  2. chr1/2001:3000
>
> ClientB could take advantage of the fact that ClientA had been  
> looking at
> the same tiles.
>
> For this to work, the clients would need to be using the same tile  
> size.
> The optimal tile size is likely to vary from datasource to datasource,
> depending on the length and density distributions of the features
> contained in the datasource.  The "sources" or "versioned sources"
> payload could suggest a tiling size to prospective clients.   
> Servers could
> also pre-cache all tiles by hitting each tile after an update of the
> datasource (or the DAS/2 service code).
>
> The tradeoff for the performance gains is that clients may now need  
> to do
> filtering on the returned records to only return those requested by  
> the
> client's client.
>
> -Allen
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Steve_Chervitz at affymetrix.com  Wed Nov 23 16:03:55 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Wed, 23 Nov 2005 08:03:55 -0800
Subject: [DAS2] Simple Sharing Extensions for RSS and OPML
Message-ID: <BFA9D2EB.1850D%Steve_Chervitz@affymetrix.com>


This may have some concept relevant to DAS/2 writeback:

http://msdn.microsoft.com/xml/rss/sse/

Steve


From allenday at ucla.edu  Wed Nov 23 23:50:24 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed, 23 Nov 2005 15:50:24 -0800 (PST)
Subject: [DAS2] tiled queries for performance
In-Reply-To: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
Message-ID: <Pine.LNX.4.58.0511231505020.23486@sumo.ctrl.ucla.edu>

More thoughts on this.  The client can eliminate the redundancy in the
records returned by issuing the tiling queries as previously described
(query1), then issuing queries for records that are not contained within
tiles, but overlap the boundaries of 1 or more tiles (query2).

However, by issuing all the overlaps queries at once, we've just deferred
the performance hit one step, because we can't reasonably expect the
server to have cached all combinations of tile overlaps queries.  I think,
to get this tiling optimization to work, the burden needs to be on the
client to identify and remove duplicate responses for multiple
edge-overlaps queries (query3).

1000bp        2000bp        3000bp
|             |             |
| ===         | =====^====  |
|         ====#=====        |
| ============#=============#=====
|             |             |

 <----------->                     query1a
               <----------->       query1b
             <o>           <o>     query2
             <o>                   query3a
                           <o>     query3b

Key:

  |  : tile boundary
  =  : feature
  ^  : gap between child features
  #  : portion of feature overlapping tile boundary.
 <o> : client overlaps query
 <.> : client contains query

-Allen


On Mon, 21 Nov 2005, Allen Day wrote:

> Hi,
> 
> I had an idea of how clients may be able to get better response from
> servers by using a tiled query technique.  Here's the basic idea:
> 
> ClientA wants features in chr1/1010:2020, and issues a request for that
> range.  No other clients have previously requested this range, so the
> server-side cache faults to the DAS/2 service (slow).
> 
> ClientB wants features in chr1/1020:2030, and issues a request for that
> range.  Although the intersection of the resulting records with ClientA's
> query is large, the URIs are different and the server-side cache faults
> again.
> 
> If ClientA and ClientB were to each issue two separate "tiled" requests:
> 
>  1. chr1/1001:2000
>  2. chr1/2001:3000
> 
> ClientB could take advantage of the fact that ClientA had been looking at
> the same tiles.
> 
> For this to work, the clients would need to be using the same tile size.  
> The optimal tile size is likely to vary from datasource to datasource,
> depending on the length and density distributions of the features
> contained in the datasource.  The "sources" or "versioned sources"  
> payload could suggest a tiling size to prospective clients.  Servers could
> also pre-cache all tiles by hitting each tile after an update of the
> datasource (or the DAS/2 service code).
> 
> The tradeoff for the performance gains is that clients may now need to do
> filtering on the returned records to only return those requested by the
> client's client.
> 
> -Allen
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From Steve_Chervitz at affymetrix.com  Thu Nov 24 01:40:13 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Wed, 23 Nov 2005 17:40:13 -0800
Subject: [DAS2] Ontology Lookup Service
Message-ID: <BFAA59FD.185AF%Steve_Chervitz@affymetrix.com>

Allen,

This looks similar to what you have been working on for SMD:

http://www.ebi.ac.uk/ontology-lookup/

Would be interesting to compare it with your ontology DAS-based
implementation (e.g., performance, ease of installation, extending, etc.).

Steve


From dalke at dalkescientific.com  Thu Nov 24 02:52:35 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 24 Nov 2005 03:52:35 +0100
Subject: [DAS2] tiled queries for performance
In-Reply-To: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
Message-ID: <f9e78983a8e22cc3e5a71d95618ad461@dalkescientific.com>

Allen:
> No other clients have previously requested this range, so the
> server-side cache faults to the DAS/2 service (slow).

Admittedly I'm curious about this.  Why is this slow?  What does
slow mean?  I assume "cannot be returned faster than the network
will take it."

How many annotations are in the database?  Figuring one annotation
for every ... 100 bases? gives me 30 million.  Shouldn't a range
search over < only 30 million be fast?  Is this being done in the
database?  Which database and what's the SQL?

If the DB is the bottleneck then pulling it out as a specialized
search might be worthwhile.

What I'm driving at for this is this.  The proposal feels like
a workaround for a given implementation.  To use it requires
more smarts in the client.  Why not put that logic on the server?


					Andrew
					dalke at dalkescientific.com


From allenday at ucla.edu  Thu Nov 24 07:10:36 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed, 23 Nov 2005 23:10:36 -0800
Subject: [DAS2] tiled queries for performance
In-Reply-To: <f9e78983a8e22cc3e5a71d95618ad461@dalkescientific.com>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
	<f9e78983a8e22cc3e5a71d95618ad461@dalkescientific.com>
Message-ID: <5c24dcc30511232310p1623ff4dk9088579cdf58e082@mail.gmail.com>

Hi Andrew.

I'd like to be able to consistently get network-bottlenecked response from
the server.  The largest (250 megabase) SQL range queries typically take ~30
seconds to complete, returning ~500K features.  I'm currently working on
getting the templating system (Template Toolkit aka TT2) we use to flush to
the client periodically, rather than building the entire response first.
This is the current bottleneck; TT2 generation of a 500K record XML document
takes many minutes.  Regardless of how much more optimization work we put
into the server, it's never going to be as fast as serving up pre-queried,
pre-rendered content.

I borrowed the idea of tiling from the Google maps application (
maps.google.com).  In their implementation the server is dumb, and just
serves up a static HTML/Javascript document (the application), and static
PNG images based on latitute/longitude coordinates (the data).  All of the
application logic for what to display occurs client side.  Classic AJAX.

In the DAS protocol, the distribution of the application logic is
distributed between the client and server, sometimes to ill effect.
Requiring both (a) the server to respond to arbitrary range queries, and (b)
the client to display arbitrary ranges unnecessarily creates a bifurcation
of the View component of the application.  Brian was hinting at this when he
mentioned the idea of bittorrent blocks earlier in the thread.

We also require code redundancy between client and server to be able to
fully use the type and exacttype filters.  In this case the Model component
has been bifurcated -- the client needs to build a model the ontology (from
who knows where... presumably processing OBO-Edit files) so the user can
issue queries, and the server needs to also have some representation of the
ontology to generate a response.

Hopefully the ontology DAS extension will help the latter situation outlined
above by getting both client and server to be synchronized on the same data
model.  As far as the tiling optimization goes, it's likely that I'll
implement a preprocessor for the HTTP query so I can break it into tiles --
conceptually very similar to the log10 binning that Lincoln does in the GFF
database.

-Allen


On 11/23/05, Andrew Dalke <dalke at dalkescientific.com> wrote:
>
> Allen:
> > No other clients have previously requested this range, so the
> > server-side cache faults to the DAS/2 service (slow).
>
> Admittedly I'm curious about this.  Why is this slow?  What does
> slow mean?  I assume "cannot be returned faster than the network
> will take it."
>
> How many annotations are in the database?  Figuring one annotation
> for every ... 100 bases? gives me 30 million.  Shouldn't a range
> search over < only 30 million be fast?  Is this being done in the
> database?  Which database and what's the SQL?
>
> If the DB is the bottleneck then pulling it out as a specialized
> search might be worthwhile.
>
> What I'm driving at for this is this.  The proposal feels like
> a workaround for a given implementation.  To use it requires
> more smarts in the client.  Why not put that logic on the server?
>
>
>                                         Andrew
>                                         dalke at dalkescientific.com
>
>


From allenday at ucla.edu  Thu Nov 24 07:21:48 2005
From: allenday at ucla.edu (Allen Day)
Date: Wed, 23 Nov 2005 23:21:48 -0800
Subject: [DAS2] Re: Ontology Lookup Service
In-Reply-To: <BFAA59FD.185AF%Steve_Chervitz@affymetrix.com>
References: <BFAA59FD.185AF%Steve_Chervitz@affymetrix.com>
Message-ID: <5c24dcc30511232321v70f77dc9y7a1ceef22bcf6edc@mail.gmail.com>

Hi Steve.

Yes, this is pretty similar to what we're doing.  The major differences I
see are (a) the query flexibility -- It only lets you retrieve terms from
one ontology at a time, and does not support wildcards (b) the display -- it
doesn't actually show you the dag structure of the ontology, and (c) using
different tech -- Java/SOAP as opposed to Perl/ReST.

-Allen


On 11/23/05, Steve Chervitz <Steve_Chervitz at affymetrix.com> wrote:
>
> Allen,
>
> This looks similar to what you have been working on for SMD:
>
> http://www.ebi.ac.uk/ontology-lookup/
>
> Would be interesting to compare it with your ontology DAS-based
> implementation (e.g., performance, ease of installation, extending, etc.).
>
> Steve
>
>


From dalke at dalkescientific.com  Thu Nov 24 13:28:00 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 24 Nov 2005 14:28:00 +0100
Subject: [DAS2] tiled queries for performance
In-Reply-To: <5c24dcc30511232310p1623ff4dk9088579cdf58e082@mail.gmail.com>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
	<f9e78983a8e22cc3e5a71d95618ad461@dalkescientific.com>
	<5c24dcc30511232310p1623ff4dk9088579cdf58e082@mail.gmail.com>
Message-ID: <9eb929192db24ad93fb2a7cf423aa9c3@dalkescientific.com>

Allen:
>  I'd like to be able to consistently get network-bottlenecked response 
> from the server.? The largest (250 megabase) SQL range queries 
> typically take ~30 seconds to complete, returning ~500K features.? I'm 
> currently working on getting the templating system (Template Toolkit 
> aka TT2) we use to flush to the client periodically, rather than 
> building the entire response first.? This is the current bottleneck; 
> TT2 generation of a 500K record XML document takes many minutes.? 
> Regardless of how much more optimization work we put into the server, 
> it's never going to be as fast as serving up pre-queried, pre-rendered 
> content.

Interesting.  So I was right, in that the range search is fast, but 
wrong
in not considering the template generation problem.

Could that cause a DoS attack by asking for several large ranges at 
once?
You're building up multi-megabyte strings in memory.  (If 1 feature is 
1K
then that's 500MB.)

Ideologically the clean solution might be to have the search return only
a list of identifiers and have the client fetch each feature one-by-one.
This is a tile size of 1.

Implementation-wise this will cause problems unless using HTTP 1.1
pipelining since the act of opening 500K connections takes non-trivial
time.  Adding a "return XML for these ids" service doesn't help either -
it brings us back to the same problem.

But another solution is to cache all the features as XML, leaving out
only the header and footer.  Skip the templating system (rather, it's
upstream of the caching).  Do the search, get the ids, and stream the
contents directly from the cache.

This would be used in feature lookup and for search results.

>  In the DAS protocol, the distribution of the application logic is 
> distributed between the client and server, sometimes to ill effect.? 
> Requiring both (a) the server to respond to arbitrary range queries, 
> and (b) the client to display arbitrary ranges unnecessarily creates a 
> bifurcation of the View component of the application.? Brian was 
> hinting at this when he mentioned the idea of bittorrent blocks 
> earlier in the thread.

What application logic?  There should be many ways to build different
applications on top of DAS.

DAS is a data model.  The client provides the view (or many views).

There are two reasons for query support on the server.

  1. slow bandwidth and limited client resources - otherwise clients 
could
       download and search the data locally
  2. easier support for (certain classes of) application developers

To make the Google comparison, there's no reason Google searches 
couldn't
take place on your personal machine except that you can't download the
Internet and search it in usable time.  With Google providing the 
service
others can do things like provide domain-specific web searches via 
Google,
include Google links in a web browser, or make something like 
Googlefight.

> We also require code redundancy between client and server to be able 
> to fully use the type and exacttype filters.? In this case the Model 
> component has been bifurcated -- the client needs to build a model the 
> ontology (from who knows where... presumably processing OBO-Edit 
> files) so the user can issue queries, and the server needs to also 
> have some representation of the ontology to generate a response.
>
>  Hopefully the ontology DAS extension will help the latter situation 
> outlined above by getting both client and server to be synchronized on 
> the same data model.? As far as the tiling optimization goes, it's 
> likely that I'll implement a preprocessor for the HTTP query so I can 
> break it into tiles -- conceptually very similar to the log10 binning 
> that Lincoln does in the GFF database.

I didn't follow this.  Code redundancy means what?  There's an
exchange of data models - in this case the model for a query.  But any
client/server needs to do this.

Take Entrez, for example.  It supports many types of search fields,
including MeSH (which I think counts as an ontology).  A sophisticated
client may have a GUI to help people identify MeSH terms.  This 
obviously
does some duplicate work as with the server.

Is that what you mean?  If so, why does it matter?

Note also that while Google Maps serves static images only, there's
shared logic between the application (in the browser) and the tools
that generated those maps.  Eg, both have the same code for 
understanding
geography/latitude&longitude.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Nov 24 13:47:26 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 24 Nov 2005 14:47:26 +0100
Subject: [DAS2] tiled queries for performance
In-Reply-To: <2042BBCD-8490-461D-80C1-1BB4A1FAACB1@pantherinformatics.com>
References: <Pine.LNX.4.58.0511211233060.17486@sumo.ctrl.ucla.edu>
	<2042BBCD-8490-461D-80C1-1BB4A1FAACB1@pantherinformatics.com>
Message-ID: <22110007fe53238adbda91041ee1baf2@dalkescientific.com>

Hi Brian,

> 	We're looking into this kind of implementation issue ourselves and 
> thought that a bitorrent like cache makes the most sense. ie. all 
> servers in the "fabric" are issued the query in a certain "hop 
> adjacency". These servers then send their data to the client who's job 
> it is to assemble the data.

I go back and forth between the "large data set" model and the "large 
number
of entities" model.

In the first:
   - client requests a large data file
   - server returns it

This can be sped up by distributing the file among many sites and
using something like BitTorrent to put it together, or something like
Coral ( http://www.coralcdn.org/ ) to redirect to nearby caches.

But making the code for this is complicated.  It's possible to build
on BitTorrent and similar systems, but I have no feel for the actual
implementation cost, which makes me wary.  I've looked into a couple
of the P2P toolkits and not gotten the feel that it's any easier than
writing HTTP requests directly.  Plus, who will set up the alternate
servers?

In the second:
   - make query to server
   - server returns list of N identifiers
   - make N-n requests (where 'n' is the number of identifiers already 
resolved)

The id resolution can be done in a distributed fashion and is easily
supported via web caches, either with well-configured proxies or (again)
through Coral.

I like the latter model in part because it's more fine grained.  Eg,
a progress bar can say "downloading feature 4 of 10000", and if a given
feature is already present there's no need to refetch it.

The downside of the 2nd is the need for HTTP 1.1 pipelining to make it
be efficient.  I don't know if we want to have that requirement.  Gregg
came up with the range restrictions because most of the massive results
will be from range searches.  By being a bit more clever about tracking
what's known and not known, a client can get a much smaller results 
page.

These are complementary.  Using Gregg's restricted range queries can
reduce the number of identifiers returned in a search, making the
network overhead even smaller.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Nov 25 15:21:21 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 25 Nov 2005 16:21:21 +0100
Subject: [DAS2] DAS intro
Message-ID: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>

The front of the DAS doc starts

   DAS 2.0 is designed to address the shortcomings of DAS 1.0, including:

That kinda assumes people know what DAS 1.0 is to understand DAS 2.0.

How about this instead, as an overview/introduction.

  ======

DAS/2 describes a data model for genome annotations.  An annotation
server provides information about one or more genome sources.  Each
source may have one or more versions.  Different versions are usually
based on different assemblies.  As an implementation detail an
assembly and corresponding sequence data may be distributed via a
different machine, which is called the reference server.  Portions of
the assembly may have higher relative accuracy than the assembly as a
whole.  A reference server may supply these portions as an alternate
reference frame.

Annotations are located on the genome with a start and end position.
The range may be specified mutiple times if there are alternate
reference frames.  An annotation may contain multiple non-continguous
parts, making it the parent of those parts.  Some parts may have more
than one parent.  Annotations have a type based on terms in SOFA
(Sequence Ontology for Feature Annotation).  Stylesheets contain a set
of properties used to depict a given type.

Annotations can be searched by range, type, and a properties table
associated with each annotation.  These are called feature filters.

DAS/2 is implemented using a ReST architecture.  Each entity (also
called a document or object) has a name, which is a URL.  Fetching the
URL gets information about the entity.  The DAS-specific entities are
all XML documents.  Other entities contain data types with an existing
and frequently used file format.  Where possible, a DAS server returns
data using existing formats.  In some cases a server may describe how
to fetch a given entity in several different formats.
  ======


					Andrew
					dalke at dalkescientific.com


From asims at bcgsc.ca  Fri Nov 25 19:15:17 2005
From: asims at bcgsc.ca (Asim Siddiqui)
Date: Fri, 25 Nov 2005 11:15:17 -0800
Subject: [DAS2] tiled queries for performance
Message-ID: <86C6E520C12E52429ACBCB01546DF4D3BE3E5E@xchange1.phage.bcgsc.ca>


Hi,

I'm a newbie to this list, so apologies if I've missed something
critical.

I think this is a great idea.

I don't see this as a big change to the DAS/2 spec or requiring much in
the way of additional smarts on the client side.
The change is simply that instead of the client getting exactly what it
asks for, it may get more.

My 2 cents,

Asim


-----Original Message-----
From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-bio.org] On Behalf Of Allen Day
Sent: Wednesday, November 23, 2005 11:11 PM
To: Andrew Dalke; DAS/2
Subject: Re: [DAS2] tiled queries for performance

Hi Andrew.

I'd like to be able to consistently get network-bottlenecked response
from the server.  The largest (250 megabase) SQL range queries typically
take ~30 seconds to complete, returning ~500K features.  I'm currently
working on getting the templating system (Template Toolkit aka TT2) we
use to flush to the client periodically, rather than building the entire
response first.
This is the current bottleneck; TT2 generation of a 500K record XML
document takes many minutes.  Regardless of how much more optimization
work we put into the server, it's never going to be as fast as serving
up pre-queried, pre-rendered content.

I borrowed the idea of tiling from the Google maps application (
maps.google.com).  In their implementation the server is dumb, and just
serves up a static HTML/Javascript document (the application), and
static PNG images based on latitute/longitude coordinates (the data).
All of the application logic for what to display occurs client side.
Classic AJAX.

In the DAS protocol, the distribution of the application logic is
distributed between the client and server, sometimes to ill effect.
Requiring both (a) the server to respond to arbitrary range queries, and
(b) the client to display arbitrary ranges unnecessarily creates a
bifurcation of the View component of the application.  Brian was hinting
at this when he mentioned the idea of bittorrent blocks earlier in the
thread.

We also require code redundancy between client and server to be able to
fully use the type and exacttype filters.  In this case the Model
component has been bifurcated -- the client needs to build a model the
ontology (from who knows where... presumably processing OBO-Edit files)
so the user can issue queries, and the server needs to also have some
representation of the ontology to generate a response.

Hopefully the ontology DAS extension will help the latter situation
outlined above by getting both client and server to be synchronized on
the same data model.  As far as the tiling optimization goes, it's
likely that I'll implement a preprocessor for the HTTP query so I can
break it into tiles -- conceptually very similar to the log10 binning
that Lincoln does in the GFF database.

-Allen


On 11/23/05, Andrew Dalke <dalke at dalkescientific.com> wrote:
>
> Allen:
> > No other clients have previously requested this range, so the 
> > server-side cache faults to the DAS/2 service (slow).
>
> Admittedly I'm curious about this.  Why is this slow?  What does slow 
> mean?  I assume "cannot be returned faster than the network will take 
> it."
>
> How many annotations are in the database?  Figuring one annotation for

> every ... 100 bases? gives me 30 million.  Shouldn't a range search 
> over < only 30 million be fast?  Is this being done in the database?  
> Which database and what's the SQL?
>
> If the DB is the bottleneck then pulling it out as a specialized 
> search might be worthwhile.
>
> What I'm driving at for this is this.  The proposal feels like a 
> workaround for a given implementation.  To use it requires more smarts

> in the client.  Why not put that logic on the server?
>
>
>                                         Andrew
>                                         dalke at dalkescientific.com
>
>

_______________________________________________
DAS2 mailing list
DAS2 at portal.open-bio.org
http://portal.open-bio.org/mailman/listinfo/das2


From suzi at fruitfly.org  Fri Nov 25 22:20:29 2005
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Fri, 25 Nov 2005 14:20:29 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
Message-ID: <59fa39752e4d792d2142fe2682813937@fruitfly.org>

a few minor in-line edits below. trying to simplify and not confuse, as 
this is just an intro.

On Nov 25, 2005, at 7:21 AM, Andrew Dalke wrote:

> The front of the DAS doc starts
>
>   DAS 2.0 is designed to address the shortcomings of DAS 1.0, 
> including:
>
> That kinda assumes people know what DAS 1.0 is to understand DAS 2.0.
>
> How about this instead, as an overview/introduction.
>
>  ======
>
> DAS/2 describes a data model for genome annotations
,  THAT IS, DESCRIPTIONS OF FEATURES LOCATED ON THE GENOMIC SEQUENCE
> .  An annotation
> server provides

SUCH
> information
FOR
>  one or more genome
SEQUENCES.
> Each
GENOMIC SEQUENCE
> may have one or more versions.  Different versions are usually
> based on different assemblies.  As an implementation detail an
> assembly and corresponding sequence data may be distributed via a
> different machine, which is called the reference server.
(DELETED LAST 2 SENTENCES).


>
> Annotations are located on the genome with a start and end position.
> The range may be specified mutiple times if there are alternate
>
SEQUENCES THEY MAY BE PLACED UPON (REFERENCE FRAMES).

> An annotation may contain multiple non-continguous
> parts

(DELECTED PHRASE AND SENTENCE)

> Annotations have a type based on terms in SOFA
> (Sequence Ontology for Feature Annotation).  Stylesheets contain a set
> of properties used to depict a given type.
>
> Annotations can be searched by range, type, and a properties table
> associated with each annotation.  These are called feature filters.
>
> DAS/2 is implemented using a ReST architecture.  Each entity (also
> called a document or object) has a name, which is a URL.  Fetching the
> URL gets information about the entity.  The DAS-specific entities are
> all XML documents.  Other entities contain data types with an existing
> and frequently used file format.  Where possible, a DAS server returns
> data using existing formats.  In some cases a server may describe how
> to fetch a given entity in several different formats.
>  ======
>
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Fri Nov 25 23:43:10 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat, 26 Nov 2005 00:43:10 +0100
Subject: [DAS2] tiled queries for performance
In-Reply-To: <86C6E520C12E52429ACBCB01546DF4D3BE3E5E@xchange1.phage.bcgsc.ca>
References: <86C6E520C12E52429ACBCB01546DF4D3BE3E5E@xchange1.phage.bcgsc.ca>
Message-ID: <9ec33e6fb3efbbe8b39adc52d2b78db7@dalkescientific.com>

Asim Siddiqui
> I think this is a great idea.
>
> I don't see this as a big change to the DAS/2 spec or requiring much in
> the way of additional smarts on the client side.

I agree with Allen on this - in some sense there's no effect on the
spec.  It ends up being an agreement among the clients to request
aligned data, by rounding up/down to the nearest, say, kilobase and
for the server implementers to cache those requests.

> The change is simply that instead of the client getting exactly what it
> asks for, it may get more.

While that's another matter - the client makes a request
and the server is free to expand the range to something it can handle
a bit better.  Allen?  Were you suggesting this instead?

In this case there is a change to the spec, and all clients must
be able to filter or otherwise ignore extra results.

I personally think it's an implementation issue related to performance
and there are ways to make the results be generated fast enough.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Sat Nov 26 00:35:45 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat, 26 Nov 2005 01:35:45 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <59fa39752e4d792d2142fe2682813937@fruitfly.org>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<59fa39752e4d792d2142fe2682813937@fruitfly.org>
Message-ID: <c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>

Hi Suzi,

You're supposed to be on holiday - it's Thanksgiving after all.

Though I'm not celebrating it until next week.  I wonder where
I can find pumpkin pie mix here ...

>> DAS/2 describes a data model for genome annotations
> ,  THAT IS, DESCRIPTIONS OF FEATURES LOCATED ON THE GENOMIC SEQUENCE

Changed, along with the other fixes.

> (DELETED LAST 2 SENTENCES).

That was the two lines about

>> Portions of
>> the assembly may have higher relative accuracy than the assembly as a
>> whole.  A reference server may supply these portions as an alternate
>> reference frame.

In the intro I want to mention all of the parts of DAS.  The
problem is that I still don't understand the /region request.
These two lines were my best attempt at explaining them.

Was the deletion because my understanding is wrong or because it's
not needed for the intro?

I think my confusion is related the concept you mention in:
>> Annotations are located on the genome with a start and end position.
>> The range may be specified mutiple times if there are alternate
>>
> SEQUENCES THEY MAY BE PLACED UPON (REFERENCE FRAMES).

because I don't understand what I should change.  I made up the
term 'reference frame' because of my physics training.  Is it
the correct term here?  Does 'reference frame' as it's normally
used only refer to the full assembly or does it refer to each
"/region" as well?  If I give the coordinates on a contig can
I say it's in the reference frame of that contig?

(Hmm, David Block agrees with me, according to
  http://open-bio.org/bosc2001/abstracts/lightning/block
    The presence of a Tiling_Path table allows the loading of
    any arbitrary length of sequence, in the reference frame
    of any of the contigs that make up the tiling path. )


I thought it was important to mention that a given annotation
may have "several <LOC> tags if the feature's location can be
represented in multiple coordinate systems (e.g. multiple builds
of a genome or multiple contigs)"

Then again, I don't understand how a given feature can be
annotated on multiple builds because I thought that a feature
was only associated with a single versioned source, and a
versioned source has only one build.


I would like to have something in the intro which mentions
"/region".  I just don't know how to do it.  Why does anyone
care about regions and not just point directly to the sequence?

>> An annotation may contain multiple non-continguous
>> parts
>
> (DELECTED PHRASE AND SENTENCE)

The deleted text there was ", making it the parent of those parts.
Some parts may have more than one parent."

I put it there because I remember we talked a lot about this
at CSHL a couple years back and wanted to make sure the data
model handled cases where, say, there were two parents to three
parts.  I seems to me that that structure is important enough
that someone who is trying to get a quick understanding of
DAS annotations would be interested in it.

My internal model for the expected reader is someone like
Allen or Gregg - people who have some experience in data
models for annotations and would like to know that DAS
can handle those sorts of more complicated tree structures.

I'm willing to move it further into the text, but I'm not
convinced that it makes things less confusing or simpler.
Features having parts and parents is an essential part of
the DAS data model.

					Andrew
					dalke at dalkescientific.com


From suzi at fruitfly.org  Sat Nov 26 01:44:54 2005
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Fri, 25 Nov 2005 17:44:54 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<59fa39752e4d792d2142fe2682813937@fruitfly.org>
	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>
Message-ID: <1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>

Hi Andrew,

so there seem to be 2 questions. it would be good to have both in the 
intro, but only as long as the description can be clearly stated in 
just a sentence or two. If it takes more then it is clearly something 
that requires a fuller description outside of the intro.

I'll try to give my understanding (but goodness knows I am peering 
through different lenses). I don't think in terms of the spec at all, 
just the information that needs to be conveyed.

#1 "reference frame" =========================================

"reference frame", is (to my mind) "reference sequence". at least, that 
is what i've always called it.

First, accuracy has nothing at all to do with it, so we don't want the 
sentence in there.

Second, the region of sequence that is returned is nothing more than 
that. Think of it as a special type of feature. This is what makes a 
transformation possible from one coordinate-system to another (by 
adding the correct offsets)

Third, just think of "reference sequence" as a coordinate system. One 
can have the exact same feature and indicate that: on 
coordinate-system-A this feature starts and ends here, and on 
coordinate-system-B it starts and ends there. Thus a feature's 
coordinates may be given both on a chromosome, and on a contig, and on 
any other coordinate-system that can be derived through a transform 
from these. So you could change the sentence below to read "A reference 
server may supply features where the locations (start and end) are 
relative to either contigs, some other arbitrary region, or to the 
entire chromosome."

#2 "multiple parents" =========================================

It still is easier for me to think of this in terms of sequences. We 
may know that somewhere out in the world a sequence must exist, but the 
data/sequence we have collected is fragmentary. For example, thinly 
sequenced genomes (resulting in many separate contigs) or a pair of 
ESTs from an cDNA. In either of these cases we need to be able to have 
the many to many relationships you talk about. This one is perhaps too 
subtle for the introduction, but if we decide to include it then I 
think it should first be phrased in terms of the problem (biological 
sampling) and then in terms of the solution (multiple parents).

-S

On Nov 25, 2005, at 4:35 PM, Andrew Dalke wrote:

> Hi Suzi,
>
> You're supposed to be on holiday - it's Thanksgiving after all.
>
> Though I'm not celebrating it until next week.  I wonder where
> I can find pumpkin pie mix here ...
>
>>> DAS/2 describes a data model for genome annotations
>> ,  THAT IS, DESCRIPTIONS OF FEATURES LOCATED ON THE GENOMIC SEQUENCE
>
> Changed, along with the other fixes.
>
>> (DELETED LAST 2 SENTENCES).
>
> That was the two lines about
>
>>> Portions of
>>> the assembly may have higher relative accuracy than the assembly as a
>>> whole.  A reference server may supply these portions as an alternate
>>> reference frame.
>
> In the intro I want to mention all of the parts of DAS.  The
> problem is that I still don't understand the /region request.
> These two lines were my best attempt at explaining them.
>
> Was the deletion because my understanding is wrong or because it's
> not needed for the intro?
>
> I think my confusion is related the concept you mention in:
>>> Annotations are located on the genome with a start and end position.
>>> The range may be specified mutiple times if there are alternate
>>>
>> SEQUENCES THEY MAY BE PLACED UPON (REFERENCE FRAMES).
>
> because I don't understand what I should change.  I made up the
> term 'reference frame' because of my physics training.  Is it
> the correct term here?  Does 'reference frame' as it's normally
> used only refer to the full assembly or does it refer to each
> "/region" as well?  If I give the coordinates on a contig can
> I say it's in the reference frame of that contig?
>
> (Hmm, David Block agrees with me, according to
>  http://open-bio.org/bosc2001/abstracts/lightning/block
>    The presence of a Tiling_Path table allows the loading of
>    any arbitrary length of sequence, in the reference frame
>    of any of the contigs that make up the tiling path. )
>
>
>
> I thought it was important to mention that a given annotation
> may have "several <LOC> tags if the feature's location can be
> represented in multiple coordinate systems (e.g. multiple builds
> of a genome or multiple contigs)"
>
> Then again, I don't understand how a given feature can be
> annotated on multiple builds because I thought that a feature
> was only associated with a single versioned source, and a
> versioned source has only one build.
>
>
> I would like to have something in the intro which mentions
> "/region".  I just don't know how to do it.  Why does anyone
> care about regions and not just point directly to the sequence?
>
>>> An annotation may contain multiple non-continguous
>>> parts
>>
>> (DELECTED PHRASE AND SENTENCE)
>
> The deleted text there was ", making it the parent of those parts.
> Some parts may have more than one parent."
>
> I put it there because I remember we talked a lot about this
> at CSHL a couple years back and wanted to make sure the data
> model handled cases where, say, there were two parents to three
> parts.  I seems to me that that structure is important enough
> that someone who is trying to get a quick understanding of
> DAS annotations would be interested in it.
>
> My internal model for the expected reader is someone like
> Allen or Gregg - people who have some experience in data
> models for annotations and would like to know that DAS
> can handle those sorts of more complicated tree structures.
>
> I'm willing to move it further into the text, but I'm not
> convinced that it makes things less confusing or simpler.
> Features having parts and parents is an essential part of
> the DAS data model.
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Sun Nov 27 01:20:24 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sun, 27 Nov 2005 02:20:24 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<59fa39752e4d792d2142fe2682813937@fruitfly.org>
	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>
	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
Message-ID: <9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>

Suzi:
> so there seem to be 2 questions. it would be good to have both in the 
> intro, but only as long as the description can be clearly stated in 
> just a sentence or two. If it takes more then it is clearly something 
> that requires a fuller description outside of the intro.

Agreed.

> I'll try to give my understanding (but goodness knows I am peering 
> through different lenses). I don't think in terms of the spec at all, 
> just the information that needs to be conveyed.
>
> #1 "reference frame" =========================================
>
> "reference frame", is (to my mind) "reference sequence". at least, 
> that is what i've always called it.


> First, accuracy has nothing at all to do with it, so we don't want the 
> sentence in there.

I'm fine with that.  I've found it best to declare my ignorance early
than to keep it hidden.

> Second, the region of sequence that is returned is nothing more than 
> that. Think of it as a special type of feature. This is what makes a 
> transformation possible from one coordinate-system to another (by 
> adding the correct offsets)

I can think of it as a feature just fine.  But then shouldn't each 
region
also be a feature?  Why wouldn't all contigs be visible as an 
annotation?

Contigs are in SOFA as

     @is_a at contig ; SO:0000149 @is_a@ assembly_component ;
         SO:0000143 @part_of@ supercontig ; SO:0000148

What advantage is there to break this feature out at a "/region"?

One that I can see is that the reference server provides the regions
while the annotation server provides the other features.  But if
that's the case we could have the reference server also provide the
regions as features, and the annotation server makes references to
those features rather than to regions.

That is, in the current scheme we have:

<feature> has 0 or more <loc> element, where the 'pos' attribute
    links to region + start/stop range and the optional 'seq' attribute
    links to the sequence range, as in:

    <LOC  pos="region/Chr3/1271:1507:1" seq="sequence/Chr3/1271:1507:1"/>


<region> is only a link to the sequence and a length, as in:

    <REGION id="../sequence/ctg2/100:200" length="100" name="ABCDE" />


One alternate possibility is to change that so "pos" points to a
/feature (instead of a /region) and have features for each contig or
other assembly component.  The result would look like:

    <LOC  pos="feature/AB1234/671:907:1" 
seq="sequence/Chr3/1271:1507:1"/>

   <FEATURE id="feature/AB1234" type="ABCDE_type"> ...

Doing this, however, means that all features must support subranges.


As an alternate solution without ranges, use

    <LOC  pos="feature/AB1234" seq="sequence/Chr3/1271:1507:1"/>

and then look up the sequence coordinates of feature/AB1234 to
figure out where it starts/stops.


The other advantage to a region is you can ask for the assembly
via the 'agp' format.  But because of the the existing support for
formats which are only valid for some feature you can do that by asking
for, say, all assembly_component features (via the feature filter) and 
return
the results in 'agp' format.

> Third, just think of "reference sequence" as a coordinate system. One 
> can have the exact same feature and indicate that: on 
> coordinate-system-A this feature starts and ends here, and on 
> coordinate-system-B it starts and ends there. Thus a feature's 
> coordinates may be given both on a chromosome, and on a contig, and on 
> any other coordinate-system that can be derived through a transform 
> from these.

I believe I understand this.  There really is only one reference frame 
for
the entire genome sequence, for a given assembly, and all other 
coordinate
systems are a fixed and definite offset of that single reference frame.
I believe this is called the golden path?

My reference to accuracy is because I figured that given two features
A and B on an assembly component X then the fuzziness in the relative
distance between A and B is small if X is also small.  That is, smaller
terms are less likely to have changes as the golden path changes.


>  So you could change the sentence below to read "A reference server 
> may supply features where the locations (start and end) are relative 
> to either contigs, some other arbitrary region, or to the entire 
> chromosome."

Why not always supply it relative to the chromosome coordinates?  The 
spec
now allows that as an optional field.  I can't figure out why you would
want to do otherwise.

Is it because sometimes it's easier to work with, say, a large number of
contig reference frames than with one large reference frame?  Does that
mean we shift the complexity of coordinate translation from the data
provider to the data consumer?  (Making it easier to generate data than
to consume data.)


> This one is perhaps too subtle for the introduction, but if we decide 
> to include it then I think it should first be phrased in terms of the 
> problem (biological sampling) and then in terms of the solution 
> (multiple parents).

Oh, definitely.  It's some place where I just don't have the domain
knowledge to explain it or even come up with examples.

					Andrew
					dalke at dalkescientific.com


From suzi at fruitfly.org  Sun Nov 27 01:24:07 2005
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Sat, 26 Nov 2005 17:24:07 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<59fa39752e4d792d2142fe2682813937@fruitfly.org>
	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>
	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
Message-ID: <e3dd0b2a0be613ba9ce83792801a41bd@fruitfly.org>

Lets add this to the agenda for Monday morning. Hopefully that will be 
faster than via e-mail.

On Nov 26, 2005, at 5:20 PM, Andrew Dalke wrote:

> Suzi:
>> so there seem to be 2 questions. it would be good to have both in the 
>> intro, but only as long as the description can be clearly stated in 
>> just a sentence or two. If it takes more then it is clearly something 
>> that requires a fuller description outside of the intro.
>
> Agreed.
>
>> I'll try to give my understanding (but goodness knows I am peering 
>> through different lenses). I don't think in terms of the spec at all, 
>> just the information that needs to be conveyed.
>>
>> #1 "reference frame" =========================================
>>
>> "reference frame", is (to my mind) "reference sequence". at least, 
>> that is what i've always called it.
>
>
>> First, accuracy has nothing at all to do with it, so we don't want 
>> the sentence in there.
>
> I'm fine with that.  I've found it best to declare my ignorance early
> than to keep it hidden.
>
>> Second, the region of sequence that is returned is nothing more than 
>> that. Think of it as a special type of feature. This is what makes a 
>> transformation possible from one coordinate-system to another (by 
>> adding the correct offsets)
>
> I can think of it as a feature just fine.  But then shouldn't each 
> region
> also be a feature?  Why wouldn't all contigs be visible as an 
> annotation?
>
> Contigs are in SOFA as
>
>     @is_a at contig ; SO:0000149 @is_a@ assembly_component ;
>         SO:0000143 @part_of@ supercontig ; SO:0000148
>
> What advantage is there to break this feature out at a "/region"?
>
> One that I can see is that the reference server provides the regions
> while the annotation server provides the other features.  But if
> that's the case we could have the reference server also provide the
> regions as features, and the annotation server makes references to
> those features rather than to regions.
>
> That is, in the current scheme we have:
>
> <feature> has 0 or more <loc> element, where the 'pos' attribute
>    links to region + start/stop range and the optional 'seq' attribute
>    links to the sequence range, as in:
>
>    <LOC  pos="region/Chr3/1271:1507:1" 
> seq="sequence/Chr3/1271:1507:1"/>
>
>
> <region> is only a link to the sequence and a length, as in:
>
>    <REGION id="../sequence/ctg2/100:200" length="100" name="ABCDE" />
>
>
> One alternate possibility is to change that so "pos" points to a
> /feature (instead of a /region) and have features for each contig or
> other assembly component.  The result would look like:
>
>    <LOC  pos="feature/AB1234/671:907:1" 
> seq="sequence/Chr3/1271:1507:1"/>
>
>   <FEATURE id="feature/AB1234" type="ABCDE_type"> ...
>
> Doing this, however, means that all features must support subranges.
>
>
> As an alternate solution without ranges, use
>
>    <LOC  pos="feature/AB1234" seq="sequence/Chr3/1271:1507:1"/>
>
> and then look up the sequence coordinates of feature/AB1234 to
> figure out where it starts/stops.
>
>
> The other advantage to a region is you can ask for the assembly
> via the 'agp' format.  But because of the the existing support for
> formats which are only valid for some feature you can do that by asking
> for, say, all assembly_component features (via the feature filter) and 
> return
> the results in 'agp' format.
>
>> Third, just think of "reference sequence" as a coordinate system. One 
>> can have the exact same feature and indicate that: on 
>> coordinate-system-A this feature starts and ends here, and on 
>> coordinate-system-B it starts and ends there. Thus a feature's 
>> coordinates may be given both on a chromosome, and on a contig, and 
>> on any other coordinate-system that can be derived through a 
>> transform from these.
>
> I believe I understand this.  There really is only one reference frame 
> for
> the entire genome sequence, for a given assembly, and all other 
> coordinate
> systems are a fixed and definite offset of that single reference frame.
> I believe this is called the golden path?
>
> My reference to accuracy is because I figured that given two features
> A and B on an assembly component X then the fuzziness in the relative
> distance between A and B is small if X is also small.  That is, smaller
> terms are less likely to have changes as the golden path changes.
>
>
>>  So you could change the sentence below to read "A reference server 
>> may supply features where the locations (start and end) are relative 
>> to either contigs, some other arbitrary region, or to the entire 
>> chromosome."
>
> Why not always supply it relative to the chromosome coordinates?  The 
> spec
> now allows that as an optional field.  I can't figure out why you would
> want to do otherwise.
>
> Is it because sometimes it's easier to work with, say, a large number 
> of
> contig reference frames than with one large reference frame?  Does that
> mean we shift the complexity of coordinate translation from the data
> provider to the data consumer?  (Making it easier to generate data than
> to consume data.)
>
>
>> This one is perhaps too subtle for the introduction, but if we decide 
>> to include it then I think it should first be phrased in terms of the 
>> problem (biological sampling) and then in terms of the solution 
>> (multiple parents).
>
> Oh, definitely.  It's some place where I just don't have the domain
> knowledge to explain it or even come up with examples.
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Mon Nov 28 09:44:18 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 28 Nov 2005 01:44:18 -0800
Subject: [DAS2] tiled queries for performance
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FADF@msex02.affymetrix.com>


> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Thursday, November 24, 2005 5:47 AM
> To: Brian Gilman
> Cc: DAS/2
> Subject: Re: [DAS2] tiled queries for performance
> 
> Hi Brian,
> 
> > 	We're looking into this kind of implementation issue ourselves
and
> > thought that a bitorrent like cache makes the most sense. ie. all
> > servers in the "fabric" are issued the query in a certain "hop
> > adjacency". These servers then send their data to the client who's
job
> > it is to assemble the data.
> 
> I go back and forth between the "large data set" model and the "large
> number
> of entities" model.
> 
> In the first:
>    - client requests a large data file
>    - server returns it
> 
> This can be sped up by distributing the file among many sites and
> using something like BitTorrent to put it together, or something like
> Coral ( http://www.coralcdn.org/ ) to redirect to nearby caches.
> 
> But making the code for this is complicated.  It's possible to build
> on BitTorrent and similar systems, but I have no feel for the actual
> implementation cost, which makes me wary.  I've looked into a couple
> of the P2P toolkits and not gotten the feel that it's any easier than
> writing HTTP requests directly.  Plus, who will set up the alternate
> servers?

My hope would be that any system like this could be hidden behind a
single HTTP GET request and hence require no changes to the DAS/2
protocol.  Standard web caches already work this way.  I'm less familiar
with the BitTorrent approach, but I'm guessing that the client-side code
that stitches together the pieces from multiple servers could be
encapsulated in a client-side daemon that responds to localhost HTTP
calls.
 
> In the second:
>    - make query to server
>    - server returns list of N identifiers
>    - make N-n requests (where 'n' is the number of identifiers already
> resolved)
> 
> The id resolution can be done in a distributed fashion and is easily
> supported via web caches, either with well-configured proxies or
(again)
> through Coral.
> 
> I like the latter model in part because it's more fine grained.  Eg,
> a progress bar can say "downloading feature 4 of 10000", and if a
given
> feature is already present there's no need to refetch it.
> 
> The downside of the 2nd is the need for HTTP 1.1 pipelining to make it
> be efficient.  I don't know if we want to have that requirement.  

I'm wary of this "large number of entities" approach, for several
reasons.  Due to the overhead for TCP/IP, HTTP headers, and extra XML
stuff like doctype and namespace declarations, making an HTTP GET
request per feature will increase the total number of bytes that need to
be transmitted.  It will also increase the parsing overhead on the
client side.  And if the features contain little information (for
example just type, parts/parents, and location) that overhead could
easily exceed the time taken to process the "useful" data.  As you
indicated, some performance problems could be alleviated by HTTP 1.1
pipelining, but that adds additional requirements to both client and
server.  Also, for persistent caching on the local machine when you
start splitting up the data into hundreds of thousands of files, I
suspect the additional disk seek time will far exceed disk read time and
become a serious performance impediment.

Having said that, in theory this approach is (almost) testable using the
current DAS/2 spec.  Create one DAS/2 server that in response to feature
queries returns only the minimum required information for "N" features:
id and type.  And have feature ids returned be URLs on another DAS/2
server that _does_ return full feature information (location, alignment,
etc.).  Then make "N-n" single-feature queries with those URLs to get
full information.  Due to the current DAS/2 requirement that any parts /
parents referenced also be included in the same XML doc, this would only
be a reasonable test for features with no hierarchical structure, such
as SNPs.

> Gregg
> came up with the range restrictions because most of the massive
results
> will be from range searches.  By being a bit more clever about
tracking
> what's known and not known, a client can get a much smaller results
> page.
>
>
> These are complementary.  Using Gregg's restricted range queries can
> reduce the number of identifiers returned in a search, making the
> network overhead even smaller.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Mon Nov 28 10:05:33 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 28 Nov 2005 02:05:33 -0800
Subject: [DAS2] das registry and das2
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE0@msex02.affymetrix.com>


> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Friday, November 18, 2005 10:00 AM
> To: DAS/2
> Subject: Re: [DAS2] das registry and das2
> 
> Andreas Prlic:
> > I would like to start a discussion of how to provide a proper DAS
> > interface for
> > our das- registration server at http://das.sanger.ac.uk/registry/
> >
> > Currently it is possible to interact with it using SOAP, or manually
> > via the HTML
> >  interface.  We should also make it accessible using URL requests.
> 
> One of the things Gregg and I talked about at ISMB was that the
> top-level
> "das-sources" format is, or can be, identical to what's needed for the
> registry server.
> 

Some of what we discussed I wrote up in a post ealier this year: 
http://portal.open-bio.org/pipermail/das2/2005-June/000198.html

Another post that might be useful in current discussions is a summary of
what was discussed in the DAS/2 registry meeting we had in Hinxton back
in September 2004:
http://portal.open-bio.org/pipermail/das2/2005-June/000197.html


	gregg


From Gregg_Helt at affymetrix.com  Mon Nov 28 10:58:00 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 28 Nov 2005 02:58:00 -0800
Subject: [DAS2] tiled queries for performance
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE1@msex02.affymetrix.com>

The attachment is a PowerPoint slide showing one of the feature query
optimizations that the IGB client currently uses, which combines
"overlaps" and "inside" filters.  When used consistently this guarantees
that the same feature is not returned in multiple feature queries.
However in general I agree that it is the client's responsibility to
reasonably handle cases where the same feature is returned multiple
times.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Allen Day
> Sent: Wednesday, November 23, 2005 3:50 PM
> To: das2 at portal.open-bio.org
> Subject: Re: [DAS2] tiled queries for performance
> 
> More thoughts on this.  The client can eliminate the redundancy in the
> records returned by issuing the tiling queries as previously described
> (query1), then issuing queries for records that are not contained
within
> tiles, but overlap the boundaries of 1 or more tiles (query2).
> 
> However, by issuing all the overlaps queries at once, we've just
deferred
> the performance hit one step, because we can't reasonably expect the
> server to have cached all combinations of tile overlaps queries.  I
think,
> to get this tiling optimization to work, the burden needs to be on the
> client to identify and remove duplicate responses for multiple
> edge-overlaps queries (query3).
> 
> 1000bp        2000bp        3000bp
> |             |             |
> | ===         | =====^====  |
> |         ====#=====        |
> | ============#=============#=====
> |             |             |
> 
>  <----------->                     query1a
>                <----------->       query1b
>              <o>           <o>     query2
>              <o>                   query3a
>                            <o>     query3b
> 
> Key:
> 
>   |  : tile boundary
>   =  : feature
>   ^  : gap between child features
>   #  : portion of feature overlapping tile boundary.
>  <o> : client overlaps query
>  <.> : client contains query
> 
> -Allen
> 
> 
> 
> On Mon, 21 Nov 2005, Allen Day wrote:
> 
> > Hi,
> >
> > I had an idea of how clients may be able to get better response from
> > servers by using a tiled query technique.  Here's the basic idea:
> >
> > ClientA wants features in chr1/1010:2020, and issues a request for
that
> > range.  No other clients have previously requested this range, so
the
> > server-side cache faults to the DAS/2 service (slow).
> >
> > ClientB wants features in chr1/1020:2030, and issues a request for
that
> > range.  Although the intersection of the resulting records with
> ClientA's
> > query is large, the URIs are different and the server-side cache
faults
> > again.
> >
> > If ClientA and ClientB were to each issue two separate "tiled"
requests:
> >
> >  1. chr1/1001:2000
> >  2. chr1/2001:3000
> >
> > ClientB could take advantage of the fact that ClientA had been
looking
> at
> > the same tiles.
> >
> > For this to work, the clients would need to be using the same tile
size.
> > The optimal tile size is likely to vary from datasource to
datasource,
> > depending on the length and density distributions of the features
> > contained in the datasource.  The "sources" or "versioned sources"
> > payload could suggest a tiling size to prospective clients.  Servers
> could
> > also pre-cache all tiles by hitting each tile after an update of the
> > datasource (or the DAS/2 service code).
> >
> > The tradeoff for the performance gains is that clients may now need
to
> do
> > filtering on the returned records to only return those requested by
the
> > client's client.
> >
> > -Allen
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> >
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAS2_Query_Optimization.ppt
Type: application/vnd.ms-powerpoint
Size: 287744 bytes
Desc: DAS2_Query_Optimization.ppt
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20051128/03f7f254/attachment-0001.ppt>

From ap3 at sanger.ac.uk  Mon Nov 28 11:48:03 2005
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon, 28 Nov 2005 11:48:03 +0000
Subject: [DAS2] DAS intro
In-Reply-To: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
Message-ID: <b3bf5797fbf04cde998830d9822d1733@sanger.ac.uk>

Hi!


> How about this instead, as an overview/introduction.
>
>  ======
>
> DAS/2 describes a data model for genome annotations.

Can we formulate the start a little more general?

something like:

DAS/2  is a protocol to share biological data. It provides 
specifications for how
to share annotations of genomes and proteins, assays, ontologies  
(space fore more here...).

then I would continue with your text.


Cheers,
Andreas


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From dalke at dalkescientific.com  Mon Nov 28 17:10:30 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 28 Nov 2005 18:10:30 +0100
Subject: [DAS2] mtg topics for Nov 28
Message-ID: <b4c91693e3e2df1fc5b124ca4ac4a04a@dalkescientific.com>

Here are the spec issues I would like to talk about for today's meeting,
culled from the last few weeks of emails and phone calls

1) DAS Status Code in headers

The current spec says
>  X-DAS-Status: XXX status code
>
> The list of status codes is similar, but not identical, to those used  
> by DAS/1:
>
> 200 OK, data follows
> 400 Bad namespace
> 401 Bad data source
> 402 Bad data format
> 403 Unknown object ID
> 404 Invalid object ID
> 405 Region coordinate error
> 406 No lock
> 407 Access denied
> 500 Server error
> 501 Unimplemented feature

I argued that these are not needed.  Some of them are duplicates with
HTTP error codes and those which are not can be covered by an error
code "300" along with an (optional) XML payload.

The major problem with doing this seems to be in how MS IE handles
certain error codes.  While IE is not a target browser, MS software
may use IE as a component for fetching data.  From the link Ed dug
up, it looks like this won't be a problem.

Lincoln's last email on this was a tepid

> I give up arguing this one and will go with the way Andrew wants to do
> it. Therefore I propose the following rules:
>
> 	1) Return the HTTP 404 error for the case that any component of the  
> DAS2 path
> 	is invalid. This would apply to the following situations:
>
> 		Bad namespace
> 		Bad data source
> 		Unknown object ID
>
> 	2) Return HTTP 301 and 302 redirects when the requested object has
> moved.
>
> 	3) Return HTTP 403 (forbidden) for no-lock errors.
>
> 	4) Return HTTP 500 when the server crashes.
>
> For all errors there should be a text/x-das-error entity returned that
> describes the error in more detail.

The "x-das-error" format must have an invariant string, either an
error code or fixed text, and a possible optional explanatory text
section. Note the "should" in that last paragraph - this is optional.


2) Content-type

There was some discussion about changing the content type to
"text/xml" to support viewing DAS results in a browser.  We decided
that that wasn't a valid use case.

In doing the research for this I found that the general recommendation
for these sorts of XML documents is to put the document under  
"application/*"
instead of "text/*".

One reason is from http://www.ietf.org/rfc/rfc3023.txt

    If an XML document -- that is, the unprocessed, source XML document
    -- is readable by casual users, text/xml is preferable to
    application/xml.  MIME user agents (and web user agents) that do not
    have explicit support for text/xml will treat it as text/plain, for
    example, by displaying the XML MIME entity as plain text.
    Application/xml is preferable when the XML MIME entity is unreadable
    by casual users.  Similarly, text/xml-external-parsed-entity is
    preferable when an external parsed entity is readable by casual
    users, but application/xml-external-parsed-entity is preferable when
    a plain text display is inappropriate.

       NOTE: Users are in general not used to text containing tags such
       as <price>, and often find such tags quite disorienting or
       annoying.  If one is not sure, the conservative principle would
       suggest using application/* instead of text/* so as not to put
       information in front of users that they will quite likely not
       understand.

Another is the difference in how application/* and text/* handle
character set encodings.

We use "text/x-...+xml" - I propose changing this to  
"application/x-...+xml"

I don't think there are any objections to this.  The main objection is
to the difficulty of ploughing through all the specs related to charsets
and unicode.

3) Key/value data

As Steve pointed out, the spec is incomplete on how to handle key/value
data associated with a record.  The main problem is in how it handles
namespaces.  It mixes an internal attribute value namespace with the
xml namespace, which doesn't happen.

For example,

<FEATURES
      xmlns="http://www.biodas.org/ns/das/genome/2.00"
      xmlns:das="http://www.biodas.org/ns/das/properties/2.00"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://www.wormbase.org/das/genome/volvox/1/">

  <FEATURE   id = "feature/cTel54X"
           type = "type/gene"
           name = "tg-3">

     <PROP ptype  = "das:note">This is a telomeric repeat</PROP>
     <PROP ptype  = "das:alias">birx28</PROP>
     <PROP  ptype = "property/entrez_dbxref"


Steve proposed using xml:namespaced attributes, like

<FEATURES
      xmlns="http://www.biodas.org/ns/das/genome/2.00"
      xmlns:das="http://www.biodas.org/ns/das/properties/2.00"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://www.wormbase.org/das/genome/volvox/1/">

  <FEATURE   id = "feature/cTel54X"
           type = "type/gene"
           name = "tg-3">

     <PROP das:ptype="das:prop#note">This is a telomeric repeat</PROP>
     <PROP das:ptype="property/genefinder-score">29</PROP>
     <PROP das:ptype="das:prop#protein_translation"
       xlink:type="simple"
        
xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
CTEL54X.1
     /></PROP>

I proposed using the "eXtensible" in XML, like this

<FEATURES
      xmlns="http://www.biodas.org/ns/das/genome/2.00"
      xmlns:das="http://www.biodas.org/ns/das/properties/2.00"
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://www.wormbase.org/das/genome/volvox/1/">

  <FEATURE   id = "feature/cTel54X"
           type = "type/gene"
           name = "tg-3">

     <das:note>This is a telomeric repeat</PROP>
     <some_other_ns:gf-score>29</PROP>
     <das:protein_translation"
       xlink:type="simple"
        
xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
CTEL54X.1
     /></PROP>


Steve's concern with this was the validation.  I looked into the
Relax-NG spec and it support this just fine.

4) Standard form for key/value pairs

Furthermore, I looked into how Atom handles this.  They also allow
extensible key/value data in parts of the spec.  Quoting from an
earlier email, which I now see I only sent to Steve

> In an extensible section there are two/three categories of elements:
>   - those in the "atom:" namespace
>   - "simple extension elements" not in the "atom:" namespace
>   - "structured extension elements" not in the "atom:" namespace.
>
> Most of the "atom:" elements share a common structure.  For example:
>   - the type= attribute indicates of the contents are text, escaped
>       HTML or XHTML; or an explicit content-type like "chemical/x-pdb".
>
>   - the src= attribute indicates that the content of the element is
>       empty and to go to the given URL instead (apparently the hip
>       term for URL these days is IRL - internationalized Resource  
> Identifiers.
>       I think we only need to use URLs)
>
>
> These are not always used for all elements; if it's appropriate for a
> given field then it's used.
>
>
>  Simple extension elements are always of the form
>     <element>Content goes here</element>
> where 'element' is not part of the 'atom:' namespace.  Consumers of
> this data may treat it as simple key/value data.
>
>  Structured extension elements always have at least an attribute
> or a sub-element, so must look like
>   <element attr="xyz"> .. </element>
> -or-
>   <element> .. <subelement /> .. </element>
>
> If the element isn't known this field may be ignored.
>
> These three things provide for:
>   - a set of well-define elements, understandable by everyone
>   - a simple extension for things which can be key/value data
>   - a way to store or refer to more complex data types


5) xlink and <link>

Several places in the spec include or may include links to documents
elsewhere.  The XLink specification describes an general extensibility
mechanism for such links.

xlinks have 1 of about 4 properties, the most important are:
   - where does the link go to
   - what kind of link is it
   - what should the browser do with such a link

I personally don't understand the xlink spec well enough to want
to use it, and I haven't come across examples of it in use.  I am
wary about specs like that.

Another is to use something like the <link> element from HTML 4.0
and in Atom.  This looks something like

  <link rel="density.experimental_xray" type="chemical/x-ccp4-edm"
     href="http://blah.blah/"></link>

that is, it has:
   - a category for how the link is related to the given object ('rel')
   - an optional MIME type (use, eg, if the server has multiple ways
         to provide data for the same 'rel' category)
   - an href to the data

As implemented in Atom the contents of a <link> are extensible,
which allows people to experiment with things like mirroring.


<link rel="something" title="This is a title"
       xmlns:x="blah/blah" href="http://default>
   <x:mirror href="http://here/"/>
   <x:mirror href="http://there/"/>
   <x:mirror href="http://everywhere/"/>
</link>

In any case we need a way to provide typed links to other documents.
Such links may include:
   - link from a given feature to the versioned source
   - link from a versioned source to the lock document

6) Source filters

This comes from Andreas Prlic.

We can support metadata servers via the same <SOURCES> document
returned from the entry point to a DAS server.

However, a metadata server may also support searches, eg, to show
only H. sapiens annotations using the build 1234 assembly.

Should we make this property searching part of the DAS/2 spec, which
means everyone must support it, or should we say it's optional
but if implemented it must be done in a standard way?

Or leave it for version 2.1, once we have more experience with
DAS in real-life?  (Though we already have that experience.)

7) /regions

Could someone please explain to me the point of the /region subtree?

As far as I can tell, a region is just a type of feature.  A generic
feature is located somewhere on the genome (with respect to a given
assembly), and may also say it's on various 'region' features.

I don't see the need for a separate namespace for this.

8) Tiled queries

Do they need spec changes, or spec recommendations?


I think I've mentioned everything to be covered.


					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Mon Nov 28 17:14:28 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 28 Nov 2005 09:14:28 -0800
Subject: [DAS2] tiled queries for performance
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE2@msex02.affymetrix.com>

I don't think we should allow servers to return features than do not
meet the criteria specified in the query feature filters, it's an
invitation for ambiguity.  This may seem harmless with just an
"overlaps" region filter, but what about "inside", "contains",
"identical"?  What about "type", etc?

If different DAS/2 server implementations contain the same data, they
should return the same set of features for a given feature query.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Friday, November 25, 2005 3:43 PM
> To: Asim Siddiqui
> Cc: DAS/2
> Subject: Re: [DAS2] tiled queries for performance
>
> > The change is simply that instead of the client getting exactly what
it
> > asks for, it may get more.
> 
> While that's another matter - the client makes a request
> and the server is free to expand the range to something it can handle
> a bit better.  Allen?  Were you suggesting this instead?
> 
> In this case there is a change to the spec, and all clients must
> be able to filter or otherwise ignore extra results.
> 
> I personally think it's an implementation issue related to performance
> and there are ways to make the results be generated fast enough.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 


From dalke at dalkescientific.com  Mon Nov 28 17:14:52 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 28 Nov 2005 18:14:52 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <b3bf5797fbf04cde998830d9822d1733@sanger.ac.uk>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<b3bf5797fbf04cde998830d9822d1733@sanger.ac.uk>
Message-ID: <e7a0e92638b29d2f5fd3655f879d1a06@dalkescientific.com>

Andreas Prlic:
> Can we formulate the start a little more general?
>
> something like:
>
> DAS/2  is a protocol to share biological data. It provides 
> specifications for how
> to share annotations of genomes and proteins, assays, ontologies  
> (space fore more here...).

I thought about that, but the DAS/2.0 spec doesn't include any of those.
Perhaps be more definite instead and say this is DAS/2.0?

Or say "Other projects (link, link, link) extend DAS/2 to protein,
assay and ontology data sets."


					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Mon Nov 28 17:24:32 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 28 Nov 2005 12:24:32 -0500
Subject: [DAS2] DAS intro
In-Reply-To: <9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
Message-ID: <200511281224.32885.lstein@cshl.edu>

>     <LOC  pos="region/Chr3/1271:1507:1" seq="sequence/Chr3/1271:1507:1"/>
>
>
> <region> is only a link to the sequence and a length, as in:
>
>     <REGION id="../sequence/ctg2/100:200" length="100" name="ABCDE" />

You know, this is still kind of ugly. I hate to revisit this so late in the 
game, but can't we make sequence retrieval a three-step process?

1) Feature request returns:

	<LOC  pos="region/Chr3/1271:1507:1" />

2) Region request returns:

	<REGION id="Chr3/1271:1507:1" seq="../sequence/Chr3/1271:1507:1" />

(where seq= could be an absolute URL if someone else owns the bases)

3) Sequence request then returns the bases

Lincoln

>
>
> One alternate possibility is to change that so "pos" points to a
> /feature (instead of a /region) and have features for each contig or
> other assembly component.  The result would look like:
>
>     <LOC  pos="feature/AB1234/671:907:1"
> seq="sequence/Chr3/1271:1507:1"/>
>
>    <FEATURE id="feature/AB1234" type="ABCDE_type"> ...
>
> Doing this, however, means that all features must support subranges.
>
>
> As an alternate solution without ranges, use
>
>     <LOC  pos="feature/AB1234" seq="sequence/Chr3/1271:1507:1"/>
>
> and then look up the sequence coordinates of feature/AB1234 to
> figure out where it starts/stops.
>
>
> The other advantage to a region is you can ask for the assembly
> via the 'agp' format.  But because of the the existing support for
> formats which are only valid for some feature you can do that by asking
> for, say, all assembly_component features (via the feature filter) and
> return
> the results in 'agp' format.
>
> > Third, just think of "reference sequence" as a coordinate system. One
> > can have the exact same feature and indicate that: on
> > coordinate-system-A this feature starts and ends here, and on
> > coordinate-system-B it starts and ends there. Thus a feature's
> > coordinates may be given both on a chromosome, and on a contig, and on
> > any other coordinate-system that can be derived through a transform
> > from these.
>
> I believe I understand this.  There really is only one reference frame
> for
> the entire genome sequence, for a given assembly, and all other
> coordinate
> systems are a fixed and definite offset of that single reference frame.
> I believe this is called the golden path?
>
> My reference to accuracy is because I figured that given two features
> A and B on an assembly component X then the fuzziness in the relative
> distance between A and B is small if X is also small.  That is, smaller
> terms are less likely to have changes as the golden path changes.
>
> >  So you could change the sentence below to read "A reference server
> > may supply features where the locations (start and end) are relative
> > to either contigs, some other arbitrary region, or to the entire
> > chromosome."
>
> Why not always supply it relative to the chromosome coordinates?  The
> spec
> now allows that as an optional field.  I can't figure out why you would
> want to do otherwise.
>
> Is it because sometimes it's easier to work with, say, a large number of
> contig reference frames than with one large reference frame?  Does that
> mean we shift the complexity of coordinate translation from the data
> provider to the data consumer?  (Making it easier to generate data than
> to consume data.)
>
> > This one is perhaps too subtle for the introduction, but if we decide
> > to include it then I think it should first be phrased in terms of the
> > problem (biological sampling) and then in terms of the solution
> > (multiple parents).
>
> Oh, definitely.  It's some place where I just don't have the domain
> knowledge to explain it or even come up with examples.
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Mon Nov 28 17:08:35 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 28 Nov 2005 12:08:35 -0500
Subject: [DAS2] DAS intro
In-Reply-To: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
Message-ID: <200511281208.36204.lstein@cshl.edu>

Yes, this is a better intro.

Lincoln

On Friday 25 November 2005 10:21 am, Andrew Dalke wrote:
> The front of the DAS doc starts
>
>    DAS 2.0 is designed to address the shortcomings of DAS 1.0, including:
>
> That kinda assumes people know what DAS 1.0 is to understand DAS 2.0.
>
> How about this instead, as an overview/introduction.
>
>   ======
>
> DAS/2 describes a data model for genome annotations.  An annotation
> server provides information about one or more genome sources.  Each
> source may have one or more versions.  Different versions are usually
> based on different assemblies.  As an implementation detail an
> assembly and corresponding sequence data may be distributed via a
> different machine, which is called the reference server.  Portions of
> the assembly may have higher relative accuracy than the assembly as a
> whole.  A reference server may supply these portions as an alternate
> reference frame.
>
> Annotations are located on the genome with a start and end position.
> The range may be specified mutiple times if there are alternate
> reference frames.  An annotation may contain multiple non-continguous
> parts, making it the parent of those parts.  Some parts may have more
> than one parent.  Annotations have a type based on terms in SOFA
> (Sequence Ontology for Feature Annotation).  Stylesheets contain a set
> of properties used to depict a given type.
>
> Annotations can be searched by range, type, and a properties table
> associated with each annotation.  These are called feature filters.
>
> DAS/2 is implemented using a ReST architecture.  Each entity (also
> called a document or object) has a name, which is a URL.  Fetching the
> URL gets information about the entity.  The DAS-specific entities are
> all XML documents.  Other entities contain data types with an existing
> and frequently used file format.  Where possible, a DAS server returns
> data using existing formats.  In some cases a server may describe how
> to fetch a given entity in several different formats.
>   ======
>
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Mon Nov 28 17:11:24 2005
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 28 Nov 2005 12:11:24 -0500
Subject: [DAS2] tiled queries for performance
In-Reply-To: <9ec33e6fb3efbbe8b39adc52d2b78db7@dalkescientific.com>
References: <86C6E520C12E52429ACBCB01546DF4D3BE3E5E@xchange1.phage.bcgsc.ca>
	<9ec33e6fb3efbbe8b39adc52d2b78db7@dalkescientific.com>
Message-ID: <200511281211.25239.lstein@cshl.edu>

One thing to do is to add to the spec a note that the server is free to return 
features from a range larger than requested. This way the server is free to 
expand the range to the 1k boundaries.

My preference, however, would be for the server to implement a filter that 
removes from the precalculated tiled XML output all features that are outside 
the range. This would be completely transparent to the client.

Lincoln

On Friday 25 November 2005 06:43 pm, Andrew Dalke wrote:
> Asim Siddiqui
>
> > I think this is a great idea.
> >
> > I don't see this as a big change to the DAS/2 spec or requiring much in
> > the way of additional smarts on the client side.
>
> I agree with Allen on this - in some sense there's no effect on the
> spec.  It ends up being an agreement among the clients to request
> aligned data, by rounding up/down to the nearest, say, kilobase and
> for the server implementers to cache those requests.
>
> > The change is simply that instead of the client getting exactly what it
> > asks for, it may get more.
>
> While that's another matter - the client makes a request
> and the server is free to expand the range to something it can handle
> a bit better.  Allen?  Were you suggesting this instead?
>
> In this case there is a change to the spec, and all clients must
> be able to filter or otherwise ignore extra results.
>
> I personally think it's an implementation issue related to performance
> and there are ways to make the results be generated fast enough.
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From Gregg_Helt at affymetrix.com  Mon Nov 28 17:30:27 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 28 Nov 2005 09:30:27 -0800
Subject: [DAS2] Agenda for today's DAS/2 meeting
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE3@msex02.affymetrix.com>

Today we're going over spec issues.  Here's my short list of topics to
cover:
 
DAS-specific headers
Error codes
Feature properties
Registry & Discovery
 
Please feel free to add!
 
            gregg
 

From td2 at sanger.ac.uk  Mon Nov 28 17:27:31 2005
From: td2 at sanger.ac.uk (Thomas Down)
Date: Mon, 28 Nov 2005 17:27:31 +0000
Subject: [DAS2] DAS intro
In-Reply-To: <e7a0e92638b29d2f5fd3655f879d1a06@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<b3bf5797fbf04cde998830d9822d1733@sanger.ac.uk>
	<e7a0e92638b29d2f5fd3655f879d1a06@dalkescientific.com>
Message-ID: <83634851-73AD-454A-B027-644539CF1869@sanger.ac.uk>


On 28 Nov 2005, at 17:14, Andrew Dalke wrote:

> Andreas Prlic:
>> Can we formulate the start a little more general?
>>
>> something like:
>>
>> DAS/2  is a protocol to share biological data. It provides  
>> specifications for how
>> to share annotations of genomes and proteins, assays, ontologies   
>> (space fore more here...).
>
> I thought about that, but the DAS/2.0 spec doesn't include any of  
> those.

There are pages about assay and ontology retrieval on the website.   
Are these not part of the spec?  Or are they being counted as  
something else (DAS/2.1?)

            Thomas.


From dalke at dalkescientific.com  Mon Nov 28 18:09:17 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 28 Nov 2005 19:09:17 +0100
Subject: properties and key/value data (was Re: [DAS2] Spec issues)
In-Reply-To: <cfb538488b311ba27c955c8ba060d8f9@dalkescientific.com>
References: <BF9E7C8C.17F89%Steve_Chervitz@affymetrix.com>
	<cfb538488b311ba27c955c8ba060d8f9@dalkescientific.com>
Message-ID: <a70ab5ce08de37c456fe791ca4178b66@dalkescientific.com>

Here's the email I sent to Steve that I meant to send to everyone.

On Nov 17, 2005, at 2:09 AM, Andrew Dalke wrote:

> I think I understand the Atom spec better now.  In brief, the
> Atom document contains sections which are extensible and sections
> which are not.
>
> In an extensible section there are two/three categories of elements:
>   - those in the "atom:" namespace
>   - "simple extension elements" not in the "atom:" namespace
>   - "structured extension elements" not in the "atom:" namespace.
>
> Most of the "atom:" elements share a common structure.  For example:
>   - the type= attribute indicates of the contents are text, escaped
>       HTML or XHTML; or an explicit content-type like "chemical/x-pdb".
>
>   - the src= attribute indicates that the content of the element is
>       empty and to go to the given URL instead (apparently the hip
>       term for URL these days is IRL - internationalized Resource  
> Identifiers.
>       I think we only need to use URLs)
>
>
> These are not always used for all elements; if it's appropriate for a
> given field then it's used.
>
>
>  Simple extension elements are always of the form
>     <element>Content goes here</element>
> where 'element' is not part of the 'atom:' namespace.  Consumers of
> this data may treat it as simple key/value data.
>
>  Structured extension elements always have at least an attribute
> or a sub-element, so must look like
>   <element attr="xyz"> .. </element>
> -or-
>   <element> .. <subelement /> .. </element>
>
> If the element isn't known this field may be ignored.
>
> These three things provide for:
>   - a set of well-define elements, understandable by everyone
>   - a simple extension for things which can be key/value data
>   - a way to store or refer to more complex data types
>
>
> Steve, responding to an earlier posting of mine:
>> Interesting, but a problem with this is that it effectively creates a
>> new version of the TYPES schema every time a new property is added to
>> the DAS properties controlled vocabulary. I would hope for a solution
>> that decouples the content of the controlled vocab from the data
>> exchange format.
>
> I looked into that.  Relax-NG lets you define a "can be anything
> except ...".  The Atom spec is defined with the following
>
> # Simple Extension
>
> simpleExtensionElement =
>    element * - atom:* {
>       text
>    }
>
> # Structured Extension
>
> structuredExtensionElement =
>    element * - atom:* {
>       (attribute * { text }+,
>          (text|anyElement)*)
>     | (attribute * { text }*,
>        (text?, anyElement+, (text|anyElement)*))
>    }
>
> The "element * - atom:*" means "Any element except those in
> the atom namespace."
>
> Thus we can validate anything with DAS/2 tags, and ignore
> validate of anything not part of DAS/2.  And we can say that
> extensions are only allowed in certain parts of the spec and
> not in others.
>
> We would need to update the schema when we add new "das:" elements,
> but we already need to do that.
>
> We wouldn't need to change the schema to allow others to develop
> their own extensions. Indeed, the schema would still let use
> verify that extensions are still well-formed.
>
>> Here's my next attempt, which more fully exploits xml:base to achieve
>> this decoupling:
>>
>>   <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
>>             xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>>             xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>>             xmlns:xlink="http://www.w3.org/1999/xlink"
>>>
>>     <FEATURE das:id="feature/cTel54X.1.2"
>>              das:type="type/curated_exon">
>>       <PROPERTIES>
>>         <PROP das:ptype="property/genefinder-score">29</PROP>
>>       </PROPERTIES>
>>       <PROPERTIES
>> xml:base="http://www.biodas.org/ns/das/genome/2.00/properties">
>>         <PROP das:ptype="phase">2</PROP>
>>         <PROP das:ptype="protein_translation"
>>               xlink:type="simple"
>>
>> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/ 
>> CTEL54X.1"
>> />
>>       </PROPERTIES>
>>     </FEATURE>
>
> Vs.
>
> <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
>             xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>              
> xmlns:prop="http://www.biodas.org/ns/das/genome/2.00/properties"
>           xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>           xmlns:xlink="http://www.w3.org/1999/xlink">
>   <FEATURE id="feature/xTel54X.1.2"
>            das:type="type/curated_exon">
>      <prop:genefinder-score>29</score>
>      <prop:phase>2</phase>
>      <prop:protein_translation
>   src="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1"  
> />
>   </FEATURE>
> </FEATURES>
>
> The main differences are:
>   - the properties are defined elements in the prop: namespace (though
>       I think they can just as easily be in the das: namespace)
>
>   - I'm using lower-case since that seems to be the trend these days.
>
>
>
>> So now we have the following arrangement:
>>
>>  * the attribute keys 'das:id', 'das:type', and 'das:ptype' are  
>> defined
>>    within the xmlns:das namespace (i.e., the full id of 'das:type' is
>>    derived by appending 'type' to the xmlns:das URL).
>
> I don't follow why the attributes have full namespaces.  Is that
> to allow extensibility of element attribute on a per-element basis?
>
> I kept "das:type" above because "type" already has too many meanings.
>
>>  * the attributes values of 'das:id', 'das:type', and 'das:ptype' are
>>    URLs relative to xml:base.
>
> Are all attribute values relative to xml:base or only those three?
>
> Are xlink:href fields relative to xml:base as well?  I assume "yes".
>
>>  * The FEATURE element may contain zero or more PROPERTIES
>>    sub-elements, each with it's own xml:base attribute, effectively
>>    changing what xml:base is used within the containted PROP
>>    sub-elements.
>>
>> So in this example, the property  
>> 'das:ptype="property/genefinder-score"'
>> inherits its xml:base from its grandparent FEATURES element and so
>> expands to:
>>
>> http://www.wormbase.org/das/genome/volvox/1/property/genefinder-score
>>
>> while the 'das:ptype="phase"' and 'das:ptype="protein_translation"'
>> properties inherit xml:base from their PROPERTIES parent element and
>> so expand to:
>>
>> http://www.biodas.org/ns/das/genome/2.00/properties/phase
>> http://www.biodas.org/ns/das/genome/2.00/properties/ 
>> protein_translation
>
> This is also what happens with the "prop:" namespaced elements, just
> at the element level instead of the attribute level.
>
> To keep this on key/value data I've shifted the rest of the reply
> to the next email.

					Andrew
					dalke at dalkescientific.com


From asims at bcgsc.ca  Mon Nov 28 19:21:47 2005
From: asims at bcgsc.ca (Asim Siddiqui)
Date: Mon, 28 Nov 2005 11:21:47 -0800
Subject: [DAS2] tiled queries for performance
Message-ID: <86C6E520C12E52429ACBCB01546DF4D3BE3EF8@xchange1.phage.bcgsc.ca>


Agreed - in light of this, my suggestion doesn't make sense,
though Allen's idea may be workable through some other means.

Asim
 

-----Original Message-----
From: Helt,Gregg [mailto:Gregg_Helt at affymetrix.com] 
Sent: Monday, November 28, 2005 9:14 AM
To: Andrew Dalke; Asim Siddiqui
Cc: DAS/2
Subject: RE: [DAS2] tiled queries for performance

I don't think we should allow servers to return features than do not
meet the criteria specified in the query feature filters, it's an
invitation for ambiguity.  This may seem harmless with just an
"overlaps" region filter, but what about "inside", "contains",
"identical"?  What about "type", etc?

If different DAS/2 server implementations contain the same data, they
should return the same set of features for a given feature query.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Friday, November 25, 2005 3:43 PM
> To: Asim Siddiqui
> Cc: DAS/2
> Subject: Re: [DAS2] tiled queries for performance
>
> > The change is simply that instead of the client getting exactly what
it
> > asks for, it may get more.
> 
> While that's another matter - the client makes a request and the 
> server is free to expand the range to something it can handle a bit 
> better.  Allen?  Were you suggesting this instead?
> 
> In this case there is a change to the spec, and all clients must be 
> able to filter or otherwise ignore extra results.
> 
> I personally think it's an implementation issue related to performance

> and there are ways to make the results be generated fast enough.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 


From allenday at ucla.edu  Mon Nov 28 20:11:59 2005
From: allenday at ucla.edu (Allen Day)
Date: Mon, 28 Nov 2005 12:11:59 -0800 (PST)
Subject: [DAS2] tiled queries for performance
In-Reply-To: <200511281211.25239.lstein@cshl.edu>
References: <86C6E520C12E52429ACBCB01546DF4D3BE3E5E@xchange1.phage.bcgsc.ca>
	<9ec33e6fb3efbbe8b39adc52d2b78db7@dalkescientific.com>
	<200511281211.25239.lstein@cshl.edu>
Message-ID: <Pine.LNX.4.58.0511281209030.32182@sumo.ctrl.ucla.edu>

On Mon, 28 Nov 2005, Lincoln Stein wrote:

> One thing to do is to add to the spec a note that the server is free to return 
> features from a range larger than requested. This way the server is free to 
> expand the range to the 1k boundaries.

This would require the returned payload to contain the bounds of the 
features actually returned.  E.g. if client asks for 1500..1600, and 
server responds with 1001..2000, it needs a way to tell the client what 
the actual bounds of the response are.

> 
> My preference, however, would be for the server to implement a filter that 
> removes from the precalculated tiled XML output all features that are outside 
> the range. This would be completely transparent to the client.

Yes, this is what I plan to do if we agree to use one of the tiling
variants.

-Allen

> 
> Lincoln
> 
> On Friday 25 November 2005 06:43 pm, Andrew Dalke wrote:
> > Asim Siddiqui
> >
> > > I think this is a great idea.
> > >
> > > I don't see this as a big change to the DAS/2 spec or requiring much in
> > > the way of additional smarts on the client side.
> >
> > I agree with Allen on this - in some sense there's no effect on the
> > spec.  It ends up being an agreement among the clients to request
> > aligned data, by rounding up/down to the nearest, say, kilobase and
> > for the server implementers to cache those requests.
> >
> > > The change is simply that instead of the client getting exactly what it
> > > asks for, it may get more.
> >
> > While that's another matter - the client makes a request
> > and the server is free to expand the range to something it can handle
> > a bit better.  Allen?  Were you suggesting this instead?
> >
> > In this case there is a change to the spec, and all clients must
> > be able to filter or otherwise ignore extra results.
> >
> > I personally think it's an implementation issue related to performance
> > and there are ways to make the results be generated fast enough.
> >
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> 
> 


From Steve_Chervitz at affymetrix.com  Mon Nov 28 22:07:29 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 28 Nov 2005 14:07:29 -0800
Subject: properties and key/value data (was Re: [DAS2] Spec issues)
In-Reply-To: <BF9E7C8C.17F89%Steve_Chervitz@affymetrix.com>
Message-ID: <BFB0BFA1.186F4%Steve_Chervitz@affymetrix.com>

To give some context to the message that Andrew recently forwarded to the
list, below is the message I sent to Andrew that prompted his reply (I also
meant to send to the list instead of to just Andrew).

It contains my fix to the 'namespace in attribute values' problem regarding
properties which I mentioned in today's conf call, and is, I believe, the
only viable alternative to Andrew's relax-NG based solution.

Basically, the trick is to enclose PROP elements that are relative to the
same xml:base within a parent PROPERTIES element and then permit multiple
PROPERTIES elements within a feature. This way you can allow property
attribute URIs that are relative to different xml:bases.

To clarify a point of possible confusion, there are really two sets of
key-value pairs to keep in mind:

1. The key-value pair for the property type.
2. The key-value pair for the property itself.

So in this example:

  <PROP das:ptype="property/genefinder-score">29</PROP>

The key for the type is 'das:ptype' and it's value is
'property/genefinder-score' and this value is a relative URL based on
xml:base in the enclosing PROPERTIES element (or in it's grandparent or
great-grandparent element, etc.). The value of the property itself is 29 and
it's key is the whole key-value pair for the type (
das:ptype="property/genefinder-score").

In Andrew's Relax-NG equivalent:

  <prop:genefinder-score>29</score>

the element name contains both the key ('prop:') and the value of the
property type ('genefinder-score'), while the element name as a whole serves
as the key for the property itself (value=29). The 'prop:genefinder-score'
string is not a relative URL, but is just a namespace-scoped element name,
with 'prop:' serving merely to make 'genefinder-score' globally unique,
relative to the URI defined by:

  xmlns:prop="http://www.biodas.org/ns/das/genome/2.00/properties"

A potential drawback of the Relax-NG approach, as discussed in today's conf
call, is that the value of the property type is not resolvable as in the
other approach using the PROPERTIES parent element.

Andrew doesn't see a need for resolvability, e.g., for a dynamically
discoverable schema fragment. But I thought of another use case besides the
one mentioned in today's call (determining data type such as int or float,
which isn't of much use in practice). The URL for the type could point to a
human readable definition of the term. A user may not need clarification of
'genefinder-score' but might for something like 'softberry-ztuple'.

One could still satisfy such a use case under the Relax-NG approach by
providing a resolvable URL based on the element name + namespace such as:

http://www.biodas.org/ns/das/genome/2.00/properties#genefinder-score

True, there's no XML spec that says this is legal, but we could declare that
such a convention will hold for all biodas.org-based properties. One problem
with the above convention is that it's not obvious what the URL resolves to.
So we could have something like:

http://www.biodas.org/ns/das/genome/2.00/properties?prop=genefinder-score&de
fine=true

http://www.biodas.org/ns/das/genome/2.00/properties?prop=genefinder-score&sc
hema=true

Just a thought.

Steve 

> From: Steve Chervitz <Steve_Chervitz at affymetrix.com>
> Date: Mon, 14 Nov 2005 17:40:28 -0800
> To: Andrew Dalke <dalke at dalkescientific.com>
> Conversation: [DAS2] Spec issues
> Subject: Re: [DAS2] Spec issues
> 
> 
> Andrew Dalke <dalke at dalkescientific.com> wrote on 14 Nov 2005:
>> 
>> To: DAS/2 <das2 at portal.open-bio.org>
>> Subject: Re: [DAS2] Spec issues
>> 
>> On Nov 4 Steve wrote:
>>>     <FEATURE das:id="feature/cTel54X.1.2"
>>>              das:type="type/curated_exon">
>>>       <PROP das:ptype="property/genefinder-score">29</PROP>
>>>       <PROP das:ptype="das:prop#phase">2</PROP>
>>>       <PROP das:ptype="das:prop#protein_translation"
>>>             xlink:type="simple"
>>>    
>>> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/
>>> CTEL54X.1
>>> />
>>>     </FEATURE>
>> 
>> I think we're missing something.  This is XML.  We can do
>> 
>> <TYPES xml:base="http://www.wormbase.org/dase/genome/volvox/1/type">
>>    <TYPE id="curated_gene"
>>            ontology="http://song.sf.net/ontologies/sofa#gene"
>>            source="curated"
>>            xml:base="gene/">
>>      <das:ptype name="property/genefinder-score">29</das:ptype>
>>      <das:phase>2</das:phase>
>>      <das:protein_translation xlink:type="simple"
>> xlink:href="http://www.wormbase.org/..." />
>>      <xyz:ack type="html">This message brought to you by
>> AT&amp;T</xyz:ack>
>>    </TYPE
>> </TYPES>
>> 
>> The whole point of having namespaces in XML is to keep from needing
>> to define new namespaces like <PROP>.
>> 
>> In doing that, there's no problem in supporting things like "bg:glyph",
>> etc. because the values are expanded as expected by the XML processor.
> 
> Interesting, but a problem with this is that it effectively creates a
> new version of the TYPES schema every time a new property is added to
> the DAS properties controlled vocabulary. I would hope for a solution
> that decouples the content of the controlled vocab from the data
> exchange format.
> 
> Here's my next attempt, which more fully exploits xml:base to achieve
> this decoupling:
> 
>   <FEATURES xmlns="http://www.biodas.org/ns/das/genome/2.00/"
>             xmlns:das="http://www.biodas.org/ns/das/genome/2.00/"
>             xml:base="http://www.wormbase.org/das/genome/volvox/1/"
>             xmlns:xlink="http://www.w3.org/1999/xlink"
>> 
>     <FEATURE das:id="feature/cTel54X.1.2"
>              das:type="type/curated_exon">
>       <PROPERTIES>
>         <PROP das:ptype="property/genefinder-score">29</PROP>
>       </PROPERTIES>
>       <PROPERTIES
> xml:base="http://www.biodas.org/ns/das/genome/2.00/properties">
>         <PROP das:ptype="phase">2</PROP>
>         <PROP das:ptype="protein_translation"
>               xlink:type="simple"
>               
> xlink:href="http://www.wormbase.org/das/protein/volvox/2/feature/CTEL54X.1" />
>       </PROPERTIES>
>     </FEATURE>
> 
> So now we have the following arrangement:
> 
>  * the attribute keys 'das:id', 'das:type', and 'das:ptype' are defined
>    within the xmlns:das namespace (i.e., the full id of 'das:type' is
>    derived by appending 'type' to the xmlns:das URL).
> 
>  * the attributes values of 'das:id', 'das:type', and 'das:ptype' are
>    URLs relative to xml:base.
> 
>  * The FEATURE element may contain zero or more PROPERTIES
>    sub-elements, each with it's own xml:base attribute, effectively
>    changing what xml:base is used within the containted PROP
>    sub-elements. 
> 
> So in this example, the property 'das:ptype="property/genefinder-score"'
> inherits its xml:base from its grandparent FEATURES element and so
> expands to: 
> 
> http://www.wormbase.org/das/genome/volvox/1/property/genefinder-score
> 
> while the 'das:ptype="phase"' and 'das:ptype="protein_translation"'
> properties inherit xml:base from their PROPERTIES parent element and
> so expand to:
> 
> http://www.biodas.org/ns/das/genome/2.00/properties/phase
> http://www.biodas.org/ns/das/genome/2.00/properties/protein_translation
> 
> 
>>> Also, we might want to allow some controlled vocabulary terms to be
>>> used for
>>> the value of type.source (e.g., "das:curated"), to ensure that
>>> different
>>> users use the same term to specify that a feature type is produced by
>>> curation.
>> 
>> I talked with Andreas Prlic about what other metadata is needed for the
>> registry system.  He mentioned
>> 
>>      Together with the BioSapiens DAS people we recently decided that
>>      there should be the possibility to assign gene-ontology evidence
>>      codes to each das source, so in the next update of the registry,
>>      this will be changed.
>> 
>> That's at the source level, but perhaps it's also needed at the
>> annotation level.
> 
> I like this idea. Good re-use of GO technology.
>  
>> <snip>
>> 
>> My thoughts on these are:
>>    - come up with a more consistent way to store key/value data
>>    - the Atom spec has a nice way to say "the data is in this CDATA
>> as text/html/xml" vs. "this text is over there".  I want to copy its
>> way of doing things.
>> 
>>    - I'm still not clear about xlink.  Another is the HTML-style
>> <link href="http://..." rel="...">
>> 
>> Atom uses the "rel=" to encoding information about the link.  For
>> example, the URL to edit a given document is
>> 
>>    <link ... rel="service.edit">
>> 
>> See http://atomenabled.org/developers/api/atom-api-spec.php
> 
> Not sure about this one yet. In the Atom API, the value of the rel
> attribute is restricted to a controlled vocabulary of link
> relationships and available services pertaining to editing and
> publishing syndicated content on the web:
> http://atomenabled.org/developers/api/atom-api-spec.php#rfc.section.5.4.1
> 
> What would a controlled vocab for DAS resources be?
> 
> Skimming through the DAS/2 retrieval spec, our use of hrefs is
> simply for pointing at the location of resources on the web
> containing some specified content (e.g., documentation, database
> entry, image data, etc.).
> 
> The next/prev/start idea for Atom might have good applicability in the
> DAS world for iterating through versions of annotations or assemblies
> (e.g., rel='link-to-gene-on-next-version-of-genome'). One relationship
> that would be useful for DAS would be 'latest', to get the latest
> version of an annotation.
> 
> DAS get URLs themselves seem fairly self-documenting (it's clear a
> given link is for feature, type, or sequence for example), so having a
> separate rel attribute may not provide much additional value for these
> links. But it might be handy for versioning and for DAS/2 writebacks.
> 
> Here's another link about Atom:
> http://en.wikipedia.org/wiki/Atom_%28standard%29
> 
> Steve


From ed_erwin at affymetrix.com  Mon Nov 28 22:09:23 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Mon, 28 Nov 2005 14:09:23 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
Message-ID: <438B8013.3060107@affymetrix.com>


Andrew Dalke wrote:
> 
> I believe I understand this.  There really is only one reference frame for
> the entire genome sequence, for a given assembly, and all other coordinate
> systems are a fixed and definite offset of that single reference frame.

No.  The coordinate transformations are often more complicated than 
simple offsets.  The coordinate space for features on one contig can be 
'backwards' with respect to a different contig, and the coordinate space 
for a gene may skip over one or more gaps with respect to the genomic 
sequence.

Also, the term 'reference frame' bugs me a bit because 'frame' always 
makes me think of 'reading frame', which is not what you intend.


From Steve_Chervitz at affymetrix.com  Mon Nov 28 22:55:28 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 28 Nov 2005 14:55:28 -0800
Subject: [DAS2] DAS/1 vs DAS/2 discussion list
In-Reply-To: <C71929195D04BF48BAECD499AF717B48B6FAD6@msex02.affymetrix.com>
Message-ID: <BFB0CAE0.18707%Steve_Chervitz@affymetrix.com>

The DAS/1 list is still open and working. I updated biodas.org to reflect
this and set up a special page to inform people about which list to use:

http://biodas.org/documents/biodas-lists.html

Subscribers on the DAS/1 list have not been automatically added to the DAS/2
list. They must actively subscribe themselves here:

http://biodas.org/mailman/listinfo/das2

Steve


> From: "Helt,Gregg" <Gregg_Helt at affymetrix.com>
> Date: Mon, 21 Nov 2005 09:24:37 -0800
> To: Andrew Dalke <dalke at dalkescientific.com>, DAS/2 <das2 at portal.open-bio.org>
> Conversation: [DAS2] Getting individual features in DAS/1
> Subject: RE: [DAS2] Getting individual features in DAS/1
> 
> We need to discuss at today's meeting.  I don't think the original DAS
> list should be closed, but rather continue to serve as a list to discuss
> the DAS/1 protocol and implementations, and the DAS2 mailing list should
> focus on DAS/2.  If we mix DAS/1 and DAS/2 discussions in the same
> mailing list I think it's going to lead to a lot of confusion.
> 
> gregg
> 
>> -----Original Message-----
>> From: das2-bounces at portal.open-bio.org
> [mailto:das2-bounces at portal.open-
>> bio.org] On Behalf Of Andrew Dalke
>> Sent: Monday, November 21, 2005 9:09 AM
>> To: DAS/2
>> Subject: Re: [DAS2] Getting individual features in DAS/1
>> 
>> Has anyone answered Ilari's question?
>> 
>> I never used DAS/1 enough to answer it myself.
>> 
>> If the normal DAS list is closed, is this the right place for DAS/1
>> questions?
>> 
>> 
>> On Nov 18, 2005, at 4:22 PM, Ilari Scheinin wrote:
>> 
>>> This mail is not really about DAS/2, but the web site says the
>>> original DAS mailing list is now closed.
>>> 
>>> I am setting up a DAS server that serves CGH data from my database
> to
>>> a visualization software, which in my case is gbrowse. I've already
>>> set up Dazzle that serves the reference data from a local copy of
>>> Ensembl. I need to be able to select individual CGH experiments to
> be
>>> visualized, and as the measurements from a single CGH experiment
> cover
>>> the entire genome, this cannot of course be done by specifying a
>>> segment along with the features command.
>>> 
>>> I noticed that there is a feature_id option for getting the features
>>> in DAS/1.5, but on a closer look, it seems to work by getting the
>>> segment that the specified feature corresponds to, and then getting
>>> all features from that segment. My next approach was to use the
>>> feature type to distinguish between different CGH experiments. As
> all
>>> my data is of the type CGH, I thought that I could use spare this
>>> piece of information for identifying purposes.
>>> 
>>> First I tried the generic seqfeature plugin. I created a database
> for
>>> it with some test data. However, getting features by type does not
>>> seem to work. I always get all the features from the segment in
>>> question.
>>> 
>>> Next I tried the LDAS plugin. Again I created a compatible database
>>> with some test data. I must have done something wrong the the data
>>> file I imported to the database, because getting the features does
> not
>>> work. I can get the feature types, but trying to get the features
>>> gives me an ERRORSEGMENT error.
>>> 
>>> I thought that before I go further, it might be useful to ask
> whether
>>> my approach seems reasonable, or is there a better way to achieve
> what
>>> I am trying to do? What should I do to be able to visualize
> individual
>>> CGH profiles?
>>> 
>>> I'm grateful for any advice,
>>> Ilari
>> 
>> Andrew
>> dalke at dalkescientific.com
>> 
>> _______________________________________________
>> DAS2 mailing list
>> DAS2 at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/das2
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Tue Nov 29 00:01:08 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 29 Nov 2005 01:01:08 +0100
Subject: properties and key/value data (was Re: [DAS2] Spec issues)
In-Reply-To: <BFB0BFA1.186F4%Steve_Chervitz@affymetrix.com>
References: <BFB0BFA1.186F4%Steve_Chervitz@affymetrix.com>
Message-ID: <bb823710169971a87b214920557be03e@dalkescientific.com>

Steve:
> To clarify a point of possible confusion, there are really two sets of
> key-value pairs to keep in mind:
>
> 1. The key-value pair for the property type.
> 2. The key-value pair for the property itself.

I don't see that #1 is a useful distinction.

> So in this example:
>
>   <PROP das:ptype="property/genefinder-score">29</PROP>
>
> The key for the type is 'das:ptype' and it's value is
> 'property/genefinder-score' and this value is a relative URL based on
> xml:base in the enclosing PROPERTIES element (or in it's grandparent or
> great-grandparent element, etc.). The value of the property itself is  
> 29 and
> it's key is the whole key-value pair for the type (
> das:ptype="property/genefinder-score").

How do I make an extension type?  For example, I want to add
a new property for 3D structure depiction, which can be one of
"cartoon", "ribbons", or "wires".

Let's say it's under my company web site in
   http://www.dalkescientific.com/das-types/rep3d

How do I write it?  I tried but couldn't figure it out.

What does that URL resolve, if anything?


> In Andrew's Relax-NG equivalent:
>
>   <prop:genefinder-score>29</score>
>
> the element name contains both the key ('prop:') and the value of the
> property type ('genefinder-score'), while the element name as a whole  
> serves
> as the key for the property itself (value=29). The  
> 'prop:genefinder-score'
> string is not a relative URL, but is just a namespace-scoped element  
> name,
> with 'prop:' serving merely to make 'genefinder-score' globally unique,
> relative to the URI defined by:
>
>   xmlns:prop="http://www.biodas.org/ns/das/genome/2.00/properties"

It took me a while to understand XML namespaces.  This helped
   http://www.jclark.com/xml/xmlns.htm

He uses (for purposes of explanation) the so-called "Clark notation".
An example from that document is

    <cars:part xmlns:cars="http://www.cars.com/xml"/>
      maps to
   <{http://www.cars.com/xml}part/>

"""The role of the URI in a universal name is purely to allow
applications to recognize the name. There are no guarantees about
the resource identified by the URI."""

Using Clark notation helps with remembering that, since { and }
here are not valid for URLs.

The element name "prop:genefinder-score" is a convenient way to
write the full element name, and that's all.  There is no meaning
to the parts of the name.  "prop:" is not a key, since given these
two namespace definitions

   <... xmlns:prop="http://www.dalkescientific.com/"
        xmlns:wash="http://www.dalkescientific.com/">

then these two elements are identical

     <prop:genefinder-score>29</score>
     <wash:genefinder-score>29</score>

I think Steve is saying the same thing as I am - I wanted to rephrase
it to make sure.


> A potential drawback of the Relax-NG approach, as discussed in today's  
> conf
> call, is that the value of the property type is not resolvable as in  
> the
> other approach using the PROPERTIES parent element.
>
> Andrew doesn't see a need for resolvability, e.g., for a dynamically
> discoverable schema fragment. But I thought of another use case  
> besides the
> one mentioned in today's call (determining data type such as int or  
> float,
> which isn't of much use in practice). The URL for the type could point  
> to a
> human readable definition of the term. A user may not need  
> clarification of
> 'genefinder-score' but might for something like 'softberry-ztuple'.

Who is the user that would want the clarification?  That is, what human
will be doing the reading?

Once clarified, what does that user do with the information?

In my opinion, the only people who care about this are developers,
and more specifically, developers who will extend a client to support
new data types.  Users of, say, the web front end or of IGB don't care.

That's a relatively small number of people.  And the use case is
solved by having the doc_href for the versioned source include a
link to any extensions served.


Here's another solution. Somewhere early in the results include

<link ref="das.format_href" href="http://example.org/blah">

where the schema includes links for each of the fields, including
any extensions.  It doesn't need to be a <link>, just something
meant as a shout out to developer people.


> One could still satisfy such a use case under the Relax-NG approach by
> providing a resolvable URL based on the element name + namespace such  
> as:
>
> http://www.biodas.org/ns/das/genome/2.00/properties#genefinder-score
>
> True, there's no XML spec that says this is legal, but we could  
> declare that
> such a convention will hold for all biodas.org-based properties. One  
> problem
> with the above convention is that it's not obvious what the URL  
> resolves to.
> So we could have something like:
>
> http://www.biodas.org/ns/das/genome/2.00/properties?prop=genefinder- 
> score&de
> fine=true
>
> http://www.biodas.org/ns/das/genome/2.00/properties?prop=genefinder- 
> score&sc
> hema=true

We could do this, though it's a bit complicated with some tools which
represent element via Clark notation - it needs a bit of string munging.

I suggest that the reason why "it's not obvious what the URL resolves
to" is because there's nothing which will actually use this.

It is easier to just have a human-readable link either on the doc_href
page or via some special "if you're a developer, look here" reference,
and don't worry about automating it further.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Nov 29 00:16:17 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 29 Nov 2005 01:16:17 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <438B8013.3060107@affymetrix.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>
	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>
	<438B8013.3060107@affymetrix.com>
Message-ID: <e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>

Ed Erwin:
> No.  The coordinate transformations are often more complicated than 
> simple offsets.  The coordinate space for features on one contig can 
> be 'backwards' with respect to a different contig, and the coordinate 
> space for a gene may skip over one or more gaps with respect to the 
> genomic sequence.

The /region entities in the DAS/2 spec are defined as

<REGION> (zero or more)
A top-level region on the genome (similar to the "entry points" of
the DAS/1 protocol).
     id ? the URI of the sequence ID
     length ? length of the sequence
     name (optional) ? a human-readable label for use when referring
        to the region
     doc_href (optional) ? a URL that gives additional information
        about this region

Here is an example

    <REGION id="../sequence/ctg2" length="81918" name="VolvoxContig2" />

This is a very simple definition.  As far as I can tell it does not
capture the information for, say, skipping.

How would you represent "the coordinate space for a gene [that skips]
over one or more gapes with respect to the genomic sequence" using the
current DAS/2 object model?

Or goes backwards?  I don't see anything like that.

> Also, the term 'reference frame' bugs me a bit because 'frame' always
> makes me think of 'reading frame', which is not what you intend.

Oh, I agree.  It's a bad term.  Very very few genomics people use it,
according to Google.

There's a theory, popular in usenet and in some wikis, is that experts
rarely write the details because after all they know the topic.  The
best way to get a detailed explanation is to post something in error
and wait for the corrections.  :)

					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Tue Nov 29 03:05:40 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 28 Nov 2005 19:05:40 -0800
Subject: [DAS2] DAS/2 weekly meeting notes for 28 Nov 05
Message-ID: <BFB10584.1875F%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 28 Nov 2005.

$Id: das2-teleconf-2005-11-28.txt,v 1.1 2005/11/29 03:06:04 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  UC Berkeley: Suzi Lewis
  Sanger: Thomas Down, Andreas Prlic
  Sweden: Andrew Dalke
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2005. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Today's topic: Spec issues (for DAS/2 retrievals)
-------------------------------------------------

We are following the agenda summary in Andrew's email:
http://portal.open-bio.org/pipermail/das2/2005-November/000352.html


1) DAS Status Code in headers
-----------------------------
Use http error codes and not das-specific ones.
das-error to provide more detail.

GH: Do we really need a detailed response document?

TD: How do you distinguish different parts of the error-causing
request?
AD: how detailed do we need to be?

LS: If you wish to do error recovery, you could have problems with one
part and not another. You give up granularity.

GH: Willing to give up the granularity in favor of simplicity.

AD: Possibilities of error

LS: How about everything that can be turned into an http error should
be. And have a special section to provide das details. E.g.:
    <x-das-error id="code#" description="...">
client is still going to have to understand das error codes

GH, AD: client does need to be there.
AD: Using only http error codes reduces complexity - you only need to
check one place. Another benefit - you can provide a file-based das
server (this was not an use case from the RFCs, just AD's pet idea he
envisions as potentially useful).

GH: Can't think of DAS/1 clients that did anything meaningful with
those das error codes.
AD: NCBI entrez server - does lots of extra error support. Don't want
to go there with das.

TD, LS: DAS error codes can be used to tell client which part of the
URL is at fault. Now it will be just '404 not found'.

AD: REST API says use the http protocol directly.
LS: There are some things in the DAS API that don't translate into
http error codes.

AD: We can support this with error document.

[A] Use HTTP error codes and x-das-error document with code and optional
description.


2) Content-type
---------------

[A] No objections to using: application/x-das+blah+xml


3) Key/value data
-----------------

Three possibilities summarized in Andrew's email.

1) (current spec) using namespace in attrib value.
2) (steve, lincoln) all attribute values are URI's
3) (andrew) Relax-NG based, drop in well-structured XML

SC: (clarified proposal #2). For more, see today's post at:
http://portal.open-bio.org/pipermail/das2/2005-November/000363.html

AD: What's wrong with the Relax-NG based approach?
LS: I don't understand it yet.
SC: Community lacks experience with Relax-NG in general.

TD: Does it let you to point to schema fragments for data types?
AD: There are ways to define it in the schema, haven't looked at it.

LS: This looks great. Would propose having a convention that if it's a
simple, single-valued key, value should be encoded in an attribute
(value="blah"), not as content of a section (CDATA). Reason: It's more
consistent with rest of spec, and it's easier to parse. So in the
example, genefinder-score is not correctly encoded.

AD: That's not in the das: namespace, hence is not under our
control. We can use this convention for things in the das namespace.

AD: User can put it any xml as long as it's reasonably well-formed. We
can define what well-formed is. This is what atom uses. Allows some
simple key val data on client as if it were native data. It permits
searches without needing to know about complex data.

GH: Likes idea of allowing arbitrary xml.

SC: Not completely arbitrary since we limit use of das: namespace,
and possibly other aspects.

LS: So we're going to say we have properties represented as key/val
pairs using this syntax. You'll find 'das:' as well as possibly other
namespaces. I think that works.

What becomes of /property url (ptype)? Does that go away and replaced
by namespace?

AD: Possibly use it for data type (e.g., float). Or we could make it
discoverable? 

LS: Easier to make it part of the spec.

TD: If this can work like XML schema, we could have a pointer to an
xsi. Is there a way to put a pointer to a schema url?

AD: Found this to be useless. Hard coding what is expected is better
than having discoverability.

TD: With the xsi schema location, you can put multiple schema
locations for the das schema, and your extension, separate pointers to
both in a single document.

AD: Never found dynamically resolved schemas useful for anything
LS: In theory they are. Why not?
AD: Knowing that something's an int does say what that int is supposed
to mean.
LS: Right. Let's make sure that the common types of annotation a
server would want to return are in the spec from the get go. Anyone
that doesn't care about extensions can ignore additional properties.

No doubt people will make extensions to DAS/2 that are implemented on
client and server that are in-house, private extensions that only work
in client-server pairs.

Should we allow schema fragments to be brought in via xsi?

TD: this would be in the top-level element. Or can put it on an
enclosing element.
AD: Is there a good reason to do it?

LS: Let's not seek discoverability.

[A] Andrew will flesh out his Relax-NG based property encoding approach.

SC: You could put your schema at the url pointed do by 'das:'

AD: Don't see a need. I found that many of the DAS/1 schema
fragments/documents were in valid. This didn't seem to bother DAS/1
clients and users.
LS: In the real world, people don't validate.

5) xlink and <link>
-------------------

AD: The official xlink spec is long. Have not fully groked it.
GH: Does anyone else have experience with it?  (silence...) Seems like
a reason to not go there.

AD: Atom, uses link to say, "Here's some generic linked out stuff". We
could use it to say, "I'm looking for the stylesheet for this thing or
the schema for the xml document."

GH: We need to draw line between generic links and specific
things. eg. feature ids, all ids are resolvable links, and so could in
principle be specified with link tags.

AD: Link from feature to versioned source it's a part of. Client can
figure out context from url.
Use case: DAS user sends email to colleague, 'look at this url for
feature X'. The other user enters URL in his das browser, client can
identify the das2-versioned source given the feature URL.
LS: They would rely on xml:base.
Nothing in the current DAS/2 spec says that the xml base is for the
versioned source.
LS: But it does give you the versioned source. This is absolutely part
of the spec.
AD: Nothing in the spec that says that features have to be on the same
machine as the rest of the data.
LS: Why does user want versioned source on the same machine that the
feature came from?

AD: Nothing in the spec says that that a feature has to be under
'feature' in the URL.

GH: Generalizing the info href element to be more generic, to specify
what that link means is fine as long as we don't do this for everything
that can be a link. Doc hrefs are fine, not ids.

LS: We're not going to demand that people specify links. (Something
about giving people enough rope to hang themselves with...)

GH: Ids are opaque uris to id the feature.

LS: The HTML link tag has been around a long time, and used a total of
two times: style sheets, copyright statements. This could have easily
been done with a stylesheet tag and copyright tag (without needing a
general link tag).

[A] Consider the xlink/link tags issue tabled.

6) Source filters
-----------------

GH: Use case: DAS/2 client is trying to discover what registry has,
query can be the same as for any das server, you can just apply
additional filters when dealing with a registry.

AP: Client would use tags that a registry server must implement.

GH: A non-registry server can implement as well.

TD: say filtering is optional in general.
AD: I tend to not like optional things. Filtering is required for features.

GH: The spec can state the filters that a registry is required to
implement on sources query. General DAS/2 servers are not requiredd,
but can if they want. What if you send a sources query with filters that it
doesn't understand?

LS: Return everything
GH: Return error
AP: Client can filter out what they want

GH: It's already important to have search capability in client.
Use case: On given genome, show me all gene predicitons for this
region. You need to go to all servers, which could be many.

AD: Can you filter by type of features that can be returned?
AP: Can be added.

GH: Want to be able to search on ontology term, not just id of the
type. 
AD: Need meta-data server to ask of DAS/2 servers what features do you
implement? 

LS: Does metadata protocol need to be part of das spec, or an
additional protocol on top? There should be an optional section of
DAS/2 that is implemented by metadata servers or registrys that allows
you to do servers. Shouldn't overload the core server spec.

GH: Concerned with the response. It's so close to the same xml, it
might as well be the same. Makes it easy for clients to know about
both servers and metadata servers. could call it 'sources' or
something else.

LS: Filtering by feature type, do we need that info that's returned by
sources document?
GH: No, it's part of the query.
LS: Metadata server would have to do a types request.

AD: What if there's a mismatch in SOFA version?
LS: We're in trouble.
AD: Concerned about change in meaning.
SL: Not important.

LS: Use case: There's a 'restriction site' node in SOFA 1.4 with five
terms underneath it. In version 1.5, now there's six terms. A metadata
server running off of the old version is using an incomplete
node. Metadata engine should always run off the latest version.

AP: Registry at Sanger checks every 2 hrs with server.

AD: How is this better than having client do it itself? What features
do you know with this type and this range?
GH: If lots of DAS servers, this will be time intensive
AD: Can we wait until there are lots of servers?
AP: We have 17.
LS: Current paradigm - EBI has many servers that just do one type of
feature e.g, there's a server that just does repeat elements.
So there are servers that will serve up one or a few feat types.
AD: Had not considered that.
LS: Happy to have optional filter syntax added to sources request
supported by metadata servers. Gregg is right about returning error
(unimplemented). Will not change protocol in fundamental way. Just an
annex, just optional section supported by metadata servers.

GH: Based on Andreas' queries in soap, can we squeeze everything in to
params on url? filterable?

AP: yes

AD: optional fields will include species, build#, type, etc.

[A] Add optional filter syntax to sources request. Allow unimpl error
return.

7) /regions
-----------

LS: In sofa, a feature of type region is root of all other features -
everything is a region. Has props - ref sequence it's on, start,
strandedness. The reason for region is for retrieving assemblies.

SC: Region is also currently the only way to get back a list of
available sequence ids without getting all sequence data. The
top-level sequence request returns data along with sequence.

LS/GH: region could be called 'landmarks'

[A] Andrew will work directly with Lincoln on revising region request.

8) Tiled queries
----------------

LS: This doesn't need to be in spec. If client filters features by a
range, is there a contract such that server must return exact range he
asked for, contained in, or is ok for server to return more?
GH: We need to be more strict.
LS: Agree. Client should trim it.

[A] Tiled queries should not be part of the spec.

Other issues 
------------

AP: There are still some other issues not addressed in this
call. E.g., Not possible to handle situation where protein
sequence in a structure varies from genome. Can defer to the next
spec discussion conf call.


From ed_erwin at affymetrix.com  Tue Nov 29 19:30:41 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 29 Nov 2005 11:30:41 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>	<438B8013.3060107@affymetrix.com>
	<e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>
Message-ID: <438CAC61.1090104@affymetrix.com>


Andrew Dalke wrote:
> Ed Erwin:
> 
>> No.  The coordinate transformations are often more complicated than 
>> simple offsets.  The coordinate space for features on one contig can 
>> be 'backwards' with respect to a different contig, and the coordinate 
>> space for a gene may skip over one or more gaps with respect to the 
>> genomic sequence.
> 
> 
> The /region entities in the DAS/2 spec are defined as
> 
> <REGION> (zero or more)
> A top-level region on the genome (similar to the "entry points" of
> the DAS/1 protocol).
>     id ? the URI of the sequence ID
>     length ? length of the sequence
>     name (optional) ? a human-readable label for use when referring
>        to the region
>     doc_href (optional) ? a URL that gives additional information
>        about this region
> 
> Here is an example
> 
>    <REGION id="../sequence/ctg2" length="81918" name="VolvoxContig2" />
> 

I had to go back and look-up the context for this discussion.  Here it is:

 >> [Suzi wrote]
>> Third, just think of "reference sequence" as a coordinate system. One 
>> can have the exact same feature and indicate that: on 
>> coordinate-system-A this feature starts and ends here, and on 
>> coordinate-system-B it starts and ends there. Thus a feature's 
>> coordinates may be given both on a chromosome, and on a contig, and on 
>> any other coordinate-system that can be derived through a transform 
>> from these.
> 
 > [Andrew wrote]
> I believe I understand this.  There really is only one reference frame 
> for the entire genome sequence, for a given assembly, and all other 
> coordinate systems are a fixed and definite offset of that single 
 > reference frame.

I understand this as talking about coordinates in general, not the 
<region> elements or "pos" attributes in the spec.  Suzi specifically 
mentions chromosomes and contigs; one can definitely be backwards with 
respect to the other. But top-level regions in an assembly would 
probably all be chromosomes or all be contigs, rather than a mixture.

There is not one single "reference frame" for an assembly: rather there 
is one coordinate axis for *each* top-level region.  If those top-level 
regions are chromosomes, then there is no relationship between the 
coordinates on different ones.  If those top-level regions are contigs 
or ESTs (which I believe is allowed by the spec), then positions on one 
of them can be related to positions on others through various transforms.


 > This is a very simple definition.  As far as I can tell it does not
 > capture the information for, say, skipping.
 >
 > How would you represent "the coordinate space for a gene [that skips]
 > over one or more gapes with respect to the genomic sequence" using the
 > current DAS/2 object model?
 >
 > Or goes backwards?  I don't see anything like that.

You represent gaps with <FEATURE> tag parent-child relationships, and 
going backwards by specifying "+1" strand on one contig and "-1" strand 
on the other.

The spec does not requires a DAS/2 server to know how to perform 
transformations from one coordinate system to another, but your 
statement "there really is only one reference frame for the entire 
genome sequence" is wrong as I understand it.  There is one coordinate 
axis for *each* top-level region.


From ed_erwin at affymetrix.com  Tue Nov 29 19:36:13 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 29 Nov 2005 11:36:13 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
Message-ID: <438CADAD.8060403@affymetrix.com>


Andrew Dalke wrote:
> The front of the DAS doc starts
> 
>   DAS 2.0 is designed to address the shortcomings of DAS 1.0, including:
> 
> That kinda assumes people know what DAS 1.0 is to understand DAS 2.0.
> 
> How about this instead, as an overview/introduction.
> 
>  ======
> 
> DAS/2 describes a data model for genome annotations.  

In general I like this better than the original introduction.  Thanks 
for writing it.

But I agree with Andreas that the first line is better as:

 > DAS/2  is a protocol to share biological data.

I definitely think of DAS as a protocol first, rather than a data model 
first.


From ed_erwin at affymetrix.com  Tue Nov 29 20:16:11 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 29 Nov 2005 12:16:11 -0800
Subject: [DAS2] mtg topics for Nov 28
In-Reply-To: <b4c91693e3e2df1fc5b124ca4ac4a04a@dalkescientific.com>
References: <b4c91693e3e2df1fc5b124ca4ac4a04a@dalkescientific.com>
Message-ID: <438CB70B.4030005@affymetrix.com>


Andrew Dalke wrote:
> Here are the spec issues I would like to talk about for today's meeting,
> culled from the last few weeks of emails and phone calls
> 
> 1) DAS Status Code in headers
> 
> The current spec says
> 
>>  X-DAS-Status: XXX status code
>>
>> The list of status codes is similar, but not identical, to those used  
>> by DAS/1:
>>
>> 200 OK, data follows
>> 400 Bad namespace
>> 401 Bad data source
>> 402 Bad data format
>> 403 Unknown object ID
>> 404 Invalid object ID
>> 405 Region coordinate error
>> 406 No lock
>> 407 Access denied
>> 500 Server error
>> 501 Unimplemented feature
> 
> 
> I argued that these are not needed.  Some of them are duplicates with
> HTTP error codes and those which are not can be covered by an error
> code "300" along with an (optional) XML payload.
> 
> The major problem with doing this seems to be in how MS IE handles
> certain error codes.  While IE is not a target browser, MS software
> may use IE as a component for fetching data.  From the link Ed dug
> up, it looks like this won't be a problem.
> 

I'm not going to argue anymore against moving the X-DAS-Status code up 
into the HTTP status code.  I'm willing to try it and see if it works.

But I want to re-iterate why I'm suspicious of this.  I have experience 
trying this in two separate projects and it failed both times.  (Still, 
I think those problems won't occur this time.)

1.  I tried this on a project internally at Affymetrix.  It didn't work 
in this case because the client code was (indirectly) using MS IE code, 
and IE was throwing away the HTTP content when the header had certain 
error codes.  This doesn't bother me much now, though, because I doubt 
many DAS clients will be written that interface with IE, and because I 
now know that you can force IE to keep the HTTP content as long as you 
make sure the content is always at least 512 characters long.  So if we 
ever run into this problem, there is an easy work-around.

2. I tried putting the X-DAS-Status codes into the HTTP status code in 
our internal DAS/1 server about a year ago.  (In DAS/1 they are not 
supposed to be in the HTTP status codes, but I misunderstood the spec.) 
  I ran into problems when I tried that, and that is the main reason I 
objected to trying that in DAS/2.

Unfortunately, I can't remember what those problems were....

The problem might have been:
a) the IGB client didn't understand the status codes because they 
weren't in the expected place.

If this is the case, then the problem was benign, because we are now 
writing new code to support the new spec, so we can make IGB understand 
whatever we want.

b) I use Apache's ".htaccess" files to do some URL re-direction on our 
DAS/1 client machine.

see http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html#RewriteRule

It is possible that this was causing the original HTTP status code to be 
replaced with a different one.

I'm currently using the "proxy" form of redirect, which seems to keep 
the status code intact.  Earlier I was using the "redirect" form of 
redirect, which may change the status code to 302.

-----

Based on my experience with apache re-direction, I have a vague fear 
that we may run into cases where firewalls, or html cachers and 
optimizers may mangle the HTTP status codes for some users at some 
point.  But since I have no confirmed evidence that that will happen, I 
have no objection to  going ahead and trying to use HTTP status codes.


From Steve_Chervitz at affymetrix.com  Tue Nov 29 20:33:29 2005
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Tue, 29 Nov 2005 12:33:29 -0800
Subject: [DAS2] DAS intro
In-Reply-To: <438CADAD.8060403@affymetrix.com>
Message-ID: <BFB1FB19.18803%Steve_Chervitz@affymetrix.com>


Ed Erwin wrote: 
> Andrew Dalke wrote:
>> The front of the DAS doc starts
>> 
>>   DAS 2.0 is designed to address the shortcomings of DAS 1.0, including:
>> 
>> That kinda assumes people know what DAS 1.0 is to understand DAS 2.0.
>> 
>> How about this instead, as an overview/introduction.
>> 
>>  ======
>> 
>> DAS/2 describes a data model for genome annotations.
> 
> In general I like this better than the original introduction.  Thanks
> for writing it.
> 
> But I agree with Andreas that the first line is better as:
> 
>> DAS/2  is a protocol to share biological data.
> 
> I definitely think of DAS as a protocol first, rather than a data model
> first.

I concur. The main aim of DAS is to define an API to allow clients to query
servers in order to retrieve bioinformatics data objects in defined response
formats. Of course, the writeback facility of DAS/2 will make DAS more of a
two-way street so we could say 'sharing and editing', but I think retrieval
is more fundamental and probably accounts for the majority of uses.

How about this for the first line:

  DAS is a protocol for sharing biological data.

No need to limit it to version 2. This applies to all versions. Use 'DAS/2'
when talking about new features in this version, such as writeback.

Steve


From dalke at dalkescientific.com  Tue Nov 29 22:17:02 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 29 Nov 2005 23:17:02 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <BFB1FB19.18803%Steve_Chervitz@affymetrix.com>
References: <BFB1FB19.18803%Steve_Chervitz@affymetrix.com>
Message-ID: <b6fdd2ebaffe5c14067451853ec5553c@dalkescientific.com>

Steve:
> How about this for the first line:
>
>   DAS is a protocol for sharing biological data.
>
> No need to limit it to version 2. This applies to all versions. Use 
> 'DAS/2'
> when talking about new features in this version, such as writeback.

Done.  Made a few changes to the CVS intro text to reduce the use
of "DAS/2".  So that email I just sent is out of date.  :)

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Wed Nov 30 00:02:07 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 30 Nov 2005 01:02:07 +0100
Subject: What are regions for? (was Re: [DAS2] DAS intro)
In-Reply-To: <438CAC61.1090104@affymetrix.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>	<438B8013.3060107@affymetrix.com>
	<e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>
	<438CAC61.1090104@affymetrix.com>
Message-ID: <921477a6bd799b5e19b965b3cd39d239@dalkescientific.com>

Ed:
> I understand this as talking about coordinates in general, not the 
> <region> elements or "pos" attributes in the spec.  Suzi specifically 
> mentions chromosomes and contigs; one can definitely be backwards with 
> respect to the other. But top-level regions in an assembly would 
> probably all be chromosomes or all be contigs, rather than a mixture.

I'm trying to figure out when people use the /region.

In my way of understanding things there is the genomic sequence.
That consists of a set of chromosomes, each with a list of bases.

A chromosome is assembled from parts.  One of these parts is
called a 'contig'.  I thought I knew what it was, but according to
   http://staden.sourceforge.net/contig.html
there are several meanings.

What I understand is that a 'contig' is a sequenced chunk of
DNA which has overlaps with other contigs and when combined
can be used to deduce the entire sequence (excepting regions
of repeats and other ambiguities).  The best such deduction
is the golden path.

For DAS/2 we assume sequenced genomes.  When will people
use top-level regions which are not chromosomes?

Chromosome top-level regions are identical to the /sequence,
except for the ability to get the assembly and the sequence
data directly.  Is that correct?

The spec allows links from a feature into several different
regions.  This suggests to me that sometimes there will be
regions which are a mixture of contigs and chromosomes.
Else why support that ability?

There is nothing in the spec (that I know of) which allows any
hierarchy to the regions - all regions are top-level.  Is
this correct?

> If those top-level regions are chromosomes, then there is no 
> relationship between the coordinates on different ones.

While I understand that, I did get it wrong when I wrote it down.

In my head I was thinking "each base has a 1-to-1 mapping to a
number, and if two bases are next to each other then the corresponding
two numbers are next to each other."  This is invalid because the
converse is not true - if one number is the end of a chromosome and
the other is the start of the next then the two bases are not next
to each other.


>   If those top-level regions are contigs or ESTs (which I believe is 
> allowed by the spec), then positions on one of them can be related to 
> positions on others through various transforms.

Those are allowed.  Will people use them?  What advantage is there
to having these be a special category instead of a feature?

> You represent gaps with <FEATURE> tag parent-child relationships, and 
> going backwards by specifying "+1" strand on one contig and "-1" 
> strand on the other.

Something like this?  (Yes, this is hand-wavy)  Here's a <FEATURE>
(and note, this is NOT a <REGION>) with two subfeatures, one on the
forward strand and one on the reverse.

   <feature id="A">
     <part id="A.1"/>
     <part id="A.2"/>
   </feature>

   <feature id="A.1">
     <parent id="A" />
     <LOC pos="region/Chr3/1271:2917:1" />
   </feature>

   <feature id="A.2">
     <parent id="A" />
     <LOC pos="region/Chr3/5541:5523:-1" />
   </feature>


This I understand just fine.  I don't understand why the
positions are given in /region spec instead of either:

   - directly to /sequence space, eg

   <feature id="A.2">
     <parent id="A" />
     <LOC seq="sequence/Chr3/5541:5523:-1" />
   </feature>
     ...

-or-


   - point to a feature of type 'region' which provides the
        region coordinates

   <feature id="A.2">
     <parent id="A" />
     <LOC on="feature/contig1" />
   </feature>
      ...
   <feature id="contig1" type="region">
     <LOC seq="Chr3/5541:5523:-1" />
   </feature>

(Again, hand-wavy.  I think best looking at data and code.)

> The spec does not requires a DAS/2 server to know how to perform 
> transformations from one coordinate system to another, but your 
> statement "there really is only one reference frame for the entire 
> genome sequence" is wrong as I understand it.  There is one coordinate 
> axis for *each* top-level region.

Understood.

My questions, to summarize, are:
   - why do we need a /region space when we can
       1. point directly to a sequence (for chromosome regions) and/or
       2. point to a "contig" or "assembly" or "region" feature type
               (for other regions)

   - When would someone have regions which have more than one of
      contigs, ESTs and chromosomes?  Especially given that this
      is the genome spec, so chromosome-level info is known, at
      least enough for a rough assembly.

In other words, what are regions for?

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Wed Nov 30 00:26:41 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 30 Nov 2005 01:26:41 +0100
Subject: [DAS2] mtg topics for Nov 28
In-Reply-To: <438CB70B.4030005@affymetrix.com>
References: <b4c91693e3e2df1fc5b124ca4ac4a04a@dalkescientific.com>
	<438CB70B.4030005@affymetrix.com>
Message-ID: <45f7dbc8e14fa2a68af6c1d03153d715@dalkescientific.com>

Ed:
> I'm not going to argue anymore against moving the X-DAS-Status code up 
> into the HTTP status code.  I'm willing to try it and see if it works.
>
> But I want to re-iterate why I'm suspicious of this.  I have 
> experience trying this in two separate projects and it failed both 
> times.  (Still, I think those problems won't occur this time.)
>
> 1.  I tried this on a project internally at Affymetrix.  It didn't 
> work in this case because the client code was (indirectly) using MS IE 
> code, and IE was throwing away the HTTP content when the header had 
> certain error codes.

This was a two-part problem:
   - identifying in client code that a given error occured
   - extracting the payload when the error occurred

As far as I can tell, the problem you are concerned about is
the second part.

Personally I don't want an application/x-das-error+xml return
document.  Several others do.  Thing is, when Gregg asked
if anyone used the DAS/1 error codes for anything other than
"there was an error", no one said anything.  I could hear the
proverbial crickets chirping (or in my case, snow falling).

I am convinced that the actual error content will be server
implementation specific and as such non-portable across
clients.  I will flesh out a document type for this then
ask Thomas, Lincoln etc. to provide a list of defined
error code extensions that their servers will return.

It's likely they'll not be able to agree on it, because
their code will do different styles of error checking.

I'll also dodge the whole mess by saying that the error
document payload is optional, so clients are highly unlikely
to read it for anything meaningful.  (Except perhaps some
text shunted to the user.)

That makes more work in the spec implementation for something
I can almost guarantee will be ignored by DAS clients.

> b) I use Apache's ".htaccess" files to do some URL re-direction on our 
> DAS/1 client machine.
>
> see http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html#RewriteRule
>
> It is possible that this was causing the original HTTP status code to 
> be replaced with a different one.
>
> I'm currently using the "proxy" form of redirect, which seems to keep 
> the status code intact.  Earlier I was using the "redirect" form of 
> redirect, which may change the status code to 302.

I don't understand how the old one would be a problem in the
web clients I'm familiar with.  It should be:

   send request to server
       get 302 "moved temporarily" response along with new URL
   repeat until no redirect or reached max redirect limit
   request new URL
       get headers/payload back

The redirects shouldn't affect the real response code, which
would be the last in the chain.  If it did, it would also
affect 404 and 200 responses.

> Based on my experience with apache re-direction, I have a vague fear 
> that we may run into cases where firewalls, or html cachers and 
> optimizers may mangle the HTTP status codes for some users at some 
> point.  But since I have no confirmed evidence that that will happen, 
> I have no objection to  going ahead and trying to use HTTP status 
> codes.

I know that fear.  I've had intermediate web caches misconfigured
which cached anything HTML page for an hour, making me unable
to edit my web site and see the changes.

That was with a normal 200 response code, so likely misconfigured
caches will affect other response codes.  But what's there to
do about that?  What's the error rate?  We're using normal HTTP
and if a web cache breaks for us - we aren't doing anything
fancy; no content-negotiation, no 'If-Modified-Since', etc - then
it will break for anyone doing HTTP.  That's anyone exchanging
HTML, sending RSS, etc.


					Andrew
					dalke at dalkescientific.com


From ed_erwin at affymetrix.com  Wed Nov 30 00:34:11 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 29 Nov 2005 16:34:11 -0800
Subject: [DAS2] mtg topics for Nov 28
In-Reply-To: <45f7dbc8e14fa2a68af6c1d03153d715@dalkescientific.com>
References: <b4c91693e3e2df1fc5b124ca4ac4a04a@dalkescientific.com>
	<438CB70B.4030005@affymetrix.com>
	<45f7dbc8e14fa2a68af6c1d03153d715@dalkescientific.com>
Message-ID: <438CF383.5050604@affymetrix.com>


>> I'm currently using the "proxy" form of redirect, which seems to keep 
>> the status code intact.  Earlier I was using the "redirect" form of 
>> redirect, which may change the status code to 302.
> 
> 
> I don't understand how the old one would be a problem in the
> web clients I'm familiar with.  It should be:
> 
>   send request to server
>       get 302 "moved temporarily" response along with new URL
>   repeat until no redirect or reached max redirect limit
>   request new URL
>       get headers/payload back

Unlike modern web browsers, IGB isn't smart enough to do that.  Maybe 
someday it will need to be, but it isn't there yet.


From dalke at dalkescientific.com  Tue Nov 29 22:13:49 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 29 Nov 2005 23:13:49 +0100
Subject: [DAS2] DAS intro
In-Reply-To: <438CADAD.8060403@affymetrix.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>
	<438CADAD.8060403@affymetrix.com>
Message-ID: <24b1a9183d9f344398f80839f4c71b6e@dalkescientific.com>

Ed:
> I definitely think of DAS as a protocol first, rather than a data 
> model first.

Mmm.  I see you all's point.  All protocols express a data model, though
neither side necessarily must implement it that way.

Here's the updated text.  This is what I just committed to CVS.  Note
that it's missing mention of the '/region' section.

=====

DAS/2 is a protocol for sharing biological data.  This version of the
specification describes features located on the genomic sequence.
Future extensions will add support for sharing annotations of expression
data, protein sequences, 3D structures, and ontologies.

A DAS/2 annotation server provides feature information about one or
more genome sources.  Each source may have one or more versions.
Different versions are usually based on different assemblies.  As an
implementation detail an assembly and corresponding sequence data may
be distributed via a different machine, which is called the reference
server.

Annotations are located on the genomic sequence with a start and end
position.  The range may be specified mutiple times if there are
alternate reference frames.  An annotation may contain multiple
non-continguous parts, making it the parent of those parts.  Some
parts may have more than one parent.  Annotations have a type based on
terms in SOFA (Sequence Ontology for Feature Annotation).  Stylesheets
contain a set of properties used to depict a given type.

Annotations can be searched by range, type, and a properties table
associated with each annotation.  These are called feature filters.

DAS/2 is implemented using a ReST architecture.  Each entity (also
called a document or object) has a name, which is a URL.  Fetching the
URL gets information about the entity.  The DAS-specific entities are
all XML documents.  Other entities contain data types with an existing
and frequently used file format.  Where possible, a DAS server returns
data using existing formats.  In some cases a server may describe how
to fetch a given entity in several different formats.

=====

					Andrew
					dalke at dalkescientific.com


From ed_erwin at affymetrix.com  Wed Nov 30 00:37:07 2005
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 29 Nov 2005 16:37:07 -0800
Subject: What are regions for? (was Re: [DAS2] DAS intro)
In-Reply-To: <921477a6bd799b5e19b965b3cd39d239@dalkescientific.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>	<438B8013.3060107@affymetrix.com>	<e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>	<438CAC61.1090104@affymetrix.com>
	<921477a6bd799b5e19b965b3cd39d239@dalkescientific.com>
Message-ID: <438CF433.1020707@affymetrix.com>

Andrew Dalke wrote:
> My questions, to summarize, are:
>   - why do we need a /region space when we can
>       1. point directly to a sequence (for chromosome regions) and/or
>       2. point to a "contig" or "assembly" or "region" feature type
>               (for other regions)

The way I understand it, that is what region is for: to point directly 
to a location on a sequence and/or contig.

>   - When would someone have regions which have more than one of
>      contigs, ESTs and chromosomes?  Especially given that this
>      is the genome spec, so chromosome-level info is known, at
>      least enough for a rough assembly.

I think they do it mainly 1) when the assembly is incomplete or 2) to 
preserve annotations from the past when the assembly was incomplete. 
There could be more reasons.

Here is an example of a DAS/1 server that contains both chromosomes and 
"other" short sequences as entry points:

http://servlet.sanger.ac.uk:8080/das/ensembl_Homo_sapiens_core_28_35a/entry_points

See here for some more genomes that are treated similarly:

http://servlet.sanger.ac.uk:8080/das


> In other words, what are regions for?
> 
>                     Andrew
>                     dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Wed Nov 30 01:26:29 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 30 Nov 2005 02:26:29 +0100
Subject: What is /region for? (was Re: [DAS2] DAS intro)
In-Reply-To: <438CF433.1020707@affymetrix.com>
References: <5b3c55a976a0effc3725923122c66d4f@dalkescientific.com>	<59fa39752e4d792d2142fe2682813937@fruitfly.org>	<c7d731b89f38a8706d052ecfe786b867@dalkescientific.com>	<1ac71c37969c1ef9dcc0d983157746aa@fruitfly.org>	<9a9ee9242a38f40049a7c5d973980e7d@dalkescientific.com>	<438B8013.3060107@affymetrix.com>	<e6c8c6745ce3f2646659da0c197e2364@dalkescientific.com>	<438CAC61.1090104@affymetrix.com>
	<921477a6bd799b5e19b965b3cd39d239@dalkescientific.com>
	<438CF433.1020707@affymetrix.com>
Message-ID: <6fd85d539c25833e9b6f7f41b3429231@dalkescientific.com>

(Changed the Subject line slightly to be a bit clearer. I hope.)

On Nov 30, 2005, at 1:37 AM, Ed Erwin wrote:
> Andrew Dalke wrote:
>> My questions, to summarize, are:
>>   - why do we need a /region space when we can
>>       1. point directly to a sequence (for chromosome regions) and/or
>>       2. point to a "contig" or "assembly" or "region" feature type
>>               (for other regions)
>
> The way I understand it, that is what region is for: to point directly 
> to a location on a sequence and/or contig.

Am I not asking the question correctly?  Am I missing the
obvious?  Been known to happen before!

I know what regions are.  I don't know why they are in
a distinct /region subtree.

I'm happy - enthusiastic - ecstatic - that there are different
ways to identify certain regions.  I fully accept that they
are in use every day and widely understood.

Why are they special enough to get their own /region subtree?
Why can't they be features?

Here's my proposal.  Leaf node parts of a <feature> always point
to a /sequence and optionally point to one or more /feature
elements which are of type "region".  (Or some other part of
SOFA - perhaps assembly-component?)

What to know where the feature is on a given "region" feature?
Then look up the region to find its /sequence location.  Use
these two /sequence locations to get the location in the region.
Both /sequence locations are in the same "coordinate space" of
"identifier + start/end offset"

BTW, if regions are a type of features then you can search for
them.  Eg, search for all top-level regions in the range 100000
to 2000000.  Can't do that with the /region container.  Can
if the region data is in the /feature container.


>>   - When would someone have regions which have more than one of
>>      contigs, ESTs and chromosomes?  Especially given that this
>>      is the genome spec, so chromosome-level info is known, at
>>      least enough for a rough assembly.
>
> I think they do it mainly 1) when the assembly is incomplete or 2) to 
> preserve annotations from the past when the assembly was incomplete. 
> There could be more reasons.
>
> Here is an example of a DAS/1 server that contains both chromosomes 
> and "other" short sequences as entry points:

Okay, I'm fine with that.  Thanks.

Is a goal of DAS to support incomplete genomes?

Note, btw, that the /sequence subtree does not need to contain
only chromosomes.  From the spec

   seqid is the sequence ID, and can correspond to an assembled
   chromosome, a contig, a clone, or any other accessionable
   chunk of sequence.

Hence for incomplete genomes, put the sequence data as
best you can under /sequence and have the /feature subtree
point to it.

>> In other words, what are regions for?

Still don't understand the need for a /region namespace.
Repeat: I understand regions, I just don't see why they
go in their own subtree and aren't part of some other data chunk.

Please, someone sketch out some example with hand-waving
XML that shows how having a /region is the appropriate solution.
That's what I'm worried about now - the representation in XML.

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Wed Nov 30 02:08:47 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Tue, 29 Nov 2005 18:08:47 -0800
Subject: [DAS2] mtg topics for Nov 28
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE8@msex02.affymetrix.com>

Actually I think by default the java networking library that IGB uses
follows most redirections automatically without IGB having to worry
about it.  I'm not familiar with what different forms of redirection
might do to the status codes, but I expect that as long as the
redirection is successful the code IGB would actually see would be 200
OK.

IGB does have a ways to go to properly respond to all possible HTTP
status codes though...

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Ed Erwin
> Sent: Tuesday, November 29, 2005 4:34 PM
> To: Andrew Dalke
> Cc: DAS/2
> Subject: Re: [DAS2] mtg topics for Nov 28
> 
> 
> >> I'm currently using the "proxy" form of redirect, which seems to
keep
> >> the status code intact.  Earlier I was using the "redirect" form of
> >> redirect, which may change the status code to 302.
> >
> >
> > I don't understand how the old one would be a problem in the
> > web clients I'm familiar with.  It should be:
> >
> >   send request to server
> >       get 302 "moved temporarily" response along with new URL
> >   repeat until no redirect or reached max redirect limit
> >   request new URL
> >       get headers/payload back
> 
> Unlike modern web browsers, IGB isn't smart enough to do that.  Maybe
> someday it will need to be, but it isn't there yet.
> 
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Wed Nov 30 02:17:24 2005
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Tue, 29 Nov 2005 18:17:24 -0800
Subject: [DAS2] mtg topics for Nov 28
Message-ID: <C71929195D04BF48BAECD499AF717B48B6FAE9@msex02.affymetrix.com>


> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Ed Erwin
> Sent: Tuesday, November 29, 2005 12:16 PM
> To: Andrew Dalke
> Cc: DAS/2
> Subject: Re: [DAS2] mtg topics for Nov 28
...
> 2. I tried putting the X-DAS-Status codes into the HTTP status code in
> our internal DAS/1 server about a year ago.  (In DAS/1 they are not
> supposed to be in the HTTP status codes, but I misunderstood the
spec.)
>   I ran into problems when I tried that, and that is the main reason I
> objected to trying that in DAS/2.
> 
> Unfortunately, I can't remember what those problems were....
> 
> The problem might have been:
> a) the IGB client didn't understand the status codes because they
> weren't in the expected place.
> 
> If this is the case, then the problem was benign, because we are now
> writing new code to support the new spec, so we can make IGB
understand
> whatever we want.

I'm pretty sure this was the problem (IGB didn't know where to find the
status codes).

	gregg