From Gregg_Helt at affymetrix.com  Sun Oct  1 22:26:20 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Sun, 1 Oct 2006 19:26:20 -0700
Subject: [DAS2] No Monday teleconference this week -- switced to biweekly
	call
Message-ID: <C71929195D04BF48BAECD499AF717B480198CBF3@msex02.affymetrix.com>

Just wanted to remind everyone that we decided last month to switch from
a weekly to a biweekly DAS/2 teleconference schedule.  So the next DAS/2
conference call will be on Monday, October 9th at 9:30 AM PST.

Conference phone #, US: 800-531-3250
Conference phone #, International: 303-928-2693
Conference ID: 2879055
Passcode: 1365

	Thanks,
	Gregg


From Steve_Chervitz at affymetrix.com  Wed Oct  4 13:42:46 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Wed, 04 Oct 2006 10:42:46 -0700
Subject: [DAS2] Updated java runtimes for timezone change in 2007
Message-ID: <C14940A6.21B5D%Steve_Chervitz@affymetrix.com>

Yes, the Bush administration's reach extends into the lives of Java
developers, changing when DST starts and stops in 2007.

Here's a link for updated Java runtimes for a variety of versions:
http://java.sun.com/developer/technicalArticles/Intl/USDST/

This could be an issue for DAS, particularly for writeback. Some
implementations may rely on consistent time-stamping, e.g., to determine
which edit request was submitted first.

May not make a difference within a server, but it would be an issue across
multiple servers.

Steve 


From Steve_Chervitz at affymetrix.com  Mon Oct  9 13:30:42 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 09 Oct 2006 10:30:42 -0700
Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006
Message-ID: <C14FD552.21D8F%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 9 Oct 2006

$Id: das2-teleconf-2006-10-09.txt,v 1.1 2006/10/09 17:24:14 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed Erwin, Gregg Helt
  UCLA: Allen Day, Brian O'connor

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda
-------
* Status reports


Topic: Status reports
---------------------
gh: Funding thru end of may. shifting times around a bit here at
affy. gh going up to a greater percentage during this period.
going down to half time for next month due to house-related work.

Focusing now on cleaning up impl of writeback on igb client. clean
impl based on ideas sketched out at code sprint in Aug.

Spec issue:
-----------
gh: was there a resolution to the feature group assembly conversation
on email thread. 
aday: died out. so the assumption is: no change.

[A] Ask andrew about feature group assembly resolution, if any.


ee: new release of IGB. bug fix then patch release. rapid turn
around. Exposed need for more throurough testing.
Specifying multiple urls for get more info links. sources for urls:
track lines in psl/bed files. Also supporting das files (1 and
probably 2)
noticed: feature tag can give feat label and ID. IGB ignores these
labels, because they seem to be attached to wrong thing. feat in das/1
is like 'exon' group is 'mrna'. it's the mrna we want the label on,
not exon where the labels are on.

gh: if people just label parent. names don't have to be unique. id is
unique uri, name is displayed name. parser isn't looking into that now.

[A] Ed will look into using feature name as label in IGB client


sc: Installed updated das2_server code on affy the das/2 server
(netaffxdas.affymetrix.com). Installed new, efficient version of exon
array data for hg18 (Mar 2006) assembly on this server. (igb's 'Bprobe1'
parser, generates new bp2 format files). Probe and probeset data
loaded fine, but exon/transcript cluster data failed with exception
about 'Probe_count is zero for <probesetID>'.

gh: problem: the bp2 data format isn't designed for representing
transcripts/exon just probe. problem in the part that generates the
bp2 files. can take a look at that.

[A] Gregg will look into steve's Bprobe1 parser error. Needs source gff.

ee: Can you verify that the gff data you are loading doesn't have
unmapped probes, probe sets? Some are not mapped after lifting from
previous genome assembly.

[A] Steve will remove unmapped objects in the source gff used for bp2


aday: working on UML for integrating the writeback and the read
features. Also retrieval of dynamic features as well. Sent out example
query. working on getting them all into a single model, determines
what do do based on input query.

will impl own block caching rather than apache caching.
If I see a writeback coming in , can see which types have been
modified, within each region. can fork off process to re-generate them
after doing the writeback. will be a lot faster.

Have a flowchart. partway through creating UML classes, functions,
return types. Using poseidon.

[A] Allen will distribute uml diagrams for das/2 modeling when ready

gh: will locking be a part of that?
aday: can make sure it's compatible. don't know how much of that to
impl now.
gh: useful to think about how to model that too.

[A] Allen will include locking in his UML modelling.

aday: flowchart is pretty generic. can be used by other servers.


bo: no das work because of work on manuscript.
started sourceforge project for das/2 assay "gyrax" (nee hyrax --
already taken at sf).
The motivation for this project is to take the das/2 objects in igb
and make them more generic. This project can host these objects. They
could then be used for other apps (igb, gyrax, others). Mark
Carlson in lab is working on the gyrax client.  Could be a nice
library for use by other apps, gui or not, that are built on top of a
das server.

gh: parts of the igb objects are tied into genometry model, a separate
package also. but both of these could be separated from igb.

ee: There was some email on genoviz forum where someone is writing an
app based on old NGSDK objects, on the help forum on
sourceforge. problems with >30,000 glyphs. advice: switch to efficient
glyph versions (special drawing alg if children are too small to see).

gh: Lots of caveats...There is code that hasn't been touched in a
while.

gh: question about hardware quote for UCLA

[A] Allen will send gregg hardware quote for UCLA (<$5k)

sc: status of hardware for affy das server upgrade?

gh: plan to order end of oct, should have in place in first two weeks
of nov. 


From allenday at ucla.edu  Tue Oct 10 18:30:14 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 10 Oct 2006 15:30:14 -0700
Subject: [DAS2] biopackages server UML
Message-ID: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com>

Hi,

I'm attaching my first draft for the UML of a server rewrite.  Aside from
all the spec churn, there are two main types of requests that need to be
handled that spurred me to do this rewrite.  The third reason I'm doing this
is to rework the caching mechanism on the server.  With the current code
base there is a lot of custom table clustering and denormalization to get
decent performance out of the Chado database.  I did some experimenting
(discussed in an earlier thread and on conf. calls) with a "tiling" or
"block" caching strategy of cache that turns out to work really well, and I
wanted to integrate that with the writeback functionality.

1) tighter integration of writeback, including locking.
2) configurability of feature types to be
  * dynamic (e.g. for on-the-fly gene prediction)
  * non-cacheable
  * cacheable
3) caching
  * segment range/type tiled caching
  * ability of writeback events to trigger cache flush events

See attached UML.  There is a .zuml file, you can view/edit with Poseidon,
or if you need a .xml I can send another attachment.

-Allen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: das2_refactor.zuml
Type: application/octet-stream
Size: 34991 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20061010/5d7a1c0b/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: das2_refactor.png
Type: image/png
Size: 130013 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20061010/5d7a1c0b/attachment.png>

From boconnor at ucla.edu  Tue Oct 10 18:51:54 2006
From: boconnor at ucla.edu (Brian O'Connor)
Date: Tue, 10 Oct 2006 15:51:54 -0700
Subject: [DAS2] biopackages server UML
In-Reply-To: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com>
References: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com>
Message-ID: <452C240A.4060603@ucla.edu>

Hi Allen,

I have a few questions.

* How does feature (and other data types) filtering take place?  Does 
the controller passes info into read_features() in Das2::Model::Genome? 
  Where is the actual filtering implementation?  In 
Das2::Model::Genome::Feature?

* Where will the SQL queries live?  In the current implementation we 
have an object where many of the prepared statements live.  Do you plan 
on using something similar here?  Or will the SQL generally be embedded 
in Das2::Model::Record objects and Das2::Model::Genome::Chado?

* For the Das2::Model::Record subclasses, should there be another layer 
of inheritance with a Das2::Model::Chado::Record object?  In case you 
want additional data adapters for other DBs/flat files in the future?

--Brian	

Allen Day wrote:
> Hi,
> 
> I'm attaching my first draft for the UML of a server rewrite.  Aside 
> from all the spec churn, there are two main types of requests that need 
> to be handled that spurred me to do this rewrite.  The third reason I'm 
> doing this is to rework the caching mechanism on the server.  With the 
> current code base there is a lot of custom table clustering and 
> denormalization to get decent performance out of the Chado database.  I 
> did some experimenting (discussed in an earlier thread and on conf. 
> calls) with a "tiling" or "block" caching strategy of cache that turns 
> out to work really well, and I wanted to integrate that with the 
> writeback functionality.
> 
> 1) tighter integration of writeback, including locking.
> 2) configurability of feature types to be
>   * dynamic (e.g. for on-the-fly gene prediction)
>   * non-cacheable
>   * cacheable
> 3) caching
>   * segment range/type tiled caching
>   * ability of writeback events to trigger cache flush events
> 
> See attached UML.  There is a .zuml file, you can view/edit with 
> Poseidon, or if you need a .xml I can send another attachment.
> 
> -Allen
> 
> ------------------------------------------------------------------------
> 


From dalke at dalkescientific.com  Mon Oct 23 12:19:03 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 23 Oct 2006 17:19:03 +0100
Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006
In-Reply-To: <C14FD552.21D8F%Steve_Chervitz@affymetrix.com>
References: <C14FD552.21D8F%Steve_Chervitz@affymetrix.com>
Message-ID: <f6d6a5b59c0a06ec848065b95dd94988@dalkescientific.com>

On Oct 9, 2006, at 6:30 PM, Steve Chervitz wrote:
> [A] Ask andrew about feature group assembly resolution, if any.

As far as I know there was no resolution.

At last standing the problem is as follows.  Consider a complex 
annotation
with a single parent A and a single child B.

There are several ways to represent this

Option 1:

   <FEATURE uri="A" part="B"/>
   <FEATURE uri="B" parent="A"/>

This is the current spec.  Parents point to children and children to
parents.  This was different than the GFF-style where only the children
have a parent reference.  My hope was to assemble complex annotations
while reading the data from the remote server.

In practice this streaming assembly proved hard to implement.  The
algorithm is non-trivial for complex structures so most people will
do the assembly only after reading all features.  Also, there's a
possible error when parents don't list all children or vice versa,
and likely most clients won't fully validate, so a top-down and a
bottom-up assembly may give different results for the same server.

Option 2:

   <FEATURE uri="A"/>
   <FEATURE uri="B" parent="A"/>

This is the GFF-style.  The main limitations are support for streaming
data, such as showing partial results while downloading and converting
to/from other formats.  In both cases this is because parent nodes may
(and do) occur after children nodes, and there's no knowledge that all
children have been seen.

There is a problem in both option1 and option2 of not easily detecting
cycles or multi-rooted structures.

Variation: require that children are listed after parents.

Option 3:

<FEATURE-GROUP>
   <FEATURE uri="A"/>
   <FEATURE uri="B" parent="A"/>
</FEATURE-GROUP>

That is, put all features which are part of the same feature group into
a single element.  This is essentially like the ### "no forward 
references"
token in GFF3.

It's cumbersome because either there are two data types ("FEATURE-GROUP"
and "FEATURE") elements under the root or there are a lot of 
FEATURE-GROUPs
containing a single sequence.  There's still the need for cycle 
detection
and checking that the parent/part relationship are valid.

Option 4:

<FEATURE uri="A">
   <FEATURE uri="B"/>
</FEATURE>

Break the DAG into a tree structure (a spanning tree).  In this case
"B" is a child of "A".  For a more complex structure where "C" is a
child of "A" and "B",

<FEATURE uri="A">
   <FEATURE uri="B">
     <FEATURE uri="C" parent="A"/>
   </FEATURE>
</FEATURE>

This doesn't fit well with relational databases.  There's still the need
to check for cycles but it's much simpler.


Given the feedback I've heard, the use cases for streaming the data are
not seen as important.  Hence I'm willing to go with #2  (GFF-style, 
children
point to parents) and have nothing like the no-forward-references of 
GFF3.


					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Mon Oct 23 10:01:01 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 23 Oct 2006 10:01:01 -0400
Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006
In-Reply-To: <C14FD552.21D8F%Steve_Chervitz@affymetrix.com>
References: <AcbryKTy4z+4ole7EduEhgAKlXZSNg==>
	<C14FD552.21D8F%Steve_Chervitz@affymetrix.com>
Message-ID: <6dce9a0b0610230701q1898dc79wa3a3ff56814ff37e@mail.gmail.com>

Hi Folks,

I'm going to miss today's conference call again. I've been scheduled to
interview a job candidate and I can't change it.

Lincoln

On 10/9/06, Steve Chervitz <Steve_Chervitz at affymetrix.com> wrote:
>
> Notes from the weekly DAS/2 teleconference, 9 Oct 2006
>
> $Id: das2-teleconf-2006-10-09.txt,v 1.1 2006/10/09 17:24:14 sac Exp $
>
> Note taker: Steve Chervitz
>
> Attendees:
>   Affy: Steve Chervitz, Ed Erwin, Gregg Helt
>   UCLA: Allen Day, Brian O'connor
>
> Action items are flagged with '[A]'.
>
> These notes are checked into the biodas.org CVS repository at
> das/das2/notes/2006. Instructions on how to access this
> repository are at http://biodas.org
>
> DISCLAIMER:
> The note taker aims for completeness and accuracy, but these goals are
> not always achievable, given the desire to get the notes out with a
> rapid turnaround. So don't consider these notes as complete minutes
> from the meeting, but rather abbreviated, summarized versions of what
> was discussed. There may be errors of commission and omission.
> Participants are welcome to post comments and/or corrections to these
> as they see fit.
>
>
> Agenda
> -------
> * Status reports
>
>
> Topic: Status reports
> ---------------------
> gh: Funding thru end of may. shifting times around a bit here at
> affy. gh going up to a greater percentage during this period.
> going down to half time for next month due to house-related work.
>
> Focusing now on cleaning up impl of writeback on igb client. clean
> impl based on ideas sketched out at code sprint in Aug.
>
> Spec issue:
> -----------
> gh: was there a resolution to the feature group assembly conversation
> on email thread.
> aday: died out. so the assumption is: no change.
>
> [A] Ask andrew about feature group assembly resolution, if any.
>
>
> ee: new release of IGB. bug fix then patch release. rapid turn
> around. Exposed need for more throurough testing.
> Specifying multiple urls for get more info links. sources for urls:
> track lines in psl/bed files. Also supporting das files (1 and
> probably 2)
> noticed: feature tag can give feat label and ID. IGB ignores these
> labels, because they seem to be attached to wrong thing. feat in das/1
> is like 'exon' group is 'mrna'. it's the mrna we want the label on,
> not exon where the labels are on.
>
> gh: if people just label parent. names don't have to be unique. id is
> unique uri, name is displayed name. parser isn't looking into that now.
>
> [A] Ed will look into using feature name as label in IGB client
>
>
> sc: Installed updated das2_server code on affy the das/2 server
> (netaffxdas.affymetrix.com). Installed new, efficient version of exon
> array data for hg18 (Mar 2006) assembly on this server. (igb's 'Bprobe1'
> parser, generates new bp2 format files). Probe and probeset data
> loaded fine, but exon/transcript cluster data failed with exception
> about 'Probe_count is zero for <probesetID>'.
>
> gh: problem: the bp2 data format isn't designed for representing
> transcripts/exon just probe. problem in the part that generates the
> bp2 files. can take a look at that.
>
> [A] Gregg will look into steve's Bprobe1 parser error. Needs source gff.
>
> ee: Can you verify that the gff data you are loading doesn't have
> unmapped probes, probe sets? Some are not mapped after lifting from
> previous genome assembly.
>
> [A] Steve will remove unmapped objects in the source gff used for bp2
>
>
> aday: working on UML for integrating the writeback and the read
> features. Also retrieval of dynamic features as well. Sent out example
> query. working on getting them all into a single model, determines
> what do do based on input query.
>
> will impl own block caching rather than apache caching.
> If I see a writeback coming in , can see which types have been
> modified, within each region. can fork off process to re-generate them
> after doing the writeback. will be a lot faster.
>
> Have a flowchart. partway through creating UML classes, functions,
> return types. Using poseidon.
>
> [A] Allen will distribute uml diagrams for das/2 modeling when ready
>
> gh: will locking be a part of that?
> aday: can make sure it's compatible. don't know how much of that to
> impl now.
> gh: useful to think about how to model that too.
>
> [A] Allen will include locking in his UML modelling.
>
> aday: flowchart is pretty generic. can be used by other servers.
>
>
> bo: no das work because of work on manuscript.
> started sourceforge project for das/2 assay "gyrax" (nee hyrax --
> already taken at sf).
> The motivation for this project is to take the das/2 objects in igb
> and make them more generic. This project can host these objects. They
> could then be used for other apps (igb, gyrax, others). Mark
> Carlson in lab is working on the gyrax client.  Could be a nice
> library for use by other apps, gui or not, that are built on top of a
> das server.
>
> gh: parts of the igb objects are tied into genometry model, a separate
> package also. but both of these could be separated from igb.
>
> ee: There was some email on genoviz forum where someone is writing an
> app based on old NGSDK objects, on the help forum on
> sourceforge. problems with >30,000 glyphs. advice: switch to efficient
> glyph versions (special drawing alg if children are too small to see).
>
> gh: Lots of caveats...There is code that hasn't been touched in a
> while.
>
> gh: question about hardware quote for UCLA
>
> [A] Allen will send gregg hardware quote for UCLA (<$5k)
>
> sc: status of hardware for affy das server upgrade?
>
> gh: plan to order end of oct, should have in place in first two weeks
> of nov.
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From Steve_Chervitz at affymetrix.com  Mon Oct 23 21:17:46 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 23 Oct 2006 18:17:46 -0700
Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 23 Oct 2006
Message-ID: <C162B7CA.22332%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 23 Oct 2006

$Id: das2-teleconf-2006-10-23.txt,v 1.1 2006/10/24 01:15:21 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Gregg Helt, Ed Erwin
  UCLA: Allen Day
  Dalke Scientific: Andrew Dalke


Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda
-------
* Status reports
* Spec discussion


Status Reports
---------------
[Note: lots of digressions within status reports]

ad: Have been looking at how Tim Hubbard's group is using das/1.

gh: you are acting as our proxy to the uk group.

gh: andreas has been working on das registry.
ad: yes, in use for both das/1 and 2 servers.
gh: am interested in his work to ping servers to test for live-ness.

gh: see my response on das discussion list to Brian Gilman's
message. where to find das/2 servers to hit on. biopackages was not
giving correct answers for sources query.
ee: was true two weeks ago.
aday: just a bug.

gh: we need to get both servers fixed. need an automated way to figure
out when servers are down, such as what andreas is doing with das/1.

[A] Andrew will ask Andreas about live-ness test for das/2 as well.

gh: andrew's validator could be scripted to do this, too.
gh: your validator is not running, btw.
ad: server rebooted, not set up to restart automatically.

[A] andrew will see that his validator server is up (done).

gh: affy server is serving up incorrect xml base now. code is set up
to allow which xml base to use.

[A] steve will fix xml base on affy server

gh: need to use four arg version:
port, data dir, email for maintainer, xml:base
without xml:base, everything goes screwy

gh: Andrew's validator should catch this since xml:base resolution of
capabilities would resolve to local host which would throw an error.
ad: yes.

gh: Andrew: you are focusing on das now?
ad: this week at EBI, then next month focusing on DAS work.

Status (continued)
-------------------
gh: this week - distracted by igb issues, also on 1/2 time this month,
so no new das work to report.

ee: gff3 parser, got feedback from lincoln. adding support for
track lines, several of our parsers there is a diff between the way
igb puts things into tracks and the way the ucsc browser puts things
into tracks. in igb: we put thing into tracks based on source
field. so one file can lead to multiple tiers. in ucsc: everything
below track line goes into one track. Soln: if there are track lines,
do it the way UCSC does it. Otherwise, do it the igb way. Also worked
on coloring by score (affects gff, ed, and one other). Makes it
similar to ucsc. Assumption is white background. It is rigged to be
based on normal foreground and background colors. white = ucsc

Also participated in the java "ask the experts" thing: asked about
swing, but they didn't answer.

gh: das2 style sheets?
ee: yes, how free am I to change that spec?
ad: go for it.
ee: don't want spec to say you need to use certain shaped glyphs --
hard to support. just simple things - colors, labels.

ad: asked uk folks about style sheets, they haven't done anything.
gh: gbrowse (lincoln) uses style sheets for das/1.
ee: the stuff in das/2 come from das/1?
ad: yes, with some changes.
ee: also need to do documentation.

sc: worked on added data for currently unsupported arrays on the Affy
DAS/1 server to the quickload directory. Got some requests for mouse
assembly aug 2005, RG-U34 rat arrays. Didn't update the annots.txt
yet, so IGB users won't know they are available.

[A] steve will update affy quickload annots.txt

sc: ideally, this should be automated.
gh/ee: could possibly have IGB detect these without needing to update
an extra file. But there was no standard way to read directory
contents.

gh: chp files have no genomic location for probe sets, so igb needs
to look this up, likely via das/2 server. primary way for people to
look at results in igb.

sc: did some work on loading exon array annotations into das/2 server
using gregg's new bp2 format (reported last time). Didn't see any
justification for the "probeset with zero probes" error it threw.

[A] gregg and steve will look into bp2 format parsing issues

[A] gregg will put in order for new hardware for affy das server

aday: porting gff3 into writeback server as an alt format for loading
data in. Email thread with Ed - ambiguities in the gff3 specification

[A] Allen will forward email to list.

aday: some communication with lincoln's group, re: validator. I need
to create some sample gff3 docs to make sure validator can parse them
all. will adding support to parser in bioperl (likely).
Re: alignments: target and source have to be stranded, length of one have
to be equal to or less than the one it's aligned to, etc.
No work on server uml. hold off until spec is finalized before
committing to uml model. Eg., fasta response not mentioned, broken
hyperlinks, no response from Andrew.

gh: fasta?
aday: refered to but not described. properties response mentioned but
not described. fasta has been replaced by segments, properties
gone. See email on list.

sc: sequence retrieval command used to return fasta format, hence the
fasta request. this has been replaced with segments, but spec not
updated.

gh: property capability?
aday: yes. not sure how to proceed yet.

[A] Andrew will fix/respond to issues raised by Allen.

gh: another spec issue: last code sprint I didn't like semantics of
range feature filters, I eventually caved to majority. caveat: I
wanted an optional attrib in types doc to say: "here's a type but you
can or cannot use it in search filter."  I.e., optionally restrict which
types you can use in those filters. If false, it indicates to client
it shouldn't use it as a searchable thing.
ad: if it does anyway?
gh: server could throw an error
ad: or not return any results of that type?
gh: ok
ad: reason for this? is there a better word than 'searchable'?
w/r/t the problem domain.
gh: the reason: I want people to search for 'genscan transcripts' not
'genscan exon' because of how we decided to do range queries.
ad: not sure why someone would want to do this.
gh: it was agreed on at last code sprint...

[A] gregg will write up use case for range feature filters underlying his
need


ad: Regarding parent and child bidirectional feature pointers: I'm
willing to say that there's no need to assemble features dynamically
on streaming approach. so we can get rid of parent or child
relationship. make it more like gff3 to have parent link only.

gh: worried about not having full closure. could get parents that don't
know about child. if you have child, do you then have to have every
parent in the response?

ad: I thought we required it? if there is a feature then all features
in that group must be returned.
ee: never a fan of specifying both parents and children. can lead to
mistakes - not compatible. andrew says parsing is more difficult...
ad: when processing input you know when done with a feature
group. this is useful.
if no one impls it why have the overhead?
ee: impl doesn't seem difficult
gh: my impl doesn't catch cycles. still have to do cycle check
regardless if it was bi-directional.
ad: can't find a simple algorithm for doing it.
gh: keep children around. check if tree is complete. bidirectionality
allows me to crawl tree.
ad: you don't check for cycles or multiply rooted trees.
ee: just assume there are not such problems.
ad: I don't like bogus data.

ee: my gff3 parsing, I wait until end to assemble things.
ad: as mine does, too. worried about extra fields means more
possibilities of breaking things. bad data.
ee: should be able to detect bad data.
ad: duplicate links means you can't assemble from one but not
other. most people will not check both.
gh: main justification was to get complete feats before end of doc.
lincoln was the one who wanted this ability.
ad: several ways to do it. eg. contained feature elements with all
children, spanning tree, etc.

ee: catching loops is hard, need to wait till end.

gh: let's wait till lincoln comes in.

[A] Everyone will revisit bidirectional parent-child pointers with Lincoln


Other issues:
-------------

ad: Regarding Brian's question from email, the xml document he sent.
gh: my reply: document was otherwise correct but xml:base was wrong.
ad: also: lowercase close types element at end.

ad: know anything about brian's deadline mentioned by lincoln?
gh: no.

[A] Someone will send Brian pointer to Andrew's validator.

ee: das/2 impl is not usable by igb now. need to fix top-level
document.

gh: we really need an automated way to know when server is having problems.

gh: conf call with Andreas and other's in UK? can set up a conf call to
talk about registry. Also coordinate mapping - when one system is the
same as the other. ties into registry stuff.

[A] Gregg/Andrew maybe will have conf call with Andreas while Andrew is in
UK


From dalke at dalkescientific.com  Tue Oct 24 05:17:58 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 24 Oct 2006 10:17:58 +0100
Subject: [DAS2] das2 diagrams, questions
In-Reply-To: <5c24dcc30609210030k5324378fy18990dc41a1f1b1e@mail.gmail.com>
References: <5c24dcc30609210030k5324378fy18990dc41a1f1b1e@mail.gmail.com>
Message-ID: <57ca007c161fd08f104c8bb87e4127ac@dalkescientific.com>

Allen:
> I have a few questions, mostly targeted at Andrew, regarding the 
> current
> HTML version of the spec on the biodas.org site.  It hasn't been 
> updated in
> about 5 months, and looks pretty out of date.

Strange.  The last changes were in August.

> * Is the HTML document in sync with the "new_spec.txt" document in CVS?

It should not be.  That was a text document I was working on back in
Jan/Feb as part of the updated to the current version of the spec.  I've
removed it from CVS.

(Even though I know it's CVS, my fingers keep typing "svn" :)


> * There is mention of a "fasta" command, and its fragment is linked 
> from the
> ToC of the genome retrievals document, but it does not appear in the
> document.  Does this command exist?  My understanding from conference 
> calls
> is that the sequence/fasta/segment/dna stuff has all merged into the
> "segment" response. Is this correct?

That is correct.  There is a segments request.  Passing "format=fasta"
to a segment request returns the sequence in FASTA format.

I didn't catch that line when I was doing the changes.  I've removed it
from CVS.

> * The "property" command seems to have disappeared.  Is that correct?  
> Are
> property keys no longer URIs?  Also the "prop-*" feature filters could 
> be
> better described, it is not clear to me if they are meant as some sort 
> of
> replacement for the property command.

The property command has disappeared.  Notes are at
   das2-teleconf-2005-11-28.txt

It was replaced by two things.  One is the key/value PROP table, which 
is
meant to store simple string data.  It should be considered to be
user-editable, eg, as a property sheet.  The "prop-*" commands are used
to search that table.

The other the non-DAS namespace'd XML extensions.  For example,

<FEATURE ...>
   ...
   <PROP key="gene_region_length" value="5398" />
   <fly:map xmlns:fly="http://flybase.org/"
      physical_map="4: 26,994..32,391[-]"
      cytogenetic_map="4: 101F1--102A1" />
</FEATURE>

In this case there is no default search mechanism.  Instead the server
may declare that it implements a map-specific search extension to the
DAS query language, or a new search interface, and clients which 
understand
the extension can add support for it.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Oct 24 10:03:54 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 24 Oct 2006 15:03:54 +0100
Subject: [DAS2] XML-RPC based DAS2 validator
Message-ID: <4c0809629f5d0e26693547964e86d6c9@dalkescientific.com>

I've added an XML-RPC service to the DAS validator.  Andreas will be 
able
to use it to verify new DAS2 entries in his registry.

The entry point to the XML-RPC server is
   http://cgi.biodas.org:8080/RPC2/

The trailing "/" is important - use ".../RPC" and the server will do
an HTTP redirect to ".../RPC/", which not all XML-RPC clients 
understand.

At present the server implement a single RPC method named 
"validate_url".
It takes two positional fields.  The first is the required URL to 
validate.
The second is the optional document type to validate against.  If not 
given
then the server will attempt to guess.

The response is a list of 2-element tuples.  In each pair the first is
the severity level and will be one of
  "info"
  "warning"
  "error"
  "fatal"

"fatal" means the validator normally should not continue.  I can 
override
that, which I do in the XML-RPC service in order to generate more 
messages.

"error" means the result does not meet the spec but the validator will
continue checking, at least in the normal case.  (That too is 
user-defined.)

"warning" is for things which are suspicious but not wrong, like using
"application/xml" instead of the DAS2 content-type, or having a uri 
field
with an empty content.  (This is legal; it refers to the document 
itself.
It's just strange and likely indicates an error in the server.)

The "info" is for niggling details, like that the server guess the 
document
type (in the case of application/xml response) by looking at the tag for
the top-level element.


Here's an example in Python's interactive shell.  I'll first make a 
proxy
to the remote server

 >>> import xmlrpclib
 >>> server = xmlrpclib.Server("http://cgi.biodas.org:8080/RPC2/")

then call the new method with a single parameter; the URL to validate.

 >>> server.validate_url("http://das.biopackages.net/das/genome/human/")
[['info', "Assuming doctype of 'sources' based on Content-Type"]]

That's a list with a single element containing the (severity, message) 
tuple.
The info statement came because it guessed the document type based on 
the
content-type from the server.  I can specify the document type directly
and skip that warning statement

 >>> server.validate_url("http://das.biopackages.net/das/genome/human/", 
"sources")
[]

Here's an example of validating a server with the wrong document type, 
to show
what the error message look like.  I've added newlines so the results 
aren't
all on one string

 >>> 
server.validate_url("http://www.dasregistry.org/registry/das1/sources", 
"types")
[['fatal', "Received Content-Type 'application/x-das-sources+xml', 
expected 'application/x-das-types+xml'."],
['fatal', "Expected element '{http://biodas.org/documents/das2}TYPES' 
but got '{http://biodas.org/documents/das2}SOURCES' at byte 41, line 2, 
column 2"],
['error', 'element "SOURCES" from namespace 
"http://biodas.org/documents/das2" not allowed in this context at byte 
41, line 2, column 2']]
 >>>


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Wed Oct 25 13:42:32 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 25 Oct 2006 18:42:32 +0100
Subject: [DAS2] DAS2 validation service
Message-ID: <fc8c7c650fbc7ce088867a216e8303e2@dalkescientific.com>

I've updated the DAS2 validation service a couple of ways.
One was to improve the error handling, eg, point it to slashdot.org
(not XML), slashdot.org/blahblah (404 - not found) or to
blahblah.blah (host does not exist) and it reports an error
instead of raising an exception.

There was a problem of sorts with the XML-RPC server.  I chose
XML-RPC yesterday because I thought it would be dead simple to use
in any environment.  It's old, stable technology.  Andreas tried
a few Java XML-RPC clients and found there were various hard-to-resolve
dependencies.  Eg, the most modern one requires Java 1.5 but his
system runs 1.4, and the older one requires some XML DOM parser
which isn't included with the system and proved hard to track down.

Rather than struggle to make that work, I've added a new HTTP
interface for automated validation

The URL is
   http://cgi.biodas.org:8080/validate_url

It has a required parameter, "url", which is the URL to validate

%curl 'http://cgi.biodas.org:8080/validate_url?url=http://slashdot.org/'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://slashdot.org/">
  <MESSAGE text="Unknown Content-Type 'text/html'." severity="error"  
/><MESSAGE text="expat: mismatched tag at byte 1794, line 29, column 3"  
severity="fatal" />
</DAS_VALIDATION>

It has an optional parameter "doctype" which is the document type to  
expect


%curl 'http://cgi.biodas.org:8080/validate_url?\
url=http://das.biopackages.net/das/genome/human/;doctype=sources'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://das.biopackages.net/das/genome/human/"  
doctype="sources" />

In that last case there were no messages.


The XML document is

<DAS_VALIDATION url="URL-used-for-the-validation"  
doctype="the-document-type"? >
   <MESSAGE severity="one of info, warning, error, fatal"
            text="the error message" />  *
</DAS_VALIDATION>

A note about the doctype.  If the server could not get the document then
the validation will not have a doctype even if you gave it one.

%curl  
'http://cgi.biodas.org:8080/validate_url?url=http://slashdot.org; 
doctype=types'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://slashdot.org">
  <MESSAGE text="Received Content-Type 'text/html', expected  
'application/x-das-types+xml'." severity="fatal" />
  <MESSAGE text="expat: mismatched tag at byte 1794, line 29, column 3"  
severity="fatal" />
</DAS_VALIDATION>

If you tell it the wrong doctype and it gets something in XML then it  
assumes the reponse is in the given doctype

%curl  
'http://cgi.biodas.org:8080/validate_url?url=http:// 
das.biopackages.net/das/genome/human/;doctype=types'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://das.biopackages.net/das/genome/human/"  
doctype="types">
  <MESSAGE text="Received Content-Type 'application/x-das-sources+xml',  
expected 'application/x-das-types+xml'." severity="fatal" />
  <MESSAGE text="Expected element  
'{http://biodas.org/documents/das2}TYPES' but got  
'{http://biodas.org/documents/das2}SOURCES' at byte 41, line 3, column  
2" severity="fatal" />
  <MESSAGE text="element &quot;SOURCES&quot; from namespace  
&quot;http://biodas.org/documents/das2&quot; not allowed in this  
context at byte 41, line 3, column 2" severity="error" />

If no input doctype is given then it will guess at the doctype based on
analysis of what it got from the remote server

%curl  
'http://cgi.biodas.org:8080/validate_url?url=http:// 
das.biopackages.net/das/genome/human/'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://das.biopackages.net/das/genome/human/"  
doctype="sources">
  <MESSAGE text="Assuming doctype of 'sources' based on Content-Type"  
severity="info" />
</DAS_VALIDATION>

This XML should be easy for anyone to parse.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Oct 26 05:06:33 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 26 Oct 2006 10:06:33 +0100
Subject: [DAS2] stylesheets meeting
Message-ID: <22090f570d5179afc3fe71a0768ed2ec@dalkescientific.com>

I met yesterday afternoon with Andreas Prlic, Andreas Kahari and
Eugene Kulesha to get information about their stylesheet needs.
Ed said he would work more on the spec and this should provide
some relevant information.

We ended up talking about the stylesheet using a sort of CSS
approach.  There are selectors (feature uri, type uri, etc.)
and properties (color, glyph shape, ...).  Some of the properties
inherit/cascade and others don't.  There's nothing new in this;
we talked about it during the 2nd sprint.

The details of inheritance prove tricky.  For example, consider

[ Feature A ]   ---- is of ---> [ Type 1 ]
     |
   contains
     |
[ Feature B ]   ---- is of ---> [ Type 2 ]

where each feature and type has a style sheet.  The property
(say "color") for Feature B is determined first by the stylesheet
for Feature B, then that of Type 2.  If still not present,
does it come from the parent(s) of Feature B and the parent's
type?

Given as that requires correct traversal in the face of multiple
inheritance, I'll now argue "no".  Even though this is an
effectively solved problem in OO programming ("C3 method resolution
order", from Dylan and also used in Python, Perl6, and others).
It's complex enough to make it unjustifiable.

The selectors people wanted are:
    - the feature type, based on its uri
    - the feature itself, based on its uri
    - view type, that is, "2D" vs "3D".  Akin to "screen", "paper,
        in CSS.  Andreas P's DAS-based structure viewer uses
        very different stylings ("ribbon", "vdw") than sequence.

Note: only "and" selections are requested.  There seems to be no
need for selection like "features of type T1 which are descended
from feature F2"

Other possibilities are:
    - selectors based on the type ontology uri
    - application-specific styles (but this is probably handled
         best through properties and not though a selector; on the other
         hand, it would enable workarounds for app-specific bugs)
    - level of detail (but Eugene didn't even know this option existed
         in DAS1, so perhaps it's not needed for DAS2)
    - support for overrides in case of stylesheet conflicts (user
       overrides server overrides application, most recent definition
       overrides previous)

For the view and the application selectors a space separated list
seems reasonable, as
    view="2D 3D" ... color as yellow
meaning that for 2D and 3D to draw the feature in yellow.  Or just
leave out the selector.

One question was how to find the stylesheet.  They can be listed in
the SOURCES document but I was thinking they could also be listed
in the FEATURES response, as

<FEATURES xmlns="http:// ... /">
   <link rel="das-stylesheet" type="application/x-das-stylesheet"
          href="http://example.com/stylesheet">


Another question is the format of that selection language.
That was quickly answered: "in XML".

I brought up Ed's comment about (if I understand correctly) making
the shape language a bit more abstract.  For example, in DAS1
there's a GLYPH called "PRIMERS", while the others are names like
"EX" and "ARROW".  The general view is that this level of abstraction
isn't useful.  Andreas Prlic summarized it nicely as (reworded) "the
goal of a stylesheet is to make thing concrete".  Though perhaps an
SVG-style set of drawing commands may be useful.

That said, there may be a few things which need a more domain-specific
name.  The example which came up is in color.  EBI has "contig blue"
as a color name.  Are there other colors like that?

On the topic of colors, the desired colors are the CSS color names
(though in-house they also have the X11 names) and the CSS-style
#color #selection, as #0FF for cyan.  The #RGB and #RRGGBB color
names are sufficient.  Other CSS variation, like rgb(255, 0, 0) and
rgb(10%, 45%, 82%) are not needed.

In the meeting I mentioned alpha/opacity values in CSS as #RGBA and
#RRGGBBAA.  In writing these notes up I see that CSS does not support
that syntax.  Alpha is a "wouldn't it be cool if .." feature and not
one which is needed or specifically requested.

I outlined support for more complex font information for DAS2.
Feedback here say that's not important.  There's no desire to change
the font size, style, etc.  Nor desire for super/subscript, underscore,
italics, bold, condensed, etc.

I asked about standardizing the drawing model so there is more
consistency between different viewers.  For example, if there is
a glyph and a piece of text, where is the text drawn in relationship
to the glpyh?  Does the height of the glyph include both?  There
was no desire for this.

On the other hand, a current user-specified option is where to
draw the text, which corresponds to a stylesheet override.

What they want is support for plots and color gradients.  See the
"Gradient" and "TilingArray" entries at

http://www.ensembl.org/Homo_sapiens/contigview?conf_script=contigview; 
vc_start=25422500;vc_end=25447499;region=17; 
add_das_source=(name=Gradient+url=http://das.ensembl.org/ 
das+dsn=hydraeuf_00001350+type=ensembl_location_chromosome+stylesheet=y+ 
score=c+fg_merge=a+fg_grades=50+fg_data=l+fg_max=310+fg_min= 
-143+active=1);add_das_source=(name=TilingArray+url=http:// 
das.ensembl.org/ 
das+dsn=hydraeuf_00001350+type=ensembl_location_chromosome+stylesheet=y+ 
score=s+fg_merge=m+active=1

I can think of several ways to handle that.  One is to declare a feature
for the entire chromosome, as

<FEATURE id="asdfasdf">
   <LOC segment="../chromosome1" />
   <extension:tiling_data href="http://somewhere/else" />
</FEATURE>

and viewers can use some agreed upon protocol to get the right data from
somewhere/else.

Another is

<FEATURE id="asdfasdf">
   <LOC segment="../chromosome1" />
   <extension:tiling_data>
R0lGODdhOABkAPMAABq15RaU14za5O3391660P////7//vz+/gAAAOCP4XJgv// 
10AAAAACP4XRM
j+F1ICwAAAAAOABkAAAE/xDJSau9OGtZuv9gKI4kyZVoqorn6r4sAs90S9+pje8x7/e/ 
YEcn3BGL
tyNyply+ms4VNJqTUZPWKzOrfXK70i+4OvaWXdOzJ60usNXvc7w8H9fB925eu7/ 
2qX9RgU6DS4VI
h0WJQotBjT+PPpE8k0ZibSGVOJpYlm5DkJdhblaiMH1ZFKGba6Cmo1gTQ4GNspuvSZS4mJw1 
u229
W5gowae/ 
OwYDBmZUAwIBBAMHsE7OANcA0dPExzMHzwAB4tkC2ybdMALX4uMBAgTmQEXJ7AHW2NnK
I48D9QIGyQhgczdgHzoVBwiwK+dhgMAA5OKtOZiin7h/IL4pJCgvyDd3EtQ/ 
HJBm0Em8Ac4KqqiU
7Nm4aGSqqau3juG5ag8hEpwpoAQ/ 
gQNhFpipsuMPA+rWgfSQEEDPkkWahhP6YUC2kLOQOJyKtR8A
rKS0PsQY4hk8qEK2Xiva8JrNTBRRfMNGVSO5m0je4SMANBtfsGGR2A23ly9buDIFvAPKV8Bh 
xCZR
OlOMEvDEMwYOaPYZd0XmyYv5EngLgt9iiISvjU6JVgg4fAN1ulsGuYjFdkrZUS3dGYVLcY0d 
aybZ
Oqpmy8WH8VaOl3lt5x+KMYMevTegDdiza7cQAQA7
   </extension:tiling_data>
</FEATURE>

with an agreed upon definition of how to interpret the in-line data.
But for the entire genome this could be rather big.

Another is to break it down into parts, as

<FEATURE id="a00001">
  <LOC segment="../chromosome1" range="0:10000">
   ... data for the first 10,000 bases ...
</FEATURE>
<FEATURE id="a00002">
  <LOC segment="../chromosome1" range="10000:20000">
   ... data for the second 10,000 bases ...
</FEATURE>
   ...

There is already the need for displaying images on the display, but
the current use is to click on a point to bring up an image and not
showing the image as a glyph.  The current solution is a hack,
embedding HTML in the NOTE field.  Only a couple of HTML elements are
supported.

This can easily me moved into a property or a local extension
in DAS2.

If viewer does not understand one of the extensions, what does
it display?

There are two things in DAS1 which I don't know well enough to ask
reasonable questions.  One is the BUMP, which I think specifies if
multiple glyphs of the same type may overlap.  I think Eugene said
they wanted more control over that, like limiting to at most 5
overlaps.

Another is the GROUP, which in DAS1 was used to merge multiple
feature types into a single track.  Quoting from the DAS1 spec

     The canonical example is the CDS, exons and introns of a
     transcribed gene, which logically belong together.

DAS1 has specialized stylesheet language for depicting groups.
DAS2 uses hierarchical features instead.  Does/can DAS2
do the right thing for depicting those?

I think I've covered the major points.  Please chime in if I've
missed anything relevant.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Oct 26 09:46:24 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 26 Oct 2006 14:46:24 +0100
Subject: [DAS2] TYPE[@source] -> TYPE[@method]
Message-ID: <4098539a2681ec2c3243e4008dac7855@dalkescientific.com>

I would like to change the existing TYPE attribute of "source"
and have it use a different attribute name.  Its meaning conflicts
with the other uses of "source" in DAS2.

The best alternative is "method" because (I believe) it is  supposed
to store the same information as the corresponding DAS1 TYPE attribute.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Oct 27 15:56:27 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 27 Oct 2006 20:56:27 +0100
Subject: [DAS2] segments and types
Message-ID: <91244d1fb88f2b49939a9d10f15d2b03@dalkescientific.com>

A couple of observations about what I've seen in existing
DAS1 servers.  Nothing here concerns format changes.

There are four different ways to handle segments:
   1) Don't provide segment information
        "Our clients know the segment because of the id
         so they don't need a segments document"
   2) use "size" (pre-DAS 1.0 spec)
   3) use "start"/"stop" (DAS 1.0 spec)
       - with variations, like "0", "0" meaning the length is undefined
           (and even "1", "0", with a size="2", for one server!)
   4) use a "version" field

The last is mostly used for protein sequences, that I've seen.
Its an aspect of #1 ("9pti" means "bovine pancreatic trypsin
inhibitor structure from PDB") as an abstract identifier, with
the version used to make it concrete ("with the update because
the first release had a typo")  I think it can be encapsulated
in the uri scheme we now use because each version gets it own
identifier, and since the client knows all versions there's no
problem.


The folks at EBI/Sanger (what's the correct collective term;
Hinxton? Genome Campus?) know which servers provide which
systems so many servers don't provide coordinates.

In some cases, like rabbit, the server will generate about
120,000 segments, one for each scaffold.  It takes quite some time
(a minute or more) to generate the output.  In theory this is
static and can be precomputed by the server.

For my own knowledge, when do people want the complete list
of segments?  When do they want the length?  You, yes, you
there, in front of the computer.  When do you you want to
use it?

Let me stress -- this is not a request to change anything.  I
would like to know for my own sake, for writing the documentation,
and for how much emphasis to put on this for the validation.

As another observation, the Sanger/EBI servers also don't
do much with the types document. Some don't even handle the
request.  Eugene said that no one had asked him to add it.
It's there now (thanks Eugene).

I think this is because most of their servers only had a single
type and the solution was "display everything."  They are
running into difficulties with this for a few new servers and
will be need type support, and type filter support soonish.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Oct 27 16:01:01 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 27 Oct 2006 21:01:01 +0100
Subject: [DAS2] das1->das2 proxy adapter
Message-ID: <a10d51062eb5731fa6f828d9937f1d86@dalkescientific.com>

As part of my effort to make sure DAS2 supports at least what
DAS1 can do, and to simplify migration from DAS1 to DAS2,
I have over this week developed a partial proxy adapter.  It's
a DAS2 server which translates the request then forwards it
to a DAS1 server (including the "segment" and "overlaps"
feature filters).

It takes the results and reformats them into DAS2 format.  I
had used a template approach for this but that proved slow for
for large responses.  I rewrote the code so I generate the XML
by hand, which also gives me a chance to put in a lot more
validation code for DAS1.  The goal there is to ensure that
I catch all the extensions people added to DAS1.


					Andrew
					dalke at dalkescientific.com


From ed_erwin at affymetrix.com  Mon Oct 30 17:26:38 2006
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Mon, 30 Oct 2006 14:26:38 -0800
Subject: [DAS2] das1->das2 proxy adapter
In-Reply-To: <a10d51062eb5731fa6f828d9937f1d86@dalkescientific.com>
References: <a10d51062eb5731fa6f828d9937f1d86@dalkescientific.com>
Message-ID: <45467C1E.1000705@affymetrix.com>

Thanks Andrew,

That sounds really useful.  It might be nice to try to run the current 
NetAffx DAS/1 server through this translation and see what comes out the 
other end.  How would we need to do that?  Do we download your code and 
run it ourselves, or will you have some server that we can pass the data 
through?

Ed


Andrew Dalke wrote:
> As part of my effort to make sure DAS2 supports at least what
> DAS1 can do, and to simplify migration from DAS1 to DAS2,
> I have over this week developed a partial proxy adapter.  It's
> a DAS2 server which translates the request then forwards it
> to a DAS1 server (including the "segment" and "overlaps"
> feature filters).
>
> It takes the results and reformats them into DAS2 format.  I
> had used a template approach for this but that proved slow for
> for large responses.  I rewrote the code so I generate the XML
> by hand, which also gives me a chance to put in a lot more
> validation code for DAS1.  The goal there is to ensure that
> I catch all the extensions people added to DAS1.
>
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2
>   


From Gregg_Helt at affymetrix.com  Mon Oct  2 02:26:20 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Sun, 1 Oct 2006 19:26:20 -0700
Subject: [DAS2] No Monday teleconference this week -- switced to biweekly
	call
Message-ID: <C71929195D04BF48BAECD499AF717B480198CBF3@msex02.affymetrix.com>

Just wanted to remind everyone that we decided last month to switch from
a weekly to a biweekly DAS/2 teleconference schedule.  So the next DAS/2
conference call will be on Monday, October 9th at 9:30 AM PST.

Conference phone #, US: 800-531-3250
Conference phone #, International: 303-928-2693
Conference ID: 2879055
Passcode: 1365

	Thanks,
	Gregg


From Steve_Chervitz at affymetrix.com  Wed Oct  4 17:42:46 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Wed, 04 Oct 2006 10:42:46 -0700
Subject: [DAS2] Updated java runtimes for timezone change in 2007
Message-ID: <C14940A6.21B5D%Steve_Chervitz@affymetrix.com>

Yes, the Bush administration's reach extends into the lives of Java
developers, changing when DST starts and stops in 2007.

Here's a link for updated Java runtimes for a variety of versions:
http://java.sun.com/developer/technicalArticles/Intl/USDST/

This could be an issue for DAS, particularly for writeback. Some
implementations may rely on consistent time-stamping, e.g., to determine
which edit request was submitted first.

May not make a difference within a server, but it would be an issue across
multiple servers.

Steve 


From Steve_Chervitz at affymetrix.com  Mon Oct  9 17:30:42 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 09 Oct 2006 10:30:42 -0700
Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006
Message-ID: <C14FD552.21D8F%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 9 Oct 2006

$Id: das2-teleconf-2006-10-09.txt,v 1.1 2006/10/09 17:24:14 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed Erwin, Gregg Helt
  UCLA: Allen Day, Brian O'connor

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda
-------
* Status reports


Topic: Status reports
---------------------
gh: Funding thru end of may. shifting times around a bit here at
affy. gh going up to a greater percentage during this period.
going down to half time for next month due to house-related work.

Focusing now on cleaning up impl of writeback on igb client. clean
impl based on ideas sketched out at code sprint in Aug.

Spec issue:
-----------
gh: was there a resolution to the feature group assembly conversation
on email thread. 
aday: died out. so the assumption is: no change.

[A] Ask andrew about feature group assembly resolution, if any.


ee: new release of IGB. bug fix then patch release. rapid turn
around. Exposed need for more throurough testing.
Specifying multiple urls for get more info links. sources for urls:
track lines in psl/bed files. Also supporting das files (1 and
probably 2)
noticed: feature tag can give feat label and ID. IGB ignores these
labels, because they seem to be attached to wrong thing. feat in das/1
is like 'exon' group is 'mrna'. it's the mrna we want the label on,
not exon where the labels are on.

gh: if people just label parent. names don't have to be unique. id is
unique uri, name is displayed name. parser isn't looking into that now.

[A] Ed will look into using feature name as label in IGB client


sc: Installed updated das2_server code on affy the das/2 server
(netaffxdas.affymetrix.com). Installed new, efficient version of exon
array data for hg18 (Mar 2006) assembly on this server. (igb's 'Bprobe1'
parser, generates new bp2 format files). Probe and probeset data
loaded fine, but exon/transcript cluster data failed with exception
about 'Probe_count is zero for <probesetID>'.

gh: problem: the bp2 data format isn't designed for representing
transcripts/exon just probe. problem in the part that generates the
bp2 files. can take a look at that.

[A] Gregg will look into steve's Bprobe1 parser error. Needs source gff.

ee: Can you verify that the gff data you are loading doesn't have
unmapped probes, probe sets? Some are not mapped after lifting from
previous genome assembly.

[A] Steve will remove unmapped objects in the source gff used for bp2


aday: working on UML for integrating the writeback and the read
features. Also retrieval of dynamic features as well. Sent out example
query. working on getting them all into a single model, determines
what do do based on input query.

will impl own block caching rather than apache caching.
If I see a writeback coming in , can see which types have been
modified, within each region. can fork off process to re-generate them
after doing the writeback. will be a lot faster.

Have a flowchart. partway through creating UML classes, functions,
return types. Using poseidon.

[A] Allen will distribute uml diagrams for das/2 modeling when ready

gh: will locking be a part of that?
aday: can make sure it's compatible. don't know how much of that to
impl now.
gh: useful to think about how to model that too.

[A] Allen will include locking in his UML modelling.

aday: flowchart is pretty generic. can be used by other servers.


bo: no das work because of work on manuscript.
started sourceforge project for das/2 assay "gyrax" (nee hyrax --
already taken at sf).
The motivation for this project is to take the das/2 objects in igb
and make them more generic. This project can host these objects. They
could then be used for other apps (igb, gyrax, others). Mark
Carlson in lab is working on the gyrax client.  Could be a nice
library for use by other apps, gui or not, that are built on top of a
das server.

gh: parts of the igb objects are tied into genometry model, a separate
package also. but both of these could be separated from igb.

ee: There was some email on genoviz forum where someone is writing an
app based on old NGSDK objects, on the help forum on
sourceforge. problems with >30,000 glyphs. advice: switch to efficient
glyph versions (special drawing alg if children are too small to see).

gh: Lots of caveats...There is code that hasn't been touched in a
while.

gh: question about hardware quote for UCLA

[A] Allen will send gregg hardware quote for UCLA (<$5k)

sc: status of hardware for affy das server upgrade?

gh: plan to order end of oct, should have in place in first two weeks
of nov. 


From allenday at ucla.edu  Tue Oct 10 22:30:14 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 10 Oct 2006 15:30:14 -0700
Subject: [DAS2] biopackages server UML
Message-ID: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com>

Hi,

I'm attaching my first draft for the UML of a server rewrite.  Aside from
all the spec churn, there are two main types of requests that need to be
handled that spurred me to do this rewrite.  The third reason I'm doing this
is to rework the caching mechanism on the server.  With the current code
base there is a lot of custom table clustering and denormalization to get
decent performance out of the Chado database.  I did some experimenting
(discussed in an earlier thread and on conf. calls) with a "tiling" or
"block" caching strategy of cache that turns out to work really well, and I
wanted to integrate that with the writeback functionality.

1) tighter integration of writeback, including locking.
2) configurability of feature types to be
  * dynamic (e.g. for on-the-fly gene prediction)
  * non-cacheable
  * cacheable
3) caching
  * segment range/type tiled caching
  * ability of writeback events to trigger cache flush events

See attached UML.  There is a .zuml file, you can view/edit with Poseidon,
or if you need a .xml I can send another attachment.

-Allen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: das2_refactor.zuml
Type: application/octet-stream
Size: 34991 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20061010/5d7a1c0b/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: das2_refactor.png
Type: image/png
Size: 130013 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20061010/5d7a1c0b/attachment-0001.png>

From boconnor at ucla.edu  Tue Oct 10 22:51:54 2006
From: boconnor at ucla.edu (Brian O'Connor)
Date: Tue, 10 Oct 2006 15:51:54 -0700
Subject: [DAS2] biopackages server UML
In-Reply-To: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com>
References: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com>
Message-ID: <452C240A.4060603@ucla.edu>

Hi Allen,

I have a few questions.

* How does feature (and other data types) filtering take place?  Does 
the controller passes info into read_features() in Das2::Model::Genome? 
  Where is the actual filtering implementation?  In 
Das2::Model::Genome::Feature?

* Where will the SQL queries live?  In the current implementation we 
have an object where many of the prepared statements live.  Do you plan 
on using something similar here?  Or will the SQL generally be embedded 
in Das2::Model::Record objects and Das2::Model::Genome::Chado?

* For the Das2::Model::Record subclasses, should there be another layer 
of inheritance with a Das2::Model::Chado::Record object?  In case you 
want additional data adapters for other DBs/flat files in the future?

--Brian	

Allen Day wrote:
> Hi,
> 
> I'm attaching my first draft for the UML of a server rewrite.  Aside 
> from all the spec churn, there are two main types of requests that need 
> to be handled that spurred me to do this rewrite.  The third reason I'm 
> doing this is to rework the caching mechanism on the server.  With the 
> current code base there is a lot of custom table clustering and 
> denormalization to get decent performance out of the Chado database.  I 
> did some experimenting (discussed in an earlier thread and on conf. 
> calls) with a "tiling" or "block" caching strategy of cache that turns 
> out to work really well, and I wanted to integrate that with the 
> writeback functionality.
> 
> 1) tighter integration of writeback, including locking.
> 2) configurability of feature types to be
>   * dynamic (e.g. for on-the-fly gene prediction)
>   * non-cacheable
>   * cacheable
> 3) caching
>   * segment range/type tiled caching
>   * ability of writeback events to trigger cache flush events
> 
> See attached UML.  There is a .zuml file, you can view/edit with 
> Poseidon, or if you need a .xml I can send another attachment.
> 
> -Allen
> 
> ------------------------------------------------------------------------
> 


From dalke at dalkescientific.com  Mon Oct 23 16:19:03 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 23 Oct 2006 17:19:03 +0100
Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006
In-Reply-To: <C14FD552.21D8F%Steve_Chervitz@affymetrix.com>
References: <C14FD552.21D8F%Steve_Chervitz@affymetrix.com>
Message-ID: <f6d6a5b59c0a06ec848065b95dd94988@dalkescientific.com>

On Oct 9, 2006, at 6:30 PM, Steve Chervitz wrote:
> [A] Ask andrew about feature group assembly resolution, if any.

As far as I know there was no resolution.

At last standing the problem is as follows.  Consider a complex 
annotation
with a single parent A and a single child B.

There are several ways to represent this

Option 1:

   <FEATURE uri="A" part="B"/>
   <FEATURE uri="B" parent="A"/>

This is the current spec.  Parents point to children and children to
parents.  This was different than the GFF-style where only the children
have a parent reference.  My hope was to assemble complex annotations
while reading the data from the remote server.

In practice this streaming assembly proved hard to implement.  The
algorithm is non-trivial for complex structures so most people will
do the assembly only after reading all features.  Also, there's a
possible error when parents don't list all children or vice versa,
and likely most clients won't fully validate, so a top-down and a
bottom-up assembly may give different results for the same server.

Option 2:

   <FEATURE uri="A"/>
   <FEATURE uri="B" parent="A"/>

This is the GFF-style.  The main limitations are support for streaming
data, such as showing partial results while downloading and converting
to/from other formats.  In both cases this is because parent nodes may
(and do) occur after children nodes, and there's no knowledge that all
children have been seen.

There is a problem in both option1 and option2 of not easily detecting
cycles or multi-rooted structures.

Variation: require that children are listed after parents.

Option 3:

<FEATURE-GROUP>
   <FEATURE uri="A"/>
   <FEATURE uri="B" parent="A"/>
</FEATURE-GROUP>

That is, put all features which are part of the same feature group into
a single element.  This is essentially like the ### "no forward 
references"
token in GFF3.

It's cumbersome because either there are two data types ("FEATURE-GROUP"
and "FEATURE") elements under the root or there are a lot of 
FEATURE-GROUPs
containing a single sequence.  There's still the need for cycle 
detection
and checking that the parent/part relationship are valid.

Option 4:

<FEATURE uri="A">
   <FEATURE uri="B"/>
</FEATURE>

Break the DAG into a tree structure (a spanning tree).  In this case
"B" is a child of "A".  For a more complex structure where "C" is a
child of "A" and "B",

<FEATURE uri="A">
   <FEATURE uri="B">
     <FEATURE uri="C" parent="A"/>
   </FEATURE>
</FEATURE>

This doesn't fit well with relational databases.  There's still the need
to check for cycles but it's much simpler.


Given the feedback I've heard, the use cases for streaming the data are
not seen as important.  Hence I'm willing to go with #2  (GFF-style, 
children
point to parents) and have nothing like the no-forward-references of 
GFF3.


					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Mon Oct 23 14:01:01 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 23 Oct 2006 10:01:01 -0400
Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006
In-Reply-To: <C14FD552.21D8F%Steve_Chervitz@affymetrix.com>
References: <AcbryKTy4z+4ole7EduEhgAKlXZSNg==>
	<C14FD552.21D8F%Steve_Chervitz@affymetrix.com>
Message-ID: <6dce9a0b0610230701q1898dc79wa3a3ff56814ff37e@mail.gmail.com>

Hi Folks,

I'm going to miss today's conference call again. I've been scheduled to
interview a job candidate and I can't change it.

Lincoln

On 10/9/06, Steve Chervitz <Steve_Chervitz at affymetrix.com> wrote:
>
> Notes from the weekly DAS/2 teleconference, 9 Oct 2006
>
> $Id: das2-teleconf-2006-10-09.txt,v 1.1 2006/10/09 17:24:14 sac Exp $
>
> Note taker: Steve Chervitz
>
> Attendees:
>   Affy: Steve Chervitz, Ed Erwin, Gregg Helt
>   UCLA: Allen Day, Brian O'connor
>
> Action items are flagged with '[A]'.
>
> These notes are checked into the biodas.org CVS repository at
> das/das2/notes/2006. Instructions on how to access this
> repository are at http://biodas.org
>
> DISCLAIMER:
> The note taker aims for completeness and accuracy, but these goals are
> not always achievable, given the desire to get the notes out with a
> rapid turnaround. So don't consider these notes as complete minutes
> from the meeting, but rather abbreviated, summarized versions of what
> was discussed. There may be errors of commission and omission.
> Participants are welcome to post comments and/or corrections to these
> as they see fit.
>
>
> Agenda
> -------
> * Status reports
>
>
> Topic: Status reports
> ---------------------
> gh: Funding thru end of may. shifting times around a bit here at
> affy. gh going up to a greater percentage during this period.
> going down to half time for next month due to house-related work.
>
> Focusing now on cleaning up impl of writeback on igb client. clean
> impl based on ideas sketched out at code sprint in Aug.
>
> Spec issue:
> -----------
> gh: was there a resolution to the feature group assembly conversation
> on email thread.
> aday: died out. so the assumption is: no change.
>
> [A] Ask andrew about feature group assembly resolution, if any.
>
>
> ee: new release of IGB. bug fix then patch release. rapid turn
> around. Exposed need for more throurough testing.
> Specifying multiple urls for get more info links. sources for urls:
> track lines in psl/bed files. Also supporting das files (1 and
> probably 2)
> noticed: feature tag can give feat label and ID. IGB ignores these
> labels, because they seem to be attached to wrong thing. feat in das/1
> is like 'exon' group is 'mrna'. it's the mrna we want the label on,
> not exon where the labels are on.
>
> gh: if people just label parent. names don't have to be unique. id is
> unique uri, name is displayed name. parser isn't looking into that now.
>
> [A] Ed will look into using feature name as label in IGB client
>
>
> sc: Installed updated das2_server code on affy the das/2 server
> (netaffxdas.affymetrix.com). Installed new, efficient version of exon
> array data for hg18 (Mar 2006) assembly on this server. (igb's 'Bprobe1'
> parser, generates new bp2 format files). Probe and probeset data
> loaded fine, but exon/transcript cluster data failed with exception
> about 'Probe_count is zero for <probesetID>'.
>
> gh: problem: the bp2 data format isn't designed for representing
> transcripts/exon just probe. problem in the part that generates the
> bp2 files. can take a look at that.
>
> [A] Gregg will look into steve's Bprobe1 parser error. Needs source gff.
>
> ee: Can you verify that the gff data you are loading doesn't have
> unmapped probes, probe sets? Some are not mapped after lifting from
> previous genome assembly.
>
> [A] Steve will remove unmapped objects in the source gff used for bp2
>
>
> aday: working on UML for integrating the writeback and the read
> features. Also retrieval of dynamic features as well. Sent out example
> query. working on getting them all into a single model, determines
> what do do based on input query.
>
> will impl own block caching rather than apache caching.
> If I see a writeback coming in , can see which types have been
> modified, within each region. can fork off process to re-generate them
> after doing the writeback. will be a lot faster.
>
> Have a flowchart. partway through creating UML classes, functions,
> return types. Using poseidon.
>
> [A] Allen will distribute uml diagrams for das/2 modeling when ready
>
> gh: will locking be a part of that?
> aday: can make sure it's compatible. don't know how much of that to
> impl now.
> gh: useful to think about how to model that too.
>
> [A] Allen will include locking in his UML modelling.
>
> aday: flowchart is pretty generic. can be used by other servers.
>
>
> bo: no das work because of work on manuscript.
> started sourceforge project for das/2 assay "gyrax" (nee hyrax --
> already taken at sf).
> The motivation for this project is to take the das/2 objects in igb
> and make them more generic. This project can host these objects. They
> could then be used for other apps (igb, gyrax, others). Mark
> Carlson in lab is working on the gyrax client.  Could be a nice
> library for use by other apps, gui or not, that are built on top of a
> das server.
>
> gh: parts of the igb objects are tied into genometry model, a separate
> package also. but both of these could be separated from igb.
>
> ee: There was some email on genoviz forum where someone is writing an
> app based on old NGSDK objects, on the help forum on
> sourceforge. problems with >30,000 glyphs. advice: switch to efficient
> glyph versions (special drawing alg if children are too small to see).
>
> gh: Lots of caveats...There is code that hasn't been touched in a
> while.
>
> gh: question about hardware quote for UCLA
>
> [A] Allen will send gregg hardware quote for UCLA (<$5k)
>
> sc: status of hardware for affy das server upgrade?
>
> gh: plan to order end of oct, should have in place in first two weeks
> of nov.
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From Steve_Chervitz at affymetrix.com  Tue Oct 24 01:17:46 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 23 Oct 2006 18:17:46 -0700
Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 23 Oct 2006
Message-ID: <C162B7CA.22332%Steve_Chervitz@affymetrix.com>

Notes from the weekly DAS/2 teleconference, 23 Oct 2006

$Id: das2-teleconf-2006-10-23.txt,v 1.1 2006/10/24 01:15:21 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Gregg Helt, Ed Erwin
  UCLA: Allen Day
  Dalke Scientific: Andrew Dalke


Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda
-------
* Status reports
* Spec discussion


Status Reports
---------------
[Note: lots of digressions within status reports]

ad: Have been looking at how Tim Hubbard's group is using das/1.

gh: you are acting as our proxy to the uk group.

gh: andreas has been working on das registry.
ad: yes, in use for both das/1 and 2 servers.
gh: am interested in his work to ping servers to test for live-ness.

gh: see my response on das discussion list to Brian Gilman's
message. where to find das/2 servers to hit on. biopackages was not
giving correct answers for sources query.
ee: was true two weeks ago.
aday: just a bug.

gh: we need to get both servers fixed. need an automated way to figure
out when servers are down, such as what andreas is doing with das/1.

[A] Andrew will ask Andreas about live-ness test for das/2 as well.

gh: andrew's validator could be scripted to do this, too.
gh: your validator is not running, btw.
ad: server rebooted, not set up to restart automatically.

[A] andrew will see that his validator server is up (done).

gh: affy server is serving up incorrect xml base now. code is set up
to allow which xml base to use.

[A] steve will fix xml base on affy server

gh: need to use four arg version:
port, data dir, email for maintainer, xml:base
without xml:base, everything goes screwy

gh: Andrew's validator should catch this since xml:base resolution of
capabilities would resolve to local host which would throw an error.
ad: yes.

gh: Andrew: you are focusing on das now?
ad: this week at EBI, then next month focusing on DAS work.

Status (continued)
-------------------
gh: this week - distracted by igb issues, also on 1/2 time this month,
so no new das work to report.

ee: gff3 parser, got feedback from lincoln. adding support for
track lines, several of our parsers there is a diff between the way
igb puts things into tracks and the way the ucsc browser puts things
into tracks. in igb: we put thing into tracks based on source
field. so one file can lead to multiple tiers. in ucsc: everything
below track line goes into one track. Soln: if there are track lines,
do it the way UCSC does it. Otherwise, do it the igb way. Also worked
on coloring by score (affects gff, ed, and one other). Makes it
similar to ucsc. Assumption is white background. It is rigged to be
based on normal foreground and background colors. white = ucsc

Also participated in the java "ask the experts" thing: asked about
swing, but they didn't answer.

gh: das2 style sheets?
ee: yes, how free am I to change that spec?
ad: go for it.
ee: don't want spec to say you need to use certain shaped glyphs --
hard to support. just simple things - colors, labels.

ad: asked uk folks about style sheets, they haven't done anything.
gh: gbrowse (lincoln) uses style sheets for das/1.
ee: the stuff in das/2 come from das/1?
ad: yes, with some changes.
ee: also need to do documentation.

sc: worked on added data for currently unsupported arrays on the Affy
DAS/1 server to the quickload directory. Got some requests for mouse
assembly aug 2005, RG-U34 rat arrays. Didn't update the annots.txt
yet, so IGB users won't know they are available.

[A] steve will update affy quickload annots.txt

sc: ideally, this should be automated.
gh/ee: could possibly have IGB detect these without needing to update
an extra file. But there was no standard way to read directory
contents.

gh: chp files have no genomic location for probe sets, so igb needs
to look this up, likely via das/2 server. primary way for people to
look at results in igb.

sc: did some work on loading exon array annotations into das/2 server
using gregg's new bp2 format (reported last time). Didn't see any
justification for the "probeset with zero probes" error it threw.

[A] gregg and steve will look into bp2 format parsing issues

[A] gregg will put in order for new hardware for affy das server

aday: porting gff3 into writeback server as an alt format for loading
data in. Email thread with Ed - ambiguities in the gff3 specification

[A] Allen will forward email to list.

aday: some communication with lincoln's group, re: validator. I need
to create some sample gff3 docs to make sure validator can parse them
all. will adding support to parser in bioperl (likely).
Re: alignments: target and source have to be stranded, length of one have
to be equal to or less than the one it's aligned to, etc.
No work on server uml. hold off until spec is finalized before
committing to uml model. Eg., fasta response not mentioned, broken
hyperlinks, no response from Andrew.

gh: fasta?
aday: refered to but not described. properties response mentioned but
not described. fasta has been replaced by segments, properties
gone. See email on list.

sc: sequence retrieval command used to return fasta format, hence the
fasta request. this has been replaced with segments, but spec not
updated.

gh: property capability?
aday: yes. not sure how to proceed yet.

[A] Andrew will fix/respond to issues raised by Allen.

gh: another spec issue: last code sprint I didn't like semantics of
range feature filters, I eventually caved to majority. caveat: I
wanted an optional attrib in types doc to say: "here's a type but you
can or cannot use it in search filter."  I.e., optionally restrict which
types you can use in those filters. If false, it indicates to client
it shouldn't use it as a searchable thing.
ad: if it does anyway?
gh: server could throw an error
ad: or not return any results of that type?
gh: ok
ad: reason for this? is there a better word than 'searchable'?
w/r/t the problem domain.
gh: the reason: I want people to search for 'genscan transcripts' not
'genscan exon' because of how we decided to do range queries.
ad: not sure why someone would want to do this.
gh: it was agreed on at last code sprint...

[A] gregg will write up use case for range feature filters underlying his
need


ad: Regarding parent and child bidirectional feature pointers: I'm
willing to say that there's no need to assemble features dynamically
on streaming approach. so we can get rid of parent or child
relationship. make it more like gff3 to have parent link only.

gh: worried about not having full closure. could get parents that don't
know about child. if you have child, do you then have to have every
parent in the response?

ad: I thought we required it? if there is a feature then all features
in that group must be returned.
ee: never a fan of specifying both parents and children. can lead to
mistakes - not compatible. andrew says parsing is more difficult...
ad: when processing input you know when done with a feature
group. this is useful.
if no one impls it why have the overhead?
ee: impl doesn't seem difficult
gh: my impl doesn't catch cycles. still have to do cycle check
regardless if it was bi-directional.
ad: can't find a simple algorithm for doing it.
gh: keep children around. check if tree is complete. bidirectionality
allows me to crawl tree.
ad: you don't check for cycles or multiply rooted trees.
ee: just assume there are not such problems.
ad: I don't like bogus data.

ee: my gff3 parsing, I wait until end to assemble things.
ad: as mine does, too. worried about extra fields means more
possibilities of breaking things. bad data.
ee: should be able to detect bad data.
ad: duplicate links means you can't assemble from one but not
other. most people will not check both.
gh: main justification was to get complete feats before end of doc.
lincoln was the one who wanted this ability.
ad: several ways to do it. eg. contained feature elements with all
children, spanning tree, etc.

ee: catching loops is hard, need to wait till end.

gh: let's wait till lincoln comes in.

[A] Everyone will revisit bidirectional parent-child pointers with Lincoln


Other issues:
-------------

ad: Regarding Brian's question from email, the xml document he sent.
gh: my reply: document was otherwise correct but xml:base was wrong.
ad: also: lowercase close types element at end.

ad: know anything about brian's deadline mentioned by lincoln?
gh: no.

[A] Someone will send Brian pointer to Andrew's validator.

ee: das/2 impl is not usable by igb now. need to fix top-level
document.

gh: we really need an automated way to know when server is having problems.

gh: conf call with Andreas and other's in UK? can set up a conf call to
talk about registry. Also coordinate mapping - when one system is the
same as the other. ties into registry stuff.

[A] Gregg/Andrew maybe will have conf call with Andreas while Andrew is in
UK


From dalke at dalkescientific.com  Tue Oct 24 09:17:58 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 24 Oct 2006 10:17:58 +0100
Subject: [DAS2] das2 diagrams, questions
In-Reply-To: <5c24dcc30609210030k5324378fy18990dc41a1f1b1e@mail.gmail.com>
References: <5c24dcc30609210030k5324378fy18990dc41a1f1b1e@mail.gmail.com>
Message-ID: <57ca007c161fd08f104c8bb87e4127ac@dalkescientific.com>

Allen:
> I have a few questions, mostly targeted at Andrew, regarding the 
> current
> HTML version of the spec on the biodas.org site.  It hasn't been 
> updated in
> about 5 months, and looks pretty out of date.

Strange.  The last changes were in August.

> * Is the HTML document in sync with the "new_spec.txt" document in CVS?

It should not be.  That was a text document I was working on back in
Jan/Feb as part of the updated to the current version of the spec.  I've
removed it from CVS.

(Even though I know it's CVS, my fingers keep typing "svn" :)


> * There is mention of a "fasta" command, and its fragment is linked 
> from the
> ToC of the genome retrievals document, but it does not appear in the
> document.  Does this command exist?  My understanding from conference 
> calls
> is that the sequence/fasta/segment/dna stuff has all merged into the
> "segment" response. Is this correct?

That is correct.  There is a segments request.  Passing "format=fasta"
to a segment request returns the sequence in FASTA format.

I didn't catch that line when I was doing the changes.  I've removed it
from CVS.

> * The "property" command seems to have disappeared.  Is that correct?  
> Are
> property keys no longer URIs?  Also the "prop-*" feature filters could 
> be
> better described, it is not clear to me if they are meant as some sort 
> of
> replacement for the property command.

The property command has disappeared.  Notes are at
   das2-teleconf-2005-11-28.txt

It was replaced by two things.  One is the key/value PROP table, which 
is
meant to store simple string data.  It should be considered to be
user-editable, eg, as a property sheet.  The "prop-*" commands are used
to search that table.

The other the non-DAS namespace'd XML extensions.  For example,

<FEATURE ...>
   ...
   <PROP key="gene_region_length" value="5398" />
   <fly:map xmlns:fly="http://flybase.org/"
      physical_map="4: 26,994..32,391[-]"
      cytogenetic_map="4: 101F1--102A1" />
</FEATURE>

In this case there is no default search mechanism.  Instead the server
may declare that it implements a map-specific search extension to the
DAS query language, or a new search interface, and clients which 
understand
the extension can add support for it.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Oct 24 14:03:54 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 24 Oct 2006 15:03:54 +0100
Subject: [DAS2] XML-RPC based DAS2 validator
Message-ID: <4c0809629f5d0e26693547964e86d6c9@dalkescientific.com>

I've added an XML-RPC service to the DAS validator.  Andreas will be 
able
to use it to verify new DAS2 entries in his registry.

The entry point to the XML-RPC server is
   http://cgi.biodas.org:8080/RPC2/

The trailing "/" is important - use ".../RPC" and the server will do
an HTTP redirect to ".../RPC/", which not all XML-RPC clients 
understand.

At present the server implement a single RPC method named 
"validate_url".
It takes two positional fields.  The first is the required URL to 
validate.
The second is the optional document type to validate against.  If not 
given
then the server will attempt to guess.

The response is a list of 2-element tuples.  In each pair the first is
the severity level and will be one of
  "info"
  "warning"
  "error"
  "fatal"

"fatal" means the validator normally should not continue.  I can 
override
that, which I do in the XML-RPC service in order to generate more 
messages.

"error" means the result does not meet the spec but the validator will
continue checking, at least in the normal case.  (That too is 
user-defined.)

"warning" is for things which are suspicious but not wrong, like using
"application/xml" instead of the DAS2 content-type, or having a uri 
field
with an empty content.  (This is legal; it refers to the document 
itself.
It's just strange and likely indicates an error in the server.)

The "info" is for niggling details, like that the server guess the 
document
type (in the case of application/xml response) by looking at the tag for
the top-level element.


Here's an example in Python's interactive shell.  I'll first make a 
proxy
to the remote server

 >>> import xmlrpclib
 >>> server = xmlrpclib.Server("http://cgi.biodas.org:8080/RPC2/")

then call the new method with a single parameter; the URL to validate.

 >>> server.validate_url("http://das.biopackages.net/das/genome/human/")
[['info', "Assuming doctype of 'sources' based on Content-Type"]]

That's a list with a single element containing the (severity, message) 
tuple.
The info statement came because it guessed the document type based on 
the
content-type from the server.  I can specify the document type directly
and skip that warning statement

 >>> server.validate_url("http://das.biopackages.net/das/genome/human/", 
"sources")
[]

Here's an example of validating a server with the wrong document type, 
to show
what the error message look like.  I've added newlines so the results 
aren't
all on one string

 >>> 
server.validate_url("http://www.dasregistry.org/registry/das1/sources", 
"types")
[['fatal', "Received Content-Type 'application/x-das-sources+xml', 
expected 'application/x-das-types+xml'."],
['fatal', "Expected element '{http://biodas.org/documents/das2}TYPES' 
but got '{http://biodas.org/documents/das2}SOURCES' at byte 41, line 2, 
column 2"],
['error', 'element "SOURCES" from namespace 
"http://biodas.org/documents/das2" not allowed in this context at byte 
41, line 2, column 2']]
 >>>


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Wed Oct 25 17:42:32 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 25 Oct 2006 18:42:32 +0100
Subject: [DAS2] DAS2 validation service
Message-ID: <fc8c7c650fbc7ce088867a216e8303e2@dalkescientific.com>

I've updated the DAS2 validation service a couple of ways.
One was to improve the error handling, eg, point it to slashdot.org
(not XML), slashdot.org/blahblah (404 - not found) or to
blahblah.blah (host does not exist) and it reports an error
instead of raising an exception.

There was a problem of sorts with the XML-RPC server.  I chose
XML-RPC yesterday because I thought it would be dead simple to use
in any environment.  It's old, stable technology.  Andreas tried
a few Java XML-RPC clients and found there were various hard-to-resolve
dependencies.  Eg, the most modern one requires Java 1.5 but his
system runs 1.4, and the older one requires some XML DOM parser
which isn't included with the system and proved hard to track down.

Rather than struggle to make that work, I've added a new HTTP
interface for automated validation

The URL is
   http://cgi.biodas.org:8080/validate_url

It has a required parameter, "url", which is the URL to validate

%curl 'http://cgi.biodas.org:8080/validate_url?url=http://slashdot.org/'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://slashdot.org/">
  <MESSAGE text="Unknown Content-Type 'text/html'." severity="error"  
/><MESSAGE text="expat: mismatched tag at byte 1794, line 29, column 3"  
severity="fatal" />
</DAS_VALIDATION>

It has an optional parameter "doctype" which is the document type to  
expect


%curl 'http://cgi.biodas.org:8080/validate_url?\
url=http://das.biopackages.net/das/genome/human/;doctype=sources'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://das.biopackages.net/das/genome/human/"  
doctype="sources" />

In that last case there were no messages.


The XML document is

<DAS_VALIDATION url="URL-used-for-the-validation"  
doctype="the-document-type"? >
   <MESSAGE severity="one of info, warning, error, fatal"
            text="the error message" />  *
</DAS_VALIDATION>

A note about the doctype.  If the server could not get the document then
the validation will not have a doctype even if you gave it one.

%curl  
'http://cgi.biodas.org:8080/validate_url?url=http://slashdot.org; 
doctype=types'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://slashdot.org">
  <MESSAGE text="Received Content-Type 'text/html', expected  
'application/x-das-types+xml'." severity="fatal" />
  <MESSAGE text="expat: mismatched tag at byte 1794, line 29, column 3"  
severity="fatal" />
</DAS_VALIDATION>

If you tell it the wrong doctype and it gets something in XML then it  
assumes the reponse is in the given doctype

%curl  
'http://cgi.biodas.org:8080/validate_url?url=http:// 
das.biopackages.net/das/genome/human/;doctype=types'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://das.biopackages.net/das/genome/human/"  
doctype="types">
  <MESSAGE text="Received Content-Type 'application/x-das-sources+xml',  
expected 'application/x-das-types+xml'." severity="fatal" />
  <MESSAGE text="Expected element  
'{http://biodas.org/documents/das2}TYPES' but got  
'{http://biodas.org/documents/das2}SOURCES' at byte 41, line 3, column  
2" severity="fatal" />
  <MESSAGE text="element &quot;SOURCES&quot; from namespace  
&quot;http://biodas.org/documents/das2&quot; not allowed in this  
context at byte 41, line 3, column 2" severity="error" />

If no input doctype is given then it will guess at the doctype based on
analysis of what it got from the remote server

%curl  
'http://cgi.biodas.org:8080/validate_url?url=http:// 
das.biopackages.net/das/genome/human/'
<?xml version="1.0" encoding="utf-8"?>
<DAS_VALIDATION url="http://das.biopackages.net/das/genome/human/"  
doctype="sources">
  <MESSAGE text="Assuming doctype of 'sources' based on Content-Type"  
severity="info" />
</DAS_VALIDATION>

This XML should be easy for anyone to parse.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Oct 26 09:06:33 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 26 Oct 2006 10:06:33 +0100
Subject: [DAS2] stylesheets meeting
Message-ID: <22090f570d5179afc3fe71a0768ed2ec@dalkescientific.com>

I met yesterday afternoon with Andreas Prlic, Andreas Kahari and
Eugene Kulesha to get information about their stylesheet needs.
Ed said he would work more on the spec and this should provide
some relevant information.

We ended up talking about the stylesheet using a sort of CSS
approach.  There are selectors (feature uri, type uri, etc.)
and properties (color, glyph shape, ...).  Some of the properties
inherit/cascade and others don't.  There's nothing new in this;
we talked about it during the 2nd sprint.

The details of inheritance prove tricky.  For example, consider

[ Feature A ]   ---- is of ---> [ Type 1 ]
     |
   contains
     |
[ Feature B ]   ---- is of ---> [ Type 2 ]

where each feature and type has a style sheet.  The property
(say "color") for Feature B is determined first by the stylesheet
for Feature B, then that of Type 2.  If still not present,
does it come from the parent(s) of Feature B and the parent's
type?

Given as that requires correct traversal in the face of multiple
inheritance, I'll now argue "no".  Even though this is an
effectively solved problem in OO programming ("C3 method resolution
order", from Dylan and also used in Python, Perl6, and others).
It's complex enough to make it unjustifiable.

The selectors people wanted are:
    - the feature type, based on its uri
    - the feature itself, based on its uri
    - view type, that is, "2D" vs "3D".  Akin to "screen", "paper,
        in CSS.  Andreas P's DAS-based structure viewer uses
        very different stylings ("ribbon", "vdw") than sequence.

Note: only "and" selections are requested.  There seems to be no
need for selection like "features of type T1 which are descended
from feature F2"

Other possibilities are:
    - selectors based on the type ontology uri
    - application-specific styles (but this is probably handled
         best through properties and not though a selector; on the other
         hand, it would enable workarounds for app-specific bugs)
    - level of detail (but Eugene didn't even know this option existed
         in DAS1, so perhaps it's not needed for DAS2)
    - support for overrides in case of stylesheet conflicts (user
       overrides server overrides application, most recent definition
       overrides previous)

For the view and the application selectors a space separated list
seems reasonable, as
    view="2D 3D" ... color as yellow
meaning that for 2D and 3D to draw the feature in yellow.  Or just
leave out the selector.

One question was how to find the stylesheet.  They can be listed in
the SOURCES document but I was thinking they could also be listed
in the FEATURES response, as

<FEATURES xmlns="http:// ... /">
   <link rel="das-stylesheet" type="application/x-das-stylesheet"
          href="http://example.com/stylesheet">


Another question is the format of that selection language.
That was quickly answered: "in XML".

I brought up Ed's comment about (if I understand correctly) making
the shape language a bit more abstract.  For example, in DAS1
there's a GLYPH called "PRIMERS", while the others are names like
"EX" and "ARROW".  The general view is that this level of abstraction
isn't useful.  Andreas Prlic summarized it nicely as (reworded) "the
goal of a stylesheet is to make thing concrete".  Though perhaps an
SVG-style set of drawing commands may be useful.

That said, there may be a few things which need a more domain-specific
name.  The example which came up is in color.  EBI has "contig blue"
as a color name.  Are there other colors like that?

On the topic of colors, the desired colors are the CSS color names
(though in-house they also have the X11 names) and the CSS-style
#color #selection, as #0FF for cyan.  The #RGB and #RRGGBB color
names are sufficient.  Other CSS variation, like rgb(255, 0, 0) and
rgb(10%, 45%, 82%) are not needed.

In the meeting I mentioned alpha/opacity values in CSS as #RGBA and
#RRGGBBAA.  In writing these notes up I see that CSS does not support
that syntax.  Alpha is a "wouldn't it be cool if .." feature and not
one which is needed or specifically requested.

I outlined support for more complex font information for DAS2.
Feedback here say that's not important.  There's no desire to change
the font size, style, etc.  Nor desire for super/subscript, underscore,
italics, bold, condensed, etc.

I asked about standardizing the drawing model so there is more
consistency between different viewers.  For example, if there is
a glyph and a piece of text, where is the text drawn in relationship
to the glpyh?  Does the height of the glyph include both?  There
was no desire for this.

On the other hand, a current user-specified option is where to
draw the text, which corresponds to a stylesheet override.

What they want is support for plots and color gradients.  See the
"Gradient" and "TilingArray" entries at

http://www.ensembl.org/Homo_sapiens/contigview?conf_script=contigview; 
vc_start=25422500;vc_end=25447499;region=17; 
add_das_source=(name=Gradient+url=http://das.ensembl.org/ 
das+dsn=hydraeuf_00001350+type=ensembl_location_chromosome+stylesheet=y+ 
score=c+fg_merge=a+fg_grades=50+fg_data=l+fg_max=310+fg_min= 
-143+active=1);add_das_source=(name=TilingArray+url=http:// 
das.ensembl.org/ 
das+dsn=hydraeuf_00001350+type=ensembl_location_chromosome+stylesheet=y+ 
score=s+fg_merge=m+active=1

I can think of several ways to handle that.  One is to declare a feature
for the entire chromosome, as

<FEATURE id="asdfasdf">
   <LOC segment="../chromosome1" />
   <extension:tiling_data href="http://somewhere/else" />
</FEATURE>

and viewers can use some agreed upon protocol to get the right data from
somewhere/else.

Another is

<FEATURE id="asdfasdf">
   <LOC segment="../chromosome1" />
   <extension:tiling_data>
R0lGODdhOABkAPMAABq15RaU14za5O3391660P////7//vz+/gAAAOCP4XJgv// 
10AAAAACP4XRM
j+F1ICwAAAAAOABkAAAE/xDJSau9OGtZuv9gKI4kyZVoqorn6r4sAs90S9+pje8x7/e/ 
YEcn3BGL
tyNyply+ms4VNJqTUZPWKzOrfXK70i+4OvaWXdOzJ60usNXvc7w8H9fB925eu7/ 
2qX9RgU6DS4VI
h0WJQotBjT+PPpE8k0ZibSGVOJpYlm5DkJdhblaiMH1ZFKGba6Cmo1gTQ4GNspuvSZS4mJw1 
u229
W5gowae/ 
OwYDBmZUAwIBBAMHsE7OANcA0dPExzMHzwAB4tkC2ybdMALX4uMBAgTmQEXJ7AHW2NnK
I48D9QIGyQhgczdgHzoVBwiwK+dhgMAA5OKtOZiin7h/IL4pJCgvyDd3EtQ/ 
HJBm0Em8Ac4KqqiU
7Nm4aGSqqau3juG5ag8hEpwpoAQ/ 
gQNhFpipsuMPA+rWgfSQEEDPkkWahhP6YUC2kLOQOJyKtR8A
rKS0PsQY4hk8qEK2Xiva8JrNTBRRfMNGVSO5m0je4SMANBtfsGGR2A23ly9buDIFvAPKV8Bh 
xCZR
OlOMEvDEMwYOaPYZd0XmyYv5EngLgt9iiISvjU6JVgg4fAN1ulsGuYjFdkrZUS3dGYVLcY0d 
aybZ
Oqpmy8WH8VaOl3lt5x+KMYMevTegDdiza7cQAQA7
   </extension:tiling_data>
</FEATURE>

with an agreed upon definition of how to interpret the in-line data.
But for the entire genome this could be rather big.

Another is to break it down into parts, as

<FEATURE id="a00001">
  <LOC segment="../chromosome1" range="0:10000">
   ... data for the first 10,000 bases ...
</FEATURE>
<FEATURE id="a00002">
  <LOC segment="../chromosome1" range="10000:20000">
   ... data for the second 10,000 bases ...
</FEATURE>
   ...

There is already the need for displaying images on the display, but
the current use is to click on a point to bring up an image and not
showing the image as a glyph.  The current solution is a hack,
embedding HTML in the NOTE field.  Only a couple of HTML elements are
supported.

This can easily me moved into a property or a local extension
in DAS2.

If viewer does not understand one of the extensions, what does
it display?

There are two things in DAS1 which I don't know well enough to ask
reasonable questions.  One is the BUMP, which I think specifies if
multiple glyphs of the same type may overlap.  I think Eugene said
they wanted more control over that, like limiting to at most 5
overlaps.

Another is the GROUP, which in DAS1 was used to merge multiple
feature types into a single track.  Quoting from the DAS1 spec

     The canonical example is the CDS, exons and introns of a
     transcribed gene, which logically belong together.

DAS1 has specialized stylesheet language for depicting groups.
DAS2 uses hierarchical features instead.  Does/can DAS2
do the right thing for depicting those?

I think I've covered the major points.  Please chime in if I've
missed anything relevant.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Oct 26 13:46:24 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 26 Oct 2006 14:46:24 +0100
Subject: [DAS2] TYPE[@source] -> TYPE[@method]
Message-ID: <4098539a2681ec2c3243e4008dac7855@dalkescientific.com>

I would like to change the existing TYPE attribute of "source"
and have it use a different attribute name.  Its meaning conflicts
with the other uses of "source" in DAS2.

The best alternative is "method" because (I believe) it is  supposed
to store the same information as the corresponding DAS1 TYPE attribute.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Oct 27 19:56:27 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 27 Oct 2006 20:56:27 +0100
Subject: [DAS2] segments and types
Message-ID: <91244d1fb88f2b49939a9d10f15d2b03@dalkescientific.com>

A couple of observations about what I've seen in existing
DAS1 servers.  Nothing here concerns format changes.

There are four different ways to handle segments:
   1) Don't provide segment information
        "Our clients know the segment because of the id
         so they don't need a segments document"
   2) use "size" (pre-DAS 1.0 spec)
   3) use "start"/"stop" (DAS 1.0 spec)
       - with variations, like "0", "0" meaning the length is undefined
           (and even "1", "0", with a size="2", for one server!)
   4) use a "version" field

The last is mostly used for protein sequences, that I've seen.
Its an aspect of #1 ("9pti" means "bovine pancreatic trypsin
inhibitor structure from PDB") as an abstract identifier, with
the version used to make it concrete ("with the update because
the first release had a typo")  I think it can be encapsulated
in the uri scheme we now use because each version gets it own
identifier, and since the client knows all versions there's no
problem.


The folks at EBI/Sanger (what's the correct collective term;
Hinxton? Genome Campus?) know which servers provide which
systems so many servers don't provide coordinates.

In some cases, like rabbit, the server will generate about
120,000 segments, one for each scaffold.  It takes quite some time
(a minute or more) to generate the output.  In theory this is
static and can be precomputed by the server.

For my own knowledge, when do people want the complete list
of segments?  When do they want the length?  You, yes, you
there, in front of the computer.  When do you you want to
use it?

Let me stress -- this is not a request to change anything.  I
would like to know for my own sake, for writing the documentation,
and for how much emphasis to put on this for the validation.

As another observation, the Sanger/EBI servers also don't
do much with the types document. Some don't even handle the
request.  Eugene said that no one had asked him to add it.
It's there now (thanks Eugene).

I think this is because most of their servers only had a single
type and the solution was "display everything."  They are
running into difficulties with this for a few new servers and
will be need type support, and type filter support soonish.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Oct 27 20:01:01 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 27 Oct 2006 21:01:01 +0100
Subject: [DAS2] das1->das2 proxy adapter
Message-ID: <a10d51062eb5731fa6f828d9937f1d86@dalkescientific.com>

As part of my effort to make sure DAS2 supports at least what
DAS1 can do, and to simplify migration from DAS1 to DAS2,
I have over this week developed a partial proxy adapter.  It's
a DAS2 server which translates the request then forwards it
to a DAS1 server (including the "segment" and "overlaps"
feature filters).

It takes the results and reformats them into DAS2 format.  I
had used a template approach for this but that proved slow for
for large responses.  I rewrote the code so I generate the XML
by hand, which also gives me a chance to put in a lot more
validation code for DAS1.  The goal there is to ensure that
I catch all the extensions people added to DAS1.


					Andrew
					dalke at dalkescientific.com


From ed_erwin at affymetrix.com  Mon Oct 30 22:26:38 2006
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Mon, 30 Oct 2006 14:26:38 -0800
Subject: [DAS2] das1->das2 proxy adapter
In-Reply-To: <a10d51062eb5731fa6f828d9937f1d86@dalkescientific.com>
References: <a10d51062eb5731fa6f828d9937f1d86@dalkescientific.com>
Message-ID: <45467C1E.1000705@affymetrix.com>

Thanks Andrew,

That sounds really useful.  It might be nice to try to run the current 
NetAffx DAS/1 server through this translation and see what comes out the 
other end.  How would we need to do that?  Do we download your code and 
run it ourselves, or will you have some server that we can pass the data 
through?

Ed


Andrew Dalke wrote:
> As part of my effort to make sure DAS2 supports at least what
> DAS1 can do, and to simplify migration from DAS1 to DAS2,
> I have over this week developed a partial proxy adapter.  It's
> a DAS2 server which translates the request then forwards it
> to a DAS1 server (including the "segment" and "overlaps"
> feature filters).
>
> It takes the results and reformats them into DAS2 format.  I
> had used a template approach for this but that proved slow for
> for large responses.  I rewrote the code so I generate the XML
> by hand, which also gives me a chance to put in a lot more
> validation code for DAS1.  The goal there is to ensure that
> I catch all the extensions people added to DAS1.
>
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2
>