[DAS2] Notes from the weekly DAS/2 teleconference, 23 Oct 2006

Steve Chervitz Steve_Chervitz at affymetrix.com
Tue Oct 24 01:17:46 UTC 2006

Notes from the weekly DAS/2 teleconference, 23 Oct 2006

$Id: das2-teleconf-2006-10-23.txt,v 1.1 2006/10/24 01:15:21 sac Exp $

Note taker: Steve Chervitz

  Affy: Steve Chervitz, Gregg Helt, Ed Erwin
  UCLA: Allen Day
  Dalke Scientific: Andrew Dalke

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

* Status reports
* Spec discussion

Status Reports
[Note: lots of digressions within status reports]

ad: Have been looking at how Tim Hubbard's group is using das/1.

gh: you are acting as our proxy to the uk group.

gh: andreas has been working on das registry.
ad: yes, in use for both das/1 and 2 servers.
gh: am interested in his work to ping servers to test for live-ness.

gh: see my response on das discussion list to Brian Gilman's
message. where to find das/2 servers to hit on. biopackages was not
giving correct answers for sources query.
ee: was true two weeks ago.
aday: just a bug.

gh: we need to get both servers fixed. need an automated way to figure
out when servers are down, such as what andreas is doing with das/1.

[A] Andrew will ask Andreas about live-ness test for das/2 as well.

gh: andrew's validator could be scripted to do this, too.
gh: your validator is not running, btw.
ad: server rebooted, not set up to restart automatically.

[A] andrew will see that his validator server is up (done).

gh: affy server is serving up incorrect xml base now. code is set up
to allow which xml base to use.

[A] steve will fix xml base on affy server

gh: need to use four arg version:
port, data dir, email for maintainer, xml:base
without xml:base, everything goes screwy

gh: Andrew's validator should catch this since xml:base resolution of
capabilities would resolve to local host which would throw an error.
ad: yes.

gh: Andrew: you are focusing on das now?
ad: this week at EBI, then next month focusing on DAS work.

Status (continued)
gh: this week - distracted by igb issues, also on 1/2 time this month,
so no new das work to report.

ee: gff3 parser, got feedback from lincoln. adding support for
track lines, several of our parsers there is a diff between the way
igb puts things into tracks and the way the ucsc browser puts things
into tracks. in igb: we put thing into tracks based on source
field. so one file can lead to multiple tiers. in ucsc: everything
below track line goes into one track. Soln: if there are track lines,
do it the way UCSC does it. Otherwise, do it the igb way. Also worked
on coloring by score (affects gff, ed, and one other). Makes it
similar to ucsc. Assumption is white background. It is rigged to be
based on normal foreground and background colors. white = ucsc

Also participated in the java "ask the experts" thing: asked about
swing, but they didn't answer.

gh: das2 style sheets?
ee: yes, how free am I to change that spec?
ad: go for it.
ee: don't want spec to say you need to use certain shaped glyphs --
hard to support. just simple things - colors, labels.

ad: asked uk folks about style sheets, they haven't done anything.
gh: gbrowse (lincoln) uses style sheets for das/1.
ee: the stuff in das/2 come from das/1?
ad: yes, with some changes.
ee: also need to do documentation.

sc: worked on added data for currently unsupported arrays on the Affy
DAS/1 server to the quickload directory. Got some requests for mouse
assembly aug 2005, RG-U34 rat arrays. Didn't update the annots.txt
yet, so IGB users won't know they are available.

[A] steve will update affy quickload annots.txt

sc: ideally, this should be automated.
gh/ee: could possibly have IGB detect these without needing to update
an extra file. But there was no standard way to read directory

gh: chp files have no genomic location for probe sets, so igb needs
to look this up, likely via das/2 server. primary way for people to
look at results in igb.

sc: did some work on loading exon array annotations into das/2 server
using gregg's new bp2 format (reported last time). Didn't see any
justification for the "probeset with zero probes" error it threw.

[A] gregg and steve will look into bp2 format parsing issues

[A] gregg will put in order for new hardware for affy das server

aday: porting gff3 into writeback server as an alt format for loading
data in. Email thread with Ed - ambiguities in the gff3 specification

[A] Allen will forward email to list.

aday: some communication with lincoln's group, re: validator. I need
to create some sample gff3 docs to make sure validator can parse them
all. will adding support to parser in bioperl (likely).
Re: alignments: target and source have to be stranded, length of one have
to be equal to or less than the one it's aligned to, etc.
No work on server uml. hold off until spec is finalized before
committing to uml model. Eg., fasta response not mentioned, broken
hyperlinks, no response from Andrew.

gh: fasta?
aday: refered to but not described. properties response mentioned but
not described. fasta has been replaced by segments, properties
gone. See email on list.

sc: sequence retrieval command used to return fasta format, hence the
fasta request. this has been replaced with segments, but spec not

gh: property capability?
aday: yes. not sure how to proceed yet.

[A] Andrew will fix/respond to issues raised by Allen.

gh: another spec issue: last code sprint I didn't like semantics of
range feature filters, I eventually caved to majority. caveat: I
wanted an optional attrib in types doc to say: "here's a type but you
can or cannot use it in search filter."  I.e., optionally restrict which
types you can use in those filters. If false, it indicates to client
it shouldn't use it as a searchable thing.
ad: if it does anyway?
gh: server could throw an error
ad: or not return any results of that type?
gh: ok
ad: reason for this? is there a better word than 'searchable'?
w/r/t the problem domain.
gh: the reason: I want people to search for 'genscan transcripts' not
'genscan exon' because of how we decided to do range queries.
ad: not sure why someone would want to do this.
gh: it was agreed on at last code sprint...

[A] gregg will write up use case for range feature filters underlying his

ad: Regarding parent and child bidirectional feature pointers: I'm
willing to say that there's no need to assemble features dynamically
on streaming approach. so we can get rid of parent or child
relationship. make it more like gff3 to have parent link only.

gh: worried about not having full closure. could get parents that don't
know about child. if you have child, do you then have to have every
parent in the response?

ad: I thought we required it? if there is a feature then all features
in that group must be returned.
ee: never a fan of specifying both parents and children. can lead to
mistakes - not compatible. andrew says parsing is more difficult...
ad: when processing input you know when done with a feature
group. this is useful.
if no one impls it why have the overhead?
ee: impl doesn't seem difficult
gh: my impl doesn't catch cycles. still have to do cycle check
regardless if it was bi-directional.
ad: can't find a simple algorithm for doing it.
gh: keep children around. check if tree is complete. bidirectionality
allows me to crawl tree.
ad: you don't check for cycles or multiply rooted trees.
ee: just assume there are not such problems.
ad: I don't like bogus data.

ee: my gff3 parsing, I wait until end to assemble things.
ad: as mine does, too. worried about extra fields means more
possibilities of breaking things. bad data.
ee: should be able to detect bad data.
ad: duplicate links means you can't assemble from one but not
other. most people will not check both.
gh: main justification was to get complete feats before end of doc.
lincoln was the one who wanted this ability.
ad: several ways to do it. eg. contained feature elements with all
children, spanning tree, etc.

ee: catching loops is hard, need to wait till end.

gh: let's wait till lincoln comes in.

[A] Everyone will revisit bidirectional parent-child pointers with Lincoln

Other issues:

ad: Regarding Brian's question from email, the xml document he sent.
gh: my reply: document was otherwise correct but xml:base was wrong.
ad: also: lowercase close types element at end.

ad: know anything about brian's deadline mentioned by lincoln?
gh: no.

[A] Someone will send Brian pointer to Andrew's validator.

ee: das/2 impl is not usable by igb now. need to fix top-level

gh: we really need an automated way to know when server is having problems.

gh: conf call with Andreas and other's in UK? can set up a conf call to
talk about registry. Also coordinate mapping - when one system is the
same as the other. ties into registry stuff.

[A] Gregg/Andrew maybe will have conf call with Andreas while Andrew is in

More information about the DAS2 mailing list