[DAS2] Notes from DAS/2 code sprint #3, day one, 14 Aug 2006

Mon Aug 14 18:30:35 UTC 2006

Notes from DAS/2 code sprint #3, day one, 14 Aug 2006

$Id: das2-teleconf-2006-08-14.txt,v 1.2 2006/08/14 18:28:47 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  Dalke Scientific: Andrew Dalke
  Panther Informatics: Brian Gilman
  UAB: All Loraine
  UCLA: Allen Day, Brian O'Connor

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Agenda:
* Status reports, including what you want/need to focus on for this
  sprint, progress from last sprint.

Status Reports
---------------

gh: have done writeback work. IGB can create curation, post to
biopackages writeback server, das/2 client can see curations. no
editing yet. client can edit own data models, can't post those
edits. to work on ID mapping stuff: client can't accept newly create
ids from server. currently just holds onto temporary id's.
IGB client has had one or more release since last.

priorities - mainly writeback for client.

ls: continue working on perl client interface to das/2, not functional
at present. need to backout changes since last sprint. das/2 tracks in
gbrowse. About 10hrs needed.

sc: have been working on keeping data on Affymetrix public das servers
up to date, dealing with memory issues cause by increasing amount of
array data to support.

Gregg has new efficient format for modeling exon array features with
lower memory requirements. Will work on getting the das server to use
it. Long-term plan is to remove our das/1 server and just have das/2,
easier to use and maintain. Complete transition will take time though.

Have continued working to automate the pipeline for updating the
affy das servers. Have a new page that lists available data on the
servers, currently manually created but plan to automate.

ad: web dev in python, taught course on that. plan: getting python
server up, to experiment with writeback. updating spec as per a couple
of months ago. 

gh: andrew will make spec a top priority, grant is funding for that.

bg: tasked to take das/2 data and produce set of objects to use within
caCORE system at NCI. Have objects for das/2 data and service. can
retrieve das/2 data from affy server. present in simple web page.
Using java and ruby.

gh: good week to ask questions as you flesh out the impl.

ee: gregg and I will put out new IGB release this week. can work on
style sheets (left over from last time). Or can build a gff3 parser
into IGB (lots of excitement!).

al: two things: demo applications for self and collaborators and das
newbies. retrieve genomic locations for targets of affy probe sets and
then retrieve promoter regions upstream.

gh: promoter data in das2 server?
al: can just say 500bp upstream of gene. not identifying control. Just
retrieve seq to pipe into control analysis.

Second one: meta analysis, results from diff groups for associated
phenotypes. Input: list of markers, output: annotations associated
with these. Statistical analysis. Ultimately obtain candidate genes
associated with markers. Some preliminary work on obesity that looks
promising. 

[A] Steve will help Ann convert fly probe set ids into genome locations.

Goal is to write something that can do random sampling of gene
annotations. ideal world: das server gets region, returns gene ids and
go ids. Less ideal: just get genes within the peaks (from association
studies). 

bo: doing rpm packaging for the mac (tgen). so people can set up das2
server on a mac. update rpm packages with results of work this
week. clean up bug queue on biopackages server impl, bringing it up to
spec. can talk about analysis part of server.

internal hirax client for retrieval of assay data. communication with
server is out of sync.

Spec issues:
------------
gh: want to focus on writeback. wants full xml features rather than
mapping document. 

aday: work on writes as well as deletes. Impl 413 entity request too
large adding this for requests that exceed some size threshold (10kb,
100kb) if at or below, OK.

gh: need to coord with me on writeback, I focus on client writeback,
you on server. Editing is ok. Deletes are harder.

Other Issues:
-------------
gh: Contact peter good about funding. Extending from 2yr to 3yr. talk
with lincoln and suzi about plans for next grant.

sc: status of bugzilla open bugs on spec?

[A] Someone should go through and update bugzilla list for spec

bg: version field.
gh: not too understandable. at last sprint, two freezes, the version
tells which v of spec freeze the server is using. assumption is that
now the servers are using the most recent spec. If they're not
compliant, please let us know.

affy server: won't give back a list of all features. requires an
overlaps and types restrictor.

biopackages: should be good with latest spec.

bg: sources document, source tag has version. if you do a query like
types, also has version?  No.

ad: sources document: worm 161 (data source). capabilities describe
things like writeback support for v161, but not v160.

bg: that version seems to have different sematics given query. biggest
issue was parsing and populating my object model.

gh: coordinate subelement in version elem. has a version attr. my
client does not deal with coord stuff. meant to make sure that annots
from two servers are refering to same coords, so you can overlay
annots from different servers. my client is using version URIs for
that instead. 

bg: other issue: in order to know what server you're hitting, you have
to know name space of doc, which has base URI. XML base in segments
query. xmlns biodas.org/das2. to have tracability in documents you
receive, you as implementer must track urls, converting relative to
absolute.  can be a problem when hitting 5 different servers.

gh: my obj model (client) has model of server with root url of the das
server, sources objects which has xml base of each source.

bg: you could get back a 404 from xml:base. Perfectly
apropriate. server could put whatever it wants in xml:base. currently
it's the document. 

ad: we're using the xml:base spec, so you can put xml:base on any node
you want to. construct full url by.

gh: in our schema is it clear which attribs are resolved by xml:base?
ad: no.

bg: would like to see one big document with every element, not several
different files. relaxNG isn't best format. would like a w3c XSD that
defines the elements. from coders standpoint, don't have to go and
look at 5 different docs. Have to have multiple windows up, figure out
how they are connected to each other. semantics within each query, who
is calling what.

ad: I gave brian one. using trang to spit it out.
bg: trang is not best xml schema writer. I could work on this.
why do you use relaxNG?
ad: I can read it and understand it. there were good examples.
bg: I can autgenerate code that is in XSD, soap and other wservices
stuff does that for you. Can generate a parser, point it a uri, get
doc, generate a parser and object model.

ad: parser would break if server returns extra attributes.
In spec there are some extension points. can put any element that is
in a separate namespace. I know how to do that in relaxNG, but not in
XSD.
bg: you just have to add another xmlns. define an extension point with
that namespace.

ad: should be able to resolve it into one.

bg: Three items.
1. will ask w3c people about XSD to relaxNG.
2. semantics confusion.
3. xml:base appropriate to supply a 404 if client was dependent on that
   attribute.

ad: version tag is problem if there are duplicates. should be changed
so there are no duplicates. can build parser on rng
bg: it's experimental, alpha s'ware. don't want to use for production.

bg: when you put a relative url inside a xml:base.
ad: resolvable via http, or in abolute url.
gh: if you resolve it up to the top level doc, then use the url of the
document itself. whether clients actual do this, depends on impl.
say to implementers, we could state that the top level document should
resolve to absolute url.  we wanted to say, "Das/2 uses xml:base
spec. period."

bg: put this in the spec, how you want it to be used.
ad: don't like saying, "we use xml:base with these additional things"
bg: can put off for now.

ls: In my library when I see a url and can't resolve, I fall back to a
hard coded url.