[DAS2] Notes from DAS/2 code sprint #2, day two, 14 Mar 2006

Wed Mar 15 16:37:51 UTC 2006

Notes from DAS/2 code sprint #2, day two, 14 Mar 2006

$Id: das2-teleconf-2006-03-14.txt,v 1.1 2006/03/15 16:47:50 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E.
  Sanger: Andreas Prlic, Thomas Down
  Dalke Scientific: Andrew Dalke (at Affy)
  UC Berkeley: Nomi Harris (at Affy)
  UCLA: Allen Day (at Affy)

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Agenda:
----------
See Andrew's email. Here's a summary.

* segment ids
* coord systems and how to handle

[Gregg is out, Andrew is leading the teleconf.]

ap: ad proposed changes re: coords and capabilities i think is not
really needed. the question is do annotation servers need to provide to
link to reference servers back. If the link is apparent from, c

ad: summary: moving coord element inside capabilities element (one
part of 4 things mentioned). the reason: coords and capabilities are
tied together. They refer to the same thing. E.g., you need know which
of the segments are tied to which coords.

ap: annotation server does need to, it can find the reference server
by the coordinates.

ad: if you have local coords, and you want to point to a local server,
how do you specify that this segment corresponds to these coords.
ap: you should have a reference server that speaks the coords you want
to annotate.
td: if you have your own assembly you have your own coord system,
ad: yes, and i set up my own ref server for it.

ad: if I have mult coords, won't I have multiple segments? isn't there
a 1:1 relationship between coords and segments?
ap: I think many:many.... wait
td: each segment is a member of one coord system, a coord system
contains many segments.
ad: andreas has features, some annotated on scaffold, some annotated
on chromosome. So, you need the ability to have two segments provided
by server.
ap: coords should contain segment capabilities, i.e., the other way
around.

ad: proposing to have a uri to id the coords, capapbility should have
a field to say the coord uri is 'this'
mailed out the idea to have a unique identifier for coords.
keep them separate now, have the ability
sc: optional?
ad: yes only needed if you have mult coord systems.

ad: like features and feature type. segment is saying it's of that
type

ad: will add optional id to the capability, so that you can figure out
what the segments are.

in proposal this am,
1) timestamp to coord info (optional) -- use case: sort by most recent
coord system for a given build.
2) unique id for the coord (

ap: this will be useful for searches as well. can request only results
from a particular coord system. (see email discussion this am)
td: server alignment btwn human and mouse, you can say whether you are
referencing human or mouse just by specifying coord system.
ad: also two different human assemblies.

ap: I have to leave now.

Topic: Segment identifiers email

td: segment had a name and url form id so that feature server doesn't
have to give a concrete url for the seq of chrm22, nice for
lightweight server sans sequence. getting rid of ability to reference
sequence by name instead of url breaks this. You need a concrete url
if you just want to serve features on a sequence.
You end up having to rewrite urls rather than saying this feature is
attached to chr22 in xxx coord system.

ad: one thing gregg and I discussed, the fact that url is by itself an
opaque id, you have to resolve it someway, http, or something else
too. You can use any mechanism you want to turn the name you want.
ad: in segments list, if you have your own local copy. Your segments
section says my local copy is
td: you need a segments capability. I can't have a server that uses
only features capabilities.
ad: if you have your own segments.
if all your features are described using standard names/ids, no you
don't need a segments capability.
td: ok, my assembly is human build 35, and feature lives on chr22.
ad: yes. every place you see optional alias attribute link back to
primary id of segment, that id can be anything.
td: arbitrary string scoped by the coord system, which now has a uri
id string.
ad: yes. and it's also globally unique, not scoped just by coord
system .

td: I don't see what's wrong with ....
ad: we were discussing yesterday having diff names for the same
chromosome. chrI vs chr1.
td: that can be addressed using aliases
ad: alias of field provides a synonym table for what you map locally
to a global id. 
td: you're saying the global ids have to be universally unique even
when taken out of the coord system
ad: yes. feat server providing feats from two diff coord systems, you
need a way to distinguish one segment from another segment, in a
global sense.
td: I don't totally understand cases involving mult coord systems. How
do I find out which of three possible coord systems a given segment
came from?
ad:
td: all clones in embl system. could be a lot.
ad: your client will have to know how to look up the right one.
if you have one coord system that has all your clones, you have to do
the look up anyway to know where to display the features from the
various clones.
td: suppose looking for gene names: you get back a feature on clone
AL19823. I want to start from that feature and build a meaningful
display. So  I need to work out what coord system this feature lives
on. If my server speaks multiple coord systems, one for all embl
accessions and gi ids, I have to test for membership in the set.
My server could put the coord system id on each feature. Would be
optional for servers only attached to one coord system.

ad: right. Andreas also wants coord uri part of feature filter. Could
add it to the feature filter.
td: yes. give me all genes called xyz. Do you always want to limit to
one coord system?
ad: I see your point. Having to search

ad: New thing called title for humans to read.
Also proposed inside, overlaps, contains so they don't

td: to avoid a nastiness in query lang, I like that. Removes an issue
that scares me about having urls in the query. pathological case:
client has a good reason to retrieve features on part of a two
sequences that have lots of features on. e.g., all cutting sites for
all restriction enzymes. Very high density. If the genome is made of
10kb clones, the user may want to get features that span clone
boundaries. server may do lots of extra fetching that's not really
necessary. 
ad: it's the number of requests that's the issue, same amout of
info. so it's an issue of network overhead.

advantage: makes servers easier to implement since it eliminates
searching partial regions. Some use cases exists, but can be done on
the client side. 
td: seems a shame to lose the capability, but not a huge loss.
the alternative would be to say that you parse the query string left
to right. overlaps=5000-10000; ... puts limits on how server parses.
ad: or we propose a new query interface

ad: this sounds like I should go ahead with segment ids.

ad: using uri vs id (internal link id vs link to something else)
td: seems to be enough impl-breaking changes, not a big argument
either way.
ad: enough changes going on now, but probably won't change much more.
td: if you want to make a small change that's quick to implement, no
objections. Also fine with using id, since all dom stuff about id
refers to things marked id in the scheme, not attrib names. Changing
to uri, won't cause much effect.
nh: like a gobal replace.
ad: in general there's been lots of changes, want people to get
clients/servers going.
ad: spec writing is going slow, would like to show examples that
people can use.
nh: feature parsing can use canned examples.
aday: would prefer to have spec written, trouble with ambiguity
ad: you need to impl before you can figure out how to write it.
nh: server people need full spec, client can use examples

ad: previous slow going since lincoln had little time to work on it.
aday: would like a snapshot, version number. impl after last code
sprint.
nh: don't have time to work on das after this. will just break when/if
allen's server changes.
This just happens when working on developing spec.

ad: the idea is to get code and examples up today.
td: waiting for spec to stabilize a bit.
ad: changes made this week won't have major impact on people's work in
UK?
td: no.

nh: can you provide a changes document?
ad: those would be my emails. a pain.

nh: registry, I was suprised to find a versioned sources in it. won't
there be an explosion of org x versions x server. It provides
convenience
td: as long as it's not thousands and thousands of data sources, it
won't be a problem.
ad: 2k per server x 1000 servers, = 2M
td: if it gets to point where retrieving whole registry is a problem,
we could add capability to restrict what you get.
nh: need human-friendly title for each data source.
would be nice if that explained more to the person who was choosing
that data source (e.g., date).
ad: Andreas' system (web-based) has a description.

Status reports
--------------

sc: adding more data to affy das server, working on building
das2_server code recently checked into genoviz code base by
gregg. Then will work on setting it up on a publically accessible
server at affy.

ee: will be working on style sheets in igb.

aday: spent time on setting up dev environment since laptop died
yesterday. 

bo: got food poisoning -- bad pizza?, was up till 4am.

td: not much das-related stuff yet.