[DAS2] Notes from the DAS/2 teleconference for the code sprint, 9 Feb 2006

Thu Feb 9 19:06:03 UTC 2006

Notes from the DAS/2 teleconference for the code sprint, 9 Feb 2006

$Id: das2-teleconf-2006-02-09.txt,v 1.1 2006/02/09 19:13:39 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  Sanger: Thomas Down, Roy
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris, Suzi Lewis
  UCLA: Allen Day, Brian O'connor

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

[note taker missed the first 5-10 minutes]

Topic: encoded URLs
-------------------

ls: apache bug - unesacped //. must be percent encoded or apache can
run into problems
gh: most people don't bother escaping, we should make this clear in
the spec. every major library has ways of doing this automatically.

[A] update spec to state: contained urls w/in das query urls should be
encoded

Topic: Style sheets
-------------------

ad: see Jan 26/27 email, "style sheet question"
what i described is not the same as what das/1 style sheets supply.
we already have a mechanism
gh: embed ss in types element?
ad: or, new capapbility or link server for a given source.
gh: prefer this
td: easy to have a single style element
gh: would a types elem have ptr to ss or do you query for the
capability?
ad: if no one's interested we don't have to answer the
question. sounds like no one's interested in style sheets.
gh: we'll keep what you have in the spec for style sheets and move on.
ls: what is it? 
ad: yes. style is embedded in type record. it's now on a per-element
basis. 
ls: ok with this. attributes of types. is there a need for a separate
ss? true it mixes presentation with data model. people will look for the
info they need and can ignore.
ls: transition to separate sheets - visual style id pointing to ss
url. same as with html. instead of 'i' tag moved to font style info.

Topic: Writeback
----------------

gh: discussion in progress in uk. how big a change from current
writeback spec?
ad: spec: server does modification to data. this proposal: client can
now do more stuff with the data.
gh: writeback for client is considerably harder, rarer to impl.
ad: issues: can you still do searches for modified data on server?
ls: building objs from bottom up (children, to parent) so everything
has a url.
ad: each feat has parent and a part.
ls: true. temporary id mechanism, response indicates mapping to local
id is.
what happens is: client locks, uploads parents, children with temp
ids, does referential integrity checking, then reports mapping from
temp to local id.
gh: doing http DELETE imposes a constraint
ls: how handling id issue?
gh: you need something to create new, real id
ad: b/c they're in one transaction, server can
ls: delete is a problem because http delete only permits one at a
time. updates a problem too. post that creates new objs allows you to
create multiple new objs at same time, but push and delete only
operate one at time.
ad: at this point don't want to change data model.
ls: so everything will be a post then, under your proposal, for
writeback url.
ad: a single post.
gh: moving from http delete to a
trying to understand how this is a delta model.
ad: only updates things that changed, and listed deletions
ls: fine. writeback, create update and delete sections
td: granularity. not single characters. one feature.
ls: three transactions we previously had, put, post, and delete, and
roll up into a single transaction.
gh: when you send back a feat you ve already seen, do you restate all
the xml for that feature, since otherwise it is deleted?
ad: yes.
gh: would like the unit of ro
ls: this achieves per transaction integrity, since you don't have to
do multiple deletes. the lock idea, had to persist over multiple
transactions to allow for that atomicity.
gh: we need to keep lock so curators can guarantee that nothing
changes underneath them.
td: lock corresponds to a db transaction as well.
ls: no one's impl this writeback so there's no friction against
changing it. i'm fine with it. as long as people don't mind we're
losing a cute feature described in a grant.
gh: what does roy or ed g. think?
roy: have been involved in this. this mirrors some features that otter
does. a good idea. deletes and put aren't big winners, if updating
multiple feats and they refer to each other.
roy: whole xml doc is the transcaction
ls: if anything doesn't make sense, all requests in the writeback doc
are rolled back.
roy: yes. some error messages to understand what might be going wrong.

gh: splits and merges work too? merging one feature from two, or
splitting one transcript into two.
roy: fits in well. get back two ids of new features. otter give a lot
back in the xml after posting the data.
gh: treats id in feat is a placeholder and it sends a real id back to
you. 
ls: your given a temporary placeholder then it give you real id.
might want to put a formal merge and split commands. because in
proposed new system (and old) to split one exon to two, you have to
either delete the original one, or update it to change one boundary
and create a new one. you've lost the ability to keep track of the
original and the two new ones.
ad: feats have place for arbitrary annotations. creational history log
could be maintained.
ls: how upload this to a server. splitting exon into two daughters is
different from deleting and creating two new ones.
ad: no needs this, for future.
gh: it's needed now.
ls: splitting genes into two pieces is important. people want to keep
track of this. formal merges and splits permits this tracking.
gh: my take, prefer fewer verbs as possible. if we can formally define
splits and merges as combos of delets and creates, perfer this.
ls: semantically difficult for server to know that a delete followed
by two creates is different than a split.
td: ancestor id on the features can solve this.
ad: haven't heard about this use case. features have place where you
can stick in new data. database can read it to understand history.
gh: like idea of curational track of ancestors. before, people said
we can't require dbs to do this.
td: optional property
ls: could thread it through feature properties.
ad: this version, or for 2.1?
gh: initial write back must support splits and merges.
[broad agreement]
ls: make sure it will work.
what happens when track of ancestors and the ancestor object disappears.
gh: can't assume a db has identifier for every curation in it's past
state.
roy: weakness of the current otter schema, james is working on a
fix. tag a release and go back to genes as of that release.
ls: acedb had this feature to rollback to older versions of gene
model.
aday: the schem we're using has support to previous version.
roy: tedious. big script, but a good thing to have.
ls: a few hours of more discussion to see what's involved in
supporting tracking curational merges, splits, renames, etc. to make
sure it's the write decision to put it into a curational property of
feature rather than having a formal database merges and split
operations. i'm ok doing it this way if it seems ok.
gh, aday: me too

Topic: NIH grant proposal
-------------------------

gh: i'm the bottle neck

Status reports:
---------------

gh: igb das client still. checked in code. you can get das2 client in
igb poiting to codesprint das2 server. sources, segments, types. no
features yet. working on this today. should go faster today.
ad: sent email to allen about some things about server that don't
agree with spec. properties
aday: features have no properties associated with them. do we need
valtype or href.
nh: a key with no value doesn't make sense. using 'true' if no value.
aday: ok. but need an agreement on what to do for properties with no
associated value or type
ad: can make it so.
aday: now put in empty string
ad: use for both value and href
aday: can't have both.
ad: what's interpretation if you have both?
can take out href part and have value= empty string
nh: client deals with empty value.
ad: leave it as a string
suzi: uneasy about this.
td: it does have a value, empty string.
suzi: some places where empty string doesn't make sense. data gets
dirty. if you're gonna have a tag-value structure, and may or may not
be a value, it's bad. some things are tag-value, some things just have
a value. it seems ambiguous, no guaranteed behavior.
ad: guaratee is for all keys to have a value. can be empty string.
gh: string or empty string is ok
ad: only used for clients who know what it means.
may have to update apollo
gh: if we allow arbitrary xml in features, client will have to
remember this xml or it will disappear.
ls: a huge issue w/ apollo in past. when communicating w/ db's that
have extra stuff, in the xml that isn't on client side data model.
suzi: my take, the client should not have to pass it all through.
nh: it forces client to be a complete database
gh: then the delta writeback
ls: works ok for deletes, updates become an issue
ad: you have to deal with text you don't understand.
ls: you have to keep track of tags you don't understand, other wise
they are deleted.
gh: trade off, simplicity of writeback, and what client has to
remember.
ls: client says: i don't understand it, but i can't delete it.
gh: how hard is it to have an abritrary xml chunk by client?
ls: give it an empty tag to say you want it to go away.
nh: how do you delete things that came in empty and you want to delete
them?
ls: can have attribute="delete me". this creates a burden on server
side. 
[client folks like this..]
decided to keep everything you know know and send it back. round trip
it.
ad: client can throw away what it wants. can go back to server
ls: boomerang.
gh: a variety of ways to make sure the data gets stored.
roy: will be in feature. just hold a pointer to it.
suxi: hard for apollow. passive round tripping is fine.. difficulty is
with deletes. ignoring stuff, don't know what it is. delete a
transcript or whole gene. some of that stuff you don't know what it
is, describes a mutant phenotype. you deleted from genomic record, but
there's other data that shouldn't be deleted. client would have to be
fully cognizant of it, beyond genome sequence features. client now
needs to model all the other data too.
ls: difficult to understand how a client could deal with it.
ad: just xml is a opaque chunk.
why can't client send back full record?
suzi: won't solve the full problem. if annotator said delete it
gh: client says delete that feature. it won't pass back any stuff
underneath the feature. some stuff underneath it that shouldn't be
deleted.
ad: that's what you have back ups for.
suzi: beyond this.
to deal with this, we made deletes be more atomic. had to be handled
at server side, otherwise, we have to put all that knowledge into
client. gets tied to a particular group.
ad: knowledge of what?
suzi: additional information
if you delete whole thing at top, any pass through data is also gone.
gh: not hard on client, just what does the server do with that?
suzi: this is why it belongs on server side. knows what matters and
what doesn't matter. if you don't want clients tied to a particular
db. that solution will be inadequate. we had to put the info on the
client and make the operations as fine grained as we could.

ap: writeback issues have been discussed. suggest to take this up
tomorrow. 
ad: could someone write up why a client couldn't just track the tings
that it wanted? then we can consider.

Status reports, cont'd
----------------------

roy: zmap client. can get sources and types from server. parsing it
creating internal objects. can't draw features yet. long discussion
about write back today.
ad: validator stuff
td: talking about writeback.
ap: working on registry. first das/2 server. distinguish between das/1
and das/2 via accession points.

brian: rpm build for allen's server. will post today at
biopackages.net
suzi: spoke to chris about web services for ontology. he will talk
with allen. thing about ids to deal with. also, if we do a web service
that isn't das like, it should be doable. should be able to get the
terms. also, if we want to have stop codon replacement, you also have
to say what position, what it's replaced with (uridine). how is this
done in das spec?
gh: can you post to the list?
suzi: yes. 
aday: will raise writeback issues as well.
suzi: small point mutations, indel, substitution (base and position)
aday: nearly got apache config file done, impl new std error
documents, 300, with error document.
nh: more apollo client progress. haven't dealt with types yet.
ee: igb improvements.
sc: pipeline for populating affy das server with array data. completed
pipeline for exon array design data.