[DAS2] Notes from DAS/2 code sprint #2, day five, 17 Mar 2006

Mon Mar 20 17:27:59 UTC 2006

Hi Folks,

I will join the DAS2 call a little late today (no more than 10 min). I'm 
assuming that we're on?

Lincoln

On Sunday 19 March 2006 23:54, Steve Chervitz wrote:
> Notes from DAS/2 code sprint #2, day five, 17 Mar 2006
>
> $Id: das2-teleconf-2006-03-17.txt,v 1.2 2006/03/20 05:05:22 sac Exp $
>
> Note taker: Steve Chervitz
>
> Attendees:
>   Affy: Steve Chervitz, Ed E., Gregg Helt
>   Dalke Scientific: Andrew Dalke (at Affy)
>   UCLA: Allen Day, Brian O'Connor (at Affy)
>
> Action items are flagged with '[A]'.
>
> These notes are checked into the biodas.org CVS repository at
> das/das2/notes/2006. Instructions on how to access this
> repository are at http://biodas.org
>
> DISCLAIMER:
> The note taker aims for completeness and accuracy, but these goals are
> not always achievable, given the desire to get the notes out with a
> rapid turnaround. So don't consider these notes as complete minutes
> from the meeting, but rather abbreviated, summarized versions of what
> was discussed. There may be errors of commission and omission.
> Participants are welcome to post comments and/or corrections to these
> as they see fit.
>
> Agenda:
>
> * Status reports
> * Writeback progress
>
>
> Status reports:
> ---------------
>
> gh: This is the last mtg of code sprint. For the status reports, focus
> on where you are at and what you are hoping to accomplish post-sprint.
>
> gh: working on version of affy server that impls das/2 v300 spec for
> all xml responses. sample responses passed andrew's validation.
> steve rolled it out to public server.
>
> updated igb client to handle v300 xml.
> worked more on server to impl v300 query syntax using full uri for
> type segment, segment separate from overlaps and inside.
> only impls a subset of the feature query. requires one and only one
> segment, type, insides.
>
> hoping todo for rest of sprint and after:
> 1. supporting name feat filters in igb client
> 2. remove restrictions from the server
> 3. making sure new version of server gets rolled out,
> 4. roll out jar for this version of igb. maybe put on genoviz sf site for
> testing purposes.
>
> bo: looked at xml docs that andrew checked in, updating ucla templates
> on server, not rolled out to biopackages.net, waiting to make rpm,
> hoping to do code cleanup in igb.
> getting andrew's help running validator on local copy of server.
>
> gh: igb would like to support v300, but one server is v200+ (ucla),
> one at v300 (affy) complicates things. so getting your server good to
> go would be my priority.
>
> bo: code clean up involves assay and ontology interface.
>
> gh: we're planning an igb release at end of march. as long as the code
> is clean by then it's ok.
>
> aday: code cleanup, things removed from protocol. exporting data
> matrices from assay part of server.
> validate sources document w/r/t v300 validator. work with brian to
> make sure everything is update to v300. probably working on fiter
> query, since we now treat things as names not full uri's.
>
> ad: what extra config info do you need in server for that? can you get
> it from the http headers?
> gh: mine is being promiscuous, just name of type will work. might give
> the wrong thing back, but for data we're serving back now, it can't be
> wrong.
>
> ad: how much trouble does the uri handling cause for you?
>
> gh: has to be full uri of the type, doing otherwise is not an option
> (in the spec).
> ad: you could just use name internally, then put together full uri
> when you go to the outside world.
>
> ad: I updated comments in schema definitions, updated query lang
> spec. string searches are substring searches not word-substring
> searches.
> abc = whole field must be equal
> *abc = suffix match
> abc* = prefix match
>
> previously said it was word match, but that's too complicated on
> server.
> worked with gregg to pin down what inside search means.
>
> I'm thinking about the possibility of a validating proxy server,
> configure das client to go through proxy before outside world, the
> server would sniff everything going by.
> Support for proxys can enable lots of sorts of things w/o needing
> additional config for each client.
>
> gh: how do you do proxy in java? i.e., redirect all network calls to a
> proxy.
> bo: there's a way to set proxy options via the system object in the
> java vm. can show you some examples of this.
>
> aday: performance.
> gh: current webstart based ibg works with the existing public das/2
> server, [comment pertaining to: the new version of igb and a new
> version of the affy das/2 server].
>
> ad: when will we get reference names from lincoln?
> gh: should happen yesterday. poke him about this.
> would be really nice to be able to overlay anotations!
>
> The current version of igb can turn off v300 options, and then ti can
> load stuff from the ucla server. The version of igb in cvs now can hit
> both biopackages.net and affy server in the dmz. and there's
> hardwiring to get things to overlay. temporary patch.
>
> ee: two things:
> 1. style sheets. info from andrew yesterday. looking over that. will
>    discuss questions w/ andrew.
> 2. making sure that when we do a new release of igb in a couple of
>    weeks (when I'm not here) that it will go smoothly . go over w/
>    gregg, steve. lots of testing.
> made changes in parser code, should still work.
>
> sc: I updated jars for das/1 not das/2 on netaffxdas.affymetrix.com.
> ee: it's the das/1 I'm most concerned about.
>
> sc: installed and updated gregg's new das/2 server on a publically
> accessible machine (separate box from the production das/1 and das/2
> servers on netaffxdas.affymetrix.com).
> Also spent a time loading data for new affy arrays (mouse rat
> exons). this required lots of memory, had to disable support for some
> other arrays. [gregg's das servers load all annotations into memory at
> start up, hance the big memory requirements for arrays with lots of
> probe sets.]
>
> [A] gregg optimize affy das server memory reqts for exon arrays.
>
> gh: we' gotten a lot done this week. I think we have a stable spec.
>
> gh: serving alignments, no cigars, but blat alignment to genome as
> coords on mrna and coords on the genome. igb doesn't use it yet, but
> it's there.
> ad: xid in region elements.
> gh: we haven't exercised the xids. so 'link' in das/1 is equivalent to
> xid in das/2?
> ad: yes. i believe
> gh: if you have links in das/1. without links it can build links from
> feature id using a template. This is used for building links from
> within IGB back to netaffx, for example.
>
> Topic: Writebacks
> -----------------
>
> gh: writebacks haven't been mentioned at all this week.
> ad: we need people committed to writing a server to implement it.
> gh: we decided that since ed griffith would be working on it at
> Sanger, we wouldn't worry about it for ucla server.
> bo: we started prototyping. locking mechanism. persisting part of a
> mage document. the spec changed after that. andrew's delta model. a
> little different from what we were prototyping.
> actual persistence will be done in the assay portion of our server.
> gh: grant focuses on write back for genome portion, and this was a big
> chunk of the grant. ends in end of may or june.
>
> ad: delta model was: here's a list of add, delete, modify in one
> document. An issue was if you change an existing record, do you give
> it a new identifier?
> gh: you never modify something with an existing id, just make a new
> one, new id, with a pointer back to old one. Ed Griffith said this a
> month ago. I like this idea. but told we cannot make this requirement
> on the database. but very few dbs will be writeback, so it's not
> affecting all servers
>
> ad: making new uris, client has to know the new uri for the old
> one. needs to return a mapping document.
> if network crashes partway through, client won't know mapping is and
> will be lost.
> gh: server doesn't know if client got it. it could act(?) back.
>
> gh: if a response from http server dies, server has no way to know.
> ad: There could be a proxy in the middle, or isp's proxy server. The
> server sent it successfully to the proxy, but never made it to the
> client.
>
> gh: how is this dealt with for commits into relational dbs? same thing
> applies
> ad: don't know
> ee: could ask for everything in this region.
> ad: have a new element that says 'i used to be this'.
> bo: you do an insert in a db, to get last pk that was issued. client
> talks back to server, give me last feature uri that was provisioned on
> my connection. so the client is in control.
>
> sc: it's up to client to get confirmation from server. If it failed to
> get the response after sending in the modification request, it could
> request that the server send it again.
>
> ad: (drawing on whiteboard) two stage strategy, get a transaction state.
>
>      post "get transaction url"
>     <---------------
>     post (put?) to transaction URL
>     ------------->
>     can do multiple (if identical)
>        ---------->
>        ---------->
>     Get was successful and here's transformation info
>     <---------------
>
> ad: server can hold transformation info for some timespan in case
> client needs to re-fetch.
>
> gh: I'm more insterested in getting a server up than a client
> regarding writeback. complex parts of the client are already
> implemented (apollo).
>
> gh: locks are region based not feature based.
> ad: can't lock...
>
> gh: we can talk about how to trigger ucla locking mechanism.
> bo: did flock transactional locking the suggested in perl
> cookbook. mage document has content. server locks an id using flock,
> (for assay das).
> gh: to lock a region on the genome, lock on all ids for features in
> this region.
> bo: make a file containing all the ids that are locked. flock this
> file.
>
> ad: file locking is frought with problems. why not keep it in the
> database and let the db lock it for you. don't let perl + file system
> do it for you. there could be fs problems. nfs isn't good at that. a
> database is much more reliable.
>
> bo: I went with perl flock mechanism since you could have other
> non-database sources (though so far it's all db).
>
> [A] steve, allen send brian code tips regarding locking.
>
> gh: putting aside pushing large data chunks into the server, for
> curation it's ok if protocol is a little error prone, since the
> curator-caused errors will be much more likely/common.
>
> ad: UK folks haven't done any writeback work as far as I know.
> gh: they haven't billed us in 2 years. Tony cox is contact, ed
> griffith is main developer.
> ad: andreas and thomas are not funded by this grant or the next one.
> gh: they are already funded by other means.
>
> ad: if someone want's to change an annotation should they need to get
> a lock first or can it work like cvs? do it if it can, get lock,
> release lock in one transaction.
> ee: that's my preference.
>
> ad: if every feature has it's own id, you know if it's...
>
> ee: some servers might not have any writeback facility at
> all. conflicts will be rare.
>
> [A] ask ed/tony on whether they plan to have any writeback facility
>
> gh: ed g wanted to work on client to do writeback, don't know who
> would work on a server there.
> ad: someone else, can't remember - roy?
> gh: unless we hear back from sanger, the highest priority for ucla
> folks after updating server for v300, is working server-side
> writeback.
>
> gh: spec freeze is for the read portion. the writeback portion will
> have to change as needed.
> ad: and arithmetic? ;-)
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu (516 367-5008)