[DAS2] Notes from the weekly DAS/2 teleconference, 5 Jun 2006

Chervitz, Steve Steve_Chervitz at affymetrix.com
Tue Jun 6 00:54:21 UTC 2006


Notes from the weekly DAS/2 teleconference, 5 Jun 2006

$Id: das2-teleconf-2006-06-05.txt,v 1.2 2006/06/06 00:52:14 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed Erwin, Gregg Helt
  CSHL: Lincoln Stein
  Dalke Scientific: Andrew Dalke
  UCLA: Allen Day, Brian O'connor

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Topic: status reports
---------------------

gh: waiting to hear back from peter good re: grant. He thinks we have
a decent chance of additional funding (bridge funding), would fund
till new grant kicked in in June 2007 with suzi as a PI (revised
grant). Total funding would still be less than the amount originally
requested for this grant.

Definitely will have funding through september this year. our grant
folks beefed up the dalke consulting and cshl accounts.
Will let people know re: funding past september when I find out.

Impl wise, not much done in last 2 weeks. about to start testing
writeback from the client side. write new features back to das/2
server (the easiest thing to test).

New realease of IGB is out now with a testing curation feature.
Go into preferences to turn it on. (Ed worked on this)

ls: sent in example of das2 features request that returns
alignments. discovered that i needed to add a new attribute to the LOC
tag. have to indicate that alignments use the cigar gap
string. whether you gap the ref or target sequence and indicate which
one's which. there's a target attr in LOC that indicates which one is
the target (a little assymetrical).

gh: you can get both target and query?
ls: yes. the cigar string usind d and i, you have to indicate which
one is which.

another thing: das/2 project for caBIG is pulling das2 into the
core, has a kickoff meeting this wednesday. I will be on that
meeting. we'll reiterate goals, timeline with adopters (Wistar
institute) 

gh: it's been a while since we talked about that. is the intent to
have das2 servers that can sit on top of caBIG?

ls: no, das2 clients via cdBIG. we won't need it for a couple of
months, hoping we'll be able to use the biopackages das2 server to
serve out the data. Is this reasonable?

aday: yes.

ad: nothing new to report. settling in Sweden. plan to incorporate
Lincoln's things into the spec. server writeback work.

bo: working on hyrax client that retrieves microarray data from
a das server. functional now and is now in
sourceforge. http://sourceforge.net/projects/nelsonlab. uses allen's
formatted output rather than netCDF. can browse ontology annotation
examples. can download. focusses on individual researcher needs in
Nelson lab. plan to do it as a generic plugin, data import tool.

gh: for ontology stuff, any progress with suzi and chris re: how das
ontology stuff will work with center for biomedical ontologies?
aday: no. will touch base with her. we're continuing to operate as
previously. basically just a formatting issue.

[A] allen will contact with suzi re: hooking up das ontology work with NCBO

bo: the document format (XML) right?
gh: i think yes. to me the goal is to have NCBO adopt it
aday: even if they don't we can still link to them
gh: it will take encouragement from you setting that up.
aday: you can load the data brian's talking about, egr format.
doesn't have location
gh: igb should figure it out
aday: 25,000 microarrays are available at egr. ids of probe set
prefixed with the platform. we have a bed formatter, so you can
request in bed to. 
bo: need to add a pulldown for bed.
netCDF is broken now, will fix it. egr is working

aday: genotyping array support in igb?
gh: chromosome copy number output in igb now. gtype outputs into cnat,
which outputs a graph is sgr format. ready by igb.
also have files with locations of snps. should be on quickload
servers. near bottom entries for 10, 100, 500k arrays. nice way to
visualize when zoomed way out.

aday: if you load a bed file with ids, then an egr without
locations. i.e., can bed files be used as identifiers for egr files?
ed: yes
gh: takes up more memory, but is useful.
aday: working with genotyping arrays lately. will produce more files
for it in the next few weeks.
basically doing lots of microarray data processing now.

gh: das2 writeback server?
aday: xml processing code is there, not rigged up to a webserver
yet. can partially translate into insert statements.
gh: can it send back mapping of temp ids to final?
aday: in progress
gh: i can start testing creation of features now.
aday: can put it as a standalone cgi script, can point it to any url.
gh: the beauty of rest.

[A] allen will put writeback server on public url

ed: new version of igb last week (4.38). automatic reloading via jws
not working for some clients.
bo: can delete your cache from jws console.
ed: shortcut from desktop sometimes causes problems with updates.
starting to look at better loading info about colors from different
types of data files. seque's into stylesheets from das. and other
igb-related things.

sc: installed new version of affy das2 server on the dmz. Has gregg's
temporary fix for xml:base, but currently doesn't rely on it since
there's no url rewriting happening. need to test it out and do same
thing on production server. Also wrote script to make deploying
servers easier (eg., posting new jars, re-starting server via single
make command).

[A] steve will test gregg's xml:base fix on dev server

Topic: BOSC submission for a talk
---------------------------------

ad: planning to go, waiting to determine expenses
aday: will go if main conf talk is accepted. otherwise not.
gh: sounds like its up to you (dalke)
ad: this is what biodas is, tools, how things fit together, how rest
is cool.
few submissions now (ISMB and BOSC). only 4 now. usually 12 by now.
ad: bod for bosc is discussing what to do
gh: do you need help from any of us for bosc submission?
ad: no. will send you copies to review it.
gh: I gave a talk last year on das. will send it to you as a
reference.
sc: part of talk can be a progress since then.
cause of the low turnout?
ad: people waiting to see if they are accepted before registering.
ls: for me it's a cost issue. 90% of people who practice bioinfo are in
northern hemisphere. was low in brisbane, will be low in china (rumors
of 2008 ismb in china, can't confirm).


Topic: Code sprint #3
---------------------
gh: how do people feel about having another code sprint? possibly
before or after CSB in august at Stanford.
the last two sprints were very good.

ls: I'm at csb in aug, but right after i'll be on a retreat to work on
a sequencing grant. right before will be on honeymoon.
gh: maybe we need to push it farther out.
ad: will be in europe until 15 july. not in us until february.
bo: definitely at stanford?
gh: no. august seemed like a good time/location.
might make more sense to have a euro-led one.
sc: august is a big vaction time for europeans
ad: july is for swedes.

ad: there's a late breaking poster session for ismb
gh: das poster?
ad: need to decide on cost today if I'm going.

Topic: writeback 
----------------

gh: how far behind is website vs our current thinking. that's what I'm
using for my impl.
ad: doesn't have idea of microdeltas. other stuff is the same.
ls: does it still have the mapping idea which I thought went away
(local to global)? during last codesprint.
gh: it did?

ad: returns back the complete feature with additional attribute.
so instead of a mapping, server returns back all features which
changed, along with attribute: old id ---> new id

gh: whether you delete things that aren't posted in feature when you
submit a new post.
ad: what you post is a complete replacement of what was there.
gh: that verbage needs to be added. doesn't say anything about it.

[A] andrew will add text to writeback spec re: new feat being a complete
replacement

ad: other change: complex features all need a link back to the root
feature. when parsing you can build the parent-part
relationship. otherwise, you do a lot more work to figure out whose in
the same group. 
gh: seems like a hack.
ls: this is not in the current writeback doc?
ad: correct. additional attribute for complex features. affects reads
too (not just writeback)
ls: bidrectional pointers is still there correct? parent -> child,
child -> feature.
ad: that's still there (unlike gff: unidirectional)
if you know the root, it saves you from having to traverse links,
gh: doesn't add that much. may create disagreement, errors between the
parent-child hierarchy.
I don't think the root thing is necessary.

ls: pointer to parent and the root: like a closure across it. don't
see a compelling need, makies it harder to impl.
gh: if its optional, will create other difficulties.
ad: makes it easy to find out where the root is.
ls: just go up until you find no parent. cycles would be a bug. the
issue would be if during reading from remote server, gives you
children first, middle layer, then root layer, will require some
merging of features. depends on data structures. in perl with gbrowse,
it's holding every feat or part of feat is a node in a graph. it never
merges, just updates pointers. after parse finishes, finds everything
without parent and recursively traverses them.

gh: if you want to attach annotations as features while parsing rather
than waiting till parse is done. reference counting. don't think root
thing would help then. still need to figure out do I have all children.

ad: when you get a failure you can throw away just the failures rather
than everything. can count parents and parts as they're coming in.
gh: every feature with no parent is a root.
ad: yes. assuming it comes early.
ls: in general case, you cannot go on and process a feature until you
reached the end of the parse. because you could have multiple
layers. you can say you have found any pair of layers, not everything
in berween. the root ptr doesn't help either. could still be in a
situation where you think you processed everything that belongs to a...
ad: something comes along later "i'm still a part of that group"
gh: every time you get a feature, can add it to the feature tree, can
tell when you're done with group by checking pointers.
ad: ok. not as useful as I thought.

[A] andrew won't add root feat attribute to complex features
[so the latter is actually an 'inaction' item ;-]





More information about the DAS2 mailing list