[DAS2] Notes from the weekly DAS/2 teleconference, 11 Sep 2006

Steve Chervitz Steve_Chervitz at affymetrix.com
Mon Sep 11 18:11:04 UTC 2006


Notes from the weekly DAS/2 teleconference, 11 Sep 2006

$Id: das2-teleconf-2006-09-11.txt,v 1.1 2006/09/11 18:10:11 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed Erwin, Gregg Helt
  Dalke Scientific: Andrew Dalke
  UCLA: Allen Day, Brian O'Connor

(sc, aday, bo calling in from Seattle at MGED9 jamboree)

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Agenda:
--------
* grant update
* status reports

Topic: Grant update
-------------------

gh: p good says funding outlook for getting funding for sep '06
to may '07. $250K. not completely official, but more so.
no grant to be submitted in october. still major issues to resolve:
rewriting, pi decision. size was a concern. decision about what to
drop (6 sections). 
ad: new project starting dec/jan for 1 year. Can't work on das/2 past
end of this year. product for chemical informatics.
gh: can you put more time before then. full time? 2-3 mos.
ad: need to look at my schedule. will get back to you

[A] andrew talk with gregg re: increasing his das/2 time committment

Topic: Status reports (and general  discussion)
---------------------

gh: client to do curation in igb, write back to test server. impl
thing I drew on board back at last code sprint. editing
curations. making sure undo/redo capabilities in igb works. will
translate into what writeback needs are. turned off in igb by
default. prefs -> turn on exptl curations. can edit things, but can't
connect to server. must modify code, but don't

ee: gff3 parser. trouble: gff3 files in wild don't follow spec. refseq
website, repository, all three fails in different ways. ucsc mailing
list helped, but it wasn't their files.
aday: failed on validator?
ee: yes
gh: the only request we had
ee: not trying to write a full gff3 parser. just need gene, exon, cds,
mRNA. ignore other lines and it seems compliant. but a second problem:
very flexible exon parent can be mRNA, gene, or nothing. jibes with
igb data model. 
also worked on: released new igb version. graph support handing,
parsing affy files.

ad: flybase files are gff3 compliant, parent/part relationship
requires full file parsing. 800mb file. had to insert marker mid-file
to inform parser.
ee: space reduction during parsing.
they have a recommended canonical rep of gene, but not required to do
it. haven't found an example that follows the rec.
gh: the wormbase stuff should be canonical, since lincoln did gff3 and
wormbase. 
ad: more people writing gff3 than reading
ee: ucsc discussion: grant to support more mod orgs, to include gff3
parser support. 
gh: that's the kind of grant we'd like to fold das grant work into if
we don't do a separate das/2 grant

[A] gregg look into ucsc grant, possibly fold das stuff into it

ad: gff3 -> das2xml converter. some things in gff3 i don't know how to
handle. key-value. Need to figure out why things aren't passing validator.

[A] andrew will write up questions, post to list, discuss there and/or with
lincoln at the next das/2 teleconf.

ad: modeling alignments. need a recommended way to model alignments.
gh: when to use locations vs subfeatures.
aday: why care about gff3?
ee: igb
ad: people need to convert data for das2xml.
aday: need a model mapping doc. we can hash it out next week with
lincoln.

ad: working with berkeley xml database. liking it alot.
gh: also cool: SOLR - java thing built on top of lucene and xml db
stuff. cool thing is that it layers on top of that a rest-ful approach
to retrieving and writing data to a db. thru http urls . queries are
gets all writes/updates/delete are posts.
ad: xQuery 
aday: generalization of xpath
ad: xslt is another generalization.
sc: there was a poster at MGED9 meeting from stanford group using
Berkeley XML db to map between 'flavors' of MAGE-ML, since
organizations use different ways to represent the same thing in
MAGE-ML. Represented the transformation using pairs of xQueries, one
targetting for format A, other for format B. All the smarts about the
format was confined to the xqueries. nice.

ad: I want to get feedback regarding modeling for das2, recommendation to
store
certain data (alignments, gff3).
gh: gff3 - too open ended. lots of stuff can be in there
ad: given flybase, what is the recommended way to post gff3 data.
gh: i can answer your alignments issue, can't do gff3.

[A] andrew will contact folks as needed regarding gff3/flybase modeling
issues:  suzi, chris mungall, lincoln, scott cain <cain at cshl.edu>

Other status:
-------------
sc: no major progress given Netaffx update work, MGED travel. Plan is
to update das/2 server code on affy server, load it with some exon
array design data using gregg's new parser which is more memory
efficient, and test it out. Then we'll need to migrate it off the
das/1 server where the exon data hogs lots of memory, and then migrate
Netaffx links to use das/2.
gh: new box end of october with das grant money.
have run das2 server on 64bit. on 32bit have gotten 8g in single java
process. riva. should be able to get 16g in one process. or have 2x8g

bo: allen updated assay portion, bringing igb ibjects upto date. mark
carlson is updating hyrax client to retrieve microarry data back. he's
taking das/2 client makeing it embedable. eg., into the MeV tool from
John Quackenbush at Harvard (java). should be embedable in igb to
browse celsius to d/l data. plan to have webstart for it.

aday: updating assay portion of server. mage-ml to be inline with
changes. adding/modifying element attribs, lowercase 'uri'. data
loaders to get ncbi data into server for micoarray expts. client lib
in R for talking to das server. requires parsing xml. extremely slow,
uses lots of memory, so
eg., viz bed files in R, genomic location. good plotting support in
R. look at distribution.
regarding writeback server: on hold until you report any
problems. basic stuff is working. let me know.
gh: read part: caching improvements?
aday: no more work on that since jamboree.
public server doesn't have these improvements.
plan to rewrite controller and view part. junk on this end. want to
integrate block mechanism into that as well. not sure when it will
happen.
time estimate: maybe 1-1.5 months with bo and i working half time.
bo: thie rewrite will help a lot.
aday: lots of little things changed, 'segment' etc. server domain
source, capabilities, formats. huge mess. need more looking before i
can get an accurate time estimate for patching vs. rewriting.
think the rewrite wouldn't be that expensive.
gh: machine?
aday: dual core opteron, maybe 16g ram?
load is increasing, may move off to a dedicated server. webserver is
the issue, not db.

Next teleconf: 
--------------
In two weeks. 25 Sep 2006


Special dedication:
-------------------
To those who tragically lost their lives on this day five years ago... 




More information about the DAS2 mailing list