[DAS2] Notes from the biweekly DAS/2 teleconference, 22 Jan 2007

Fri Feb 2 19:17:55 UTC 2007

[These are from the teleconf from *last* week. Apologies for tardiness.
Next meeting is this coming Monday, 5 Feb.
DAS grant folks: Don't forget to write up your goals and timeline thru May!
-Steve]

Notes from the biweekly DAS/2 teleconference, 22 Jan 2007

$Id: das2-teleconf-2007-01-22.txt,v 1.1 2007/02/02 19:14:21 sac Exp $

Teleconference Info:
   * Schedule:         Biweekly on Monday
   * Time of Day:      9:30 AM PST, 17:30 GMT
   * Dialin (US):      800-531-3250
   * Dialin (Intl):    303-928-2693
   * Toll-free UK:     08 00 40 49 467
   * Toll-free France: 08 00 907 839
   * Conference ID:    2879055
   * Passcode:         1365

Attendees:
  Affy: Steve Chervitz, Ed Erwin, Gregg Helt
  UCLA: Allen Day

Note taker: Steve Chervitz

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Agenda
-------
 * das/biosapiens meeting in hinxton
 * status reports

das/biosapiens mtg:
ee and gh: interested in attending. hang out with Andreas and other
das folks at Sanger about it.

gh: timelines and milestones for remainder of das/2 grant extension
(ending in May). what parts we are going to complete by then and
timeline to achieve. Publication: a paper should be submitted by end
of grant period. open access journals: PLoS or Biomed central. on the
affy side, publication on igb and underlying data models. highest
priority is a paper that focusses on the spec, uses examples

[A] everyone outline goals/timeline till end of May, due before next
teleconf

aday: yes, doing that today. I will be working
for the das2 grant for next two weeks full time for gregg. doing an extreme
re-factor, putting into a new framework to do writeback, block-level
caching, going through other docs, UML diagrams, emails, etc.

ee: will go over goals at Affy this week.

gh: general issue: forward from Suzanna re: ontology URI's, NCBO
starting to work on it to have their ontologies by addressable as
URIs. goog idea to be in contact with suzi or chris over there, join
in a meeting. 

aday: subscribes to that mailing list, no traffic on last week.

[A] Allen will ping NCBO folks again about URIs

gh: client UI, want to organize annotation types into something more than a
flat list, using an ontology, present a tree or DAG. now it seems the
ontologies are not sufficient to get the full graph I want. diff cell
lines, rna expression expts for some, chip-chip for some, methylation
for some. some experiments look at nuc rna vs cytoplasmic rna vs
polyA+, etc. not addressed by SO. looking at combinging annotation
field with ontology.

aday: other ontologies may be suitable biochemical molecule and
treatment.
gh: crossproduct ontologies, combinging terms from orthologous
ontologies, on the fly so you don't get combinatorial explosion.
aday: they have ideas about how to do that.
gh: right now das type ontology attrib has to be from SO.
aday: annotate affy at a sample level. many features in the
genome features space.  better to annotate the sample with the sample
treatment then point the features at it. put it as a label on the
genome track.

gh: now have this in DAS: type uri, type ontology, type title, type
method. features are typed.
aday: you want to put all features into same data source, but worried
about typing them differently.
gh: yes. could make them diff versioned sources, but can get
confusing.
aday: i'd do that. each cell line as a diff source.
gh: but trouble: crossproduct of genome assembly x cell line. need to
map to multiple assemblies.
gh: can use the title and/or method to break down things further

sc: allen, are you considering the mged ontology?
aday: doesn't mesh well with what obo has. it's verb oriented. describing
treatment, but participants are free text fields, uris.
sc: put uri's in the free text field?
aday: yes. can put, perterbation on cell line, drug name, conc, units,
time dration, units of time are all free text. for units you want to
draw from ontlogy (netCDF), for drug, accession from RxNorm, or other
ontology or reference to index it. they don't constrain that,
basically punting. not good integration.

gh: my issue is: how do you present this when there is alot of different
data on
server, present to user. search space interface. default=flat list of
types, but you can search on them. what I do for interface to
transcriptome db, tighter integration with their rdbms, allows
searching on all fields being displayed. for this case, cell lines
could be tagged with a property of cell location=nuc, cytosol,
both. then search based on those when you're trying to narrow down the
types you're looking at from the server.

gh: soon when i add graph data, i'll end up having hundreds of data
types. from das/1 server experience, eg at ucsc - tremendous amount of
data available as data types, several hundred, hard to find what you
want. but in the browser they are organizaed by categories,
alignments, etc. more navigable.

Other Topics:
---------------

sc: when can we announce the completion of genome retrieval spec?
gh: I need to add cigar string example. I' vote for mid Feb, in
advance of das/biosapiens meeting.

sc: other thing: move global seq ids into biodas.org wiki page and set
that wiki page up as main page for biodas.org.

[A] complete retrieval spec and wikification of biodas.org by mid-Feb

Status:
-----------

ee: igb manual. feedback welcome
gh: looking for gff directives for igb you can put into a gff file.
ee: not in manual. hidden from user for now. better to use our gff
parser.

gh: working on two things, and focus on das for next two
weeks. handling chip format data from affy expression console. das
related because those file formats lack genome location info. using
das to look up the info to merge into that data. using a simple
hierarchy of probesets and probe, but we need something more
sophisticated. looking at a 4 or 5-level hierarchy, now represented in
gff2, embedding hierarchy stuff in the tags.

ee: gff3 can represent the hierarchy. we didn't have a parser for it
back then.
gh: could insert that into the pipeline at some point.
my intent: a more efficient binary format.
would be a good test in das of multi-level hierarchies. das server
should then output it in a multi-level das2xml document. there hasn't
been an example of multi-level > 2 hierarchy yet (affy or
biopackages).
issues now is: how to render something that's >2 levels of hierarchy?
for now, just rendering the last two levels.

gh: second thing: support in affy server and igb to access graph data
via das/2. am amassing transcriptome data, no serious processing of it
yet. for each expt, a set of 90+ chips, 300mill data points, 16 expts
+ replicates. lots of data. getting affy das server to do smart
indexing of that data. server right now expects there to be 1 file for
a whole annotation type, but these graphs are too big, must be broken
by chromosome as well. Need to address this issue. Solution is in
sight.

sc: helping configure the affy das/2 server, working with gregg to
support transcript-level annotations for exon arrays (required for CHP
file support). biodas.org wikification, looking into moving the global
seq id page into biodas.org space. No new progress on page since Dec.

aday: gmod meeting in sd last wed, heard ucsc folk talk about genome
browser stuff, new things being added. have < 20Tb of data, but have
lots of hardware. we have more than that at our lab. we keep images,
all cel files, more recent dat files.
gh: anything else of gmod interest at gmod meeting?
aday: brian osborne was hired doing full time user support, starting by
doing documentation, 21 little projects as part of gmod, 9 selected as
having doc clean up, addition. das was select as one to be documented.
he'll send doc packages to the project site.
sc: yes he's been in contact with me and has offered to help.

Wrapup:
-------
gh: next meeting in two weeks 5 Feb 2007.

Everyone: send in goals/milestones before next meeting

aday: gregg, can we meet earlier than that?