[DAS2] Notes from DAS/2 code sprint #2, day four, 16 Mar 2006

Steve Chervitz Steve_Chervitz at affymetrix.com
Thu Mar 16 20:38:13 UTC 2006

Notes from DAS/2 code sprint #2, day four, 16 Mar 2006

$Id: das2-teleconf-2006-03-16.txt,v 1.1 2006/03/16 20:45:48 sac Exp $

Note taker: Steve Chervitz

  Affy: Steve Chervitz, Gregg Helt
  CSHL: Lincoln Stein
  Dalke Scientific: Andrew Dalke (at Affy)
  Sanger: Andreas Prlic
  UC Berkeley: Nomi Harris (at Affy)
  UCLA: Allen Day, Brian O'Connor (at Affy)
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Status reports

nh: apollo work, reading the registry, saving
capabilties. modifications to code that was based on prototype das
adaptor. Generally lots of under the hood work to bring it up to spec.

bo: diff functionality between allen's server biopackages.net server
and andrew's samepl xml. Updated templates in allen's das server to
match andrew's sample xml.

ad: worked on validation server, all stuff is in cvs. the
http://cgi.openbio.org:8080 server is built off cvs, just check out
and rebuild. 

gh: worked on affy das2 server and client up to current spec based on
whatever the rnc documents say (schema doc) as for xml. no chance to read
andrew's email on query syntax, will incorporate that today.

sc: got latest version of gregg's das/2 server up at affy. serving
hg17, hg16, dm2. Updated code that the das1 server is using based on
latest genoviz jars. Getting some errors when loading data for new
affy arrays. Investigating.

aday: minor bug fixes for spec v200. exporting assay data as different
ucsc browser can viz expression data out of das server in bed format.
das viewer can view as egr format. working on single chip at a time.

ls: here's a great use case for you: there's a cshl fellow creating dna
spectrographs of oligo frequencies presented as audiographs. can really
tell diffs from coding vs non-coding, CpG triplets, microsatellites
harmonics, big matrices of floating point data tied to genome.
consider this a challenge to das to serve this up.
my postdoc sheldon mckay is serving this up give you heatmap back
given a genomic region. new glyph for spectrographic data

aday: format netCDF is good for this, but clients out there don't
vizualize it.
gh: would like to support netCDF in igb. not sure if this is default
way to represent qualtitative data for das.

[A] allen will send lincoln pointer to netCDF.

aday: netCDF is great for cross-lang, cross platform support.
gh: people are pushing wiggle format to ucsc, so we don't want to
restrict to just netCDF.

aday: my refactor yesterday allows treatment of these as templates.
gh: how do this via region query in das?
ls: feature query, tag says here comes binary data, each column
corresponds to a base (or maybe a scaling factor to indicate # of bp
per column). 
tag says here comes binary qualtitatilve data, scale is 1:1.
gh: better way is to use alternative content format stuff (already in
spec for types)
ls: if you do feat request and don't filter by type, you'll get a mix
of binary and non binary.
aday: not in genome domain, genome/sequence the fetch to assay service
to get quant data. then do intersection to find overlap.
performance goes out window if you make the query too complex.
fine to do just two fetches.

ls: how indicate scale for numerical scale?
aday: good question. units are not encoded now.
ls: spectogrphic data one value per window where window is 100 bp
aday: so two diff units
window size, amplitude value and frequency, and that's in four
channels for the bases. we're representing as 4 matrices.
aday: one matrix per channel.many formats don't support n-dimensional
data. only 2d at most.
ls: in das1 did base64 encoded string in the notes. It worked.
gh: we can't require all clients to know how to interpret it.
This is why we have the alt content functionality...

[A] das should support dense numeric data across regions, format specified
by the existing alternative format mechanism

Topic: Spec Freeze

ls: can we talk about feezing spec?
ad: what good will it do?
ls: allow us to code to a fixed spec. you freeze spec, people write
code for a defined period of time, during that time we compare notes,
then make changes, freeze, and repeat.
ad: concerned there hasn't been enough work since the changes in jan/feb.
ls: now that i'm 'on the other side of the fence' of spec writing,
i'd like to see it not change, and have time to make an informed view
of what it's strengths and weaknesses are.
ad: haven't gotten feedback about my questions, until the
codesprints. two months ago, only now being addressed.
ls: these issues don't become pressing until we start
implementing. this is why we do code sprints.
ad: worry because there's been no extensive data modeling for
ls: can do a 1 month freeze
gh: comfortable with 1 mon freeze of schemas as they are in the rnc's
now. issues will come up.
ls: announce on biodas.org - march 18th das/2 is frozen for 1 month.
gh: we'll have to live to ambiguity with how server does certain
ls: hence the time limited 'trial' freeze.
ad: would have like people to write code from last feb so I could get
ls: you very much improved the spec. grateful for what you've done. I
wasn't getting feedback when I was writing either.
gh: validation website is great for implementers, rather than having
to read a spec document everyday.
ad: schemas aren't going to change after today (pm). would like to
clear some things up about filter language, today?
ls: most urgent freeze

[A] spec will freeze as of end of today (3/16/06, PST) for one month.

Topic: Feature filters

ad: feature filters is most important, and how do we define global
names? schema is a simple change - which is req'd and which is
optional but for impls makes a big diff.
ls: global is req'd and local is optional.
ad: who comes up with global names
ls: first person to do it has naming rights.
people have been able to do it for the ensembl service.
ad: I need documented names
gh: it means you don't know whether two names are the same thing until
this document comes out.

ls: filter language?
ad: gregg needs inside and contains,
- type and exact type: das type or ontology type?
ls: das type
gh: uri attribute of the type
ad: that type or it's subtype makes no sense for das types
ls: it's just an exact match. client can use ontology to get a series
of types
ls: should be an exact match, does not traverse ontology.
client should ask user: do you want all exons or a specific type of
ls: client goes through ontology as necesary

[A] drop exacttype, type now has exacttype semantics

Topic: XID, feature ids

ad: xid in features. no one used yet. gives a ref to some other
db. all it is is a url/uri. feels like there should be more info
ad: primary name field for feature, feels like should be name
ls: name is human readable. title would be ok
ad: but feature filter is called name searches name and id fields
ls: this is correct behavior, you can do a fetch on the url/uri
this is ok.
ad: the name feature searches title and alias.

gh: if feature id is resolvable and you resolve it, there's no
guarantee it gives back a das2xml document.
if the feature uri is resolvable, and you fetch it, you will get back
a das2xml document right?
can you put uri in the feature query?
aday: feels that having auto-generated names
ad: do all features have a human readable name?
gh/ls: optional
ad: why would you want to put a url in a name field?
gh: rdf
ad: should be a resolvable resource, das2xml for that feature.

ad: features with aliases, do aliases need type pk or accession?
prosite has false match to ...
ls: this is a property or xid, not alias
ad: suggests that xid needs extra stuff to it.
gh: file with an optional type attribute on xid
ad: let's wait to someone has a need.

Topic: Feature filters (continued)

gh: feature filters, inside, contains, identical. Which do we need,
which can we drop?

[A] overlaps - keep (all agree)

inside - gregg needs
contains - dropping, maybe
identical - dropping

ad: what about excludes - the complement of overlap?
gh: haven't had time to investigate whether I can use excludes rather
than the inside + overlaps (contains?) combination I need now.

ls: use case: pointing to children and they haven't arrived yet.
gh: my client keeps stuff around, when you get parent/child if you
have parent + all children you can construct feature.
ls: the spec requires single parent, right?
gh: no you can have multiple.
ls: gff3 spec also allows mult parent and children

[A] Lincoln will provide use cases/examples of these features scenarios:
- three or greater hierarchy features
- multiple parents
- alignments

Topic: Registry 

ap: still here.
gh: looking at registry, having trouble retrieving in a normal
browser. when looking at it in client, I only see biopackages server
registered as a server. Lincoln said there was more?
ap: this is related to mime types, changed from text plain to
gh: I get an error: source file could not be red.
lincoln said you added other test das2 servers to it.
ap: working on interface so users can upload servers.
half way through it now. upload a link to sources.
will send email once it's there.

[A] Steve will add gregg's new affy das/2 server to registry when Andreas'
web interface is ready

gh: same time tomorrow.

More information about the DAS2 mailing list