[DAS2] Notes from the DAS/2 teleconference for the code sprint, 7 Feb 2006

Steve Chervitz Steve_Chervitz at affymetrix.com
Wed Feb 8 00:30:52 UTC 2006

Notes from the DAS/2 teleconference for the code sprint, 7 Feb 2006

$Id: das2-teleconf-2006-02-07.txt,v 1.1 2006/02/08 00:37:41 sac Exp $

Note taker: Steve Chervitz

  Affy: Steve Chervitz, Ed E., Gregg Helt
  Sanger: Andreas Prlic, Thomas Down
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris, Suzi Lewis
  UCLA: Allen Day
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

* Vote on constructing URLs/URIs to query segments, types, features
* Status report from people
* Ontologies
* Feat property changes

Topic: Constructing URLS/URIs to query segments, types, features
1.) specified by query_id
2.) hardwired to ~/segments, ~/types, ~/features
3.) ?

ad: lots of people have left here so the vote won't include all.
see email why a query url is useful
agree w/ gregg: short names could be a nice to have.
shouldn't have to worry about how you organize your urls
gh: yes it does: this/types this/segments etc.
ad: can take it out if there's confusion
gh: recommended structure is good.
ee/gh: people will look at the examples and do it that way. they won't
look at .rnc file
gh: make it clearer in the spec that these are merely suggestions of the
hierarchy, you don't have to do it this way.

ad: roy's view: likes the query id url for doing search for all
featues, or all types.
query id is the url used to do search against features.
uri could be relative or absolute.
gh: category element defines a query id for a subset of das.
it's the attribute query id in the category

ad: I also want to rename category back to capability.
how do we arrange urls in a versioned source.
construction off of strings or via attributes in a url
gh: votes for hardwired, but feels less strong today about it.
ad: majority vote is for query id, spec czar goes with that.

[A] query id
[A] andrew will update spec to have less mention of hierarchical structure
[A] allen will update server to do it the recommended way

gh: in addition to have an arbitrary query id to get segments, types,
features, there's a recommended way to do it via the hierarchy. server
should do it the recommended way (hierarchy)

ee: we should be flexible about it.
gh/ad: ok take out recommendation.

Topic: Status reports

ad: see his emails.
gh: we need examples in spec document and scratch to be better
ad: should be, i've been trying to keep these in sync.
gh: plan to push into html, incorporate scratch into doc?
ad: yes, eventually.
will also add andreas' work to scratch too.

td: java xml binding libraries, how to put it into a workable server
ap: das registry, sources command, attribute handling, people can
connect to a toy server publically available.
gh: registry will respond?
ap: yes. toy server, toy data like das1, returning sources command.
gh: can you add allen's codesprint server? consider it registered.
ap: is fully working?
gh: can allen send a command to it to register it?
ap: no.
gh: would like to tell my client to do discovery rather than hard

gh: comits to igb das/2 client to handle seq, segment, types. not
features query yet. given decision about url construction, can do this
fast so we can test on codesprint server seq, seg, types to bring up
something meaningful in gui. not features by today. affy das/2 server
is running behind. will sync up today as well.

nh: apollo working out sequence, segment, types request. now does
versioned sources. integrating those into query gui as well.

aday: changes early this am. server running under /codesprint is now a
static doc pointing back to the old server. adding segment command,
merging region and seq command. has made everything except
capabilities writeback stuff.
ad: there's another request recently, see my email.
aday: have gotten 40 emails from you in the last day!

aday: brian oconnor is working on bundling dependencies for an rpm
based release.
gh: I also did significant refactoring/moving assay/ontology stuff
into subclasses on client side. haven't seen brian's code, but should
run fine. 

Topic: Integrating Sequence Ontology with DAS/2

suzi: national center for biomedical ontology, one of 7
natl centers for biomedical computing. focus on needs regarding
developing and using ontologies.

gh: hoping to have a typing system in das/2 via types queries that
references SO but doesn't require client to fully understand
ontologies. too much of a burden. that's the challenge. this
translates into referring to ontology terms as opaque uris
suzi: 'understands' means they're ignoring any relationships between
gh: yes.
currently type has attrib for id, attrib for ontology.
ad: uri or arbitrary string
suzi: can use uri or string, preprocessed
ad: one or the other
gh: prefers uri
suzi: from uri you can get the string
gh: not clear how to construct uri for particular terms in an ontology
suzi: this will happen in next few months. talking with daniel rubin
about this.
gh: this is where allen comes in. ontology das.
aday: next step is getting it hosted on NCBO server.
currently communicating with chris mungall. said they're planning on
implementing something similar soon, not sure if they'd accept allen's
solution. unclear.
working with gavin sherlock on ontology support for microarry samples,
tissue type, phenotype. was hoping people could pick this up and use
suzi: gavin and I could help push this.
gh: chris m posted concerns about obo xml that's in allen's scheme
isn't same as what he's using. re: how you make absolution uris.
aday: there's not much docs on obo xml format. did the best I could.
suzi: should be able to sort it out. just an inertia problem of
getting it installed. not a competition issue. fine with me. not
aday: by end of week we'll have an rpm.
suzi: let's keep pushing on this to make it happen. I'll talk to gavin
tomorrow. can we install on sf site, or do we need to set it up
aday: could conceivably set up a cgi on sf. uses custom apache
handler tho.

gh: more ontology q's can wait till tomorrow w/ lincoln.
concern: how do we deal w/ types that represent more
than one ontology terms. defer discussion till tomorrow.

Topic: Feature Properties

See andrew's post today.

ad: this ties into ontologies. two ontology related issues: two different
ways to query. ontology of a feature, and two diff ways to search a db
for that property: exactly equal, or a subtype.
this is a property with two diff searches you may want to do on it.
properties like note, alias, phase have ability to search key/val
properties, e.g., att:alias=something.
score is a floating point number you may want to support > or < on it.
regular exp searches, identical, etc.
td says use xml query language, but worried about complexity of this.
99% of time this is way more that you need.

scenario: given 4 different notes in a feature, is order important?
extensions: curation point gives curator's name and time stamp.
e.g., search for all featues modified by andrew in 2004.
discussion: pull this into a note element, perhaps phase and alias
property table only supports a substring search. give me an author
name, e.g.
not saying getting rid of tag values.
server supporting new data types, extensions, feat search w/ sanger
curation elements for query. or thomas xml search.
this is why I want to move categories back to capabilities.
gh: more appropriate as capabilities than header.
ad: someone can get a document. andreas can combining many servers into
one, say: which one supports which.

to summarize: 
- properties are simple strings
- only substring searches
- change att: to prop:
- note and alias and phase are elements
- advertise that a server has extension to das query lang

gh: what about phase? lincoln needs it.
ad: if it's something that people will be editing, make it a element.
gh: phase is inappropriate for certain types. would like formal way
when it should be there or not.
ad: this is formalizing a way for server to tell client that there are
more types of searches available.
can't see how to do it automatically: eg for a given score, knowing
what is considered significant (low or high, e.g.).
td: if he needs a phase he re-infers it. doesn't work for partial CDS
gh: how much spec churn will this generate?
ad: [various things, half a dozen or so, some simplifying]
gh: does a colon in a query string need to be escaped? if so, this
makes it hard to read.
ad: could use prop_ rather than prop:
thomas and I had long discussion about this.

[A] andrew will incorporate these changes into feature properties

Topic: Maintainer information

ad: modified examples under scratch
gh: maintainer at source or version level
ad: one for all sources level
ap: at sanger we have one central server with lots of sources. notes
who's responsible for which server.
gh: ownership cascades down to sub elements?
ad: yes

Topic: XML Base

gh: can be in any element. as well as xml:lang, don't really
ad: it's what the atom spec does, so we copied. maybe for
bidirectional languages.
gh: flexible uri resolution scheme w/ xml base. implementation in java
tools is spotty for xml:base. curious about java obj binding of xml
what support they have for resolving xml base. at this point will have
to roll it myself. want to ask thomas about this.
ap: he's using Stacks parser, gets global namespace.
gh: bigger concern for when we have to use sax, need to do xml:base
resolution, eg. when we need to retrieve lots of features.
ad: it can be done with sax.
gh: not hard, but it is a multistep process.
ad: multiple levels of xml:base

ad: tomorrow's agenda: go through roy's otter stuff, convert into new
das format. to get a feel for how data will look. see roy's email. to
use experience gathered from otter to make sure we're sufficiently
covering features.

gh: talking about writeback?
ad: premature. let's talk style sheets wed, and writeback
thursday. plus anything else that's come up about the spec.
want to know how style sheets will look. lincoln should be able to
help out there.

More information about the DAS2 mailing list