[DAS2] Notes from the DAS/2 teleconference for the code sprint, 8 Feb 2006h

Wed Feb 8 21:47:18 UTC 2006

Notes from the DAS/2 teleconference for the code sprint, 8 Feb 2006

$Id: das2-teleconf-2006-02-08.txt,v 1.1 2006/02/08 21:51:14 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  Sanger: Thomas Down
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris
  UCLA: Allen Day, Brian O'connor

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Agenda:

* progress report for grant renewal
* ontologies
* ids and urls
* style sheets
* status reports

Topic: Progress report for grant
--------------------------------

gh: needs to be in the mail by 5pm tomorrow, to be included as a hard
copy addendum to grant. will improve chances of funding for next cycle.
review will be done be end of feb.
nh: no later than 4pm pst today. state what you've accomplished since
Nov 1 and now, in particular this week. one or two paragraphs.
gh: 
1. highlight significant enhancements
2. involvement of sanger, ebi
3. registry work from andreas, http spec for that registry
4. writeback 

ad: andreas worked on registry server, will send write up soon post
telelconference. 

[A] Everyone write up 1-2 paragraphs of progress and send to Nomi ASAP

Topic: Ontologies
-----------------

gh: concerned about ontol attrib in types doc because, do we want it
to be possible for a type to be an instantiation of multiple terms in
the ontology.
ls: will make it hard to validate. one type = many ontol terms. don't
like it. types will be specializations of SO terms and will not have
multiple parents.
gh: thinking about people doing curation. if a type is anchored to one
tern in the ontol, and a feat can have only one type, a feat won't be
able to refer to >1 term in SO.
ls: any use case for this?
gh: still exploring this. eg., both a computed feature and an exon?
ls: no. separate category for predicted genes.
gh: is there something for 'computed exon' or 'computed cds'?
ls: think so.
sc: multiple branches like go?
ls: multiple relationship types do exist. something can be is_a or
part_of.
I wanted das/2 to be limited to what you can say in SO, with notion
that you can extend it. e.g., three predicted exons one with genefinder,
exonerate, etc.

ad: given a string 'exon' how does that get used to query server?
ls: find exon SO term, download list of types from das server, find
everything that inherits from exon ontology term.
clients need to know how to search the SO list.
they will have a local copy of SO that they'll refresh from time to
time.
gh: client isn't required to know the full structure, except maybe to
search higher-level terms. but the term in the ontology attribute is
sufficient. 
ls: could just search types and desc to find exons, but that relies on
implementer describing their types correctly.
gh: if a client wants to understand an ontol, the best way to go is
via what allen's proposing, searching via ontology das, preferably via
NCBO server.
ad: what is the actual string we're searching on?
aday: name or definition, or id.
ls: client should have a copy of the SO. unambiguous in this opinion.
client has SO, looks through types XML to find what the local types
are which the server supports which match what it's looking for in the
SO.
here's a flowchart:

- client downloads SO, caches.
- client downloads seq types list, caches.
- user searches to find exon
- client looks to find matches against 'exon', maybe 5 hits.
- prompts user to select which he's looking for
- client looks thru cached types xml to find server types of SO term
  that user selected
- client does feature query.

ad: what is the string that the user is looking for URL or string?
ls: in type xml how do we indicate the term?
gh: we've been discussing this the past few days
ls: why not replace the term with SO accession number? then we don't
have to figure out the correct representation of ontology in an
xml. can finish this by friday. chris mungall has weighed in, and xml
version of SO ontology is not completely stable.

gh: perferctly ok for client to know nothing about SO and treat these
as unique string.
ls: right. names will eventually be things like 'exon'.
aday: chris's main complaint is that the doc didn't validate. I didn't
have a dtd. got it and now it validates. I thought this was a done
deal. there is a document written that describes how to do what we're
talking about.
ls: the only thing to be resolved, in types xml document, how do we
refer to SO terms?
aday: an attribute there that allows you to put in uri. it's a
relative url that points to ontology das server to get obo xml for
that term.
ad: how do I go from string 'exon' to find out what that is?
aday: 
ls: lets say administrator of das server has local type called
foobar. associated w/ url for SO 'exon' term. andrew's question is,
user want's to search for exons, how to go from 'exon' to correct url
in SO to find what types correspond to that? what's to go from 'exon'
to foobar. 
aday: search SO for exon, local types.
there's a filter onontolgy that lets you search all terms and
definitions
gh: there's a reqt now that server must understnd parent child
relationships in ontology.
aday: server could do xpath query to pull out the terms you're
interested in w/o understanding ontology
ls: user types 'exon' returns all feats in the genome that are exons.
aday: two servers, feat and ontol server
gets all types from feat server, each has url to ontology das server,
maybe multiple ontology das servers. each must have it's ontology
searched returns supported or not. client assembles all search results
from static obo xml documents,
gh: for most clients this will be irrelevant. user will get a list of
types - genscan, blat alignment, for things they may be interested
in. they don't need to understand ontology nor does client. there may
be a url to look up info about the term. this is the typical
case. more sophisticated use cases can be put off till later.
ls: in types xml can we have two attributes, url and accession
so_accession="SO:12414", other will be url for obo xml.

[A] types will have separate attributes for URI and SO accession number

Topic: IDs and URLs
-------------------

ad: discussion about searching for exon, use case: client goes to
server to get list of all types, wants all features of a
given type in a given range. may filter based on contains or inside,
das-type=xxxxx. 
talking about that being a URL to get full name for it.
what is the thing you send to server to ask for the types?
gh: url
ad: make this an id so it's not a long complex url. just an id
specific to that server. such that you go to feat query url and get
it.
ls: can just chose the last component of the url, type id.
ad: why have ability to get feature type individually?
ls: will have to be uniquified, by adding url to types query.
ad: feat query =
ls: isn't this the way it was?
gh: every feat has unique uri.
ad: talking about filtering and querying.
ls: just give it the id not the whole url.
ad: now it is the url
ls: should be the id
does it make sense to be something that another server has defined?
probably not. just a local type.

[lots of back and forth here, didn't catch it all...]

ad: do we need ability to refer to feature or type by url?
gh: yes. for making rdf statements about das2 features.
ad: who will do this?
gh: I will if no one else does. web technology is moving in this direction.
ls: we want every object a das server serves to be referencable as a
url/uri. as for filtering mechanism, for type filter we can just use
the id of the type, a short string.
ad: agree, as of this morning the url and id are same thing.
ls: a relative uri, by definition the server should implicitly attach
the versioned data source url to it.
ad: xml processors
ls: define the way the filter query mechanism, hard code implicit
paths into it.
ls: featuresquery?type=something if 'something' has no slashes, server
implicitly adds http://myserver/das/types/...
ad: don't like pasting urls and strings together to get things.
don't like queries with implicit logic like that.
ls: perfectly happy saying you can use urls in the query strings. I'd
go with short ids
ad: propsing we have both, id and href. here's the case: people
uploading to server want to provide a das track, can provide two
documents. works well for < 1000 features

gh: we have to have uri for features.
ad: why?
gh: I will send you the page from the first grant.
ls: main reason is: to avoid namespace clashes when integrating data sets.
td: what do you mean by integrate?
ls: view of features from 4 diff annotation groups, want to search for
a particular feature by its id, need to indicate which data source
it's coming from.
td: won't you be keeping track of which data source anyway?
you never get a track that's a mixture of diff sources.
gh: dangerous to do this.
td: there must be something keeping track of which track is from.
gh: my assumption is that this is with uri
td: there's nothing that constrains a server to only use uris from itself.
gh: we sacrificed this when we went with capabilities.
ls: a server can emit a set of features, some use relative uris and
some absolute ones. if my server starts emiting features with
affymetrix uris, the assumption is these originate from affymetrix.
uris indicate that they originate from diff places even though you may
physically get them from a das server at a different location.
gh: thomas is right. given a feature uri you have no way to tell which
das server it came from. clients must keep track of this themselves.
ls: we wanted to divorce the origin of the feat from the sever that
serves it. should be possible to serve features that come from
somewhere else.
gh: making feature uri opaque was deliberate.
ad: when you do a feat query it could return the whole db. so the
server must know how to return a feature document that contains all
features. that server must know all the data.
gh: don't see problem
ad: all features and types have id and url. different. url is optional
gh: no, required. also, not url, but uri.
ad: ok. why should all records have a uri?
gh: compatibility with semantic web/rdf, lsid, future proofing.
ad: if they want to they can, if not they shouldn't be required. no
one is doing rdf now.

ls: what issue are you concerned about with respect to uri?
ad: like ontology search. give me all features of this das type, you
then have to give the url. this is different than id.
ls: completely happy treating id as the last component of uri and
doing a paste. why don't you like the paste?
ad: you can get features from two diff places, each ending with same
last word.
ls: what query is it that allows you to filter by feature id? we have
positional, type filtering and getting a single feature from server of
origin.
gh: there shouldn't be an id filter. just resolving uri for that
feature.
ls: we can't search a feature by regex match on it's id.
ad: i'm not saying that. I'm suggesting that the url be optional.
ls: I don't understand the point.
gh: why can't uri be required?
ad: see use case in email today subject="ids and urls". involves
uploading das tracks to a server.

[some trouble: not everyone has seen it]

ls: I say we have a policy that if there is big discussion, the email
should come more than 30 minutes before conf call.
gh: I've read most of it and am still confused.
ls: I still don't understand it after reading. you'll have to rephrase
it.
ad: all types and features have id and url.
ls: no, explain in a follow up email.
ad: ok

[A] Andrew will send follow up email to elaborate on his "ids and urls" use
case

[A] Everyone will try to absorb andrew's ids and urls use case

Topic: Style Sheets
-------------------

ad: how do you refer to elements in style sheets, by id or url?
gh: no opinion
ad: if everything is refered to by id, that makes style sheets easier to
write.
gh: has anyone gotten to implementation of style sheets for das/2?
ad: my proposal was a straw man.

Topic: Status reports
---------------------

gh: reading lots of specs. after yesterday's rant about xml:base last
night, implemented a stack. works fine for our current server.
we shouldn't throw out xml:base because of a few edge cases. we might
want to specify which subset of xml:base we use.
checked in code for igb client, does capabilities, specify feat,
types, segments. trouble when modeling sequences.

ee: working on das/2 client. building new widget as gregg asked for.

ad: working with andreas write up for registry.

td: understanding the spec. xml parsing.
gh: you are using stacks, have experience with it?
td: yes, less painful. streaming api for xml.
gh: tried xom. picky about namespaces. difficult to use with spec
that's not stable.
td: some trouble with dom
gh: sources, types, segments I use dom (small document). for features
use sax

nh: progress with apollo. list of versioned sources, show segments,
user picks, gets features. something that the parser doesn't like.
not sure where the problem comes from.

sc: working on setting up internal das server on 64bit machine
here. refining the pipeline for generating files for loading the affy
das server with updated data for various public and affy data
sources. also writing up and posting meeting notes.

aday: message from gavin about ontology responses. caching issue cased
trouble with model/controller. chris's obo dtd.
dependencies for server rpm were finished. now building the rpm.

td: prsing xml from codesprint server. a few things are matching the
spec from a few weeks back. prop, loc elements. will these be changed.
aday: feature xml?
td: yes. I'm still absorbing the changes, dozens of mails about feat
properties.
gh: more important is loc element, splitting into id and range. used
to be one thing, now is two. one is id, other is start,end,strand.
aday: will look into today.

nh: I'm also taking charge of getting grant progress report
done. especially need allen re: server, andreas via registry.

gh: any reports for write back.
brian: some work on that. not ready for prime time.
gh: roy?
ad: some talk about this puts and deletes on the urls.
gh: let's talk about it tomorrow.