[DAS2] Notes from the weekly DAS/2 teleconference, 13 Nov 2006

Steve Chervitz Steve_Chervitz at affymetrix.com
Fri Dec 8 03:11:00 UTC 2006


[These are notes from a critical meeting last month. In the frenzy of
activity surrounding the freezing of the retrieval spec schema, I had lost
track of the notes I took.... until today.    -Steve]

Notes from the weekly DAS/2 teleconference, 13 Nov 2006

$Id: das2-teleconf-2006-11-13.txt,v 1.1 2006/12/08 03:02:58 sac Exp $

Teleconference Info:
   * Schedule:         Biweekly on Monday
   * Time of Day:      9:30 AM PST, 17:30 GMT
   * Dialin (US):      800-531-3250
   * Dialin (Intl):    303-928-2693
   * Toll-free UK:     08 00 40 49 467
   * Toll-free France: 08 00 907 839
   * Conference ID:    2879055
   * Passcode:         1365

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  Dalke Scientific: Andrew Dalke
  Sanger: Andreas Prlic
  UAB: Ann Loraine
  UCLA: Brian O'connor

Note taker: Steve Chervitz

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Agenda:
-------

Specification
      Status of schema (das2_schemas.rnc)
      Ratification of schema freeze
      Status of XML Schema translation (das2_schemas.xsd)
      Formalizing query syntax?
 
      Status of genome retrieval specification doc (das2_get.html)
      Review of remaining issues in genome retrieval spec.
            Coordinates URIs
            Segment reference URIs
            Ontology URIs
            Revising example queries / responses
      Timeline for DAS/2 genome retrieval spec freeze.
      Other docs?

Implementation status
      Validator
      Genome retrieval servers
            NetAffx
                  queries
                  responses
            biopackages
                  queries
                  responses
            DAS/1 --> DAS/2 conversion server
            cgi.biodas.org test server
            Sanger registry
            others?
            Example queries
      Biopackages ontology server
      Genome retrieval clients
            IGB
                  queries
                  responses
            others?
 
Topic: Specification
-----------------------

gh: regarding the xid/link changes - no servers were using that,
therefore not a major issue.

ee: can't use ucla server
gh: compliance issues with both servers.

sc: was working on re-organizing the html das2 document but got stuck
in CVS commit hell (lots of commit activity going on...)
gh: we'll focus on that in a few mins.

gh: edits to das2 schema except andrew and I.

ls: looked over things. questions about html doc, but schema looks good.
ee: draft3 dir?
ad: draft3 dir are old, should be removed.
gh: those are the ones that got combined before creating the
das_schemas.rnc

ee: did good bit of work on style sheet. not ready to freeze.
gh: not concerned about freezing stylesheet

gh: any objections?
sc: can you reiterate the xid/link stuff?

ad: xid element had lots of "should haves". no feedback on this
yet. referring to other datatbase, 'false positive'. decided better to not
have this,
pulled it out. recommend html attribs for link element.
gh: human readable tag is important to igb.
ad: rnc has examples for reasons to use. features result 'link to rss
feed' so you can get new results for that feature.
gives freedom  to add new kind of links to it.

ad: allen here? no
bo: no feedback re: freezing the spec. final doc is das2_schemas.rnc?
ad: yes.

[A] clean out old, obsolete docs in that dir

[A] add a link near top of html doc to the schema doc.

gh: schemas document is now frozen! opinions on how long it should
stay frozen?
ad: depends on feedback we get.
bo: don't change it at all.
ad: errata, 2.1, community
gh: can we agree that no changes to it unless discussed on the conf
call.
all: yes.

gh: would like to discuss XML schema translation of the rnc and query
syntax when Brian Gilman joins in.

Topic: Review status of genome retrieval spec (das2_get.html)
--------------------------------------------------------------
gh: looking at CVS commit log from prev week.
most of this was to reflect changes in the rnc.
gh: 
1.35 - done by capabilities now
1.36 - remove reqt to return seq in fasta fmt. want to be able to
       specify a segments doc but not have to return the residues.

also did polishing error responses, server decides when response is
too large and sends error messages.
1.38 - started putting in ontology URIs. we had discussions with chris
       Mungall discussing how to refer to ontology entries via
       URIs. he said it would happen via NCBO but not until next
       year. Updated to refer to ontology server that allen and brian
       (UCLA) are working on.

[A] brian/allen (ucla) will work with ncbo on uri access to ontology
terms when they're ready

related issue: segment reference URIs. we still don't have ref uris
for anything but worm and fly. lincoln created at last code sprint.

ls: did human and mouse, too. on the wiki.
ad: global seq ids wiki doc: http://www.open-bio.org/wiki/DAS:GlobalSeqIDs
gh: I was looking at doc checked into CVS.

[A] will change examples in spec to start working with these

gh: how this relates to registry, uris maintained by andreas. no
connection to andreas' registry.
ad: ziltch
ap: this is concerning the uri for coordinates.
gh: this has to connect with a uri that gives these lists of
sequences.
right now, no way for someone to look at coord uri, or
source/version/authority and see which of items in this list of global
seq ids to use. they can guess, but there's no formal way to do that
now.
ap: ok

gh: have pointer for each of these sets of segments a pointer to the
coord uri at sanger.
ap: uri of coord should be resolvable to additional info like
organism, version of assembly, etc.
ls: diff between uri and gsid
gh: gsid not an id for the whole assembly. wait... it is., but is diff
from the ones andreas is using.
ls: so his registry needs to be updated to use all builds/releases
listed on this page.
ap: ok. so names can be resolved?
ls: all are uri's, some can be resolved, but that's accidental.
ap: fine.
gh: coordinates element, like everything else, are allowed to have a
doc_href, right? so you can have a pointer to a doc that does describe it.
ad: nope. uri, taxid, source, auth, version, created, test range
gh: some readable page describing coordinate system
ad: can either use an extension, or a link
ap: link is fine.
ad: segments are resolvalble, but reference ones are not.
ap: makes sense for the reference coordinate uris e resolvable, too.

gh: don't think they need to be resolvable. but it's nice to point to
the website of the authority that is owner of that assembly. getting
them to put up a resolvable is problematic.
ls: not nec a problem, but that it will never break is a problem.
I could provide doc_href for each ncbi build, that should be fine. why
must the uri resolve to anything? eg, documents to describe build
statistics. 
ad: people only need a unique string.
ls: how about a doc_href for each one, and put that in the coord
system. 
ad: in coordinate tag where you supply uri for assembly, there is no
space for doc_href.
ls: withdraw

gh: registry and server must agree on the names used for
coordinates. that's all I need. means I need to change my server,
ucla, andreas must change registry.
ad: changes to that wiki page, adding new assemblies
ls: can be done. this gsid page was just a starter.
ap: could parse html page to get a list of uris.

bo: what needs to be added to biopackages.
gh: to have registry know that your ref seq is same as everyone elses,
need uri for a given assembly.
bo: in v source or v document. there is a coord element that has uri
pointing to the assembly uri.
gh: segments response, each seg type has a ref attribute to the
appropriate uri using these gsids.
bo: this is already in there.
gh: you're good to go, but the affy server needs updating.

[A] gregg/steve update affy server to use the cannonical list of global seq
identifiers.

Topic: other changes
---------------------
gh: cigar strings, added ref to document to quote. need to put in
examples of it (alignments).
those are the major things that changed.

todo: coord, segment, ref ontology uris.
revising examples in the spec.
architectural re-org of the doc.

ee: some places where the lang could be clarified. not changing
meaning.
ad: ok.

sc: doc re-org stuff. described.

[A] steve will post message to list when re-org is done

gh: "more examples" needed sections. I'll focus on these.
target freezing html doc by end of week.

aloraine: updating website with all working servers?
gh: better to point to registry to say go there.
people will then know to contact andreas to get their server there.
al: interested in plant das servers.
ap: I don't know about these. wrote an email to them to put there
servers there. 
al: does EBI have any? there was a das site associated with ensembl
(for plants). the Iowa state das server needs fixing (xml is
malformed). 

[A] Ann will send info about Iowa state das server to Andreas

Topic: Implementation status
----------------------------
gh: validator has been helping, is it on lastest rnc?
ad: not yet, but it should be easy, just a cvs update.

[A] andrew will update to lastest spec

gh: impl das2 servers: need changes to affy server to bring into more
compliance. all responses passes validator now, but is breaking what
it needs re: errors. will coordinate with steve when ready to deploy
today or tomorrow.
sc: error codes.
gh: certain things it can't respond to, but if I throw the right
error, it's considered valid.

ucla: 
[A] run responses from biopackages server through andrew's validator when
it's updated

gh: segment syntax is a full uri
bo: biopackages. full uri should be usable, for feat and type filter.
when I updated server to fix v/source, I turned off caching. I can't
clear out cache completely. there may be old response documents that
need to be cleared out. Will leave it off for time being and figure
out how to clear out the cache.

ad: I have updated schema on the validator.

bo: another issue when reading thru rnc/html under ver source.

gh: capability type element has to match. would cause igb to fail.
bo: will update this.
gh: getting servers to pass validator is more important than freezing
html spec now. time pressure for Brian doing caBio development. has
not servers to hit against, so they need to be back in action ASAP.

gh: andrew's das1/2 proxy?
ad: das1 to das2 proxy. does it on demand. not publically accessible
now. feature conversion was too slow. need to re-write to no longer
using feature template.
gh: would want to consider putting it on a fast machine. would be a
nice thing to have to support all old das1 servers.

[A] make das1-2 proxy public

ee: not many client use of das2 now, so load should be bad.

andreas status of sanger registry.
ap: not much work for das2 since code sprint. now that spec is frozen,
planning to use andrew's validator when rewriting server. also
interested on the das1->2 proxy.
gh: could have registry make use of the proxy
gh: checked igb using sanger registry, was using recently. not sure if
it's passing the validator.

gh: lincoln plan for serving hapmap data?
ls: import the essential part of data into a hapmap server that brian
gilman is writing. then exported data will be brought into a caCore
client for re-exportation into the caBig grid.
gh: spec freeze helps timeline?
ls: brian gilman says he will have das2 client out by later
today. he should have joined this teleconf.
gh: I worked on xsd schema and talked to him via phone. it is now up
to date with frozen rnc spec. can use it to generate java or other
programmatic bindings.

gh: status of biopackages ontology server, but it is up and
running. it serves uri's so it is sufficient for das/2 needs now.

genome retrieval clients:
[A] gregg needs to see why igb is having problems with biopackages server.

am updating local server for local igb testing, will coord with steve
to post on public server.

ee: don't break it, am doing a presentation to cytoscape folks
gh: will get a new server going and keep the old one going

gh: will coord with steve

Wrapup
-------

gh: lots of good progress this week.
igb release planned for next mon, when ed will be back.

[A] meet next monday to freeze html doc.









More information about the DAS2 mailing list