[DAS2] Notes from the biweekly DAS/2 teleconference, 8 Jan 2007

Mon Jan 8 21:32:00 UTC 2007

Notes from the biweekly DAS/2 teleconference, 8 Jan 2007

$Id: das2-teleconf-2007-01-08.txt,v 1.1 2007/01/08 20:05:30 sac Exp $

Teleconference Info:
   * Schedule:         Biweekly on Monday
   * Time of Day:      9:30 AM PST, 17:30 GMT
   * Dialin (US):      800-531-3250
   * Dialin (Intl):    303-928-2693
   * Toll-free UK:     08 00 40 49 467
   * Toll-free France: 08 00 907 839
   * Conference ID:    2879055
   * Passcode:         1365

Attendees:
  Affy: Steve Chervitz, Gregg Helt
  CSHL: Lincoln Stein
  Dalke Scientific: Andrew Dalke

Note taker: Steve Chervitz

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 

Agenda
-------
 * Status reports
 * URIs for global seq IDs

gh: On the das2 get html spec, I took out all xxx comments. Just
committed today. not all fixed, but adjusted language as
needed. Remaining xxx comments are still present as html
comments. we can keep editing, looking for hidden comments, adding
info as we can.

I also refined some text, discussing coordinate elements, where to
find uri for those. put in pointer to global seq ids page on biowiki,
said it will soon be incorporated in sanger registry.

Todo: alignments, and address global seq id fuzziness.
should be able to clean up the seq id stuff based on today's
teleconf. so really just aligmnent examples.

[A] Gregg will add alignment examples to das2 get spec html version

[A] Announce das2 get spec (html and xml) when alignment examples are
complete

[A] discuss with andreas how to incorporate global seq ids into registry

gh: now weh have gobal seq ids in biowiki, added by lincoln, but editable
by anyone (handy). I want to integerate into sanger das2 registry,
and generate from that or vice versa a das2 server that serves those
things up. makes sense to have those uris served up as versioned
source in a das2 document. now, coord uris are only available in the
biowiki page.

ad: only need to be abstract strings if there's no need to resolve them.
gh: want these to be uris for a das2 versioned source.
ad: get ncbi to do it?
gh: not necessary. just make them accessible through the das2
registry. doesn't matter where it is.
ad: how use?
gh: someone can then go to das2 registry and see all the ones that
have a global coordinate system.
ad: can be done now.
gh: want to avoid screen scraping the html page.
ls: need to have a set of url's for parsable documents with coord systems?

gh: we have xml for versioned sources in segments xml
document. there's direct mapping from v-source id to coord uri, and a
direct mapping from segment reference uri to segment id. we need a
v-source document where each has a uri for global coord system. there
would be just one capability -- segments, retrievable in the segments
xml.

ls: right. who's maintianing? there are 1000s of genomes.
ad: coordinate element has other slots, attributes, what else is
needed?
gh: there is no central way to look at all coords that are available?
ad: why needed? can't you get it from the sources document without
looking something else up?

ls: registry could compile a unique list of that and produce a report
which has a list of all coordinates followed by all data sources that
use that coord system. would be useful.
would show who is using various systems. Spot bugs like  two servers
using diff coord systems for the same taxon. Server could show coord
systems on a per-taxon basis, no additional query needed.

gh: another example: part of the problem for me is relationship between
coord uris and segment reference uris.
ls: why? no consistency enforced in spec.

gh: no guarantee that if you use the same coord uri that you'll use
the same reference uri. another problem: looking at biowiki doesn't
tell you how to construct a coord element. coord elem has attribs for
taxonomy, authority, etc.

ls: the coordinates url should point to the biowiki page with correct
anchor, need a line for each coord system. I didn't give url to coord
system. 
gh: attribs aren't on the wiki page.
gh: people need to know that's what they should use in the das2
documents. we should show on page: "name for das2 full coordinates
element is 'such and such'". people can then cut and paste.
ls: ok.

[A] lincoln will post additional attributes on global seq id biowiki page

gh: then screen scaping by registry to grab all coordinates to make
them available. 
ad: why do screen scraping?
gh: ...
ls: when people add info in the registry, then that goes in.
gh: if using wrong authority?
ls: it goes in wrong.
gh: would like the server to catch it.
ad: when registring a server, it presents a drop down field
gh: where does it get that list?
ad: best to ask andreas.
gh: should be within biodas.org
sc: wikification progress of biodas.org - can migrate global seq ids
page there when it's ready.
gh: might be best to let Andreas decide, to manage it on his server.
sc: he's been active in the initial wikification work, so maybe he'll
be ok maintaining/migrating it there.

ls: some confusion over where to add this info (biodas.org, wiki of
that, or open-bio.org wiki). only want to do this once.

[A] steve/andreas will complete biodas.org wiki migration and notify all

gh: status continued - alignment examples to spec are still to do. igb
release in december: pays attention to coord uri's should be able to
match up biopackages and affymetrix das2 servers v-sources on the
same genome and overlay rather than making a new genome. in the next
month: working on getting transcriptome data in a das2 server. lincoln
had mentioned NCI interest in this. most of code is in place to serve
up affy transcriptome data as graphs, a datum every 20 bp, e.g.,
efficient slicing from whole chromosomes, to get what you need for das
range query, and bring them into igb as slices. working on serving up
in alternative formats (e.g., UCSC's wiggle) rather than just affy
binary format. Return options - graph the size of whole chromosome
(now), or more per region, put score in a das score element, which
would be a very large document.

ls: be prepared to return in a das2 xml document, each score in an
element, not sensible over whole chromosome, but ok for a limited
region. A good form of compatibility.
gh: size issues - could give a 'request too large' error.
ls: could use http compression. NCI will likely never support the
specialized format, so if you don't give das2 xml format, it will not
be available to that client.
ls: brian gilman - can't pay him to do more work on that contract.
gh: might be able to pay him via affy. need to get this going within
next month, couple of publications need it.

ls: status: - took xml parser for perl das2 client, cleaned it up, put
it on perl cpan website, underlies parsing and processing of das2
streams. pure perl, no requirements for c libraries, not validating
sax client (so it's faster than the c libries). it does namespace
handling, multi-threaded. offered as a standard reference. missing
handling of features (types, sources, segments) -- big hole. NCI java
client library went thru it's approval process, still doing various
tests and qualifications before folding into main NCI source code
repository. Hapmap data source still in progress.  Just xml parser on
cpan, full server not complete.

gh: using biopackages server?
ls: yes, an instance at CSHL (or will).

ad: holidays and was sick for last month. working on proxy code,
rewriting, fixing. 
sc: regarding your das2 committment?
ad: making up for sick time last month. plan is to get proxy stuff
done, then that's it for das.

gh: (more status) also working on getting the das2 feature query
support fleshed out. handle any combination of filters. coming soon,
moving to new affy server hardware. on steve's plate.

sc: worked on biodas.org wikification. some server configuration
issues. have made a good start with Andreas' help, but more to
do. should be in place later this month. some html get spec edits,
fixes. planning to help gregg set up new hardware for affy das
server. can then support more arrays, genome versions, organisms.

gh: hardware - hoping it would be here. approved by purchasing on
1/5/07. PO likely went out to vendor, so should be in within a week.
we have requests in to support more versions, probe set location
for exons on older genome versions.
sc: transcript annotations?
gh: background - affy chp data has no genome location, just probe set
id and score. IGB takes that data and merges with genome info to build
heat maps to look at data. Been tricky to determine most efficient way to
do that. Need to have both probe set level and transcript level
data. in progress.

[A] steve talk with UCSC about meeting focussed on das in feb/march

[A] Next das2 teleconf: 22 Jan 2007