From Gregg_Helt at affymetrix.com  Mon Mar  5 11:40:26 2007
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 5 Mar 2007 08:40:26 -0800
Subject: [DAS2] DAS/2 Teleconference today at 9:30 AM PST
Message-ID: <C71929195D04BF48BAECD499AF717B480198CD51@msex02.affymetrix.com>

Just a reminder that the DAS/2 teleconference will be at the regular
time today, 9:30 AM Pacific time.  Ed and I will be summarizing the DAS
developer and BioSapiens feature classification workshops we attended
last week in Hinxton.  Hopefully others who attended will join in and
give their perspectives as well.

Conference phone # 
    USA: 800-531-8250
    International: 303-928-2693
Conference ID: 2879055
Passcode: 1365


	Gregg


From Gregg_Helt at affymetrix.com  Mon Mar  5 12:30:10 2007
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 5 Mar 2007 09:30:10 -0800
Subject: [DAS2] Brief summary of DAS/BioSapiens workshops from a DAS/2
	perspective
Message-ID: <C71929195D04BF48BAECD499AF717B480198CD52@msex02.affymetrix.com>

Summary of DAS & Feature Classification workshops, February 26-28 2007,
Hinxton
 
DAS Developers Workshop:
http://www.sanger.ac.uk/Users/ap3/dasworkshop.html
 
BioSapiens Feature Type Classification Workshop:
http://www.ebi.ac.uk/~hhe/tmp/BioSapiensFeatureMeeting.htm


DAS1 clients discussed:
          Dasty2, JalView, VectorBase, IGB, Pepper, Spice, ProView,
Ensembl ContigView, ...
DAS1 servers discussed:
          PFam, Ensembl, ProServer, Sisyphus, ...
 
DAS1 extensions:
          Gene DAS
          Protein DAS
          Alignmen tDAS          
          Structure DAS
          3D-EM DAS
          Interaction DAS
          MaDAS (writeback?)
"simple" DAS


DAS/2

BioSapiens Overview:  http://www.biosapiens.info
<http://www.biosapiens.info/>  
  Large-scale genome/protein annotation, 25 institutions from 14
countries across Europe participating
  Currently 23 DAS servers within BioSapiens project serving 69 DAS
sources.
  4 servers appear to be down (21 sources fail features query)
  See http://www.biosapiens.info/page.php?page=biosapiensdir for more
DAS server stats


Major concerns for Ensembl / Sanger / BioSapiens I think we've addressed
well in DAS/2
          Gene DAS
          Protein DAS
          Alignment DAS          
"simple" DAS 

Major concerns for Ensembl / Sanger / BioSapiens that surprised me:

    A) In general the use of a smaller subset of DAS1 than expected
        Many BioSapiens DAS servers don't support "entry_points" query
(64 fail|NA)
        Many BioSapiens DAS servers don't support "types query" (49
fail|NA)
               in DAS1 features themselves can carry most of the types
info
        Some BioSapiens DAS servers don't support "features" query
parameters (only the features query with no params)
        Many BioSapiens clients don't use "entry_points" query, "types"
query, or any feature filters (always get all features for a given
segment)
        BioSapiens protein annotation almost exclusively uses flat
(one-level) features
very little or no use of "group" attribute to make two-level features
example: disulfide bond annotation- relies on rendering or prior
knowledge to differentiate
        Ensembl DAS servers are in general serving one type per source
        These simplifications of clients and servers are reinforcing
each other
        If using subset of DAS1, does this mean that DAS/2 might be too
complex?
        But with these simplifications, the complexity is getting pushed
into other places
    
  B) Data overload
        Number of servers, sources, types
             Ensembl: will have 1000s of sources soon
        Redundancy concerns
             example: Pfam domain 
   Many sources with same / similar annotation type - "Pfam domain"
          Slight differences in feature ranges
          Which is the authority?
          Is there a way to help clients decide which can be combined
        Mirrors
  
  C) Feature Classification / Ontology issues
        SO currently inadequate for describing protein annotation
               developing PAO (Protein Annotation Ontology)
        types proliferation
            example: one feature type for each PFam domain?
                ~9K PFam-A domains
                If look at PFam-B (PRODOM that don't overlap PFam-A),
then ~70K / 450K more (>2 proteins in family / not)
            of not in unique type, where does that information go?
       Need multiple ontology terms to describe a single type?
 
------------------------------------------------------------------------
------
 
DAS WishList (last session of DAS workshop, people listed desired
improvements on whiteboard)

Multi-level features (Gregg)
Multi-level stylesheets (Ed)
Caching (last-modified, if-modified-since, TTL)
Provenance of features from other sources (features from different
sources with same IDs? types?)
Large analysis / Scalibility
       1000s of seqs + 1000s sources + types ?
More queries: feature types / date
Entry point support
Encryption support
Stats-query interface -- count # of features of type for a source
ID ref external (URI / URN)
Proper error / exception handling
Asynchronous requests
       process
       batches
Better Stylesheets
Mapping servers

We've discussed most of these wishlist issues before while developing
DAS/2, though we certainly haven't completely solved all of them...
 
 
From Steve_Chervitz at affymetrix.com  Mon Mar  5 14:03:03 2007
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 05 Mar 2007 11:03:03 -0800
Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 5 Mar 2007
Message-ID: <C211A967.25541%Steve_Chervitz@affymetrix.com>

Notes from the biweekly DAS/2 teleconference, 5 Mar 2007

$Id: das2-teleconf-2007-03-05.txt,v 1.2 2007/03/05 19:01:59 sac Exp $

Teleconference Info:
   * Schedule:         Biweekly on Monday
   * Time of Day:      9:30 AM PST, 17:30 GMT
   * Dialin (US):      800-531-3250
   * Dialin (Intl):    303-928-2693
   * Toll-free UK:     08 00 40 49 467
   * Toll-free France: 08 00 907 839
   * Conference ID:    2879055
   * Passcode:         1365

Attendees:
    Affy: Steve Chervitz, Ed Erwin, Gregg Helt
    CSHL: Lincoln Stein
  Sanger: Andreas Prlic
    UCLA: Allen Day

Note taker: Steve Chervitz

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda
-------
* Review of BioSapiens DAS workshop
* Status updates


gh: I sent my summary of the biosapiens das workshop and feature
classification workshop I attended with Ed in Hinxton:
http://lists.open-bio.org/pipermail/das2/2007-March/000982.html

"das developers workshop from a das/2 perspective", summarizes what I
took home from these meetings, how well das/2 meets needs of people in
europe (ensembl, sanger, biosapiens -- the focus of these
meetings). and a quick biosapiens overview: a big european project ,
25 institutions, large scale genome protein annotation. decided early
on to use das to distribute annotations between organizations. can
check the stats on their das servers -- andreas' registry -- 23
servers serving up 69 das sources -- a major das investment!

In developing das/2 we haven't had too much experience with the kind
of data they're dealing with (protein annotations).

das/1 clients under study:
 - dasty2, dasty1 - ajax-based viz clients
 - jalview - alignment viewer, editor
 - igb - Ed gave presentation
 - pepper and spice - das viewers, also use alignment and 3d structure
   info
 - proview - protein annotation,
 - ensembl viewer

servers presented/discussed:
 - pfam, ensembl, proserver, Andreas',
 - Extensions to das/1 protocol discussed: gene das, protein das,
   structure das, 3d-em das (arbitrary 3d volumes), interaction das for
   prot-prot interactions. Moddas - writeback in das/1. Alignment das
   (Andreas). 
 - Simple das - das servers that don't impl all of das/1 (entry_points,
   or types, e.g.,).

Gregg presented on das/2, will put up ppt later. Tailored it assuming

[A] Gregg will send out powerpoint for his talk from BioSapiens DAS workshop

Focussed on familiarity with das/1, how big the diffs are with an eye
towards how hard it would be to move to das/2. Conceptually, not that
big a switch, though XML is a lot different.

Also discussed how well das/2 addresses some of the problems with
das/1 that came up at the workshop.

extensions for das/1:
- das/2 addressed some of them very well. E.g., gene das (das w/o
  specifying location of feature). this is addressed well in
  das/2. can have features w/o location, or w/o range.
- protein das - das/2 did a good job of removing nucleotide specific
  parts of das features (orientation, phase are not required). das/2
  is much more agnostic about dna vs protein.
- alignment das - pairwise or multiple - locations with features in
  das/2 addresses some of these issues (0,1,or more locations for a
  feature) each location can have optional gap attribute (cigar
  string). so if you can describe it with a cigar string, you can
  describe it in das/2. Can use multiple locations to do mult
  alignments. Not dealt with in das/2: 3d-threading of an alignment through
a
  structure.  Need to look at this in the future

[A] Look at how to handle 3D structure alignment threading in DAS/2 spec

- simple das stuff handled better in das/2 - in das/1 the assumption
  is you support all things unless. but in das/2 there is a
  capabilities header, you must indicate support there, if not stated,
  the default is you don't support it. Can also say you support
  feature filters, so there's more formal support for that.

Surprises:
- smaller subset of das/1 is in use than expected. of 69 sources, 64
  either fail entry points or say not applicable. types query: 49
  fail/not applicable

ls: for types query. only one type?
gh: for ensembl, this is the case.
ap: lack of consistency of types is addressed in the other workshop
related to features.

gh: in types in das/1 it is less necessary because all info is
replicated in each feature, type-method, category, id
ls: use case for types query is to present user with set of
checkboxes, select which type to retrieve from source. if in practice
das sources are being use to for one type or a set of types that only
make sense together, no reason to turn off a part of it, then makes
sense to not support types query.
ls: have heard that types query is expensive. computationally. simple
db backends with no normalization/indexins, finding all types involves
visiting each record.
gh: part of justification with 1 type / source is because those types
are stored in separate db. so having a das server to integrate them
make sense.

gh: Re: using smaller subset of das/1 than I expected:
types can be expensive in another way, example: representing pfam in
das. feat type for each pfam domain type (9000 primary domains).
Pfam b - there are 70-400K more!

ls: in das/2 create a single type 'protein domain' then use attribute
pointing to an ontology saying which pfam domain it is.
gh: concern there is, assuming clients will do something useful for
particular attributes. For rendering, I could do diff rendering based
on diff attribs (color diff domains differently). but for clients to
really understand that they're different, that's a more complicated
issue.

gh: not using types or entry_points by clients because servers don't,
feedback loop.
ap: low coverage genomes (e.g., elephant) may have several 100K entry
points. 
gh: in das/2 we are more formal and say that you don't support
it. Creates problem: how do you know what to query in the first place?
Then you have to know what you're looking for.

gh: feature hierarchies handled in das/2 -- this is not an issue for
protein das, where annotations are completely flat. even protein
disulfide bond is one level, just rendered differently so it doesn't
span all residues in between. But doing non-visual things (unions,
intersections) this could be a problem.
ls: flat in terms of location or ontology?
gh: location. there is no feature ontology yet (no consistent, agreed
upon yet, just proposed at this meeting).
ls: they aren't creating discontinuous features because too hard, or
don't care.
gh: just not needed for most protein annotations. even when it could
be needed, just not being used.
ls: for nucleotide, it's needed frequently
gh: not an issue for das/2

gh: ensembl collapses type and source into one thing. what does this
mean? das/2 could be over complicated.
ls: no doubt that it is too complicated for the biosapiens use
case. we could make it easy for them to use by providing tool kits to
read and write. could also argue that postscript is too complicate to
draw simple rectangles on the page. You wouldn't expect then to
simplify postscript. There are tools to ease simple rendering.
The complexity of das/2 won't interfere with adoption, but not having
toolkits, middleware layers to read/write. Not getting ensembl buy-in
to das/2 could be a problem
gh: tim hubbard was there and was on-board to transition to
das/2. 
ls: would have be better to have buy in now (i.e., Tony Cox dropping
out)
gh: we've made it more formal to say, here is the subset of das/2 that
this server supports. for other use cases, we do need the added
complexity.

gh: re: ensembl support for das/2. I mentioned andrew's das/1 - das/2
transformational proxy server. not released yet, but making progress
on it. So if you have a das/1 server, you can put a das/2 front end on
it.
ls: can you go the other way, provide das/1 interface on das/2?
gh: want to do this for the affy public das/2 server. Andrew's doesn't
do that yet, but I'd like to do this. Another thing: integrate that
proxy into the registry, so the registry makes it into a das/2
server. then we don't have a burden on servers to support two versions
of the protocol. 
got email from andrew about his proxy on that.

sc: I put a note about Andrew's proxy server on the biodas.org wiki.
gh: he needs to have a place to keep it.
sc: open-bio server would work. Just need a beetter mechanism to
ensure it stays up. I think it's not getting started when the machine
gets rebooted.

[A] Steve/Andrew work on stable home for the proxy server

[Correction: In my note in the teleconf, I was thinking about Andrew's
validation server, which is hosted on open-bio and has a problem with
not being up reliably. The proxy server is another issue. There's a
mention of it on the DAS FAQ page, but not pointer to any server
yet. -steve] 

gh: data overload and redundancy from the user perspective. clients
where default for protein annotation is to go to all servers, you have
way too many track showing up. Lots of servers and types. Ensembl is
moving to expose even more data via das, thousands of new tracks
(organisms, type, assembly version). Concern with biosapiens is
replication of the same annotation data. E.g., pfam domains in
different biosapiens data sources, may return same thing or slight
diffs in feature ranges. how does user decide which is authoritative?
Which can be left out? A big concern at the biosapiens meeting --
redundant information.

gh: another issue: mirrors for the data. discussed in early days of
das/2, not resolved how to deal with mirrors, http redirection
mechanism. This can lead to redundant data when you hit all mirrors.

gh: feature classification and ontologies around that. My take was
that the sequence ontology is inadequate to describe protein
annotation as it stands now. PAO - protein annotation ontology
ls: are they doing this with NCBO involved?
gh: talked to them about getting hold of lincoln and suzi and
integrating with SO as an extension.
ap: for 3rd version of SO we will contact lincoln and suzi to discuss
ls: great
gh: for biosapiens, Janet Thornton is the person to contact about
that.

gh: more about types (proliferation causing data overload issue mentioned
above.)
also discussion about dag vs hierarchical tree. pointing to multiple
terms in the ontology for a particular type. in SO, how much has
multiple parents come up? may need a type that can point to multiple
ontology terms for that type. das/2 cannot do it yet, only one term
per type.
ls: the more flexible we make it the less coherent it will be. data
overload will get even worse. to reduce data overload, need a way to
take data from servers and deciding if same or different. are they
reachable in same ontology? allowing set arithematic will create
ambiguity. biosapiens can be allowed with an attribute, multiple
attributes that point at different ontologies.

gh: combining cellular location with protien classification
ontologies. 
ls: certainly, but those are separate attributes. what we created is
essentially an RDF. Actually, terminology is 'property' not
attribute. Types property is the correct way to do this.

gh: use of subset of das/1, what it means for das/2
data overload for users,
featu classification issues

gh: das wish list, people wrote up what they feel what das is
inadequate for. Das/2 group was aware of these.

ls: encryption, synchronous request seem like impl issues, not part of
protocol.
gh: some people complained that das is inadequate because it relies on
http(s). you can do much more high-level things with soap-based
system. I think this is correct, but wrong that no one in our space
needs that.
ls: no pharma that cares about this will entrust it to the public
internet with any thing, soap or otherwise.
gh: at affy, we've done das/1 servers with https and no one has ever
complained. 
ls: identity theft problems via people stealing from encrypted streams
never emerged as a problem. they steal it from your physical trash,
setting up phony banking sites. Not related to strength of encryption.
gh: regarding asynch request - discussed 2 years ago -- yes, it's
outside of das/2 spec, but we say, use http as you will. redirect and
say "your request has been accepted, check back here in a while."

gh: wish list (sent out in email to the list noted above):
- multi-level features, stylesheets
- caching - use http caching as you will
- features from other sources - dealth with since we use URIs. a
  problem for das/1

ls: providence requires people to put in effort to maintain the
providence, but it doesn't free you of responsibility of having to
track it.

- scalability and large analysis - the data overload issue. the
answer to me is smarter clients.

- more queries -- addressed in das/2
- entry point supports - in das/2 we have a less ambiguous way to say
  whether a server points it or not.
- counting number of features of each type per source -- have the
  'count' format in das/2
- refering to id's externally (das/2 uri's)
- errors and exception handling - we have http error codes -- remains
  to be seen how well it works out. done a reasonable job to map it to
  http error codes
- better stylesheets - in progress for das/2
- mapping servers - different genome assembly versions or mapping from
  protein to nucleotide space. -- under discussion with data
  providers.

ap: Another thing on wish list: people want to know stats per server,
uptime, hits, etc. (server stats).
gh: andreas' registry does a good job for das/1. biosapiens registry
is built on Andreas' registry. How many are up, which requests they
support, the data the server. Very nice.

ap: Gregg's coverage was good. Also gave a very good advertisement for
das/2!

gh: the das/1 to das/2 transformational proxy was quite
popular. doesn't take advantage of das/2 power, but gets people started.

Other Topics:
--------------
sc: biodas.org wiki is now officially up.
gh: mentioned to Tim Hubbard. He said, "I know. I already edited it."

sc: globalseqids page needs das2xml snippets for coordinates.

[A] lincoln will add das2xml coordinate snippets to globalseqids page on
wiki

sc: might also be good to have notice of the next teleconf on the
site. Maybe pointers to the notes as well.
gh: maybe have an automatic email sent out reminding folks?
sc: maybe not, if we have a list of the dates for upcoming meetings on
the site. 

[A] Steve post list of dates of upcoming DAS/2 teleconferences on wiki

Next meeting in two weeks: 19 mar 2007


From Gregg_Helt at affymetrix.com  Wed Mar  7 16:21:48 2007
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 7 Mar 2007 13:21:48 -0800
Subject: [DAS2] Stable URIs coming from NCBI?
Message-ID: <C71929195D04BF48BAECD499AF717B480198CD58@msex02.affymetrix.com>

Some good news (or at least rumor of good news) from NCBI -- plans to
expose stable URIs for all their resources:
http://lists.w3.org/Archives/Public/public-semweb-lifesci/2007Feb/0123.h
tml  
 
Which would fit nicely with the URI-centric approach of DAS/2...
 
            Gregg


From lstein at cshl.edu  Mon Mar 12 13:02:51 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 12 Mar 2007 13:02:51 -0400
Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 5 Mar 2007
In-Reply-To: <C211A967.25541%Steve_Chervitz@affymetrix.com>
References: <AcdfWOZdJPCC8MtMEduuXAAKlXZSNg==>
	<C211A967.25541%Steve_Chervitz@affymetrix.com>
Message-ID: <6dce9a0b0703121002h4f866b10jb160044260ea812e@mail.gmail.com>

>
> lincoln will add das2xml coordinate snippets to globalseqids page on
> wiki
>

I added one line to the description of the H. sapiens source. Is this what
you're looking for? If it is, I'll go ahead and add the rest.

Note that the contents of the XML are not defined anywhere. I'm not sure why
there should be a URI that looks like it is fetchable.

Lincoln


On 3/5/07, Steve Chervitz <Steve_Chervitz at affymetrix.com> wrote:
>
> Notes from the biweekly DAS/2 teleconference, 5 Mar 2007
>
> $Id: das2-teleconf-2007-03-05.txt,v 1.2 2007/03/05 19:01:59 sac Exp $
>
> Teleconference Info:
>    * Schedule:         Biweekly on Monday
>    * Time of Day:      9:30 AM PST, 17:30 GMT
>    * Dialin (US):      800-531-3250
>    * Dialin (Intl):    303-928-2693
>    * Toll-free UK:     08 00 40 49 467
>    * Toll-free France: 08 00 907 839
>    * Conference ID:    2879055
>    * Passcode:         1365
>
> Attendees:
>     Affy: Steve Chervitz, Ed Erwin, Gregg Helt
>     CSHL: Lincoln Stein
>   Sanger: Andreas Prlic
>     UCLA: Allen Day
>
> Note taker: Steve Chervitz
>
> Action items are flagged with '[A]'.
>
> These notes are checked into the biodas.org CVS repository at
> das/das2/notes/. Instructions on how to access this
> repository are at http://biodas.org
>
> DISCLAIMER:
> The note taker aims for completeness and accuracy, but these goals are
> not always achievable, given the desire to get the notes out with a
> rapid turnaround. So don't consider these notes as complete minutes
> from the meeting, but rather abbreviated, summarized versions of what
> was discussed. There may be errors of commission and omission.
> Participants are welcome to post comments and/or corrections to these
> as they see fit.
>
>
> Agenda
> -------
> * Review of BioSapiens DAS workshop
> * Status updates
>
>
> gh: I sent my summary of the biosapiens das workshop and feature
> classification workshop I attended with Ed in Hinxton:
> http://lists.open-bio.org/pipermail/das2/2007-March/000982.html
>
> "das developers workshop from a das/2 perspective", summarizes what I
> took home from these meetings, how well das/2 meets needs of people in
> europe (ensembl, sanger, biosapiens -- the focus of these
> meetings). and a quick biosapiens overview: a big european project ,
> 25 institutions, large scale genome protein annotation. decided early
> on to use das to distribute annotations between organizations. can
> check the stats on their das servers -- andreas' registry -- 23
> servers serving up 69 das sources -- a major das investment!
>
> In developing das/2 we haven't had too much experience with the kind
> of data they're dealing with (protein annotations).
>
> das/1 clients under study:
> - dasty2, dasty1 - ajax-based viz clients
> - jalview - alignment viewer, editor
> - igb - Ed gave presentation
> - pepper and spice - das viewers, also use alignment and 3d structure
>    info
> - proview - protein annotation,
> - ensembl viewer
>
> servers presented/discussed:
> - pfam, ensembl, proserver, Andreas',
> - Extensions to das/1 protocol discussed: gene das, protein das,
>    structure das, 3d-em das (arbitrary 3d volumes), interaction das for
>    prot-prot interactions. Moddas - writeback in das/1. Alignment das
>    (Andreas).
> - Simple das - das servers that don't impl all of das/1 (entry_points,
>    or types, e.g.,).
>
> Gregg presented on das/2, will put up ppt later. Tailored it assuming
>
> [A] Gregg will send out powerpoint for his talk from BioSapiens DAS
> workshop
>
> Focussed on familiarity with das/1, how big the diffs are with an eye
> towards how hard it would be to move to das/2. Conceptually, not that
> big a switch, though XML is a lot different.
>
> Also discussed how well das/2 addresses some of the problems with
> das/1 that came up at the workshop.
>
> extensions for das/1:
> - das/2 addressed some of them very well. E.g., gene das (das w/o
>   specifying location of feature). this is addressed well in
>   das/2. can have features w/o location, or w/o range.
> - protein das - das/2 did a good job of removing nucleotide specific
>   parts of das features (orientation, phase are not required). das/2
>   is much more agnostic about dna vs protein.
> - alignment das - pairwise or multiple - locations with features in
>   das/2 addresses some of these issues (0,1,or more locations for a
>   feature) each location can have optional gap attribute (cigar
>   string). so if you can describe it with a cigar string, you can
>   describe it in das/2. Can use multiple locations to do mult
>   alignments. Not dealt with in das/2: 3d-threading of an alignment
> through
> a
>   structure.  Need to look at this in the future
>
> [A] Look at how to handle 3D structure alignment threading in DAS/2 spec
>
> - simple das stuff handled better in das/2 - in das/1 the assumption
>   is you support all things unless. but in das/2 there is a
>   capabilities header, you must indicate support there, if not stated,
>   the default is you don't support it. Can also say you support
>   feature filters, so there's more formal support for that.
>
> Surprises:
> - smaller subset of das/1 is in use than expected. of 69 sources, 64
>   either fail entry points or say not applicable. types query: 49
>   fail/not applicable
>
> ls: for types query. only one type?
> gh: for ensembl, this is the case.
> ap: lack of consistency of types is addressed in the other workshop
> related to features.
>
> gh: in types in das/1 it is less necessary because all info is
> replicated in each feature, type-method, category, id
> ls: use case for types query is to present user with set of
> checkboxes, select which type to retrieve from source. if in practice
> das sources are being use to for one type or a set of types that only
> make sense together, no reason to turn off a part of it, then makes
> sense to not support types query.
> ls: have heard that types query is expensive. computationally. simple
> db backends with no normalization/indexins, finding all types involves
> visiting each record.
> gh: part of justification with 1 type / source is because those types
> are stored in separate db. so having a das server to integrate them
> make sense.
>
> gh: Re: using smaller subset of das/1 than I expected:
> types can be expensive in another way, example: representing pfam in
> das. feat type for each pfam domain type (9000 primary domains).
> Pfam b - there are 70-400K more!
>
> ls: in das/2 create a single type 'protein domain' then use attribute
> pointing to an ontology saying which pfam domain it is.
> gh: concern there is, assuming clients will do something useful for
> particular attributes. For rendering, I could do diff rendering based
> on diff attribs (color diff domains differently). but for clients to
> really understand that they're different, that's a more complicated
> issue.
>
> gh: not using types or entry_points by clients because servers don't,
> feedback loop.
> ap: low coverage genomes (e.g., elephant) may have several 100K entry
> points.
> gh: in das/2 we are more formal and say that you don't support
> it. Creates problem: how do you know what to query in the first place?
> Then you have to know what you're looking for.
>
> gh: feature hierarchies handled in das/2 -- this is not an issue for
> protein das, where annotations are completely flat. even protein
> disulfide bond is one level, just rendered differently so it doesn't
> span all residues in between. But doing non-visual things (unions,
> intersections) this could be a problem.
> ls: flat in terms of location or ontology?
> gh: location. there is no feature ontology yet (no consistent, agreed
> upon yet, just proposed at this meeting).
> ls: they aren't creating discontinuous features because too hard, or
> don't care.
> gh: just not needed for most protein annotations. even when it could
> be needed, just not being used.
> ls: for nucleotide, it's needed frequently
> gh: not an issue for das/2
>
> gh: ensembl collapses type and source into one thing. what does this
> mean? das/2 could be over complicated.
> ls: no doubt that it is too complicated for the biosapiens use
> case. we could make it easy for them to use by providing tool kits to
> read and write. could also argue that postscript is too complicate to
> draw simple rectangles on the page. You wouldn't expect then to
> simplify postscript. There are tools to ease simple rendering.
> The complexity of das/2 won't interfere with adoption, but not having
> toolkits, middleware layers to read/write. Not getting ensembl buy-in
> to das/2 could be a problem
> gh: tim hubbard was there and was on-board to transition to
> das/2.
> ls: would have be better to have buy in now (i.e., Tony Cox dropping
> out)
> gh: we've made it more formal to say, here is the subset of das/2 that
> this server supports. for other use cases, we do need the added
> complexity.
>
> gh: re: ensembl support for das/2. I mentioned andrew's das/1 - das/2
> transformational proxy server. not released yet, but making progress
> on it. So if you have a das/1 server, you can put a das/2 front end on
> it.
> ls: can you go the other way, provide das/1 interface on das/2?
> gh: want to do this for the affy public das/2 server. Andrew's doesn't
> do that yet, but I'd like to do this. Another thing: integrate that
> proxy into the registry, so the registry makes it into a das/2
> server. then we don't have a burden on servers to support two versions
> of the protocol.
> got email from andrew about his proxy on that.
>
> sc: I put a note about Andrew's proxy server on the biodas.org wiki.
> gh: he needs to have a place to keep it.
> sc: open-bio server would work. Just need a beetter mechanism to
> ensure it stays up. I think it's not getting started when the machine
> gets rebooted.
>
> [A] Steve/Andrew work on stable home for the proxy server
>
> [Correction: In my note in the teleconf, I was thinking about Andrew's
> validation server, which is hosted on open-bio and has a problem with
> not being up reliably. The proxy server is another issue. There's a
> mention of it on the DAS FAQ page, but not pointer to any server
> yet. -steve]
>
> gh: data overload and redundancy from the user perspective. clients
> where default for protein annotation is to go to all servers, you have
> way too many track showing up. Lots of servers and types. Ensembl is
> moving to expose even more data via das, thousands of new tracks
> (organisms, type, assembly version). Concern with biosapiens is
> replication of the same annotation data. E.g., pfam domains in
> different biosapiens data sources, may return same thing or slight
> diffs in feature ranges. how does user decide which is authoritative?
> Which can be left out? A big concern at the biosapiens meeting --
> redundant information.
>
> gh: another issue: mirrors for the data. discussed in early days of
> das/2, not resolved how to deal with mirrors, http redirection
> mechanism. This can lead to redundant data when you hit all mirrors.
>
> gh: feature classification and ontologies around that. My take was
> that the sequence ontology is inadequate to describe protein
> annotation as it stands now. PAO - protein annotation ontology
> ls: are they doing this with NCBO involved?
> gh: talked to them about getting hold of lincoln and suzi and
> integrating with SO as an extension.
> ap: for 3rd version of SO we will contact lincoln and suzi to discuss
> ls: great
> gh: for biosapiens, Janet Thornton is the person to contact about
> that.
>
> gh: more about types (proliferation causing data overload issue mentioned
> above.)
> also discussion about dag vs hierarchical tree. pointing to multiple
> terms in the ontology for a particular type. in SO, how much has
> multiple parents come up? may need a type that can point to multiple
> ontology terms for that type. das/2 cannot do it yet, only one term
> per type.
> ls: the more flexible we make it the less coherent it will be. data
> overload will get even worse. to reduce data overload, need a way to
> take data from servers and deciding if same or different. are they
> reachable in same ontology? allowing set arithematic will create
> ambiguity. biosapiens can be allowed with an attribute, multiple
> attributes that point at different ontologies.
>
> gh: combining cellular location with protien classification
> ontologies.
> ls: certainly, but those are separate attributes. what we created is
> essentially an RDF. Actually, terminology is 'property' not
> attribute. Types property is the correct way to do this.
>
> gh: use of subset of das/1, what it means for das/2
> data overload for users,
> featu classification issues
>
> gh: das wish list, people wrote up what they feel what das is
> inadequate for. Das/2 group was aware of these.
>
> ls: encryption, synchronous request seem like impl issues, not part of
> protocol.
> gh: some people complained that das is inadequate because it relies on
> http(s). you can do much more high-level things with soap-based
> system. I think this is correct, but wrong that no one in our space
> needs that.
> ls: no pharma that cares about this will entrust it to the public
> internet with any thing, soap or otherwise.
> gh: at affy, we've done das/1 servers with https and no one has ever
> complained.
> ls: identity theft problems via people stealing from encrypted streams
> never emerged as a problem. they steal it from your physical trash,
> setting up phony banking sites. Not related to strength of encryption.
> gh: regarding asynch request - discussed 2 years ago -- yes, it's
> outside of das/2 spec, but we say, use http as you will. redirect and
> say "your request has been accepted, check back here in a while."
>
> gh: wish list (sent out in email to the list noted above):
> - multi-level features, stylesheets
> - caching - use http caching as you will
> - features from other sources - dealth with since we use URIs. a
>   problem for das/1
>
> ls: providence requires people to put in effort to maintain the
> providence, but it doesn't free you of responsibility of having to
> track it.
>
> - scalability and large analysis - the data overload issue. the
> answer to me is smarter clients.
>
> - more queries -- addressed in das/2
> - entry point supports - in das/2 we have a less ambiguous way to say
>   whether a server points it or not.
> - counting number of features of each type per source -- have the
>   'count' format in das/2
> - refering to id's externally (das/2 uri's)
> - errors and exception handling - we have http error codes -- remains
>   to be seen how well it works out. done a reasonable job to map it to
>   http error codes
> - better stylesheets - in progress for das/2
> - mapping servers - different genome assembly versions or mapping from
>   protein to nucleotide space. -- under discussion with data
>   providers.
>
> ap: Another thing on wish list: people want to know stats per server,
> uptime, hits, etc. (server stats).
> gh: andreas' registry does a good job for das/1. biosapiens registry
> is built on Andreas' registry. How many are up, which requests they
> support, the data the server. Very nice.
>
> ap: Gregg's coverage was good. Also gave a very good advertisement for
> das/2!
>
> gh: the das/1 to das/2 transformational proxy was quite
> popular. doesn't take advantage of das/2 power, but gets people started.
>
> Other Topics:
> --------------
> sc: biodas.org wiki is now officially up.
> gh: mentioned to Tim Hubbard. He said, "I know. I already edited it."
>
> sc: globalseqids page needs das2xml snippets for coordinates.
>
> [A] lincoln will add das2xml coordinate snippets to globalseqids page on
> wiki
>
> sc: might also be good to have notice of the next teleconf on the
> site. Maybe pointers to the notes as well.
> gh: maybe have an automatic email sent out reminding folks?
> sc: maybe not, if we have a list of the dates for upcoming meetings on
> the site.
>
> [A] Steve post list of dates of upcoming DAS/2 teleconferences on wiki
>
> Next meeting in two weeks: 19 mar 2007
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From Steve_Chervitz at affymetrix.com  Mon Mar 19 13:47:57 2007
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 19 Mar 2007 10:47:57 -0700
Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 19 Mar 2007
Message-ID: <C2241ADD.259E4%Steve_Chervitz@affymetrix.com>

Notes from the biweekly DAS/2 teleconference, 19 Mar 2007

$Id: das2-teleconf-2007-03-19.txt,v 1.2 2007/03/19 17:46:41 sac Exp $

Teleconference Info:
   * Schedule:         Biweekly on Monday
   * Time of Day:      9:30 AM PST, 17:30 GMT
   * Dialin (US):      800-531-3250
   * Dialin (Intl):    303-928-2693
   * Toll-free UK:     08 00 40 49 467
   * Toll-free France: 08 00 907 839
   * Conference ID:    2879055
   * Passcode:         1365

Attendees: 
    Affy: Steve Chervitz, Ed Erwin, Gregg Helt
    CSHL: Lincoln Stein

Note taker: Steve Chervitz

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/ and are viewable on-line at
http://biodas.org/documents/das2/notes/

Instructions on how to access the DAS/2 CVS repository are at
http://www.biodas.org/wiki/DAS/2#CVS_Access

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda
-------
 * General issues
 * Status reports, including report from Lincoln on hapmap and das2
 * Gregg's post-grant status
 * IGB support post-March


Topic: General Issues
----------------------

ls: Regarding the coordinate stuff for global seq ids, need
clarification (see me message on list).

gh: for each release we should have the xml snippet for the
coordinates, four attribs for authority, etc. so people can see
directly what they need to provide in their DAS/2 request.

[A] gregg will send global seq ID coordinate XML example to Lincoln

Topic: Status reports
----------------------

gh: working on getting good reporesentations of graphs for Affy das/2
server serving up tiling array data. Serving up slices of
graphs. Working well on my test server, better than expected. Slow
thing is the indexing the first time it sees a file. Chrm1 at 5bp
resolution tiling array data, 120M data points, slicing indexing takes
a couple of seconds the first time, other times there's no delay. this
is serving up in an optimized format. Need to serve in std das/2
format with a feature per data point. Not too hard.

Planning to deploy in April when Steve gets new server
running. Drosophila time-course public data. 8-9 time points RNA
expression tiling arrays. When phase 3 ENCODE paper comes out, we'll
have a pointer to our server for viewing that data.
Also need to beef up feat filter queries to support full spec on the
Affy das/2 server. transition IGB from using quickload and replace all
quickload stuff with das/2, so we don't need to maintain two code
bases and data respositories.

ls: hapmap das/2 server is up and running. temporarily at Brian Gilman's
consultancy business. He's coming here to CSHL to get a permanent version
running on hapmap.org by next week. There's a whole API for accessing that
data in the form that's required by NCI's caBIO project
(caCORE). After server goes up, I'll point coordinates that location,
documentation. It works with other das/2 sources as well, (Affy,
biopackages).

gh: So it will put any of that DAS-available data into caCORE object model?
ls: yes. It also can give data as DOM models, might be easier for some
users/apps.

gh: Rolling this into the next caBIO release?
ls: yes. 

ls: Will provide snp's and haplotype blocks as features.
one track per population. we can put as many tracks in as you
need. Just one set now. There are 4 populations grouped into three
panels, since two pop's don't have enough diffs to break them out.

[A] lincoln send gregg pointer to current hapmap server for testing

sc: Working on configuring the new affy das/2 public server, a
replacement machine with a lot more RAM than current box. Have been
busy with other Affy work (new Netaffx release, new product support,
etc.) but should be mostly done with this by end of March. Should be
able to devote some solid blocks to DAS work (target: 3wks). Plan is
to support as many Affy products as we can. Less focus on supporting
UCSC-provided annotations (since they're the best source for them).

sc: Gregg, have you considered using the same approach for serving
annotations by your das/2 server as you are doing to support graphs?
Could ease memory requirements.
gh: possible, but not practical, since it would require a new format
for every feature type. Graphs are relatively straightforward to serve
up via an indexing strategy. Doing something similar for features
would mean essentially writing a database app.

Other Items:
-------------

gh: grant admin says our burn rate is lower than anticipated. we can
apply for a no-cost extension. should last at least till the end of
June as for funding. We'll apply for that. Not sure what it means for
CSHL. last time it took 3-4 mos to sort it out.

ls: start working on it now. there were communication problems in the
past. would be great if Allen could extend another month or two.

gh: Andrew will come visit me in the next day or two. Will get the
latest from him. He's been working on the transformational das1-> das2
proxy. Want to get the Ensembl people to use it ASAP.

[A] get a usable das1->das2 proxy server, deploy at Ensembl

gh: Need to look at how to support scores in das/2. we dropped score
element. You can add arbitrary
elements to das/2. You can put in multiple diff scores that way, or
use XML namespaces to bring in a das/2 score element. Want to have a
recommended way of doing this. Need more input from others. In Europe
they're using score a lot more than here in the States.

[A] come up with recommended way to support scores in DAS/2

Topic: Gregg's agenda
----------------------

gh: I am planning to leave Affy at end of the grant. Will focus on
doing hands-on DAS/2 evangelism, ideally work with UCSC. Then will
take some time off.  Affy wasn't interested in supporting das w/o some
outside funding. Therefore, it's a good time to transition.

Regarding UCSC? ready to go down there and write some code. They have
a das/1 server, they just need someone with DAS/2 expertise that I can
provide. biggest prob with das/2 is adoption outside of the
grant people.

sc: considered using Andrew's proxy?
gh: might be OK for a temporary solution, but it wouldn't be as
efficient as directly supporting das/2, and I know Jim et al are
interested in efficiency. Since I'm in the area, I can help them get
into DAS/2 directly, which would help with DAS/2 acceptance by the
community. 

gh: Another goal was to have a DAS/2 paper ready and submitted before
I leave, want to have a rough draft in april. Plan to submit to an
open source journal: Biomedcentral, PLoS, or other.

[A] Gregg will circulate draft of DAS/2 paper, draft in April.

Topic: IGB Support
-------------------

ee: Regarding IGB support, Affy is not supporting IGB after March,
they are moving me to a different project. Support for IGB could
return if there's enough interest.

sc: how self-supporting is the igb community?
ee: not much.
gh: Ann Loraine has interest as do internal Affy users.

sc: Sourceforge has a new wiki project that's in beta now, for adding
a wiki to your project's web page. Could help make the IGB community
self-supporting, on-line docs, FAQ, etc. I volunteered to participate,
but haven't done anything with it yet.

gh: IGB has a good user's guide now, thanks to Ed's recent update.

ee: I'm also working on plugin interface and documenting the http API
protocol, things that will make it easier for others to use IGB with other
programs.  


From Gregg_Helt at affymetrix.com  Mon Mar  5 16:40:26 2007
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 5 Mar 2007 08:40:26 -0800
Subject: [DAS2] DAS/2 Teleconference today at 9:30 AM PST
Message-ID: <C71929195D04BF48BAECD499AF717B480198CD51@msex02.affymetrix.com>

Just a reminder that the DAS/2 teleconference will be at the regular
time today, 9:30 AM Pacific time.  Ed and I will be summarizing the DAS
developer and BioSapiens feature classification workshops we attended
last week in Hinxton.  Hopefully others who attended will join in and
give their perspectives as well.

Conference phone # 
    USA: 800-531-8250
    International: 303-928-2693
Conference ID: 2879055
Passcode: 1365


	Gregg


From Gregg_Helt at affymetrix.com  Mon Mar  5 17:30:10 2007
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 5 Mar 2007 09:30:10 -0800
Subject: [DAS2] Brief summary of DAS/BioSapiens workshops from a DAS/2
	perspective
Message-ID: <C71929195D04BF48BAECD499AF717B480198CD52@msex02.affymetrix.com>

Summary of DAS & Feature Classification workshops, February 26-28 2007,
Hinxton
 
DAS Developers Workshop:
http://www.sanger.ac.uk/Users/ap3/dasworkshop.html
 
BioSapiens Feature Type Classification Workshop:
http://www.ebi.ac.uk/~hhe/tmp/BioSapiensFeatureMeeting.htm


DAS1 clients discussed:
          Dasty2, JalView, VectorBase, IGB, Pepper, Spice, ProView,
Ensembl ContigView, ...
DAS1 servers discussed:
          PFam, Ensembl, ProServer, Sisyphus, ...
 
DAS1 extensions:
          Gene DAS
          Protein DAS
          Alignmen tDAS          
          Structure DAS
          3D-EM DAS
          Interaction DAS
          MaDAS (writeback?)
"simple" DAS


DAS/2

BioSapiens Overview:  http://www.biosapiens.info
<http://www.biosapiens.info/>  
  Large-scale genome/protein annotation, 25 institutions from 14
countries across Europe participating
  Currently 23 DAS servers within BioSapiens project serving 69 DAS
sources.
  4 servers appear to be down (21 sources fail features query)
  See http://www.biosapiens.info/page.php?page=biosapiensdir for more
DAS server stats


Major concerns for Ensembl / Sanger / BioSapiens I think we've addressed
well in DAS/2
          Gene DAS
          Protein DAS
          Alignment DAS          
"simple" DAS 

Major concerns for Ensembl / Sanger / BioSapiens that surprised me:

    A) In general the use of a smaller subset of DAS1 than expected
        Many BioSapiens DAS servers don't support "entry_points" query
(64 fail|NA)
        Many BioSapiens DAS servers don't support "types query" (49
fail|NA)
               in DAS1 features themselves can carry most of the types
info
        Some BioSapiens DAS servers don't support "features" query
parameters (only the features query with no params)
        Many BioSapiens clients don't use "entry_points" query, "types"
query, or any feature filters (always get all features for a given
segment)
        BioSapiens protein annotation almost exclusively uses flat
(one-level) features
very little or no use of "group" attribute to make two-level features
example: disulfide bond annotation- relies on rendering or prior
knowledge to differentiate
        Ensembl DAS servers are in general serving one type per source
        These simplifications of clients and servers are reinforcing
each other
        If using subset of DAS1, does this mean that DAS/2 might be too
complex?
        But with these simplifications, the complexity is getting pushed
into other places
    
  B) Data overload
        Number of servers, sources, types
             Ensembl: will have 1000s of sources soon
        Redundancy concerns
             example: Pfam domain 
   Many sources with same / similar annotation type - "Pfam domain"
          Slight differences in feature ranges
          Which is the authority?
          Is there a way to help clients decide which can be combined
        Mirrors
  
  C) Feature Classification / Ontology issues
        SO currently inadequate for describing protein annotation
               developing PAO (Protein Annotation Ontology)
        types proliferation
            example: one feature type for each PFam domain?
                ~9K PFam-A domains
                If look at PFam-B (PRODOM that don't overlap PFam-A),
then ~70K / 450K more (>2 proteins in family / not)
            of not in unique type, where does that information go?
       Need multiple ontology terms to describe a single type?
 
------------------------------------------------------------------------
------
 
DAS WishList (last session of DAS workshop, people listed desired
improvements on whiteboard)

Multi-level features (Gregg)
Multi-level stylesheets (Ed)
Caching (last-modified, if-modified-since, TTL)
Provenance of features from other sources (features from different
sources with same IDs? types?)
Large analysis / Scalibility
       1000s of seqs + 1000s sources + types ?
More queries: feature types / date
Entry point support
Encryption support
Stats-query interface -- count # of features of type for a source
ID ref external (URI / URN)
Proper error / exception handling
Asynchronous requests
       process
       batches
Better Stylesheets
Mapping servers

We've discussed most of these wishlist issues before while developing
DAS/2, though we certainly haven't completely solved all of them...
 
 
From Steve_Chervitz at affymetrix.com  Mon Mar  5 19:03:03 2007
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 05 Mar 2007 11:03:03 -0800
Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 5 Mar 2007
Message-ID: <C211A967.25541%Steve_Chervitz@affymetrix.com>

Notes from the biweekly DAS/2 teleconference, 5 Mar 2007

$Id: das2-teleconf-2007-03-05.txt,v 1.2 2007/03/05 19:01:59 sac Exp $

Teleconference Info:
   * Schedule:         Biweekly on Monday
   * Time of Day:      9:30 AM PST, 17:30 GMT
   * Dialin (US):      800-531-3250
   * Dialin (Intl):    303-928-2693
   * Toll-free UK:     08 00 40 49 467
   * Toll-free France: 08 00 907 839
   * Conference ID:    2879055
   * Passcode:         1365

Attendees:
    Affy: Steve Chervitz, Ed Erwin, Gregg Helt
    CSHL: Lincoln Stein
  Sanger: Andreas Prlic
    UCLA: Allen Day

Note taker: Steve Chervitz

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda
-------
* Review of BioSapiens DAS workshop
* Status updates


gh: I sent my summary of the biosapiens das workshop and feature
classification workshop I attended with Ed in Hinxton:
http://lists.open-bio.org/pipermail/das2/2007-March/000982.html

"das developers workshop from a das/2 perspective", summarizes what I
took home from these meetings, how well das/2 meets needs of people in
europe (ensembl, sanger, biosapiens -- the focus of these
meetings). and a quick biosapiens overview: a big european project ,
25 institutions, large scale genome protein annotation. decided early
on to use das to distribute annotations between organizations. can
check the stats on their das servers -- andreas' registry -- 23
servers serving up 69 das sources -- a major das investment!

In developing das/2 we haven't had too much experience with the kind
of data they're dealing with (protein annotations).

das/1 clients under study:
 - dasty2, dasty1 - ajax-based viz clients
 - jalview - alignment viewer, editor
 - igb - Ed gave presentation
 - pepper and spice - das viewers, also use alignment and 3d structure
   info
 - proview - protein annotation,
 - ensembl viewer

servers presented/discussed:
 - pfam, ensembl, proserver, Andreas',
 - Extensions to das/1 protocol discussed: gene das, protein das,
   structure das, 3d-em das (arbitrary 3d volumes), interaction das for
   prot-prot interactions. Moddas - writeback in das/1. Alignment das
   (Andreas). 
 - Simple das - das servers that don't impl all of das/1 (entry_points,
   or types, e.g.,).

Gregg presented on das/2, will put up ppt later. Tailored it assuming

[A] Gregg will send out powerpoint for his talk from BioSapiens DAS workshop

Focussed on familiarity with das/1, how big the diffs are with an eye
towards how hard it would be to move to das/2. Conceptually, not that
big a switch, though XML is a lot different.

Also discussed how well das/2 addresses some of the problems with
das/1 that came up at the workshop.

extensions for das/1:
- das/2 addressed some of them very well. E.g., gene das (das w/o
  specifying location of feature). this is addressed well in
  das/2. can have features w/o location, or w/o range.
- protein das - das/2 did a good job of removing nucleotide specific
  parts of das features (orientation, phase are not required). das/2
  is much more agnostic about dna vs protein.
- alignment das - pairwise or multiple - locations with features in
  das/2 addresses some of these issues (0,1,or more locations for a
  feature) each location can have optional gap attribute (cigar
  string). so if you can describe it with a cigar string, you can
  describe it in das/2. Can use multiple locations to do mult
  alignments. Not dealt with in das/2: 3d-threading of an alignment through
a
  structure.  Need to look at this in the future

[A] Look at how to handle 3D structure alignment threading in DAS/2 spec

- simple das stuff handled better in das/2 - in das/1 the assumption
  is you support all things unless. but in das/2 there is a
  capabilities header, you must indicate support there, if not stated,
  the default is you don't support it. Can also say you support
  feature filters, so there's more formal support for that.

Surprises:
- smaller subset of das/1 is in use than expected. of 69 sources, 64
  either fail entry points or say not applicable. types query: 49
  fail/not applicable

ls: for types query. only one type?
gh: for ensembl, this is the case.
ap: lack of consistency of types is addressed in the other workshop
related to features.

gh: in types in das/1 it is less necessary because all info is
replicated in each feature, type-method, category, id
ls: use case for types query is to present user with set of
checkboxes, select which type to retrieve from source. if in practice
das sources are being use to for one type or a set of types that only
make sense together, no reason to turn off a part of it, then makes
sense to not support types query.
ls: have heard that types query is expensive. computationally. simple
db backends with no normalization/indexins, finding all types involves
visiting each record.
gh: part of justification with 1 type / source is because those types
are stored in separate db. so having a das server to integrate them
make sense.

gh: Re: using smaller subset of das/1 than I expected:
types can be expensive in another way, example: representing pfam in
das. feat type for each pfam domain type (9000 primary domains).
Pfam b - there are 70-400K more!

ls: in das/2 create a single type 'protein domain' then use attribute
pointing to an ontology saying which pfam domain it is.
gh: concern there is, assuming clients will do something useful for
particular attributes. For rendering, I could do diff rendering based
on diff attribs (color diff domains differently). but for clients to
really understand that they're different, that's a more complicated
issue.

gh: not using types or entry_points by clients because servers don't,
feedback loop.
ap: low coverage genomes (e.g., elephant) may have several 100K entry
points. 
gh: in das/2 we are more formal and say that you don't support
it. Creates problem: how do you know what to query in the first place?
Then you have to know what you're looking for.

gh: feature hierarchies handled in das/2 -- this is not an issue for
protein das, where annotations are completely flat. even protein
disulfide bond is one level, just rendered differently so it doesn't
span all residues in between. But doing non-visual things (unions,
intersections) this could be a problem.
ls: flat in terms of location or ontology?
gh: location. there is no feature ontology yet (no consistent, agreed
upon yet, just proposed at this meeting).
ls: they aren't creating discontinuous features because too hard, or
don't care.
gh: just not needed for most protein annotations. even when it could
be needed, just not being used.
ls: for nucleotide, it's needed frequently
gh: not an issue for das/2

gh: ensembl collapses type and source into one thing. what does this
mean? das/2 could be over complicated.
ls: no doubt that it is too complicated for the biosapiens use
case. we could make it easy for them to use by providing tool kits to
read and write. could also argue that postscript is too complicate to
draw simple rectangles on the page. You wouldn't expect then to
simplify postscript. There are tools to ease simple rendering.
The complexity of das/2 won't interfere with adoption, but not having
toolkits, middleware layers to read/write. Not getting ensembl buy-in
to das/2 could be a problem
gh: tim hubbard was there and was on-board to transition to
das/2. 
ls: would have be better to have buy in now (i.e., Tony Cox dropping
out)
gh: we've made it more formal to say, here is the subset of das/2 that
this server supports. for other use cases, we do need the added
complexity.

gh: re: ensembl support for das/2. I mentioned andrew's das/1 - das/2
transformational proxy server. not released yet, but making progress
on it. So if you have a das/1 server, you can put a das/2 front end on
it.
ls: can you go the other way, provide das/1 interface on das/2?
gh: want to do this for the affy public das/2 server. Andrew's doesn't
do that yet, but I'd like to do this. Another thing: integrate that
proxy into the registry, so the registry makes it into a das/2
server. then we don't have a burden on servers to support two versions
of the protocol. 
got email from andrew about his proxy on that.

sc: I put a note about Andrew's proxy server on the biodas.org wiki.
gh: he needs to have a place to keep it.
sc: open-bio server would work. Just need a beetter mechanism to
ensure it stays up. I think it's not getting started when the machine
gets rebooted.

[A] Steve/Andrew work on stable home for the proxy server

[Correction: In my note in the teleconf, I was thinking about Andrew's
validation server, which is hosted on open-bio and has a problem with
not being up reliably. The proxy server is another issue. There's a
mention of it on the DAS FAQ page, but not pointer to any server
yet. -steve] 

gh: data overload and redundancy from the user perspective. clients
where default for protein annotation is to go to all servers, you have
way too many track showing up. Lots of servers and types. Ensembl is
moving to expose even more data via das, thousands of new tracks
(organisms, type, assembly version). Concern with biosapiens is
replication of the same annotation data. E.g., pfam domains in
different biosapiens data sources, may return same thing or slight
diffs in feature ranges. how does user decide which is authoritative?
Which can be left out? A big concern at the biosapiens meeting --
redundant information.

gh: another issue: mirrors for the data. discussed in early days of
das/2, not resolved how to deal with mirrors, http redirection
mechanism. This can lead to redundant data when you hit all mirrors.

gh: feature classification and ontologies around that. My take was
that the sequence ontology is inadequate to describe protein
annotation as it stands now. PAO - protein annotation ontology
ls: are they doing this with NCBO involved?
gh: talked to them about getting hold of lincoln and suzi and
integrating with SO as an extension.
ap: for 3rd version of SO we will contact lincoln and suzi to discuss
ls: great
gh: for biosapiens, Janet Thornton is the person to contact about
that.

gh: more about types (proliferation causing data overload issue mentioned
above.)
also discussion about dag vs hierarchical tree. pointing to multiple
terms in the ontology for a particular type. in SO, how much has
multiple parents come up? may need a type that can point to multiple
ontology terms for that type. das/2 cannot do it yet, only one term
per type.
ls: the more flexible we make it the less coherent it will be. data
overload will get even worse. to reduce data overload, need a way to
take data from servers and deciding if same or different. are they
reachable in same ontology? allowing set arithematic will create
ambiguity. biosapiens can be allowed with an attribute, multiple
attributes that point at different ontologies.

gh: combining cellular location with protien classification
ontologies. 
ls: certainly, but those are separate attributes. what we created is
essentially an RDF. Actually, terminology is 'property' not
attribute. Types property is the correct way to do this.

gh: use of subset of das/1, what it means for das/2
data overload for users,
featu classification issues

gh: das wish list, people wrote up what they feel what das is
inadequate for. Das/2 group was aware of these.

ls: encryption, synchronous request seem like impl issues, not part of
protocol.
gh: some people complained that das is inadequate because it relies on
http(s). you can do much more high-level things with soap-based
system. I think this is correct, but wrong that no one in our space
needs that.
ls: no pharma that cares about this will entrust it to the public
internet with any thing, soap or otherwise.
gh: at affy, we've done das/1 servers with https and no one has ever
complained. 
ls: identity theft problems via people stealing from encrypted streams
never emerged as a problem. they steal it from your physical trash,
setting up phony banking sites. Not related to strength of encryption.
gh: regarding asynch request - discussed 2 years ago -- yes, it's
outside of das/2 spec, but we say, use http as you will. redirect and
say "your request has been accepted, check back here in a while."

gh: wish list (sent out in email to the list noted above):
- multi-level features, stylesheets
- caching - use http caching as you will
- features from other sources - dealth with since we use URIs. a
  problem for das/1

ls: providence requires people to put in effort to maintain the
providence, but it doesn't free you of responsibility of having to
track it.

- scalability and large analysis - the data overload issue. the
answer to me is smarter clients.

- more queries -- addressed in das/2
- entry point supports - in das/2 we have a less ambiguous way to say
  whether a server points it or not.
- counting number of features of each type per source -- have the
  'count' format in das/2
- refering to id's externally (das/2 uri's)
- errors and exception handling - we have http error codes -- remains
  to be seen how well it works out. done a reasonable job to map it to
  http error codes
- better stylesheets - in progress for das/2
- mapping servers - different genome assembly versions or mapping from
  protein to nucleotide space. -- under discussion with data
  providers.

ap: Another thing on wish list: people want to know stats per server,
uptime, hits, etc. (server stats).
gh: andreas' registry does a good job for das/1. biosapiens registry
is built on Andreas' registry. How many are up, which requests they
support, the data the server. Very nice.

ap: Gregg's coverage was good. Also gave a very good advertisement for
das/2!

gh: the das/1 to das/2 transformational proxy was quite
popular. doesn't take advantage of das/2 power, but gets people started.

Other Topics:
--------------
sc: biodas.org wiki is now officially up.
gh: mentioned to Tim Hubbard. He said, "I know. I already edited it."

sc: globalseqids page needs das2xml snippets for coordinates.

[A] lincoln will add das2xml coordinate snippets to globalseqids page on
wiki

sc: might also be good to have notice of the next teleconf on the
site. Maybe pointers to the notes as well.
gh: maybe have an automatic email sent out reminding folks?
sc: maybe not, if we have a list of the dates for upcoming meetings on
the site. 

[A] Steve post list of dates of upcoming DAS/2 teleconferences on wiki

Next meeting in two weeks: 19 mar 2007


From Gregg_Helt at affymetrix.com  Wed Mar  7 21:21:48 2007
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 7 Mar 2007 13:21:48 -0800
Subject: [DAS2] Stable URIs coming from NCBI?
Message-ID: <C71929195D04BF48BAECD499AF717B480198CD58@msex02.affymetrix.com>

Some good news (or at least rumor of good news) from NCBI -- plans to
expose stable URIs for all their resources:
http://lists.w3.org/Archives/Public/public-semweb-lifesci/2007Feb/0123.h
tml  
 
Which would fit nicely with the URI-centric approach of DAS/2...
 
            Gregg


From lstein at cshl.edu  Mon Mar 12 17:02:51 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 12 Mar 2007 13:02:51 -0400
Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 5 Mar 2007
In-Reply-To: <C211A967.25541%Steve_Chervitz@affymetrix.com>
References: <AcdfWOZdJPCC8MtMEduuXAAKlXZSNg==>
	<C211A967.25541%Steve_Chervitz@affymetrix.com>
Message-ID: <6dce9a0b0703121002h4f866b10jb160044260ea812e@mail.gmail.com>

>
> lincoln will add das2xml coordinate snippets to globalseqids page on
> wiki
>

I added one line to the description of the H. sapiens source. Is this what
you're looking for? If it is, I'll go ahead and add the rest.

Note that the contents of the XML are not defined anywhere. I'm not sure why
there should be a URI that looks like it is fetchable.

Lincoln


On 3/5/07, Steve Chervitz <Steve_Chervitz at affymetrix.com> wrote:
>
> Notes from the biweekly DAS/2 teleconference, 5 Mar 2007
>
> $Id: das2-teleconf-2007-03-05.txt,v 1.2 2007/03/05 19:01:59 sac Exp $
>
> Teleconference Info:
>    * Schedule:         Biweekly on Monday
>    * Time of Day:      9:30 AM PST, 17:30 GMT
>    * Dialin (US):      800-531-3250
>    * Dialin (Intl):    303-928-2693
>    * Toll-free UK:     08 00 40 49 467
>    * Toll-free France: 08 00 907 839
>    * Conference ID:    2879055
>    * Passcode:         1365
>
> Attendees:
>     Affy: Steve Chervitz, Ed Erwin, Gregg Helt
>     CSHL: Lincoln Stein
>   Sanger: Andreas Prlic
>     UCLA: Allen Day
>
> Note taker: Steve Chervitz
>
> Action items are flagged with '[A]'.
>
> These notes are checked into the biodas.org CVS repository at
> das/das2/notes/. Instructions on how to access this
> repository are at http://biodas.org
>
> DISCLAIMER:
> The note taker aims for completeness and accuracy, but these goals are
> not always achievable, given the desire to get the notes out with a
> rapid turnaround. So don't consider these notes as complete minutes
> from the meeting, but rather abbreviated, summarized versions of what
> was discussed. There may be errors of commission and omission.
> Participants are welcome to post comments and/or corrections to these
> as they see fit.
>
>
> Agenda
> -------
> * Review of BioSapiens DAS workshop
> * Status updates
>
>
> gh: I sent my summary of the biosapiens das workshop and feature
> classification workshop I attended with Ed in Hinxton:
> http://lists.open-bio.org/pipermail/das2/2007-March/000982.html
>
> "das developers workshop from a das/2 perspective", summarizes what I
> took home from these meetings, how well das/2 meets needs of people in
> europe (ensembl, sanger, biosapiens -- the focus of these
> meetings). and a quick biosapiens overview: a big european project ,
> 25 institutions, large scale genome protein annotation. decided early
> on to use das to distribute annotations between organizations. can
> check the stats on their das servers -- andreas' registry -- 23
> servers serving up 69 das sources -- a major das investment!
>
> In developing das/2 we haven't had too much experience with the kind
> of data they're dealing with (protein annotations).
>
> das/1 clients under study:
> - dasty2, dasty1 - ajax-based viz clients
> - jalview - alignment viewer, editor
> - igb - Ed gave presentation
> - pepper and spice - das viewers, also use alignment and 3d structure
>    info
> - proview - protein annotation,
> - ensembl viewer
>
> servers presented/discussed:
> - pfam, ensembl, proserver, Andreas',
> - Extensions to das/1 protocol discussed: gene das, protein das,
>    structure das, 3d-em das (arbitrary 3d volumes), interaction das for
>    prot-prot interactions. Moddas - writeback in das/1. Alignment das
>    (Andreas).
> - Simple das - das servers that don't impl all of das/1 (entry_points,
>    or types, e.g.,).
>
> Gregg presented on das/2, will put up ppt later. Tailored it assuming
>
> [A] Gregg will send out powerpoint for his talk from BioSapiens DAS
> workshop
>
> Focussed on familiarity with das/1, how big the diffs are with an eye
> towards how hard it would be to move to das/2. Conceptually, not that
> big a switch, though XML is a lot different.
>
> Also discussed how well das/2 addresses some of the problems with
> das/1 that came up at the workshop.
>
> extensions for das/1:
> - das/2 addressed some of them very well. E.g., gene das (das w/o
>   specifying location of feature). this is addressed well in
>   das/2. can have features w/o location, or w/o range.
> - protein das - das/2 did a good job of removing nucleotide specific
>   parts of das features (orientation, phase are not required). das/2
>   is much more agnostic about dna vs protein.
> - alignment das - pairwise or multiple - locations with features in
>   das/2 addresses some of these issues (0,1,or more locations for a
>   feature) each location can have optional gap attribute (cigar
>   string). so if you can describe it with a cigar string, you can
>   describe it in das/2. Can use multiple locations to do mult
>   alignments. Not dealt with in das/2: 3d-threading of an alignment
> through
> a
>   structure.  Need to look at this in the future
>
> [A] Look at how to handle 3D structure alignment threading in DAS/2 spec
>
> - simple das stuff handled better in das/2 - in das/1 the assumption
>   is you support all things unless. but in das/2 there is a
>   capabilities header, you must indicate support there, if not stated,
>   the default is you don't support it. Can also say you support
>   feature filters, so there's more formal support for that.
>
> Surprises:
> - smaller subset of das/1 is in use than expected. of 69 sources, 64
>   either fail entry points or say not applicable. types query: 49
>   fail/not applicable
>
> ls: for types query. only one type?
> gh: for ensembl, this is the case.
> ap: lack of consistency of types is addressed in the other workshop
> related to features.
>
> gh: in types in das/1 it is less necessary because all info is
> replicated in each feature, type-method, category, id
> ls: use case for types query is to present user with set of
> checkboxes, select which type to retrieve from source. if in practice
> das sources are being use to for one type or a set of types that only
> make sense together, no reason to turn off a part of it, then makes
> sense to not support types query.
> ls: have heard that types query is expensive. computationally. simple
> db backends with no normalization/indexins, finding all types involves
> visiting each record.
> gh: part of justification with 1 type / source is because those types
> are stored in separate db. so having a das server to integrate them
> make sense.
>
> gh: Re: using smaller subset of das/1 than I expected:
> types can be expensive in another way, example: representing pfam in
> das. feat type for each pfam domain type (9000 primary domains).
> Pfam b - there are 70-400K more!
>
> ls: in das/2 create a single type 'protein domain' then use attribute
> pointing to an ontology saying which pfam domain it is.
> gh: concern there is, assuming clients will do something useful for
> particular attributes. For rendering, I could do diff rendering based
> on diff attribs (color diff domains differently). but for clients to
> really understand that they're different, that's a more complicated
> issue.
>
> gh: not using types or entry_points by clients because servers don't,
> feedback loop.
> ap: low coverage genomes (e.g., elephant) may have several 100K entry
> points.
> gh: in das/2 we are more formal and say that you don't support
> it. Creates problem: how do you know what to query in the first place?
> Then you have to know what you're looking for.
>
> gh: feature hierarchies handled in das/2 -- this is not an issue for
> protein das, where annotations are completely flat. even protein
> disulfide bond is one level, just rendered differently so it doesn't
> span all residues in between. But doing non-visual things (unions,
> intersections) this could be a problem.
> ls: flat in terms of location or ontology?
> gh: location. there is no feature ontology yet (no consistent, agreed
> upon yet, just proposed at this meeting).
> ls: they aren't creating discontinuous features because too hard, or
> don't care.
> gh: just not needed for most protein annotations. even when it could
> be needed, just not being used.
> ls: for nucleotide, it's needed frequently
> gh: not an issue for das/2
>
> gh: ensembl collapses type and source into one thing. what does this
> mean? das/2 could be over complicated.
> ls: no doubt that it is too complicated for the biosapiens use
> case. we could make it easy for them to use by providing tool kits to
> read and write. could also argue that postscript is too complicate to
> draw simple rectangles on the page. You wouldn't expect then to
> simplify postscript. There are tools to ease simple rendering.
> The complexity of das/2 won't interfere with adoption, but not having
> toolkits, middleware layers to read/write. Not getting ensembl buy-in
> to das/2 could be a problem
> gh: tim hubbard was there and was on-board to transition to
> das/2.
> ls: would have be better to have buy in now (i.e., Tony Cox dropping
> out)
> gh: we've made it more formal to say, here is the subset of das/2 that
> this server supports. for other use cases, we do need the added
> complexity.
>
> gh: re: ensembl support for das/2. I mentioned andrew's das/1 - das/2
> transformational proxy server. not released yet, but making progress
> on it. So if you have a das/1 server, you can put a das/2 front end on
> it.
> ls: can you go the other way, provide das/1 interface on das/2?
> gh: want to do this for the affy public das/2 server. Andrew's doesn't
> do that yet, but I'd like to do this. Another thing: integrate that
> proxy into the registry, so the registry makes it into a das/2
> server. then we don't have a burden on servers to support two versions
> of the protocol.
> got email from andrew about his proxy on that.
>
> sc: I put a note about Andrew's proxy server on the biodas.org wiki.
> gh: he needs to have a place to keep it.
> sc: open-bio server would work. Just need a beetter mechanism to
> ensure it stays up. I think it's not getting started when the machine
> gets rebooted.
>
> [A] Steve/Andrew work on stable home for the proxy server
>
> [Correction: In my note in the teleconf, I was thinking about Andrew's
> validation server, which is hosted on open-bio and has a problem with
> not being up reliably. The proxy server is another issue. There's a
> mention of it on the DAS FAQ page, but not pointer to any server
> yet. -steve]
>
> gh: data overload and redundancy from the user perspective. clients
> where default for protein annotation is to go to all servers, you have
> way too many track showing up. Lots of servers and types. Ensembl is
> moving to expose even more data via das, thousands of new tracks
> (organisms, type, assembly version). Concern with biosapiens is
> replication of the same annotation data. E.g., pfam domains in
> different biosapiens data sources, may return same thing or slight
> diffs in feature ranges. how does user decide which is authoritative?
> Which can be left out? A big concern at the biosapiens meeting --
> redundant information.
>
> gh: another issue: mirrors for the data. discussed in early days of
> das/2, not resolved how to deal with mirrors, http redirection
> mechanism. This can lead to redundant data when you hit all mirrors.
>
> gh: feature classification and ontologies around that. My take was
> that the sequence ontology is inadequate to describe protein
> annotation as it stands now. PAO - protein annotation ontology
> ls: are they doing this with NCBO involved?
> gh: talked to them about getting hold of lincoln and suzi and
> integrating with SO as an extension.
> ap: for 3rd version of SO we will contact lincoln and suzi to discuss
> ls: great
> gh: for biosapiens, Janet Thornton is the person to contact about
> that.
>
> gh: more about types (proliferation causing data overload issue mentioned
> above.)
> also discussion about dag vs hierarchical tree. pointing to multiple
> terms in the ontology for a particular type. in SO, how much has
> multiple parents come up? may need a type that can point to multiple
> ontology terms for that type. das/2 cannot do it yet, only one term
> per type.
> ls: the more flexible we make it the less coherent it will be. data
> overload will get even worse. to reduce data overload, need a way to
> take data from servers and deciding if same or different. are they
> reachable in same ontology? allowing set arithematic will create
> ambiguity. biosapiens can be allowed with an attribute, multiple
> attributes that point at different ontologies.
>
> gh: combining cellular location with protien classification
> ontologies.
> ls: certainly, but those are separate attributes. what we created is
> essentially an RDF. Actually, terminology is 'property' not
> attribute. Types property is the correct way to do this.
>
> gh: use of subset of das/1, what it means for das/2
> data overload for users,
> featu classification issues
>
> gh: das wish list, people wrote up what they feel what das is
> inadequate for. Das/2 group was aware of these.
>
> ls: encryption, synchronous request seem like impl issues, not part of
> protocol.
> gh: some people complained that das is inadequate because it relies on
> http(s). you can do much more high-level things with soap-based
> system. I think this is correct, but wrong that no one in our space
> needs that.
> ls: no pharma that cares about this will entrust it to the public
> internet with any thing, soap or otherwise.
> gh: at affy, we've done das/1 servers with https and no one has ever
> complained.
> ls: identity theft problems via people stealing from encrypted streams
> never emerged as a problem. they steal it from your physical trash,
> setting up phony banking sites. Not related to strength of encryption.
> gh: regarding asynch request - discussed 2 years ago -- yes, it's
> outside of das/2 spec, but we say, use http as you will. redirect and
> say "your request has been accepted, check back here in a while."
>
> gh: wish list (sent out in email to the list noted above):
> - multi-level features, stylesheets
> - caching - use http caching as you will
> - features from other sources - dealth with since we use URIs. a
>   problem for das/1
>
> ls: providence requires people to put in effort to maintain the
> providence, but it doesn't free you of responsibility of having to
> track it.
>
> - scalability and large analysis - the data overload issue. the
> answer to me is smarter clients.
>
> - more queries -- addressed in das/2
> - entry point supports - in das/2 we have a less ambiguous way to say
>   whether a server points it or not.
> - counting number of features of each type per source -- have the
>   'count' format in das/2
> - refering to id's externally (das/2 uri's)
> - errors and exception handling - we have http error codes -- remains
>   to be seen how well it works out. done a reasonable job to map it to
>   http error codes
> - better stylesheets - in progress for das/2
> - mapping servers - different genome assembly versions or mapping from
>   protein to nucleotide space. -- under discussion with data
>   providers.
>
> ap: Another thing on wish list: people want to know stats per server,
> uptime, hits, etc. (server stats).
> gh: andreas' registry does a good job for das/1. biosapiens registry
> is built on Andreas' registry. How many are up, which requests they
> support, the data the server. Very nice.
>
> ap: Gregg's coverage was good. Also gave a very good advertisement for
> das/2!
>
> gh: the das/1 to das/2 transformational proxy was quite
> popular. doesn't take advantage of das/2 power, but gets people started.
>
> Other Topics:
> --------------
> sc: biodas.org wiki is now officially up.
> gh: mentioned to Tim Hubbard. He said, "I know. I already edited it."
>
> sc: globalseqids page needs das2xml snippets for coordinates.
>
> [A] lincoln will add das2xml coordinate snippets to globalseqids page on
> wiki
>
> sc: might also be good to have notice of the next teleconf on the
> site. Maybe pointers to the notes as well.
> gh: maybe have an automatic email sent out reminding folks?
> sc: maybe not, if we have a list of the dates for upcoming meetings on
> the site.
>
> [A] Steve post list of dates of upcoming DAS/2 teleconferences on wiki
>
> Next meeting in two weeks: 19 mar 2007
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From Steve_Chervitz at affymetrix.com  Mon Mar 19 17:47:57 2007
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 19 Mar 2007 10:47:57 -0700
Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 19 Mar 2007
Message-ID: <C2241ADD.259E4%Steve_Chervitz@affymetrix.com>

Notes from the biweekly DAS/2 teleconference, 19 Mar 2007

$Id: das2-teleconf-2007-03-19.txt,v 1.2 2007/03/19 17:46:41 sac Exp $

Teleconference Info:
   * Schedule:         Biweekly on Monday
   * Time of Day:      9:30 AM PST, 17:30 GMT
   * Dialin (US):      800-531-3250
   * Dialin (Intl):    303-928-2693
   * Toll-free UK:     08 00 40 49 467
   * Toll-free France: 08 00 907 839
   * Conference ID:    2879055
   * Passcode:         1365

Attendees: 
    Affy: Steve Chervitz, Ed Erwin, Gregg Helt
    CSHL: Lincoln Stein

Note taker: Steve Chervitz

Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/ and are viewable on-line at
http://biodas.org/documents/das2/notes/

Instructions on how to access the DAS/2 CVS repository are at
http://www.biodas.org/wiki/DAS/2#CVS_Access

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda
-------
 * General issues
 * Status reports, including report from Lincoln on hapmap and das2
 * Gregg's post-grant status
 * IGB support post-March


Topic: General Issues
----------------------

ls: Regarding the coordinate stuff for global seq ids, need
clarification (see me message on list).

gh: for each release we should have the xml snippet for the
coordinates, four attribs for authority, etc. so people can see
directly what they need to provide in their DAS/2 request.

[A] gregg will send global seq ID coordinate XML example to Lincoln

Topic: Status reports
----------------------

gh: working on getting good reporesentations of graphs for Affy das/2
server serving up tiling array data. Serving up slices of
graphs. Working well on my test server, better than expected. Slow
thing is the indexing the first time it sees a file. Chrm1 at 5bp
resolution tiling array data, 120M data points, slicing indexing takes
a couple of seconds the first time, other times there's no delay. this
is serving up in an optimized format. Need to serve in std das/2
format with a feature per data point. Not too hard.

Planning to deploy in April when Steve gets new server
running. Drosophila time-course public data. 8-9 time points RNA
expression tiling arrays. When phase 3 ENCODE paper comes out, we'll
have a pointer to our server for viewing that data.
Also need to beef up feat filter queries to support full spec on the
Affy das/2 server. transition IGB from using quickload and replace all
quickload stuff with das/2, so we don't need to maintain two code
bases and data respositories.

ls: hapmap das/2 server is up and running. temporarily at Brian Gilman's
consultancy business. He's coming here to CSHL to get a permanent version
running on hapmap.org by next week. There's a whole API for accessing that
data in the form that's required by NCI's caBIO project
(caCORE). After server goes up, I'll point coordinates that location,
documentation. It works with other das/2 sources as well, (Affy,
biopackages).

gh: So it will put any of that DAS-available data into caCORE object model?
ls: yes. It also can give data as DOM models, might be easier for some
users/apps.

gh: Rolling this into the next caBIO release?
ls: yes. 

ls: Will provide snp's and haplotype blocks as features.
one track per population. we can put as many tracks in as you
need. Just one set now. There are 4 populations grouped into three
panels, since two pop's don't have enough diffs to break them out.

[A] lincoln send gregg pointer to current hapmap server for testing

sc: Working on configuring the new affy das/2 public server, a
replacement machine with a lot more RAM than current box. Have been
busy with other Affy work (new Netaffx release, new product support,
etc.) but should be mostly done with this by end of March. Should be
able to devote some solid blocks to DAS work (target: 3wks). Plan is
to support as many Affy products as we can. Less focus on supporting
UCSC-provided annotations (since they're the best source for them).

sc: Gregg, have you considered using the same approach for serving
annotations by your das/2 server as you are doing to support graphs?
Could ease memory requirements.
gh: possible, but not practical, since it would require a new format
for every feature type. Graphs are relatively straightforward to serve
up via an indexing strategy. Doing something similar for features
would mean essentially writing a database app.

Other Items:
-------------

gh: grant admin says our burn rate is lower than anticipated. we can
apply for a no-cost extension. should last at least till the end of
June as for funding. We'll apply for that. Not sure what it means for
CSHL. last time it took 3-4 mos to sort it out.

ls: start working on it now. there were communication problems in the
past. would be great if Allen could extend another month or two.

gh: Andrew will come visit me in the next day or two. Will get the
latest from him. He's been working on the transformational das1-> das2
proxy. Want to get the Ensembl people to use it ASAP.

[A] get a usable das1->das2 proxy server, deploy at Ensembl

gh: Need to look at how to support scores in das/2. we dropped score
element. You can add arbitrary
elements to das/2. You can put in multiple diff scores that way, or
use XML namespaces to bring in a das/2 score element. Want to have a
recommended way of doing this. Need more input from others. In Europe
they're using score a lot more than here in the States.

[A] come up with recommended way to support scores in DAS/2

Topic: Gregg's agenda
----------------------

gh: I am planning to leave Affy at end of the grant. Will focus on
doing hands-on DAS/2 evangelism, ideally work with UCSC. Then will
take some time off.  Affy wasn't interested in supporting das w/o some
outside funding. Therefore, it's a good time to transition.

Regarding UCSC? ready to go down there and write some code. They have
a das/1 server, they just need someone with DAS/2 expertise that I can
provide. biggest prob with das/2 is adoption outside of the
grant people.

sc: considered using Andrew's proxy?
gh: might be OK for a temporary solution, but it wouldn't be as
efficient as directly supporting das/2, and I know Jim et al are
interested in efficiency. Since I'm in the area, I can help them get
into DAS/2 directly, which would help with DAS/2 acceptance by the
community. 

gh: Another goal was to have a DAS/2 paper ready and submitted before
I leave, want to have a rough draft in april. Plan to submit to an
open source journal: Biomedcentral, PLoS, or other.

[A] Gregg will circulate draft of DAS/2 paper, draft in April.

Topic: IGB Support
-------------------

ee: Regarding IGB support, Affy is not supporting IGB after March,
they are moving me to a different project. Support for IGB could
return if there's enough interest.

sc: how self-supporting is the igb community?
ee: not much.
gh: Ann Loraine has interest as do internal Affy users.

sc: Sourceforge has a new wiki project that's in beta now, for adding
a wiki to your project's web page. Could help make the IGB community
self-supporting, on-line docs, FAQ, etc. I volunteered to participate,
but haven't done anything with it yet.

gh: IGB has a good user's guide now, thanks to Ed's recent update.

ee: I'm also working on plugin interface and documenting the http API
protocol, things that will make it easier for others to use IGB with other
programs.