[DAS2] Notes from the biweekly DAS/2 teleconference, 5 Mar 2007

Lincoln Stein lstein at cshl.edu
Mon Mar 12 17:02:51 UTC 2007


>
> lincoln will add das2xml coordinate snippets to globalseqids page on
> wiki
>

I added one line to the description of the H. sapiens source. Is this what
you're looking for? If it is, I'll go ahead and add the rest.

Note that the contents of the XML are not defined anywhere. I'm not sure why
there should be a URI that looks like it is fetchable.

Lincoln


On 3/5/07, Steve Chervitz <Steve_Chervitz at affymetrix.com> wrote:
>
> Notes from the biweekly DAS/2 teleconference, 5 Mar 2007
>
> $Id: das2-teleconf-2007-03-05.txt,v 1.2 2007/03/05 19:01:59 sac Exp $
>
> Teleconference Info:
>    * Schedule:         Biweekly on Monday
>    * Time of Day:      9:30 AM PST, 17:30 GMT
>    * Dialin (US):      800-531-3250
>    * Dialin (Intl):    303-928-2693
>    * Toll-free UK:     08 00 40 49 467
>    * Toll-free France: 08 00 907 839
>    * Conference ID:    2879055
>    * Passcode:         1365
>
> Attendees:
>     Affy: Steve Chervitz, Ed Erwin, Gregg Helt
>     CSHL: Lincoln Stein
>   Sanger: Andreas Prlic
>     UCLA: Allen Day
>
> Note taker: Steve Chervitz
>
> Action items are flagged with '[A]'.
>
> These notes are checked into the biodas.org CVS repository at
> das/das2/notes/. Instructions on how to access this
> repository are at http://biodas.org
>
> DISCLAIMER:
> The note taker aims for completeness and accuracy, but these goals are
> not always achievable, given the desire to get the notes out with a
> rapid turnaround. So don't consider these notes as complete minutes
> from the meeting, but rather abbreviated, summarized versions of what
> was discussed. There may be errors of commission and omission.
> Participants are welcome to post comments and/or corrections to these
> as they see fit.
>
>
> Agenda
> -------
> * Review of BioSapiens DAS workshop
> * Status updates
>
>
> gh: I sent my summary of the biosapiens das workshop and feature
> classification workshop I attended with Ed in Hinxton:
> http://lists.open-bio.org/pipermail/das2/2007-March/000982.html
>
> "das developers workshop from a das/2 perspective", summarizes what I
> took home from these meetings, how well das/2 meets needs of people in
> europe (ensembl, sanger, biosapiens -- the focus of these
> meetings). and a quick biosapiens overview: a big european project ,
> 25 institutions, large scale genome protein annotation. decided early
> on to use das to distribute annotations between organizations. can
> check the stats on their das servers -- andreas' registry -- 23
> servers serving up 69 das sources -- a major das investment!
>
> In developing das/2 we haven't had too much experience with the kind
> of data they're dealing with (protein annotations).
>
> das/1 clients under study:
> - dasty2, dasty1 - ajax-based viz clients
> - jalview - alignment viewer, editor
> - igb - Ed gave presentation
> - pepper and spice - das viewers, also use alignment and 3d structure
>    info
> - proview - protein annotation,
> - ensembl viewer
>
> servers presented/discussed:
> - pfam, ensembl, proserver, Andreas',
> - Extensions to das/1 protocol discussed: gene das, protein das,
>    structure das, 3d-em das (arbitrary 3d volumes), interaction das for
>    prot-prot interactions. Moddas - writeback in das/1. Alignment das
>    (Andreas).
> - Simple das - das servers that don't impl all of das/1 (entry_points,
>    or types, e.g.,).
>
> Gregg presented on das/2, will put up ppt later. Tailored it assuming
>
> [A] Gregg will send out powerpoint for his talk from BioSapiens DAS
> workshop
>
> Focussed on familiarity with das/1, how big the diffs are with an eye
> towards how hard it would be to move to das/2. Conceptually, not that
> big a switch, though XML is a lot different.
>
> Also discussed how well das/2 addresses some of the problems with
> das/1 that came up at the workshop.
>
> extensions for das/1:
> - das/2 addressed some of them very well. E.g., gene das (das w/o
>   specifying location of feature). this is addressed well in
>   das/2. can have features w/o location, or w/o range.
> - protein das - das/2 did a good job of removing nucleotide specific
>   parts of das features (orientation, phase are not required). das/2
>   is much more agnostic about dna vs protein.
> - alignment das - pairwise or multiple - locations with features in
>   das/2 addresses some of these issues (0,1,or more locations for a
>   feature) each location can have optional gap attribute (cigar
>   string). so if you can describe it with a cigar string, you can
>   describe it in das/2. Can use multiple locations to do mult
>   alignments. Not dealt with in das/2: 3d-threading of an alignment
> through
> a
>   structure.  Need to look at this in the future
>
> [A] Look at how to handle 3D structure alignment threading in DAS/2 spec
>
> - simple das stuff handled better in das/2 - in das/1 the assumption
>   is you support all things unless. but in das/2 there is a
>   capabilities header, you must indicate support there, if not stated,
>   the default is you don't support it. Can also say you support
>   feature filters, so there's more formal support for that.
>
> Surprises:
> - smaller subset of das/1 is in use than expected. of 69 sources, 64
>   either fail entry points or say not applicable. types query: 49
>   fail/not applicable
>
> ls: for types query. only one type?
> gh: for ensembl, this is the case.
> ap: lack of consistency of types is addressed in the other workshop
> related to features.
>
> gh: in types in das/1 it is less necessary because all info is
> replicated in each feature, type-method, category, id
> ls: use case for types query is to present user with set of
> checkboxes, select which type to retrieve from source. if in practice
> das sources are being use to for one type or a set of types that only
> make sense together, no reason to turn off a part of it, then makes
> sense to not support types query.
> ls: have heard that types query is expensive. computationally. simple
> db backends with no normalization/indexins, finding all types involves
> visiting each record.
> gh: part of justification with 1 type / source is because those types
> are stored in separate db. so having a das server to integrate them
> make sense.
>
> gh: Re: using smaller subset of das/1 than I expected:
> types can be expensive in another way, example: representing pfam in
> das. feat type for each pfam domain type (9000 primary domains).
> Pfam b - there are 70-400K more!
>
> ls: in das/2 create a single type 'protein domain' then use attribute
> pointing to an ontology saying which pfam domain it is.
> gh: concern there is, assuming clients will do something useful for
> particular attributes. For rendering, I could do diff rendering based
> on diff attribs (color diff domains differently). but for clients to
> really understand that they're different, that's a more complicated
> issue.
>
> gh: not using types or entry_points by clients because servers don't,
> feedback loop.
> ap: low coverage genomes (e.g., elephant) may have several 100K entry
> points.
> gh: in das/2 we are more formal and say that you don't support
> it. Creates problem: how do you know what to query in the first place?
> Then you have to know what you're looking for.
>
> gh: feature hierarchies handled in das/2 -- this is not an issue for
> protein das, where annotations are completely flat. even protein
> disulfide bond is one level, just rendered differently so it doesn't
> span all residues in between. But doing non-visual things (unions,
> intersections) this could be a problem.
> ls: flat in terms of location or ontology?
> gh: location. there is no feature ontology yet (no consistent, agreed
> upon yet, just proposed at this meeting).
> ls: they aren't creating discontinuous features because too hard, or
> don't care.
> gh: just not needed for most protein annotations. even when it could
> be needed, just not being used.
> ls: for nucleotide, it's needed frequently
> gh: not an issue for das/2
>
> gh: ensembl collapses type and source into one thing. what does this
> mean? das/2 could be over complicated.
> ls: no doubt that it is too complicated for the biosapiens use
> case. we could make it easy for them to use by providing tool kits to
> read and write. could also argue that postscript is too complicate to
> draw simple rectangles on the page. You wouldn't expect then to
> simplify postscript. There are tools to ease simple rendering.
> The complexity of das/2 won't interfere with adoption, but not having
> toolkits, middleware layers to read/write. Not getting ensembl buy-in
> to das/2 could be a problem
> gh: tim hubbard was there and was on-board to transition to
> das/2.
> ls: would have be better to have buy in now (i.e., Tony Cox dropping
> out)
> gh: we've made it more formal to say, here is the subset of das/2 that
> this server supports. for other use cases, we do need the added
> complexity.
>
> gh: re: ensembl support for das/2. I mentioned andrew's das/1 - das/2
> transformational proxy server. not released yet, but making progress
> on it. So if you have a das/1 server, you can put a das/2 front end on
> it.
> ls: can you go the other way, provide das/1 interface on das/2?
> gh: want to do this for the affy public das/2 server. Andrew's doesn't
> do that yet, but I'd like to do this. Another thing: integrate that
> proxy into the registry, so the registry makes it into a das/2
> server. then we don't have a burden on servers to support two versions
> of the protocol.
> got email from andrew about his proxy on that.
>
> sc: I put a note about Andrew's proxy server on the biodas.org wiki.
> gh: he needs to have a place to keep it.
> sc: open-bio server would work. Just need a beetter mechanism to
> ensure it stays up. I think it's not getting started when the machine
> gets rebooted.
>
> [A] Steve/Andrew work on stable home for the proxy server
>
> [Correction: In my note in the teleconf, I was thinking about Andrew's
> validation server, which is hosted on open-bio and has a problem with
> not being up reliably. The proxy server is another issue. There's a
> mention of it on the DAS FAQ page, but not pointer to any server
> yet. -steve]
>
> gh: data overload and redundancy from the user perspective. clients
> where default for protein annotation is to go to all servers, you have
> way too many track showing up. Lots of servers and types. Ensembl is
> moving to expose even more data via das, thousands of new tracks
> (organisms, type, assembly version). Concern with biosapiens is
> replication of the same annotation data. E.g., pfam domains in
> different biosapiens data sources, may return same thing or slight
> diffs in feature ranges. how does user decide which is authoritative?
> Which can be left out? A big concern at the biosapiens meeting --
> redundant information.
>
> gh: another issue: mirrors for the data. discussed in early days of
> das/2, not resolved how to deal with mirrors, http redirection
> mechanism. This can lead to redundant data when you hit all mirrors.
>
> gh: feature classification and ontologies around that. My take was
> that the sequence ontology is inadequate to describe protein
> annotation as it stands now. PAO - protein annotation ontology
> ls: are they doing this with NCBO involved?
> gh: talked to them about getting hold of lincoln and suzi and
> integrating with SO as an extension.
> ap: for 3rd version of SO we will contact lincoln and suzi to discuss
> ls: great
> gh: for biosapiens, Janet Thornton is the person to contact about
> that.
>
> gh: more about types (proliferation causing data overload issue mentioned
> above.)
> also discussion about dag vs hierarchical tree. pointing to multiple
> terms in the ontology for a particular type. in SO, how much has
> multiple parents come up? may need a type that can point to multiple
> ontology terms for that type. das/2 cannot do it yet, only one term
> per type.
> ls: the more flexible we make it the less coherent it will be. data
> overload will get even worse. to reduce data overload, need a way to
> take data from servers and deciding if same or different. are they
> reachable in same ontology? allowing set arithematic will create
> ambiguity. biosapiens can be allowed with an attribute, multiple
> attributes that point at different ontologies.
>
> gh: combining cellular location with protien classification
> ontologies.
> ls: certainly, but those are separate attributes. what we created is
> essentially an RDF. Actually, terminology is 'property' not
> attribute. Types property is the correct way to do this.
>
> gh: use of subset of das/1, what it means for das/2
> data overload for users,
> featu classification issues
>
> gh: das wish list, people wrote up what they feel what das is
> inadequate for. Das/2 group was aware of these.
>
> ls: encryption, synchronous request seem like impl issues, not part of
> protocol.
> gh: some people complained that das is inadequate because it relies on
> http(s). you can do much more high-level things with soap-based
> system. I think this is correct, but wrong that no one in our space
> needs that.
> ls: no pharma that cares about this will entrust it to the public
> internet with any thing, soap or otherwise.
> gh: at affy, we've done das/1 servers with https and no one has ever
> complained.
> ls: identity theft problems via people stealing from encrypted streams
> never emerged as a problem. they steal it from your physical trash,
> setting up phony banking sites. Not related to strength of encryption.
> gh: regarding asynch request - discussed 2 years ago -- yes, it's
> outside of das/2 spec, but we say, use http as you will. redirect and
> say "your request has been accepted, check back here in a while."
>
> gh: wish list (sent out in email to the list noted above):
> - multi-level features, stylesheets
> - caching - use http caching as you will
> - features from other sources - dealth with since we use URIs. a
>   problem for das/1
>
> ls: providence requires people to put in effort to maintain the
> providence, but it doesn't free you of responsibility of having to
> track it.
>
> - scalability and large analysis - the data overload issue. the
> answer to me is smarter clients.
>
> - more queries -- addressed in das/2
> - entry point supports - in das/2 we have a less ambiguous way to say
>   whether a server points it or not.
> - counting number of features of each type per source -- have the
>   'count' format in das/2
> - refering to id's externally (das/2 uri's)
> - errors and exception handling - we have http error codes -- remains
>   to be seen how well it works out. done a reasonable job to map it to
>   http error codes
> - better stylesheets - in progress for das/2
> - mapping servers - different genome assembly versions or mapping from
>   protein to nucleotide space. -- under discussion with data
>   providers.
>
> ap: Another thing on wish list: people want to know stats per server,
> uptime, hits, etc. (server stats).
> gh: andreas' registry does a good job for das/1. biosapiens registry
> is built on Andreas' registry. How many are up, which requests they
> support, the data the server. Very nice.
>
> ap: Gregg's coverage was good. Also gave a very good advertisement for
> das/2!
>
> gh: the das/1 to das/2 transformational proxy was quite
> popular. doesn't take advantage of das/2 power, but gets people started.
>
> Other Topics:
> --------------
> sc: biodas.org wiki is now officially up.
> gh: mentioned to Tim Hubbard. He said, "I know. I already edited it."
>
> sc: globalseqids page needs das2xml snippets for coordinates.
>
> [A] lincoln will add das2xml coordinate snippets to globalseqids page on
> wiki
>
> sc: might also be good to have notice of the next teleconf on the
> site. Maybe pointers to the notes as well.
> gh: maybe have an automatic email sent out reminding folks?
> sc: maybe not, if we have a list of the dates for upcoming meetings on
> the site.
>
> [A] Steve post list of dates of upcoming DAS/2 teleconferences on wiki
>
> Next meeting in two weeks: 19 mar 2007
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2
>



-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu



More information about the DAS2 mailing list