From Gregg_Helt at affymetrix.com Mon Mar 5 11:40:26 2007 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 5 Mar 2007 08:40:26 -0800 Subject: [DAS2] DAS/2 Teleconference today at 9:30 AM PST Message-ID: Just a reminder that the DAS/2 teleconference will be at the regular time today, 9:30 AM Pacific time. Ed and I will be summarizing the DAS developer and BioSapiens feature classification workshops we attended last week in Hinxton. Hopefully others who attended will join in and give their perspectives as well. Conference phone # USA: 800-531-8250 International: 303-928-2693 Conference ID: 2879055 Passcode: 1365 Gregg From Gregg_Helt at affymetrix.com Mon Mar 5 12:30:10 2007 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 5 Mar 2007 09:30:10 -0800 Subject: [DAS2] Brief summary of DAS/BioSapiens workshops from a DAS/2 perspective Message-ID: Summary of DAS & Feature Classification workshops, February 26-28 2007, Hinxton DAS Developers Workshop: http://www.sanger.ac.uk/Users/ap3/dasworkshop.html BioSapiens Feature Type Classification Workshop: http://www.ebi.ac.uk/~hhe/tmp/BioSapiensFeatureMeeting.htm DAS1 clients discussed: Dasty2, JalView, VectorBase, IGB, Pepper, Spice, ProView, Ensembl ContigView, ... DAS1 servers discussed: PFam, Ensembl, ProServer, Sisyphus, ... DAS1 extensions: Gene DAS Protein DAS Alignmen tDAS Structure DAS 3D-EM DAS Interaction DAS MaDAS (writeback?) "simple" DAS DAS/2 BioSapiens Overview: http://www.biosapiens.info Large-scale genome/protein annotation, 25 institutions from 14 countries across Europe participating Currently 23 DAS servers within BioSapiens project serving 69 DAS sources. 4 servers appear to be down (21 sources fail features query) See http://www.biosapiens.info/page.php?page=biosapiensdir for more DAS server stats Major concerns for Ensembl / Sanger / BioSapiens I think we've addressed well in DAS/2 Gene DAS Protein DAS Alignment DAS "simple" DAS Major concerns for Ensembl / Sanger / BioSapiens that surprised me: A) In general the use of a smaller subset of DAS1 than expected Many BioSapiens DAS servers don't support "entry_points" query (64 fail|NA) Many BioSapiens DAS servers don't support "types query" (49 fail|NA) in DAS1 features themselves can carry most of the types info Some BioSapiens DAS servers don't support "features" query parameters (only the features query with no params) Many BioSapiens clients don't use "entry_points" query, "types" query, or any feature filters (always get all features for a given segment) BioSapiens protein annotation almost exclusively uses flat (one-level) features very little or no use of "group" attribute to make two-level features example: disulfide bond annotation- relies on rendering or prior knowledge to differentiate Ensembl DAS servers are in general serving one type per source These simplifications of clients and servers are reinforcing each other If using subset of DAS1, does this mean that DAS/2 might be too complex? But with these simplifications, the complexity is getting pushed into other places B) Data overload Number of servers, sources, types Ensembl: will have 1000s of sources soon Redundancy concerns example: Pfam domain Many sources with same / similar annotation type - "Pfam domain" Slight differences in feature ranges Which is the authority? Is there a way to help clients decide which can be combined Mirrors C) Feature Classification / Ontology issues SO currently inadequate for describing protein annotation developing PAO (Protein Annotation Ontology) types proliferation example: one feature type for each PFam domain? ~9K PFam-A domains If look at PFam-B (PRODOM that don't overlap PFam-A), then ~70K / 450K more (>2 proteins in family / not) of not in unique type, where does that information go? Need multiple ontology terms to describe a single type? ------------------------------------------------------------------------ ------ DAS WishList (last session of DAS workshop, people listed desired improvements on whiteboard) Multi-level features (Gregg) Multi-level stylesheets (Ed) Caching (last-modified, if-modified-since, TTL) Provenance of features from other sources (features from different sources with same IDs? types?) Large analysis / Scalibility 1000s of seqs + 1000s sources + types ? More queries: feature types / date Entry point support Encryption support Stats-query interface -- count # of features of type for a source ID ref external (URI / URN) Proper error / exception handling Asynchronous requests process batches Better Stylesheets Mapping servers We've discussed most of these wishlist issues before while developing DAS/2, though we certainly haven't completely solved all of them... From Steve_Chervitz at affymetrix.com Mon Mar 5 14:03:03 2007 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 05 Mar 2007 11:03:03 -0800 Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 5 Mar 2007 Message-ID: Notes from the biweekly DAS/2 teleconference, 5 Mar 2007 $Id: das2-teleconf-2007-03-05.txt,v 1.2 2007/03/05 19:01:59 sac Exp $ Teleconference Info: * Schedule: Biweekly on Monday * Time of Day: 9:30 AM PST, 17:30 GMT * Dialin (US): 800-531-3250 * Dialin (Intl): 303-928-2693 * Toll-free UK: 08 00 40 49 467 * Toll-free France: 08 00 907 839 * Conference ID: 2879055 * Passcode: 1365 Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt CSHL: Lincoln Stein Sanger: Andreas Prlic UCLA: Allen Day Note taker: Steve Chervitz Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * Review of BioSapiens DAS workshop * Status updates gh: I sent my summary of the biosapiens das workshop and feature classification workshop I attended with Ed in Hinxton: http://lists.open-bio.org/pipermail/das2/2007-March/000982.html "das developers workshop from a das/2 perspective", summarizes what I took home from these meetings, how well das/2 meets needs of people in europe (ensembl, sanger, biosapiens -- the focus of these meetings). and a quick biosapiens overview: a big european project , 25 institutions, large scale genome protein annotation. decided early on to use das to distribute annotations between organizations. can check the stats on their das servers -- andreas' registry -- 23 servers serving up 69 das sources -- a major das investment! In developing das/2 we haven't had too much experience with the kind of data they're dealing with (protein annotations). das/1 clients under study: - dasty2, dasty1 - ajax-based viz clients - jalview - alignment viewer, editor - igb - Ed gave presentation - pepper and spice - das viewers, also use alignment and 3d structure info - proview - protein annotation, - ensembl viewer servers presented/discussed: - pfam, ensembl, proserver, Andreas', - Extensions to das/1 protocol discussed: gene das, protein das, structure das, 3d-em das (arbitrary 3d volumes), interaction das for prot-prot interactions. Moddas - writeback in das/1. Alignment das (Andreas). - Simple das - das servers that don't impl all of das/1 (entry_points, or types, e.g.,). Gregg presented on das/2, will put up ppt later. Tailored it assuming [A] Gregg will send out powerpoint for his talk from BioSapiens DAS workshop Focussed on familiarity with das/1, how big the diffs are with an eye towards how hard it would be to move to das/2. Conceptually, not that big a switch, though XML is a lot different. Also discussed how well das/2 addresses some of the problems with das/1 that came up at the workshop. extensions for das/1: - das/2 addressed some of them very well. E.g., gene das (das w/o specifying location of feature). this is addressed well in das/2. can have features w/o location, or w/o range. - protein das - das/2 did a good job of removing nucleotide specific parts of das features (orientation, phase are not required). das/2 is much more agnostic about dna vs protein. - alignment das - pairwise or multiple - locations with features in das/2 addresses some of these issues (0,1,or more locations for a feature) each location can have optional gap attribute (cigar string). so if you can describe it with a cigar string, you can describe it in das/2. Can use multiple locations to do mult alignments. Not dealt with in das/2: 3d-threading of an alignment through a structure. Need to look at this in the future [A] Look at how to handle 3D structure alignment threading in DAS/2 spec - simple das stuff handled better in das/2 - in das/1 the assumption is you support all things unless. but in das/2 there is a capabilities header, you must indicate support there, if not stated, the default is you don't support it. Can also say you support feature filters, so there's more formal support for that. Surprises: - smaller subset of das/1 is in use than expected. of 69 sources, 64 either fail entry points or say not applicable. types query: 49 fail/not applicable ls: for types query. only one type? gh: for ensembl, this is the case. ap: lack of consistency of types is addressed in the other workshop related to features. gh: in types in das/1 it is less necessary because all info is replicated in each feature, type-method, category, id ls: use case for types query is to present user with set of checkboxes, select which type to retrieve from source. if in practice das sources are being use to for one type or a set of types that only make sense together, no reason to turn off a part of it, then makes sense to not support types query. ls: have heard that types query is expensive. computationally. simple db backends with no normalization/indexins, finding all types involves visiting each record. gh: part of justification with 1 type / source is because those types are stored in separate db. so having a das server to integrate them make sense. gh: Re: using smaller subset of das/1 than I expected: types can be expensive in another way, example: representing pfam in das. feat type for each pfam domain type (9000 primary domains). Pfam b - there are 70-400K more! ls: in das/2 create a single type 'protein domain' then use attribute pointing to an ontology saying which pfam domain it is. gh: concern there is, assuming clients will do something useful for particular attributes. For rendering, I could do diff rendering based on diff attribs (color diff domains differently). but for clients to really understand that they're different, that's a more complicated issue. gh: not using types or entry_points by clients because servers don't, feedback loop. ap: low coverage genomes (e.g., elephant) may have several 100K entry points. gh: in das/2 we are more formal and say that you don't support it. Creates problem: how do you know what to query in the first place? Then you have to know what you're looking for. gh: feature hierarchies handled in das/2 -- this is not an issue for protein das, where annotations are completely flat. even protein disulfide bond is one level, just rendered differently so it doesn't span all residues in between. But doing non-visual things (unions, intersections) this could be a problem. ls: flat in terms of location or ontology? gh: location. there is no feature ontology yet (no consistent, agreed upon yet, just proposed at this meeting). ls: they aren't creating discontinuous features because too hard, or don't care. gh: just not needed for most protein annotations. even when it could be needed, just not being used. ls: for nucleotide, it's needed frequently gh: not an issue for das/2 gh: ensembl collapses type and source into one thing. what does this mean? das/2 could be over complicated. ls: no doubt that it is too complicated for the biosapiens use case. we could make it easy for them to use by providing tool kits to read and write. could also argue that postscript is too complicate to draw simple rectangles on the page. You wouldn't expect then to simplify postscript. There are tools to ease simple rendering. The complexity of das/2 won't interfere with adoption, but not having toolkits, middleware layers to read/write. Not getting ensembl buy-in to das/2 could be a problem gh: tim hubbard was there and was on-board to transition to das/2. ls: would have be better to have buy in now (i.e., Tony Cox dropping out) gh: we've made it more formal to say, here is the subset of das/2 that this server supports. for other use cases, we do need the added complexity. gh: re: ensembl support for das/2. I mentioned andrew's das/1 - das/2 transformational proxy server. not released yet, but making progress on it. So if you have a das/1 server, you can put a das/2 front end on it. ls: can you go the other way, provide das/1 interface on das/2? gh: want to do this for the affy public das/2 server. Andrew's doesn't do that yet, but I'd like to do this. Another thing: integrate that proxy into the registry, so the registry makes it into a das/2 server. then we don't have a burden on servers to support two versions of the protocol. got email from andrew about his proxy on that. sc: I put a note about Andrew's proxy server on the biodas.org wiki. gh: he needs to have a place to keep it. sc: open-bio server would work. Just need a beetter mechanism to ensure it stays up. I think it's not getting started when the machine gets rebooted. [A] Steve/Andrew work on stable home for the proxy server [Correction: In my note in the teleconf, I was thinking about Andrew's validation server, which is hosted on open-bio and has a problem with not being up reliably. The proxy server is another issue. There's a mention of it on the DAS FAQ page, but not pointer to any server yet. -steve] gh: data overload and redundancy from the user perspective. clients where default for protein annotation is to go to all servers, you have way too many track showing up. Lots of servers and types. Ensembl is moving to expose even more data via das, thousands of new tracks (organisms, type, assembly version). Concern with biosapiens is replication of the same annotation data. E.g., pfam domains in different biosapiens data sources, may return same thing or slight diffs in feature ranges. how does user decide which is authoritative? Which can be left out? A big concern at the biosapiens meeting -- redundant information. gh: another issue: mirrors for the data. discussed in early days of das/2, not resolved how to deal with mirrors, http redirection mechanism. This can lead to redundant data when you hit all mirrors. gh: feature classification and ontologies around that. My take was that the sequence ontology is inadequate to describe protein annotation as it stands now. PAO - protein annotation ontology ls: are they doing this with NCBO involved? gh: talked to them about getting hold of lincoln and suzi and integrating with SO as an extension. ap: for 3rd version of SO we will contact lincoln and suzi to discuss ls: great gh: for biosapiens, Janet Thornton is the person to contact about that. gh: more about types (proliferation causing data overload issue mentioned above.) also discussion about dag vs hierarchical tree. pointing to multiple terms in the ontology for a particular type. in SO, how much has multiple parents come up? may need a type that can point to multiple ontology terms for that type. das/2 cannot do it yet, only one term per type. ls: the more flexible we make it the less coherent it will be. data overload will get even worse. to reduce data overload, need a way to take data from servers and deciding if same or different. are they reachable in same ontology? allowing set arithematic will create ambiguity. biosapiens can be allowed with an attribute, multiple attributes that point at different ontologies. gh: combining cellular location with protien classification ontologies. ls: certainly, but those are separate attributes. what we created is essentially an RDF. Actually, terminology is 'property' not attribute. Types property is the correct way to do this. gh: use of subset of das/1, what it means for das/2 data overload for users, featu classification issues gh: das wish list, people wrote up what they feel what das is inadequate for. Das/2 group was aware of these. ls: encryption, synchronous request seem like impl issues, not part of protocol. gh: some people complained that das is inadequate because it relies on http(s). you can do much more high-level things with soap-based system. I think this is correct, but wrong that no one in our space needs that. ls: no pharma that cares about this will entrust it to the public internet with any thing, soap or otherwise. gh: at affy, we've done das/1 servers with https and no one has ever complained. ls: identity theft problems via people stealing from encrypted streams never emerged as a problem. they steal it from your physical trash, setting up phony banking sites. Not related to strength of encryption. gh: regarding asynch request - discussed 2 years ago -- yes, it's outside of das/2 spec, but we say, use http as you will. redirect and say "your request has been accepted, check back here in a while." gh: wish list (sent out in email to the list noted above): - multi-level features, stylesheets - caching - use http caching as you will - features from other sources - dealth with since we use URIs. a problem for das/1 ls: providence requires people to put in effort to maintain the providence, but it doesn't free you of responsibility of having to track it. - scalability and large analysis - the data overload issue. the answer to me is smarter clients. - more queries -- addressed in das/2 - entry point supports - in das/2 we have a less ambiguous way to say whether a server points it or not. - counting number of features of each type per source -- have the 'count' format in das/2 - refering to id's externally (das/2 uri's) - errors and exception handling - we have http error codes -- remains to be seen how well it works out. done a reasonable job to map it to http error codes - better stylesheets - in progress for das/2 - mapping servers - different genome assembly versions or mapping from protein to nucleotide space. -- under discussion with data providers. ap: Another thing on wish list: people want to know stats per server, uptime, hits, etc. (server stats). gh: andreas' registry does a good job for das/1. biosapiens registry is built on Andreas' registry. How many are up, which requests they support, the data the server. Very nice. ap: Gregg's coverage was good. Also gave a very good advertisement for das/2! gh: the das/1 to das/2 transformational proxy was quite popular. doesn't take advantage of das/2 power, but gets people started. Other Topics: -------------- sc: biodas.org wiki is now officially up. gh: mentioned to Tim Hubbard. He said, "I know. I already edited it." sc: globalseqids page needs das2xml snippets for coordinates. [A] lincoln will add das2xml coordinate snippets to globalseqids page on wiki sc: might also be good to have notice of the next teleconf on the site. Maybe pointers to the notes as well. gh: maybe have an automatic email sent out reminding folks? sc: maybe not, if we have a list of the dates for upcoming meetings on the site. [A] Steve post list of dates of upcoming DAS/2 teleconferences on wiki Next meeting in two weeks: 19 mar 2007 From Gregg_Helt at affymetrix.com Wed Mar 7 16:21:48 2007 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 7 Mar 2007 13:21:48 -0800 Subject: [DAS2] Stable URIs coming from NCBI? Message-ID: Some good news (or at least rumor of good news) from NCBI -- plans to expose stable URIs for all their resources: http://lists.w3.org/Archives/Public/public-semweb-lifesci/2007Feb/0123.h tml Which would fit nicely with the URI-centric approach of DAS/2... Gregg From lstein at cshl.edu Mon Mar 12 13:02:51 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 12 Mar 2007 13:02:51 -0400 Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 5 Mar 2007 In-Reply-To: References: Message-ID: <6dce9a0b0703121002h4f866b10jb160044260ea812e@mail.gmail.com> > > lincoln will add das2xml coordinate snippets to globalseqids page on > wiki > I added one line to the description of the H. sapiens source. Is this what you're looking for? If it is, I'll go ahead and add the rest. Note that the contents of the XML are not defined anywhere. I'm not sure why there should be a URI that looks like it is fetchable. Lincoln On 3/5/07, Steve Chervitz wrote: > > Notes from the biweekly DAS/2 teleconference, 5 Mar 2007 > > $Id: das2-teleconf-2007-03-05.txt,v 1.2 2007/03/05 19:01:59 sac Exp $ > > Teleconference Info: > * Schedule: Biweekly on Monday > * Time of Day: 9:30 AM PST, 17:30 GMT > * Dialin (US): 800-531-3250 > * Dialin (Intl): 303-928-2693 > * Toll-free UK: 08 00 40 49 467 > * Toll-free France: 08 00 907 839 > * Conference ID: 2879055 > * Passcode: 1365 > > Attendees: > Affy: Steve Chervitz, Ed Erwin, Gregg Helt > CSHL: Lincoln Stein > Sanger: Andreas Prlic > UCLA: Allen Day > > Note taker: Steve Chervitz > > Action items are flagged with '[A]'. > > These notes are checked into the biodas.org CVS repository at > das/das2/notes/. Instructions on how to access this > repository are at http://biodas.org > > DISCLAIMER: > The note taker aims for completeness and accuracy, but these goals are > not always achievable, given the desire to get the notes out with a > rapid turnaround. So don't consider these notes as complete minutes > from the meeting, but rather abbreviated, summarized versions of what > was discussed. There may be errors of commission and omission. > Participants are welcome to post comments and/or corrections to these > as they see fit. > > > Agenda > ------- > * Review of BioSapiens DAS workshop > * Status updates > > > gh: I sent my summary of the biosapiens das workshop and feature > classification workshop I attended with Ed in Hinxton: > http://lists.open-bio.org/pipermail/das2/2007-March/000982.html > > "das developers workshop from a das/2 perspective", summarizes what I > took home from these meetings, how well das/2 meets needs of people in > europe (ensembl, sanger, biosapiens -- the focus of these > meetings). and a quick biosapiens overview: a big european project , > 25 institutions, large scale genome protein annotation. decided early > on to use das to distribute annotations between organizations. can > check the stats on their das servers -- andreas' registry -- 23 > servers serving up 69 das sources -- a major das investment! > > In developing das/2 we haven't had too much experience with the kind > of data they're dealing with (protein annotations). > > das/1 clients under study: > - dasty2, dasty1 - ajax-based viz clients > - jalview - alignment viewer, editor > - igb - Ed gave presentation > - pepper and spice - das viewers, also use alignment and 3d structure > info > - proview - protein annotation, > - ensembl viewer > > servers presented/discussed: > - pfam, ensembl, proserver, Andreas', > - Extensions to das/1 protocol discussed: gene das, protein das, > structure das, 3d-em das (arbitrary 3d volumes), interaction das for > prot-prot interactions. Moddas - writeback in das/1. Alignment das > (Andreas). > - Simple das - das servers that don't impl all of das/1 (entry_points, > or types, e.g.,). > > Gregg presented on das/2, will put up ppt later. Tailored it assuming > > [A] Gregg will send out powerpoint for his talk from BioSapiens DAS > workshop > > Focussed on familiarity with das/1, how big the diffs are with an eye > towards how hard it would be to move to das/2. Conceptually, not that > big a switch, though XML is a lot different. > > Also discussed how well das/2 addresses some of the problems with > das/1 that came up at the workshop. > > extensions for das/1: > - das/2 addressed some of them very well. E.g., gene das (das w/o > specifying location of feature). this is addressed well in > das/2. can have features w/o location, or w/o range. > - protein das - das/2 did a good job of removing nucleotide specific > parts of das features (orientation, phase are not required). das/2 > is much more agnostic about dna vs protein. > - alignment das - pairwise or multiple - locations with features in > das/2 addresses some of these issues (0,1,or more locations for a > feature) each location can have optional gap attribute (cigar > string). so if you can describe it with a cigar string, you can > describe it in das/2. Can use multiple locations to do mult > alignments. Not dealt with in das/2: 3d-threading of an alignment > through > a > structure. Need to look at this in the future > > [A] Look at how to handle 3D structure alignment threading in DAS/2 spec > > - simple das stuff handled better in das/2 - in das/1 the assumption > is you support all things unless. but in das/2 there is a > capabilities header, you must indicate support there, if not stated, > the default is you don't support it. Can also say you support > feature filters, so there's more formal support for that. > > Surprises: > - smaller subset of das/1 is in use than expected. of 69 sources, 64 > either fail entry points or say not applicable. types query: 49 > fail/not applicable > > ls: for types query. only one type? > gh: for ensembl, this is the case. > ap: lack of consistency of types is addressed in the other workshop > related to features. > > gh: in types in das/1 it is less necessary because all info is > replicated in each feature, type-method, category, id > ls: use case for types query is to present user with set of > checkboxes, select which type to retrieve from source. if in practice > das sources are being use to for one type or a set of types that only > make sense together, no reason to turn off a part of it, then makes > sense to not support types query. > ls: have heard that types query is expensive. computationally. simple > db backends with no normalization/indexins, finding all types involves > visiting each record. > gh: part of justification with 1 type / source is because those types > are stored in separate db. so having a das server to integrate them > make sense. > > gh: Re: using smaller subset of das/1 than I expected: > types can be expensive in another way, example: representing pfam in > das. feat type for each pfam domain type (9000 primary domains). > Pfam b - there are 70-400K more! > > ls: in das/2 create a single type 'protein domain' then use attribute > pointing to an ontology saying which pfam domain it is. > gh: concern there is, assuming clients will do something useful for > particular attributes. For rendering, I could do diff rendering based > on diff attribs (color diff domains differently). but for clients to > really understand that they're different, that's a more complicated > issue. > > gh: not using types or entry_points by clients because servers don't, > feedback loop. > ap: low coverage genomes (e.g., elephant) may have several 100K entry > points. > gh: in das/2 we are more formal and say that you don't support > it. Creates problem: how do you know what to query in the first place? > Then you have to know what you're looking for. > > gh: feature hierarchies handled in das/2 -- this is not an issue for > protein das, where annotations are completely flat. even protein > disulfide bond is one level, just rendered differently so it doesn't > span all residues in between. But doing non-visual things (unions, > intersections) this could be a problem. > ls: flat in terms of location or ontology? > gh: location. there is no feature ontology yet (no consistent, agreed > upon yet, just proposed at this meeting). > ls: they aren't creating discontinuous features because too hard, or > don't care. > gh: just not needed for most protein annotations. even when it could > be needed, just not being used. > ls: for nucleotide, it's needed frequently > gh: not an issue for das/2 > > gh: ensembl collapses type and source into one thing. what does this > mean? das/2 could be over complicated. > ls: no doubt that it is too complicated for the biosapiens use > case. we could make it easy for them to use by providing tool kits to > read and write. could also argue that postscript is too complicate to > draw simple rectangles on the page. You wouldn't expect then to > simplify postscript. There are tools to ease simple rendering. > The complexity of das/2 won't interfere with adoption, but not having > toolkits, middleware layers to read/write. Not getting ensembl buy-in > to das/2 could be a problem > gh: tim hubbard was there and was on-board to transition to > das/2. > ls: would have be better to have buy in now (i.e., Tony Cox dropping > out) > gh: we've made it more formal to say, here is the subset of das/2 that > this server supports. for other use cases, we do need the added > complexity. > > gh: re: ensembl support for das/2. I mentioned andrew's das/1 - das/2 > transformational proxy server. not released yet, but making progress > on it. So if you have a das/1 server, you can put a das/2 front end on > it. > ls: can you go the other way, provide das/1 interface on das/2? > gh: want to do this for the affy public das/2 server. Andrew's doesn't > do that yet, but I'd like to do this. Another thing: integrate that > proxy into the registry, so the registry makes it into a das/2 > server. then we don't have a burden on servers to support two versions > of the protocol. > got email from andrew about his proxy on that. > > sc: I put a note about Andrew's proxy server on the biodas.org wiki. > gh: he needs to have a place to keep it. > sc: open-bio server would work. Just need a beetter mechanism to > ensure it stays up. I think it's not getting started when the machine > gets rebooted. > > [A] Steve/Andrew work on stable home for the proxy server > > [Correction: In my note in the teleconf, I was thinking about Andrew's > validation server, which is hosted on open-bio and has a problem with > not being up reliably. The proxy server is another issue. There's a > mention of it on the DAS FAQ page, but not pointer to any server > yet. -steve] > > gh: data overload and redundancy from the user perspective. clients > where default for protein annotation is to go to all servers, you have > way too many track showing up. Lots of servers and types. Ensembl is > moving to expose even more data via das, thousands of new tracks > (organisms, type, assembly version). Concern with biosapiens is > replication of the same annotation data. E.g., pfam domains in > different biosapiens data sources, may return same thing or slight > diffs in feature ranges. how does user decide which is authoritative? > Which can be left out? A big concern at the biosapiens meeting -- > redundant information. > > gh: another issue: mirrors for the data. discussed in early days of > das/2, not resolved how to deal with mirrors, http redirection > mechanism. This can lead to redundant data when you hit all mirrors. > > gh: feature classification and ontologies around that. My take was > that the sequence ontology is inadequate to describe protein > annotation as it stands now. PAO - protein annotation ontology > ls: are they doing this with NCBO involved? > gh: talked to them about getting hold of lincoln and suzi and > integrating with SO as an extension. > ap: for 3rd version of SO we will contact lincoln and suzi to discuss > ls: great > gh: for biosapiens, Janet Thornton is the person to contact about > that. > > gh: more about types (proliferation causing data overload issue mentioned > above.) > also discussion about dag vs hierarchical tree. pointing to multiple > terms in the ontology for a particular type. in SO, how much has > multiple parents come up? may need a type that can point to multiple > ontology terms for that type. das/2 cannot do it yet, only one term > per type. > ls: the more flexible we make it the less coherent it will be. data > overload will get even worse. to reduce data overload, need a way to > take data from servers and deciding if same or different. are they > reachable in same ontology? allowing set arithematic will create > ambiguity. biosapiens can be allowed with an attribute, multiple > attributes that point at different ontologies. > > gh: combining cellular location with protien classification > ontologies. > ls: certainly, but those are separate attributes. what we created is > essentially an RDF. Actually, terminology is 'property' not > attribute. Types property is the correct way to do this. > > gh: use of subset of das/1, what it means for das/2 > data overload for users, > featu classification issues > > gh: das wish list, people wrote up what they feel what das is > inadequate for. Das/2 group was aware of these. > > ls: encryption, synchronous request seem like impl issues, not part of > protocol. > gh: some people complained that das is inadequate because it relies on > http(s). you can do much more high-level things with soap-based > system. I think this is correct, but wrong that no one in our space > needs that. > ls: no pharma that cares about this will entrust it to the public > internet with any thing, soap or otherwise. > gh: at affy, we've done das/1 servers with https and no one has ever > complained. > ls: identity theft problems via people stealing from encrypted streams > never emerged as a problem. they steal it from your physical trash, > setting up phony banking sites. Not related to strength of encryption. > gh: regarding asynch request - discussed 2 years ago -- yes, it's > outside of das/2 spec, but we say, use http as you will. redirect and > say "your request has been accepted, check back here in a while." > > gh: wish list (sent out in email to the list noted above): > - multi-level features, stylesheets > - caching - use http caching as you will > - features from other sources - dealth with since we use URIs. a > problem for das/1 > > ls: providence requires people to put in effort to maintain the > providence, but it doesn't free you of responsibility of having to > track it. > > - scalability and large analysis - the data overload issue. the > answer to me is smarter clients. > > - more queries -- addressed in das/2 > - entry point supports - in das/2 we have a less ambiguous way to say > whether a server points it or not. > - counting number of features of each type per source -- have the > 'count' format in das/2 > - refering to id's externally (das/2 uri's) > - errors and exception handling - we have http error codes -- remains > to be seen how well it works out. done a reasonable job to map it to > http error codes > - better stylesheets - in progress for das/2 > - mapping servers - different genome assembly versions or mapping from > protein to nucleotide space. -- under discussion with data > providers. > > ap: Another thing on wish list: people want to know stats per server, > uptime, hits, etc. (server stats). > gh: andreas' registry does a good job for das/1. biosapiens registry > is built on Andreas' registry. How many are up, which requests they > support, the data the server. Very nice. > > ap: Gregg's coverage was good. Also gave a very good advertisement for > das/2! > > gh: the das/1 to das/2 transformational proxy was quite > popular. doesn't take advantage of das/2 power, but gets people started. > > Other Topics: > -------------- > sc: biodas.org wiki is now officially up. > gh: mentioned to Tim Hubbard. He said, "I know. I already edited it." > > sc: globalseqids page needs das2xml snippets for coordinates. > > [A] lincoln will add das2xml coordinate snippets to globalseqids page on > wiki > > sc: might also be good to have notice of the next teleconf on the > site. Maybe pointers to the notes as well. > gh: maybe have an automatic email sent out reminding folks? > sc: maybe not, if we have a list of the dates for upcoming meetings on > the site. > > [A] Steve post list of dates of upcoming DAS/2 teleconferences on wiki > > Next meeting in two weeks: 19 mar 2007 > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Steve_Chervitz at affymetrix.com Mon Mar 19 13:47:57 2007 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 19 Mar 2007 10:47:57 -0700 Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 19 Mar 2007 Message-ID: Notes from the biweekly DAS/2 teleconference, 19 Mar 2007 $Id: das2-teleconf-2007-03-19.txt,v 1.2 2007/03/19 17:46:41 sac Exp $ Teleconference Info: * Schedule: Biweekly on Monday * Time of Day: 9:30 AM PST, 17:30 GMT * Dialin (US): 800-531-3250 * Dialin (Intl): 303-928-2693 * Toll-free UK: 08 00 40 49 467 * Toll-free France: 08 00 907 839 * Conference ID: 2879055 * Passcode: 1365 Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt CSHL: Lincoln Stein Note taker: Steve Chervitz Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/ and are viewable on-line at http://biodas.org/documents/das2/notes/ Instructions on how to access the DAS/2 CVS repository are at http://www.biodas.org/wiki/DAS/2#CVS_Access DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * General issues * Status reports, including report from Lincoln on hapmap and das2 * Gregg's post-grant status * IGB support post-March Topic: General Issues ---------------------- ls: Regarding the coordinate stuff for global seq ids, need clarification (see me message on list). gh: for each release we should have the xml snippet for the coordinates, four attribs for authority, etc. so people can see directly what they need to provide in their DAS/2 request. [A] gregg will send global seq ID coordinate XML example to Lincoln Topic: Status reports ---------------------- gh: working on getting good reporesentations of graphs for Affy das/2 server serving up tiling array data. Serving up slices of graphs. Working well on my test server, better than expected. Slow thing is the indexing the first time it sees a file. Chrm1 at 5bp resolution tiling array data, 120M data points, slicing indexing takes a couple of seconds the first time, other times there's no delay. this is serving up in an optimized format. Need to serve in std das/2 format with a feature per data point. Not too hard. Planning to deploy in April when Steve gets new server running. Drosophila time-course public data. 8-9 time points RNA expression tiling arrays. When phase 3 ENCODE paper comes out, we'll have a pointer to our server for viewing that data. Also need to beef up feat filter queries to support full spec on the Affy das/2 server. transition IGB from using quickload and replace all quickload stuff with das/2, so we don't need to maintain two code bases and data respositories. ls: hapmap das/2 server is up and running. temporarily at Brian Gilman's consultancy business. He's coming here to CSHL to get a permanent version running on hapmap.org by next week. There's a whole API for accessing that data in the form that's required by NCI's caBIO project (caCORE). After server goes up, I'll point coordinates that location, documentation. It works with other das/2 sources as well, (Affy, biopackages). gh: So it will put any of that DAS-available data into caCORE object model? ls: yes. It also can give data as DOM models, might be easier for some users/apps. gh: Rolling this into the next caBIO release? ls: yes. ls: Will provide snp's and haplotype blocks as features. one track per population. we can put as many tracks in as you need. Just one set now. There are 4 populations grouped into three panels, since two pop's don't have enough diffs to break them out. [A] lincoln send gregg pointer to current hapmap server for testing sc: Working on configuring the new affy das/2 public server, a replacement machine with a lot more RAM than current box. Have been busy with other Affy work (new Netaffx release, new product support, etc.) but should be mostly done with this by end of March. Should be able to devote some solid blocks to DAS work (target: 3wks). Plan is to support as many Affy products as we can. Less focus on supporting UCSC-provided annotations (since they're the best source for them). sc: Gregg, have you considered using the same approach for serving annotations by your das/2 server as you are doing to support graphs? Could ease memory requirements. gh: possible, but not practical, since it would require a new format for every feature type. Graphs are relatively straightforward to serve up via an indexing strategy. Doing something similar for features would mean essentially writing a database app. Other Items: ------------- gh: grant admin says our burn rate is lower than anticipated. we can apply for a no-cost extension. should last at least till the end of June as for funding. We'll apply for that. Not sure what it means for CSHL. last time it took 3-4 mos to sort it out. ls: start working on it now. there were communication problems in the past. would be great if Allen could extend another month or two. gh: Andrew will come visit me in the next day or two. Will get the latest from him. He's been working on the transformational das1-> das2 proxy. Want to get the Ensembl people to use it ASAP. [A] get a usable das1->das2 proxy server, deploy at Ensembl gh: Need to look at how to support scores in das/2. we dropped score element. You can add arbitrary elements to das/2. You can put in multiple diff scores that way, or use XML namespaces to bring in a das/2 score element. Want to have a recommended way of doing this. Need more input from others. In Europe they're using score a lot more than here in the States. [A] come up with recommended way to support scores in DAS/2 Topic: Gregg's agenda ---------------------- gh: I am planning to leave Affy at end of the grant. Will focus on doing hands-on DAS/2 evangelism, ideally work with UCSC. Then will take some time off. Affy wasn't interested in supporting das w/o some outside funding. Therefore, it's a good time to transition. Regarding UCSC? ready to go down there and write some code. They have a das/1 server, they just need someone with DAS/2 expertise that I can provide. biggest prob with das/2 is adoption outside of the grant people. sc: considered using Andrew's proxy? gh: might be OK for a temporary solution, but it wouldn't be as efficient as directly supporting das/2, and I know Jim et al are interested in efficiency. Since I'm in the area, I can help them get into DAS/2 directly, which would help with DAS/2 acceptance by the community. gh: Another goal was to have a DAS/2 paper ready and submitted before I leave, want to have a rough draft in april. Plan to submit to an open source journal: Biomedcentral, PLoS, or other. [A] Gregg will circulate draft of DAS/2 paper, draft in April. Topic: IGB Support ------------------- ee: Regarding IGB support, Affy is not supporting IGB after March, they are moving me to a different project. Support for IGB could return if there's enough interest. sc: how self-supporting is the igb community? ee: not much. gh: Ann Loraine has interest as do internal Affy users. sc: Sourceforge has a new wiki project that's in beta now, for adding a wiki to your project's web page. Could help make the IGB community self-supporting, on-line docs, FAQ, etc. I volunteered to participate, but haven't done anything with it yet. gh: IGB has a good user's guide now, thanks to Ed's recent update. ee: I'm also working on plugin interface and documenting the http API protocol, things that will make it easier for others to use IGB with other programs. From Gregg_Helt at affymetrix.com Mon Mar 5 16:40:26 2007 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 5 Mar 2007 08:40:26 -0800 Subject: [DAS2] DAS/2 Teleconference today at 9:30 AM PST Message-ID: Just a reminder that the DAS/2 teleconference will be at the regular time today, 9:30 AM Pacific time. Ed and I will be summarizing the DAS developer and BioSapiens feature classification workshops we attended last week in Hinxton. Hopefully others who attended will join in and give their perspectives as well. Conference phone # USA: 800-531-8250 International: 303-928-2693 Conference ID: 2879055 Passcode: 1365 Gregg From Gregg_Helt at affymetrix.com Mon Mar 5 17:30:10 2007 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 5 Mar 2007 09:30:10 -0800 Subject: [DAS2] Brief summary of DAS/BioSapiens workshops from a DAS/2 perspective Message-ID: Summary of DAS & Feature Classification workshops, February 26-28 2007, Hinxton DAS Developers Workshop: http://www.sanger.ac.uk/Users/ap3/dasworkshop.html BioSapiens Feature Type Classification Workshop: http://www.ebi.ac.uk/~hhe/tmp/BioSapiensFeatureMeeting.htm DAS1 clients discussed: Dasty2, JalView, VectorBase, IGB, Pepper, Spice, ProView, Ensembl ContigView, ... DAS1 servers discussed: PFam, Ensembl, ProServer, Sisyphus, ... DAS1 extensions: Gene DAS Protein DAS Alignmen tDAS Structure DAS 3D-EM DAS Interaction DAS MaDAS (writeback?) "simple" DAS DAS/2 BioSapiens Overview: http://www.biosapiens.info Large-scale genome/protein annotation, 25 institutions from 14 countries across Europe participating Currently 23 DAS servers within BioSapiens project serving 69 DAS sources. 4 servers appear to be down (21 sources fail features query) See http://www.biosapiens.info/page.php?page=biosapiensdir for more DAS server stats Major concerns for Ensembl / Sanger / BioSapiens I think we've addressed well in DAS/2 Gene DAS Protein DAS Alignment DAS "simple" DAS Major concerns for Ensembl / Sanger / BioSapiens that surprised me: A) In general the use of a smaller subset of DAS1 than expected Many BioSapiens DAS servers don't support "entry_points" query (64 fail|NA) Many BioSapiens DAS servers don't support "types query" (49 fail|NA) in DAS1 features themselves can carry most of the types info Some BioSapiens DAS servers don't support "features" query parameters (only the features query with no params) Many BioSapiens clients don't use "entry_points" query, "types" query, or any feature filters (always get all features for a given segment) BioSapiens protein annotation almost exclusively uses flat (one-level) features very little or no use of "group" attribute to make two-level features example: disulfide bond annotation- relies on rendering or prior knowledge to differentiate Ensembl DAS servers are in general serving one type per source These simplifications of clients and servers are reinforcing each other If using subset of DAS1, does this mean that DAS/2 might be too complex? But with these simplifications, the complexity is getting pushed into other places B) Data overload Number of servers, sources, types Ensembl: will have 1000s of sources soon Redundancy concerns example: Pfam domain Many sources with same / similar annotation type - "Pfam domain" Slight differences in feature ranges Which is the authority? Is there a way to help clients decide which can be combined Mirrors C) Feature Classification / Ontology issues SO currently inadequate for describing protein annotation developing PAO (Protein Annotation Ontology) types proliferation example: one feature type for each PFam domain? ~9K PFam-A domains If look at PFam-B (PRODOM that don't overlap PFam-A), then ~70K / 450K more (>2 proteins in family / not) of not in unique type, where does that information go? Need multiple ontology terms to describe a single type? ------------------------------------------------------------------------ ------ DAS WishList (last session of DAS workshop, people listed desired improvements on whiteboard) Multi-level features (Gregg) Multi-level stylesheets (Ed) Caching (last-modified, if-modified-since, TTL) Provenance of features from other sources (features from different sources with same IDs? types?) Large analysis / Scalibility 1000s of seqs + 1000s sources + types ? More queries: feature types / date Entry point support Encryption support Stats-query interface -- count # of features of type for a source ID ref external (URI / URN) Proper error / exception handling Asynchronous requests process batches Better Stylesheets Mapping servers We've discussed most of these wishlist issues before while developing DAS/2, though we certainly haven't completely solved all of them... From Steve_Chervitz at affymetrix.com Mon Mar 5 19:03:03 2007 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 05 Mar 2007 11:03:03 -0800 Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 5 Mar 2007 Message-ID: Notes from the biweekly DAS/2 teleconference, 5 Mar 2007 $Id: das2-teleconf-2007-03-05.txt,v 1.2 2007/03/05 19:01:59 sac Exp $ Teleconference Info: * Schedule: Biweekly on Monday * Time of Day: 9:30 AM PST, 17:30 GMT * Dialin (US): 800-531-3250 * Dialin (Intl): 303-928-2693 * Toll-free UK: 08 00 40 49 467 * Toll-free France: 08 00 907 839 * Conference ID: 2879055 * Passcode: 1365 Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt CSHL: Lincoln Stein Sanger: Andreas Prlic UCLA: Allen Day Note taker: Steve Chervitz Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * Review of BioSapiens DAS workshop * Status updates gh: I sent my summary of the biosapiens das workshop and feature classification workshop I attended with Ed in Hinxton: http://lists.open-bio.org/pipermail/das2/2007-March/000982.html "das developers workshop from a das/2 perspective", summarizes what I took home from these meetings, how well das/2 meets needs of people in europe (ensembl, sanger, biosapiens -- the focus of these meetings). and a quick biosapiens overview: a big european project , 25 institutions, large scale genome protein annotation. decided early on to use das to distribute annotations between organizations. can check the stats on their das servers -- andreas' registry -- 23 servers serving up 69 das sources -- a major das investment! In developing das/2 we haven't had too much experience with the kind of data they're dealing with (protein annotations). das/1 clients under study: - dasty2, dasty1 - ajax-based viz clients - jalview - alignment viewer, editor - igb - Ed gave presentation - pepper and spice - das viewers, also use alignment and 3d structure info - proview - protein annotation, - ensembl viewer servers presented/discussed: - pfam, ensembl, proserver, Andreas', - Extensions to das/1 protocol discussed: gene das, protein das, structure das, 3d-em das (arbitrary 3d volumes), interaction das for prot-prot interactions. Moddas - writeback in das/1. Alignment das (Andreas). - Simple das - das servers that don't impl all of das/1 (entry_points, or types, e.g.,). Gregg presented on das/2, will put up ppt later. Tailored it assuming [A] Gregg will send out powerpoint for his talk from BioSapiens DAS workshop Focussed on familiarity with das/1, how big the diffs are with an eye towards how hard it would be to move to das/2. Conceptually, not that big a switch, though XML is a lot different. Also discussed how well das/2 addresses some of the problems with das/1 that came up at the workshop. extensions for das/1: - das/2 addressed some of them very well. E.g., gene das (das w/o specifying location of feature). this is addressed well in das/2. can have features w/o location, or w/o range. - protein das - das/2 did a good job of removing nucleotide specific parts of das features (orientation, phase are not required). das/2 is much more agnostic about dna vs protein. - alignment das - pairwise or multiple - locations with features in das/2 addresses some of these issues (0,1,or more locations for a feature) each location can have optional gap attribute (cigar string). so if you can describe it with a cigar string, you can describe it in das/2. Can use multiple locations to do mult alignments. Not dealt with in das/2: 3d-threading of an alignment through a structure. Need to look at this in the future [A] Look at how to handle 3D structure alignment threading in DAS/2 spec - simple das stuff handled better in das/2 - in das/1 the assumption is you support all things unless. but in das/2 there is a capabilities header, you must indicate support there, if not stated, the default is you don't support it. Can also say you support feature filters, so there's more formal support for that. Surprises: - smaller subset of das/1 is in use than expected. of 69 sources, 64 either fail entry points or say not applicable. types query: 49 fail/not applicable ls: for types query. only one type? gh: for ensembl, this is the case. ap: lack of consistency of types is addressed in the other workshop related to features. gh: in types in das/1 it is less necessary because all info is replicated in each feature, type-method, category, id ls: use case for types query is to present user with set of checkboxes, select which type to retrieve from source. if in practice das sources are being use to for one type or a set of types that only make sense together, no reason to turn off a part of it, then makes sense to not support types query. ls: have heard that types query is expensive. computationally. simple db backends with no normalization/indexins, finding all types involves visiting each record. gh: part of justification with 1 type / source is because those types are stored in separate db. so having a das server to integrate them make sense. gh: Re: using smaller subset of das/1 than I expected: types can be expensive in another way, example: representing pfam in das. feat type for each pfam domain type (9000 primary domains). Pfam b - there are 70-400K more! ls: in das/2 create a single type 'protein domain' then use attribute pointing to an ontology saying which pfam domain it is. gh: concern there is, assuming clients will do something useful for particular attributes. For rendering, I could do diff rendering based on diff attribs (color diff domains differently). but for clients to really understand that they're different, that's a more complicated issue. gh: not using types or entry_points by clients because servers don't, feedback loop. ap: low coverage genomes (e.g., elephant) may have several 100K entry points. gh: in das/2 we are more formal and say that you don't support it. Creates problem: how do you know what to query in the first place? Then you have to know what you're looking for. gh: feature hierarchies handled in das/2 -- this is not an issue for protein das, where annotations are completely flat. even protein disulfide bond is one level, just rendered differently so it doesn't span all residues in between. But doing non-visual things (unions, intersections) this could be a problem. ls: flat in terms of location or ontology? gh: location. there is no feature ontology yet (no consistent, agreed upon yet, just proposed at this meeting). ls: they aren't creating discontinuous features because too hard, or don't care. gh: just not needed for most protein annotations. even when it could be needed, just not being used. ls: for nucleotide, it's needed frequently gh: not an issue for das/2 gh: ensembl collapses type and source into one thing. what does this mean? das/2 could be over complicated. ls: no doubt that it is too complicated for the biosapiens use case. we could make it easy for them to use by providing tool kits to read and write. could also argue that postscript is too complicate to draw simple rectangles on the page. You wouldn't expect then to simplify postscript. There are tools to ease simple rendering. The complexity of das/2 won't interfere with adoption, but not having toolkits, middleware layers to read/write. Not getting ensembl buy-in to das/2 could be a problem gh: tim hubbard was there and was on-board to transition to das/2. ls: would have be better to have buy in now (i.e., Tony Cox dropping out) gh: we've made it more formal to say, here is the subset of das/2 that this server supports. for other use cases, we do need the added complexity. gh: re: ensembl support for das/2. I mentioned andrew's das/1 - das/2 transformational proxy server. not released yet, but making progress on it. So if you have a das/1 server, you can put a das/2 front end on it. ls: can you go the other way, provide das/1 interface on das/2? gh: want to do this for the affy public das/2 server. Andrew's doesn't do that yet, but I'd like to do this. Another thing: integrate that proxy into the registry, so the registry makes it into a das/2 server. then we don't have a burden on servers to support two versions of the protocol. got email from andrew about his proxy on that. sc: I put a note about Andrew's proxy server on the biodas.org wiki. gh: he needs to have a place to keep it. sc: open-bio server would work. Just need a beetter mechanism to ensure it stays up. I think it's not getting started when the machine gets rebooted. [A] Steve/Andrew work on stable home for the proxy server [Correction: In my note in the teleconf, I was thinking about Andrew's validation server, which is hosted on open-bio and has a problem with not being up reliably. The proxy server is another issue. There's a mention of it on the DAS FAQ page, but not pointer to any server yet. -steve] gh: data overload and redundancy from the user perspective. clients where default for protein annotation is to go to all servers, you have way too many track showing up. Lots of servers and types. Ensembl is moving to expose even more data via das, thousands of new tracks (organisms, type, assembly version). Concern with biosapiens is replication of the same annotation data. E.g., pfam domains in different biosapiens data sources, may return same thing or slight diffs in feature ranges. how does user decide which is authoritative? Which can be left out? A big concern at the biosapiens meeting -- redundant information. gh: another issue: mirrors for the data. discussed in early days of das/2, not resolved how to deal with mirrors, http redirection mechanism. This can lead to redundant data when you hit all mirrors. gh: feature classification and ontologies around that. My take was that the sequence ontology is inadequate to describe protein annotation as it stands now. PAO - protein annotation ontology ls: are they doing this with NCBO involved? gh: talked to them about getting hold of lincoln and suzi and integrating with SO as an extension. ap: for 3rd version of SO we will contact lincoln and suzi to discuss ls: great gh: for biosapiens, Janet Thornton is the person to contact about that. gh: more about types (proliferation causing data overload issue mentioned above.) also discussion about dag vs hierarchical tree. pointing to multiple terms in the ontology for a particular type. in SO, how much has multiple parents come up? may need a type that can point to multiple ontology terms for that type. das/2 cannot do it yet, only one term per type. ls: the more flexible we make it the less coherent it will be. data overload will get even worse. to reduce data overload, need a way to take data from servers and deciding if same or different. are they reachable in same ontology? allowing set arithematic will create ambiguity. biosapiens can be allowed with an attribute, multiple attributes that point at different ontologies. gh: combining cellular location with protien classification ontologies. ls: certainly, but those are separate attributes. what we created is essentially an RDF. Actually, terminology is 'property' not attribute. Types property is the correct way to do this. gh: use of subset of das/1, what it means for das/2 data overload for users, featu classification issues gh: das wish list, people wrote up what they feel what das is inadequate for. Das/2 group was aware of these. ls: encryption, synchronous request seem like impl issues, not part of protocol. gh: some people complained that das is inadequate because it relies on http(s). you can do much more high-level things with soap-based system. I think this is correct, but wrong that no one in our space needs that. ls: no pharma that cares about this will entrust it to the public internet with any thing, soap or otherwise. gh: at affy, we've done das/1 servers with https and no one has ever complained. ls: identity theft problems via people stealing from encrypted streams never emerged as a problem. they steal it from your physical trash, setting up phony banking sites. Not related to strength of encryption. gh: regarding asynch request - discussed 2 years ago -- yes, it's outside of das/2 spec, but we say, use http as you will. redirect and say "your request has been accepted, check back here in a while." gh: wish list (sent out in email to the list noted above): - multi-level features, stylesheets - caching - use http caching as you will - features from other sources - dealth with since we use URIs. a problem for das/1 ls: providence requires people to put in effort to maintain the providence, but it doesn't free you of responsibility of having to track it. - scalability and large analysis - the data overload issue. the answer to me is smarter clients. - more queries -- addressed in das/2 - entry point supports - in das/2 we have a less ambiguous way to say whether a server points it or not. - counting number of features of each type per source -- have the 'count' format in das/2 - refering to id's externally (das/2 uri's) - errors and exception handling - we have http error codes -- remains to be seen how well it works out. done a reasonable job to map it to http error codes - better stylesheets - in progress for das/2 - mapping servers - different genome assembly versions or mapping from protein to nucleotide space. -- under discussion with data providers. ap: Another thing on wish list: people want to know stats per server, uptime, hits, etc. (server stats). gh: andreas' registry does a good job for das/1. biosapiens registry is built on Andreas' registry. How many are up, which requests they support, the data the server. Very nice. ap: Gregg's coverage was good. Also gave a very good advertisement for das/2! gh: the das/1 to das/2 transformational proxy was quite popular. doesn't take advantage of das/2 power, but gets people started. Other Topics: -------------- sc: biodas.org wiki is now officially up. gh: mentioned to Tim Hubbard. He said, "I know. I already edited it." sc: globalseqids page needs das2xml snippets for coordinates. [A] lincoln will add das2xml coordinate snippets to globalseqids page on wiki sc: might also be good to have notice of the next teleconf on the site. Maybe pointers to the notes as well. gh: maybe have an automatic email sent out reminding folks? sc: maybe not, if we have a list of the dates for upcoming meetings on the site. [A] Steve post list of dates of upcoming DAS/2 teleconferences on wiki Next meeting in two weeks: 19 mar 2007 From Gregg_Helt at affymetrix.com Wed Mar 7 21:21:48 2007 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 7 Mar 2007 13:21:48 -0800 Subject: [DAS2] Stable URIs coming from NCBI? Message-ID: Some good news (or at least rumor of good news) from NCBI -- plans to expose stable URIs for all their resources: http://lists.w3.org/Archives/Public/public-semweb-lifesci/2007Feb/0123.h tml Which would fit nicely with the URI-centric approach of DAS/2... Gregg From lstein at cshl.edu Mon Mar 12 17:02:51 2007 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 12 Mar 2007 13:02:51 -0400 Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 5 Mar 2007 In-Reply-To: References: Message-ID: <6dce9a0b0703121002h4f866b10jb160044260ea812e@mail.gmail.com> > > lincoln will add das2xml coordinate snippets to globalseqids page on > wiki > I added one line to the description of the H. sapiens source. Is this what you're looking for? If it is, I'll go ahead and add the rest. Note that the contents of the XML are not defined anywhere. I'm not sure why there should be a URI that looks like it is fetchable. Lincoln On 3/5/07, Steve Chervitz wrote: > > Notes from the biweekly DAS/2 teleconference, 5 Mar 2007 > > $Id: das2-teleconf-2007-03-05.txt,v 1.2 2007/03/05 19:01:59 sac Exp $ > > Teleconference Info: > * Schedule: Biweekly on Monday > * Time of Day: 9:30 AM PST, 17:30 GMT > * Dialin (US): 800-531-3250 > * Dialin (Intl): 303-928-2693 > * Toll-free UK: 08 00 40 49 467 > * Toll-free France: 08 00 907 839 > * Conference ID: 2879055 > * Passcode: 1365 > > Attendees: > Affy: Steve Chervitz, Ed Erwin, Gregg Helt > CSHL: Lincoln Stein > Sanger: Andreas Prlic > UCLA: Allen Day > > Note taker: Steve Chervitz > > Action items are flagged with '[A]'. > > These notes are checked into the biodas.org CVS repository at > das/das2/notes/. Instructions on how to access this > repository are at http://biodas.org > > DISCLAIMER: > The note taker aims for completeness and accuracy, but these goals are > not always achievable, given the desire to get the notes out with a > rapid turnaround. So don't consider these notes as complete minutes > from the meeting, but rather abbreviated, summarized versions of what > was discussed. There may be errors of commission and omission. > Participants are welcome to post comments and/or corrections to these > as they see fit. > > > Agenda > ------- > * Review of BioSapiens DAS workshop > * Status updates > > > gh: I sent my summary of the biosapiens das workshop and feature > classification workshop I attended with Ed in Hinxton: > http://lists.open-bio.org/pipermail/das2/2007-March/000982.html > > "das developers workshop from a das/2 perspective", summarizes what I > took home from these meetings, how well das/2 meets needs of people in > europe (ensembl, sanger, biosapiens -- the focus of these > meetings). and a quick biosapiens overview: a big european project , > 25 institutions, large scale genome protein annotation. decided early > on to use das to distribute annotations between organizations. can > check the stats on their das servers -- andreas' registry -- 23 > servers serving up 69 das sources -- a major das investment! > > In developing das/2 we haven't had too much experience with the kind > of data they're dealing with (protein annotations). > > das/1 clients under study: > - dasty2, dasty1 - ajax-based viz clients > - jalview - alignment viewer, editor > - igb - Ed gave presentation > - pepper and spice - das viewers, also use alignment and 3d structure > info > - proview - protein annotation, > - ensembl viewer > > servers presented/discussed: > - pfam, ensembl, proserver, Andreas', > - Extensions to das/1 protocol discussed: gene das, protein das, > structure das, 3d-em das (arbitrary 3d volumes), interaction das for > prot-prot interactions. Moddas - writeback in das/1. Alignment das > (Andreas). > - Simple das - das servers that don't impl all of das/1 (entry_points, > or types, e.g.,). > > Gregg presented on das/2, will put up ppt later. Tailored it assuming > > [A] Gregg will send out powerpoint for his talk from BioSapiens DAS > workshop > > Focussed on familiarity with das/1, how big the diffs are with an eye > towards how hard it would be to move to das/2. Conceptually, not that > big a switch, though XML is a lot different. > > Also discussed how well das/2 addresses some of the problems with > das/1 that came up at the workshop. > > extensions for das/1: > - das/2 addressed some of them very well. E.g., gene das (das w/o > specifying location of feature). this is addressed well in > das/2. can have features w/o location, or w/o range. > - protein das - das/2 did a good job of removing nucleotide specific > parts of das features (orientation, phase are not required). das/2 > is much more agnostic about dna vs protein. > - alignment das - pairwise or multiple - locations with features in > das/2 addresses some of these issues (0,1,or more locations for a > feature) each location can have optional gap attribute (cigar > string). so if you can describe it with a cigar string, you can > describe it in das/2. Can use multiple locations to do mult > alignments. Not dealt with in das/2: 3d-threading of an alignment > through > a > structure. Need to look at this in the future > > [A] Look at how to handle 3D structure alignment threading in DAS/2 spec > > - simple das stuff handled better in das/2 - in das/1 the assumption > is you support all things unless. but in das/2 there is a > capabilities header, you must indicate support there, if not stated, > the default is you don't support it. Can also say you support > feature filters, so there's more formal support for that. > > Surprises: > - smaller subset of das/1 is in use than expected. of 69 sources, 64 > either fail entry points or say not applicable. types query: 49 > fail/not applicable > > ls: for types query. only one type? > gh: for ensembl, this is the case. > ap: lack of consistency of types is addressed in the other workshop > related to features. > > gh: in types in das/1 it is less necessary because all info is > replicated in each feature, type-method, category, id > ls: use case for types query is to present user with set of > checkboxes, select which type to retrieve from source. if in practice > das sources are being use to for one type or a set of types that only > make sense together, no reason to turn off a part of it, then makes > sense to not support types query. > ls: have heard that types query is expensive. computationally. simple > db backends with no normalization/indexins, finding all types involves > visiting each record. > gh: part of justification with 1 type / source is because those types > are stored in separate db. so having a das server to integrate them > make sense. > > gh: Re: using smaller subset of das/1 than I expected: > types can be expensive in another way, example: representing pfam in > das. feat type for each pfam domain type (9000 primary domains). > Pfam b - there are 70-400K more! > > ls: in das/2 create a single type 'protein domain' then use attribute > pointing to an ontology saying which pfam domain it is. > gh: concern there is, assuming clients will do something useful for > particular attributes. For rendering, I could do diff rendering based > on diff attribs (color diff domains differently). but for clients to > really understand that they're different, that's a more complicated > issue. > > gh: not using types or entry_points by clients because servers don't, > feedback loop. > ap: low coverage genomes (e.g., elephant) may have several 100K entry > points. > gh: in das/2 we are more formal and say that you don't support > it. Creates problem: how do you know what to query in the first place? > Then you have to know what you're looking for. > > gh: feature hierarchies handled in das/2 -- this is not an issue for > protein das, where annotations are completely flat. even protein > disulfide bond is one level, just rendered differently so it doesn't > span all residues in between. But doing non-visual things (unions, > intersections) this could be a problem. > ls: flat in terms of location or ontology? > gh: location. there is no feature ontology yet (no consistent, agreed > upon yet, just proposed at this meeting). > ls: they aren't creating discontinuous features because too hard, or > don't care. > gh: just not needed for most protein annotations. even when it could > be needed, just not being used. > ls: for nucleotide, it's needed frequently > gh: not an issue for das/2 > > gh: ensembl collapses type and source into one thing. what does this > mean? das/2 could be over complicated. > ls: no doubt that it is too complicated for the biosapiens use > case. we could make it easy for them to use by providing tool kits to > read and write. could also argue that postscript is too complicate to > draw simple rectangles on the page. You wouldn't expect then to > simplify postscript. There are tools to ease simple rendering. > The complexity of das/2 won't interfere with adoption, but not having > toolkits, middleware layers to read/write. Not getting ensembl buy-in > to das/2 could be a problem > gh: tim hubbard was there and was on-board to transition to > das/2. > ls: would have be better to have buy in now (i.e., Tony Cox dropping > out) > gh: we've made it more formal to say, here is the subset of das/2 that > this server supports. for other use cases, we do need the added > complexity. > > gh: re: ensembl support for das/2. I mentioned andrew's das/1 - das/2 > transformational proxy server. not released yet, but making progress > on it. So if you have a das/1 server, you can put a das/2 front end on > it. > ls: can you go the other way, provide das/1 interface on das/2? > gh: want to do this for the affy public das/2 server. Andrew's doesn't > do that yet, but I'd like to do this. Another thing: integrate that > proxy into the registry, so the registry makes it into a das/2 > server. then we don't have a burden on servers to support two versions > of the protocol. > got email from andrew about his proxy on that. > > sc: I put a note about Andrew's proxy server on the biodas.org wiki. > gh: he needs to have a place to keep it. > sc: open-bio server would work. Just need a beetter mechanism to > ensure it stays up. I think it's not getting started when the machine > gets rebooted. > > [A] Steve/Andrew work on stable home for the proxy server > > [Correction: In my note in the teleconf, I was thinking about Andrew's > validation server, which is hosted on open-bio and has a problem with > not being up reliably. The proxy server is another issue. There's a > mention of it on the DAS FAQ page, but not pointer to any server > yet. -steve] > > gh: data overload and redundancy from the user perspective. clients > where default for protein annotation is to go to all servers, you have > way too many track showing up. Lots of servers and types. Ensembl is > moving to expose even more data via das, thousands of new tracks > (organisms, type, assembly version). Concern with biosapiens is > replication of the same annotation data. E.g., pfam domains in > different biosapiens data sources, may return same thing or slight > diffs in feature ranges. how does user decide which is authoritative? > Which can be left out? A big concern at the biosapiens meeting -- > redundant information. > > gh: another issue: mirrors for the data. discussed in early days of > das/2, not resolved how to deal with mirrors, http redirection > mechanism. This can lead to redundant data when you hit all mirrors. > > gh: feature classification and ontologies around that. My take was > that the sequence ontology is inadequate to describe protein > annotation as it stands now. PAO - protein annotation ontology > ls: are they doing this with NCBO involved? > gh: talked to them about getting hold of lincoln and suzi and > integrating with SO as an extension. > ap: for 3rd version of SO we will contact lincoln and suzi to discuss > ls: great > gh: for biosapiens, Janet Thornton is the person to contact about > that. > > gh: more about types (proliferation causing data overload issue mentioned > above.) > also discussion about dag vs hierarchical tree. pointing to multiple > terms in the ontology for a particular type. in SO, how much has > multiple parents come up? may need a type that can point to multiple > ontology terms for that type. das/2 cannot do it yet, only one term > per type. > ls: the more flexible we make it the less coherent it will be. data > overload will get even worse. to reduce data overload, need a way to > take data from servers and deciding if same or different. are they > reachable in same ontology? allowing set arithematic will create > ambiguity. biosapiens can be allowed with an attribute, multiple > attributes that point at different ontologies. > > gh: combining cellular location with protien classification > ontologies. > ls: certainly, but those are separate attributes. what we created is > essentially an RDF. Actually, terminology is 'property' not > attribute. Types property is the correct way to do this. > > gh: use of subset of das/1, what it means for das/2 > data overload for users, > featu classification issues > > gh: das wish list, people wrote up what they feel what das is > inadequate for. Das/2 group was aware of these. > > ls: encryption, synchronous request seem like impl issues, not part of > protocol. > gh: some people complained that das is inadequate because it relies on > http(s). you can do much more high-level things with soap-based > system. I think this is correct, but wrong that no one in our space > needs that. > ls: no pharma that cares about this will entrust it to the public > internet with any thing, soap or otherwise. > gh: at affy, we've done das/1 servers with https and no one has ever > complained. > ls: identity theft problems via people stealing from encrypted streams > never emerged as a problem. they steal it from your physical trash, > setting up phony banking sites. Not related to strength of encryption. > gh: regarding asynch request - discussed 2 years ago -- yes, it's > outside of das/2 spec, but we say, use http as you will. redirect and > say "your request has been accepted, check back here in a while." > > gh: wish list (sent out in email to the list noted above): > - multi-level features, stylesheets > - caching - use http caching as you will > - features from other sources - dealth with since we use URIs. a > problem for das/1 > > ls: providence requires people to put in effort to maintain the > providence, but it doesn't free you of responsibility of having to > track it. > > - scalability and large analysis - the data overload issue. the > answer to me is smarter clients. > > - more queries -- addressed in das/2 > - entry point supports - in das/2 we have a less ambiguous way to say > whether a server points it or not. > - counting number of features of each type per source -- have the > 'count' format in das/2 > - refering to id's externally (das/2 uri's) > - errors and exception handling - we have http error codes -- remains > to be seen how well it works out. done a reasonable job to map it to > http error codes > - better stylesheets - in progress for das/2 > - mapping servers - different genome assembly versions or mapping from > protein to nucleotide space. -- under discussion with data > providers. > > ap: Another thing on wish list: people want to know stats per server, > uptime, hits, etc. (server stats). > gh: andreas' registry does a good job for das/1. biosapiens registry > is built on Andreas' registry. How many are up, which requests they > support, the data the server. Very nice. > > ap: Gregg's coverage was good. Also gave a very good advertisement for > das/2! > > gh: the das/1 to das/2 transformational proxy was quite > popular. doesn't take advantage of das/2 power, but gets people started. > > Other Topics: > -------------- > sc: biodas.org wiki is now officially up. > gh: mentioned to Tim Hubbard. He said, "I know. I already edited it." > > sc: globalseqids page needs das2xml snippets for coordinates. > > [A] lincoln will add das2xml coordinate snippets to globalseqids page on > wiki > > sc: might also be good to have notice of the next teleconf on the > site. Maybe pointers to the notes as well. > gh: maybe have an automatic email sent out reminding folks? > sc: maybe not, if we have a list of the dates for upcoming meetings on > the site. > > [A] Steve post list of dates of upcoming DAS/2 teleconferences on wiki > > Next meeting in two weeks: 19 mar 2007 > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Steve_Chervitz at affymetrix.com Mon Mar 19 17:47:57 2007 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 19 Mar 2007 10:47:57 -0700 Subject: [DAS2] Notes from the biweekly DAS/2 teleconference, 19 Mar 2007 Message-ID: Notes from the biweekly DAS/2 teleconference, 19 Mar 2007 $Id: das2-teleconf-2007-03-19.txt,v 1.2 2007/03/19 17:46:41 sac Exp $ Teleconference Info: * Schedule: Biweekly on Monday * Time of Day: 9:30 AM PST, 17:30 GMT * Dialin (US): 800-531-3250 * Dialin (Intl): 303-928-2693 * Toll-free UK: 08 00 40 49 467 * Toll-free France: 08 00 907 839 * Conference ID: 2879055 * Passcode: 1365 Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt CSHL: Lincoln Stein Note taker: Steve Chervitz Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/ and are viewable on-line at http://biodas.org/documents/das2/notes/ Instructions on how to access the DAS/2 CVS repository are at http://www.biodas.org/wiki/DAS/2#CVS_Access DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * General issues * Status reports, including report from Lincoln on hapmap and das2 * Gregg's post-grant status * IGB support post-March Topic: General Issues ---------------------- ls: Regarding the coordinate stuff for global seq ids, need clarification (see me message on list). gh: for each release we should have the xml snippet for the coordinates, four attribs for authority, etc. so people can see directly what they need to provide in their DAS/2 request. [A] gregg will send global seq ID coordinate XML example to Lincoln Topic: Status reports ---------------------- gh: working on getting good reporesentations of graphs for Affy das/2 server serving up tiling array data. Serving up slices of graphs. Working well on my test server, better than expected. Slow thing is the indexing the first time it sees a file. Chrm1 at 5bp resolution tiling array data, 120M data points, slicing indexing takes a couple of seconds the first time, other times there's no delay. this is serving up in an optimized format. Need to serve in std das/2 format with a feature per data point. Not too hard. Planning to deploy in April when Steve gets new server running. Drosophila time-course public data. 8-9 time points RNA expression tiling arrays. When phase 3 ENCODE paper comes out, we'll have a pointer to our server for viewing that data. Also need to beef up feat filter queries to support full spec on the Affy das/2 server. transition IGB from using quickload and replace all quickload stuff with das/2, so we don't need to maintain two code bases and data respositories. ls: hapmap das/2 server is up and running. temporarily at Brian Gilman's consultancy business. He's coming here to CSHL to get a permanent version running on hapmap.org by next week. There's a whole API for accessing that data in the form that's required by NCI's caBIO project (caCORE). After server goes up, I'll point coordinates that location, documentation. It works with other das/2 sources as well, (Affy, biopackages). gh: So it will put any of that DAS-available data into caCORE object model? ls: yes. It also can give data as DOM models, might be easier for some users/apps. gh: Rolling this into the next caBIO release? ls: yes. ls: Will provide snp's and haplotype blocks as features. one track per population. we can put as many tracks in as you need. Just one set now. There are 4 populations grouped into three panels, since two pop's don't have enough diffs to break them out. [A] lincoln send gregg pointer to current hapmap server for testing sc: Working on configuring the new affy das/2 public server, a replacement machine with a lot more RAM than current box. Have been busy with other Affy work (new Netaffx release, new product support, etc.) but should be mostly done with this by end of March. Should be able to devote some solid blocks to DAS work (target: 3wks). Plan is to support as many Affy products as we can. Less focus on supporting UCSC-provided annotations (since they're the best source for them). sc: Gregg, have you considered using the same approach for serving annotations by your das/2 server as you are doing to support graphs? Could ease memory requirements. gh: possible, but not practical, since it would require a new format for every feature type. Graphs are relatively straightforward to serve up via an indexing strategy. Doing something similar for features would mean essentially writing a database app. Other Items: ------------- gh: grant admin says our burn rate is lower than anticipated. we can apply for a no-cost extension. should last at least till the end of June as for funding. We'll apply for that. Not sure what it means for CSHL. last time it took 3-4 mos to sort it out. ls: start working on it now. there were communication problems in the past. would be great if Allen could extend another month or two. gh: Andrew will come visit me in the next day or two. Will get the latest from him. He's been working on the transformational das1-> das2 proxy. Want to get the Ensembl people to use it ASAP. [A] get a usable das1->das2 proxy server, deploy at Ensembl gh: Need to look at how to support scores in das/2. we dropped score element. You can add arbitrary elements to das/2. You can put in multiple diff scores that way, or use XML namespaces to bring in a das/2 score element. Want to have a recommended way of doing this. Need more input from others. In Europe they're using score a lot more than here in the States. [A] come up with recommended way to support scores in DAS/2 Topic: Gregg's agenda ---------------------- gh: I am planning to leave Affy at end of the grant. Will focus on doing hands-on DAS/2 evangelism, ideally work with UCSC. Then will take some time off. Affy wasn't interested in supporting das w/o some outside funding. Therefore, it's a good time to transition. Regarding UCSC? ready to go down there and write some code. They have a das/1 server, they just need someone with DAS/2 expertise that I can provide. biggest prob with das/2 is adoption outside of the grant people. sc: considered using Andrew's proxy? gh: might be OK for a temporary solution, but it wouldn't be as efficient as directly supporting das/2, and I know Jim et al are interested in efficiency. Since I'm in the area, I can help them get into DAS/2 directly, which would help with DAS/2 acceptance by the community. gh: Another goal was to have a DAS/2 paper ready and submitted before I leave, want to have a rough draft in april. Plan to submit to an open source journal: Biomedcentral, PLoS, or other. [A] Gregg will circulate draft of DAS/2 paper, draft in April. Topic: IGB Support ------------------- ee: Regarding IGB support, Affy is not supporting IGB after March, they are moving me to a different project. Support for IGB could return if there's enough interest. sc: how self-supporting is the igb community? ee: not much. gh: Ann Loraine has interest as do internal Affy users. sc: Sourceforge has a new wiki project that's in beta now, for adding a wiki to your project's web page. Could help make the IGB community self-supporting, on-line docs, FAQ, etc. I volunteered to participate, but haven't done anything with it yet. gh: IGB has a good user's guide now, thanks to Ed's recent update. ee: I'm also working on plugin interface and documenting the http API protocol, things that will make it easier for others to use IGB with other programs.