From Gregg_Helt at affymetrix.com Sun Oct 1 22:26:20 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Sun, 1 Oct 2006 19:26:20 -0700 Subject: [DAS2] No Monday teleconference this week -- switced to biweekly call Message-ID: Just wanted to remind everyone that we decided last month to switch from a weekly to a biweekly DAS/2 teleconference schedule. So the next DAS/2 conference call will be on Monday, October 9th at 9:30 AM PST. Conference phone #, US: 800-531-3250 Conference phone #, International: 303-928-2693 Conference ID: 2879055 Passcode: 1365 Thanks, Gregg From Steve_Chervitz at affymetrix.com Wed Oct 4 13:42:46 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Wed, 04 Oct 2006 10:42:46 -0700 Subject: [DAS2] Updated java runtimes for timezone change in 2007 Message-ID: Yes, the Bush administration's reach extends into the lives of Java developers, changing when DST starts and stops in 2007. Here's a link for updated Java runtimes for a variety of versions: http://java.sun.com/developer/technicalArticles/Intl/USDST/ This could be an issue for DAS, particularly for writeback. Some implementations may rely on consistent time-stamping, e.g., to determine which edit request was submitted first. May not make a difference within a server, but it would be an issue across multiple servers. Steve From Steve_Chervitz at affymetrix.com Mon Oct 9 13:30:42 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 09 Oct 2006 10:30:42 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 9 Oct 2006 $Id: das2-teleconf-2006-10-09.txt,v 1.1 2006/10/09 17:24:14 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * Status reports Topic: Status reports --------------------- gh: Funding thru end of may. shifting times around a bit here at affy. gh going up to a greater percentage during this period. going down to half time for next month due to house-related work. Focusing now on cleaning up impl of writeback on igb client. clean impl based on ideas sketched out at code sprint in Aug. Spec issue: ----------- gh: was there a resolution to the feature group assembly conversation on email thread. aday: died out. so the assumption is: no change. [A] Ask andrew about feature group assembly resolution, if any. ee: new release of IGB. bug fix then patch release. rapid turn around. Exposed need for more throurough testing. Specifying multiple urls for get more info links. sources for urls: track lines in psl/bed files. Also supporting das files (1 and probably 2) noticed: feature tag can give feat label and ID. IGB ignores these labels, because they seem to be attached to wrong thing. feat in das/1 is like 'exon' group is 'mrna'. it's the mrna we want the label on, not exon where the labels are on. gh: if people just label parent. names don't have to be unique. id is unique uri, name is displayed name. parser isn't looking into that now. [A] Ed will look into using feature name as label in IGB client sc: Installed updated das2_server code on affy the das/2 server (netaffxdas.affymetrix.com). Installed new, efficient version of exon array data for hg18 (Mar 2006) assembly on this server. (igb's 'Bprobe1' parser, generates new bp2 format files). Probe and probeset data loaded fine, but exon/transcript cluster data failed with exception about 'Probe_count is zero for '. gh: problem: the bp2 data format isn't designed for representing transcripts/exon just probe. problem in the part that generates the bp2 files. can take a look at that. [A] Gregg will look into steve's Bprobe1 parser error. Needs source gff. ee: Can you verify that the gff data you are loading doesn't have unmapped probes, probe sets? Some are not mapped after lifting from previous genome assembly. [A] Steve will remove unmapped objects in the source gff used for bp2 aday: working on UML for integrating the writeback and the read features. Also retrieval of dynamic features as well. Sent out example query. working on getting them all into a single model, determines what do do based on input query. will impl own block caching rather than apache caching. If I see a writeback coming in , can see which types have been modified, within each region. can fork off process to re-generate them after doing the writeback. will be a lot faster. Have a flowchart. partway through creating UML classes, functions, return types. Using poseidon. [A] Allen will distribute uml diagrams for das/2 modeling when ready gh: will locking be a part of that? aday: can make sure it's compatible. don't know how much of that to impl now. gh: useful to think about how to model that too. [A] Allen will include locking in his UML modelling. aday: flowchart is pretty generic. can be used by other servers. bo: no das work because of work on manuscript. started sourceforge project for das/2 assay "gyrax" (nee hyrax -- already taken at sf). The motivation for this project is to take the das/2 objects in igb and make them more generic. This project can host these objects. They could then be used for other apps (igb, gyrax, others). Mark Carlson in lab is working on the gyrax client. Could be a nice library for use by other apps, gui or not, that are built on top of a das server. gh: parts of the igb objects are tied into genometry model, a separate package also. but both of these could be separated from igb. ee: There was some email on genoviz forum where someone is writing an app based on old NGSDK objects, on the help forum on sourceforge. problems with >30,000 glyphs. advice: switch to efficient glyph versions (special drawing alg if children are too small to see). gh: Lots of caveats...There is code that hasn't been touched in a while. gh: question about hardware quote for UCLA [A] Allen will send gregg hardware quote for UCLA (<$5k) sc: status of hardware for affy das server upgrade? gh: plan to order end of oct, should have in place in first two weeks of nov. From allenday at ucla.edu Tue Oct 10 18:30:14 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 10 Oct 2006 15:30:14 -0700 Subject: [DAS2] biopackages server UML Message-ID: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com> Hi, I'm attaching my first draft for the UML of a server rewrite. Aside from all the spec churn, there are two main types of requests that need to be handled that spurred me to do this rewrite. The third reason I'm doing this is to rework the caching mechanism on the server. With the current code base there is a lot of custom table clustering and denormalization to get decent performance out of the Chado database. I did some experimenting (discussed in an earlier thread and on conf. calls) with a "tiling" or "block" caching strategy of cache that turns out to work really well, and I wanted to integrate that with the writeback functionality. 1) tighter integration of writeback, including locking. 2) configurability of feature types to be * dynamic (e.g. for on-the-fly gene prediction) * non-cacheable * cacheable 3) caching * segment range/type tiled caching * ability of writeback events to trigger cache flush events See attached UML. There is a .zuml file, you can view/edit with Poseidon, or if you need a .xml I can send another attachment. -Allen -------------- next part -------------- A non-text attachment was scrubbed... Name: das2_refactor.zuml Type: application/octet-stream Size: 34991 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: das2_refactor.png Type: image/png Size: 130013 bytes Desc: not available URL: From boconnor at ucla.edu Tue Oct 10 18:51:54 2006 From: boconnor at ucla.edu (Brian O'Connor) Date: Tue, 10 Oct 2006 15:51:54 -0700 Subject: [DAS2] biopackages server UML In-Reply-To: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com> References: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com> Message-ID: <452C240A.4060603@ucla.edu> Hi Allen, I have a few questions. * How does feature (and other data types) filtering take place? Does the controller passes info into read_features() in Das2::Model::Genome? Where is the actual filtering implementation? In Das2::Model::Genome::Feature? * Where will the SQL queries live? In the current implementation we have an object where many of the prepared statements live. Do you plan on using something similar here? Or will the SQL generally be embedded in Das2::Model::Record objects and Das2::Model::Genome::Chado? * For the Das2::Model::Record subclasses, should there be another layer of inheritance with a Das2::Model::Chado::Record object? In case you want additional data adapters for other DBs/flat files in the future? --Brian Allen Day wrote: > Hi, > > I'm attaching my first draft for the UML of a server rewrite. Aside > from all the spec churn, there are two main types of requests that need > to be handled that spurred me to do this rewrite. The third reason I'm > doing this is to rework the caching mechanism on the server. With the > current code base there is a lot of custom table clustering and > denormalization to get decent performance out of the Chado database. I > did some experimenting (discussed in an earlier thread and on conf. > calls) with a "tiling" or "block" caching strategy of cache that turns > out to work really well, and I wanted to integrate that with the > writeback functionality. > > 1) tighter integration of writeback, including locking. > 2) configurability of feature types to be > * dynamic (e.g. for on-the-fly gene prediction) > * non-cacheable > * cacheable > 3) caching > * segment range/type tiled caching > * ability of writeback events to trigger cache flush events > > See attached UML. There is a .zuml file, you can view/edit with > Poseidon, or if you need a .xml I can send another attachment. > > -Allen > > ------------------------------------------------------------------------ > From dalke at dalkescientific.com Mon Oct 23 12:19:03 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 23 Oct 2006 17:19:03 +0100 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006 In-Reply-To: References: Message-ID: On Oct 9, 2006, at 6:30 PM, Steve Chervitz wrote: > [A] Ask andrew about feature group assembly resolution, if any. As far as I know there was no resolution. At last standing the problem is as follows. Consider a complex annotation with a single parent A and a single child B. There are several ways to represent this Option 1: This is the current spec. Parents point to children and children to parents. This was different than the GFF-style where only the children have a parent reference. My hope was to assemble complex annotations while reading the data from the remote server. In practice this streaming assembly proved hard to implement. The algorithm is non-trivial for complex structures so most people will do the assembly only after reading all features. Also, there's a possible error when parents don't list all children or vice versa, and likely most clients won't fully validate, so a top-down and a bottom-up assembly may give different results for the same server. Option 2: This is the GFF-style. The main limitations are support for streaming data, such as showing partial results while downloading and converting to/from other formats. In both cases this is because parent nodes may (and do) occur after children nodes, and there's no knowledge that all children have been seen. There is a problem in both option1 and option2 of not easily detecting cycles or multi-rooted structures. Variation: require that children are listed after parents. Option 3: That is, put all features which are part of the same feature group into a single element. This is essentially like the ### "no forward references" token in GFF3. It's cumbersome because either there are two data types ("FEATURE-GROUP" and "FEATURE") elements under the root or there are a lot of FEATURE-GROUPs containing a single sequence. There's still the need for cycle detection and checking that the parent/part relationship are valid. Option 4: Break the DAG into a tree structure (a spanning tree). In this case "B" is a child of "A". For a more complex structure where "C" is a child of "A" and "B", This doesn't fit well with relational databases. There's still the need to check for cycles but it's much simpler. Given the feedback I've heard, the use cases for streaming the data are not seen as important. Hence I'm willing to go with #2 (GFF-style, children point to parents) and have nothing like the no-forward-references of GFF3. Andrew dalke at dalkescientific.com From lstein at cshl.edu Mon Oct 23 10:01:01 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 23 Oct 2006 10:01:01 -0400 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006 In-Reply-To: References: Message-ID: <6dce9a0b0610230701q1898dc79wa3a3ff56814ff37e@mail.gmail.com> Hi Folks, I'm going to miss today's conference call again. I've been scheduled to interview a job candidate and I can't change it. Lincoln On 10/9/06, Steve Chervitz wrote: > > Notes from the weekly DAS/2 teleconference, 9 Oct 2006 > > $Id: das2-teleconf-2006-10-09.txt,v 1.1 2006/10/09 17:24:14 sac Exp $ > > Note taker: Steve Chervitz > > Attendees: > Affy: Steve Chervitz, Ed Erwin, Gregg Helt > UCLA: Allen Day, Brian O'connor > > Action items are flagged with '[A]'. > > These notes are checked into the biodas.org CVS repository at > das/das2/notes/2006. Instructions on how to access this > repository are at http://biodas.org > > DISCLAIMER: > The note taker aims for completeness and accuracy, but these goals are > not always achievable, given the desire to get the notes out with a > rapid turnaround. So don't consider these notes as complete minutes > from the meeting, but rather abbreviated, summarized versions of what > was discussed. There may be errors of commission and omission. > Participants are welcome to post comments and/or corrections to these > as they see fit. > > > Agenda > ------- > * Status reports > > > Topic: Status reports > --------------------- > gh: Funding thru end of may. shifting times around a bit here at > affy. gh going up to a greater percentage during this period. > going down to half time for next month due to house-related work. > > Focusing now on cleaning up impl of writeback on igb client. clean > impl based on ideas sketched out at code sprint in Aug. > > Spec issue: > ----------- > gh: was there a resolution to the feature group assembly conversation > on email thread. > aday: died out. so the assumption is: no change. > > [A] Ask andrew about feature group assembly resolution, if any. > > > ee: new release of IGB. bug fix then patch release. rapid turn > around. Exposed need for more throurough testing. > Specifying multiple urls for get more info links. sources for urls: > track lines in psl/bed files. Also supporting das files (1 and > probably 2) > noticed: feature tag can give feat label and ID. IGB ignores these > labels, because they seem to be attached to wrong thing. feat in das/1 > is like 'exon' group is 'mrna'. it's the mrna we want the label on, > not exon where the labels are on. > > gh: if people just label parent. names don't have to be unique. id is > unique uri, name is displayed name. parser isn't looking into that now. > > [A] Ed will look into using feature name as label in IGB client > > > sc: Installed updated das2_server code on affy the das/2 server > (netaffxdas.affymetrix.com). Installed new, efficient version of exon > array data for hg18 (Mar 2006) assembly on this server. (igb's 'Bprobe1' > parser, generates new bp2 format files). Probe and probeset data > loaded fine, but exon/transcript cluster data failed with exception > about 'Probe_count is zero for '. > > gh: problem: the bp2 data format isn't designed for representing > transcripts/exon just probe. problem in the part that generates the > bp2 files. can take a look at that. > > [A] Gregg will look into steve's Bprobe1 parser error. Needs source gff. > > ee: Can you verify that the gff data you are loading doesn't have > unmapped probes, probe sets? Some are not mapped after lifting from > previous genome assembly. > > [A] Steve will remove unmapped objects in the source gff used for bp2 > > > aday: working on UML for integrating the writeback and the read > features. Also retrieval of dynamic features as well. Sent out example > query. working on getting them all into a single model, determines > what do do based on input query. > > will impl own block caching rather than apache caching. > If I see a writeback coming in , can see which types have been > modified, within each region. can fork off process to re-generate them > after doing the writeback. will be a lot faster. > > Have a flowchart. partway through creating UML classes, functions, > return types. Using poseidon. > > [A] Allen will distribute uml diagrams for das/2 modeling when ready > > gh: will locking be a part of that? > aday: can make sure it's compatible. don't know how much of that to > impl now. > gh: useful to think about how to model that too. > > [A] Allen will include locking in his UML modelling. > > aday: flowchart is pretty generic. can be used by other servers. > > > bo: no das work because of work on manuscript. > started sourceforge project for das/2 assay "gyrax" (nee hyrax -- > already taken at sf). > The motivation for this project is to take the das/2 objects in igb > and make them more generic. This project can host these objects. They > could then be used for other apps (igb, gyrax, others). Mark > Carlson in lab is working on the gyrax client. Could be a nice > library for use by other apps, gui or not, that are built on top of a > das server. > > gh: parts of the igb objects are tied into genometry model, a separate > package also. but both of these could be separated from igb. > > ee: There was some email on genoviz forum where someone is writing an > app based on old NGSDK objects, on the help forum on > sourceforge. problems with >30,000 glyphs. advice: switch to efficient > glyph versions (special drawing alg if children are too small to see). > > gh: Lots of caveats...There is code that hasn't been touched in a > while. > > gh: question about hardware quote for UCLA > > [A] Allen will send gregg hardware quote for UCLA (<$5k) > > sc: status of hardware for affy das server upgrade? > > gh: plan to order end of oct, should have in place in first two weeks > of nov. > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Steve_Chervitz at affymetrix.com Mon Oct 23 21:17:46 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 23 Oct 2006 18:17:46 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 23 Oct 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 23 Oct 2006 $Id: das2-teleconf-2006-10-23.txt,v 1.1 2006/10/24 01:15:21 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Gregg Helt, Ed Erwin UCLA: Allen Day Dalke Scientific: Andrew Dalke Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * Status reports * Spec discussion Status Reports --------------- [Note: lots of digressions within status reports] ad: Have been looking at how Tim Hubbard's group is using das/1. gh: you are acting as our proxy to the uk group. gh: andreas has been working on das registry. ad: yes, in use for both das/1 and 2 servers. gh: am interested in his work to ping servers to test for live-ness. gh: see my response on das discussion list to Brian Gilman's message. where to find das/2 servers to hit on. biopackages was not giving correct answers for sources query. ee: was true two weeks ago. aday: just a bug. gh: we need to get both servers fixed. need an automated way to figure out when servers are down, such as what andreas is doing with das/1. [A] Andrew will ask Andreas about live-ness test for das/2 as well. gh: andrew's validator could be scripted to do this, too. gh: your validator is not running, btw. ad: server rebooted, not set up to restart automatically. [A] andrew will see that his validator server is up (done). gh: affy server is serving up incorrect xml base now. code is set up to allow which xml base to use. [A] steve will fix xml base on affy server gh: need to use four arg version: port, data dir, email for maintainer, xml:base without xml:base, everything goes screwy gh: Andrew's validator should catch this since xml:base resolution of capabilities would resolve to local host which would throw an error. ad: yes. gh: Andrew: you are focusing on das now? ad: this week at EBI, then next month focusing on DAS work. Status (continued) ------------------- gh: this week - distracted by igb issues, also on 1/2 time this month, so no new das work to report. ee: gff3 parser, got feedback from lincoln. adding support for track lines, several of our parsers there is a diff between the way igb puts things into tracks and the way the ucsc browser puts things into tracks. in igb: we put thing into tracks based on source field. so one file can lead to multiple tiers. in ucsc: everything below track line goes into one track. Soln: if there are track lines, do it the way UCSC does it. Otherwise, do it the igb way. Also worked on coloring by score (affects gff, ed, and one other). Makes it similar to ucsc. Assumption is white background. It is rigged to be based on normal foreground and background colors. white = ucsc Also participated in the java "ask the experts" thing: asked about swing, but they didn't answer. gh: das2 style sheets? ee: yes, how free am I to change that spec? ad: go for it. ee: don't want spec to say you need to use certain shaped glyphs -- hard to support. just simple things - colors, labels. ad: asked uk folks about style sheets, they haven't done anything. gh: gbrowse (lincoln) uses style sheets for das/1. ee: the stuff in das/2 come from das/1? ad: yes, with some changes. ee: also need to do documentation. sc: worked on added data for currently unsupported arrays on the Affy DAS/1 server to the quickload directory. Got some requests for mouse assembly aug 2005, RG-U34 rat arrays. Didn't update the annots.txt yet, so IGB users won't know they are available. [A] steve will update affy quickload annots.txt sc: ideally, this should be automated. gh/ee: could possibly have IGB detect these without needing to update an extra file. But there was no standard way to read directory contents. gh: chp files have no genomic location for probe sets, so igb needs to look this up, likely via das/2 server. primary way for people to look at results in igb. sc: did some work on loading exon array annotations into das/2 server using gregg's new bp2 format (reported last time). Didn't see any justification for the "probeset with zero probes" error it threw. [A] gregg and steve will look into bp2 format parsing issues [A] gregg will put in order for new hardware for affy das server aday: porting gff3 into writeback server as an alt format for loading data in. Email thread with Ed - ambiguities in the gff3 specification [A] Allen will forward email to list. aday: some communication with lincoln's group, re: validator. I need to create some sample gff3 docs to make sure validator can parse them all. will adding support to parser in bioperl (likely). Re: alignments: target and source have to be stranded, length of one have to be equal to or less than the one it's aligned to, etc. No work on server uml. hold off until spec is finalized before committing to uml model. Eg., fasta response not mentioned, broken hyperlinks, no response from Andrew. gh: fasta? aday: refered to but not described. properties response mentioned but not described. fasta has been replaced by segments, properties gone. See email on list. sc: sequence retrieval command used to return fasta format, hence the fasta request. this has been replaced with segments, but spec not updated. gh: property capability? aday: yes. not sure how to proceed yet. [A] Andrew will fix/respond to issues raised by Allen. gh: another spec issue: last code sprint I didn't like semantics of range feature filters, I eventually caved to majority. caveat: I wanted an optional attrib in types doc to say: "here's a type but you can or cannot use it in search filter." I.e., optionally restrict which types you can use in those filters. If false, it indicates to client it shouldn't use it as a searchable thing. ad: if it does anyway? gh: server could throw an error ad: or not return any results of that type? gh: ok ad: reason for this? is there a better word than 'searchable'? w/r/t the problem domain. gh: the reason: I want people to search for 'genscan transcripts' not 'genscan exon' because of how we decided to do range queries. ad: not sure why someone would want to do this. gh: it was agreed on at last code sprint... [A] gregg will write up use case for range feature filters underlying his need ad: Regarding parent and child bidirectional feature pointers: I'm willing to say that there's no need to assemble features dynamically on streaming approach. so we can get rid of parent or child relationship. make it more like gff3 to have parent link only. gh: worried about not having full closure. could get parents that don't know about child. if you have child, do you then have to have every parent in the response? ad: I thought we required it? if there is a feature then all features in that group must be returned. ee: never a fan of specifying both parents and children. can lead to mistakes - not compatible. andrew says parsing is more difficult... ad: when processing input you know when done with a feature group. this is useful. if no one impls it why have the overhead? ee: impl doesn't seem difficult gh: my impl doesn't catch cycles. still have to do cycle check regardless if it was bi-directional. ad: can't find a simple algorithm for doing it. gh: keep children around. check if tree is complete. bidirectionality allows me to crawl tree. ad: you don't check for cycles or multiply rooted trees. ee: just assume there are not such problems. ad: I don't like bogus data. ee: my gff3 parsing, I wait until end to assemble things. ad: as mine does, too. worried about extra fields means more possibilities of breaking things. bad data. ee: should be able to detect bad data. ad: duplicate links means you can't assemble from one but not other. most people will not check both. gh: main justification was to get complete feats before end of doc. lincoln was the one who wanted this ability. ad: several ways to do it. eg. contained feature elements with all children, spanning tree, etc. ee: catching loops is hard, need to wait till end. gh: let's wait till lincoln comes in. [A] Everyone will revisit bidirectional parent-child pointers with Lincoln Other issues: ------------- ad: Regarding Brian's question from email, the xml document he sent. gh: my reply: document was otherwise correct but xml:base was wrong. ad: also: lowercase close types element at end. ad: know anything about brian's deadline mentioned by lincoln? gh: no. [A] Someone will send Brian pointer to Andrew's validator. ee: das/2 impl is not usable by igb now. need to fix top-level document. gh: we really need an automated way to know when server is having problems. gh: conf call with Andreas and other's in UK? can set up a conf call to talk about registry. Also coordinate mapping - when one system is the same as the other. ties into registry stuff. [A] Gregg/Andrew maybe will have conf call with Andreas while Andrew is in UK From dalke at dalkescientific.com Tue Oct 24 05:17:58 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 24 Oct 2006 10:17:58 +0100 Subject: [DAS2] das2 diagrams, questions In-Reply-To: <5c24dcc30609210030k5324378fy18990dc41a1f1b1e@mail.gmail.com> References: <5c24dcc30609210030k5324378fy18990dc41a1f1b1e@mail.gmail.com> Message-ID: <57ca007c161fd08f104c8bb87e4127ac@dalkescientific.com> Allen: > I have a few questions, mostly targeted at Andrew, regarding the > current > HTML version of the spec on the biodas.org site. It hasn't been > updated in > about 5 months, and looks pretty out of date. Strange. The last changes were in August. > * Is the HTML document in sync with the "new_spec.txt" document in CVS? It should not be. That was a text document I was working on back in Jan/Feb as part of the updated to the current version of the spec. I've removed it from CVS. (Even though I know it's CVS, my fingers keep typing "svn" :) > * There is mention of a "fasta" command, and its fragment is linked > from the > ToC of the genome retrievals document, but it does not appear in the > document. Does this command exist? My understanding from conference > calls > is that the sequence/fasta/segment/dna stuff has all merged into the > "segment" response. Is this correct? That is correct. There is a segments request. Passing "format=fasta" to a segment request returns the sequence in FASTA format. I didn't catch that line when I was doing the changes. I've removed it from CVS. > * The "property" command seems to have disappeared. Is that correct? > Are > property keys no longer URIs? Also the "prop-*" feature filters could > be > better described, it is not clear to me if they are meant as some sort > of > replacement for the property command. The property command has disappeared. Notes are at das2-teleconf-2005-11-28.txt It was replaced by two things. One is the key/value PROP table, which is meant to store simple string data. It should be considered to be user-editable, eg, as a property sheet. The "prop-*" commands are used to search that table. The other the non-DAS namespace'd XML extensions. For example, ... In this case there is no default search mechanism. Instead the server may declare that it implements a map-specific search extension to the DAS query language, or a new search interface, and clients which understand the extension can add support for it. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Oct 24 10:03:54 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 24 Oct 2006 15:03:54 +0100 Subject: [DAS2] XML-RPC based DAS2 validator Message-ID: <4c0809629f5d0e26693547964e86d6c9@dalkescientific.com> I've added an XML-RPC service to the DAS validator. Andreas will be able to use it to verify new DAS2 entries in his registry. The entry point to the XML-RPC server is http://cgi.biodas.org:8080/RPC2/ The trailing "/" is important - use ".../RPC" and the server will do an HTTP redirect to ".../RPC/", which not all XML-RPC clients understand. At present the server implement a single RPC method named "validate_url". It takes two positional fields. The first is the required URL to validate. The second is the optional document type to validate against. If not given then the server will attempt to guess. The response is a list of 2-element tuples. In each pair the first is the severity level and will be one of "info" "warning" "error" "fatal" "fatal" means the validator normally should not continue. I can override that, which I do in the XML-RPC service in order to generate more messages. "error" means the result does not meet the spec but the validator will continue checking, at least in the normal case. (That too is user-defined.) "warning" is for things which are suspicious but not wrong, like using "application/xml" instead of the DAS2 content-type, or having a uri field with an empty content. (This is legal; it refers to the document itself. It's just strange and likely indicates an error in the server.) The "info" is for niggling details, like that the server guess the document type (in the case of application/xml response) by looking at the tag for the top-level element. Here's an example in Python's interactive shell. I'll first make a proxy to the remote server >>> import xmlrpclib >>> server = xmlrpclib.Server("http://cgi.biodas.org:8080/RPC2/") then call the new method with a single parameter; the URL to validate. >>> server.validate_url("http://das.biopackages.net/das/genome/human/") [['info', "Assuming doctype of 'sources' based on Content-Type"]] That's a list with a single element containing the (severity, message) tuple. The info statement came because it guessed the document type based on the content-type from the server. I can specify the document type directly and skip that warning statement >>> server.validate_url("http://das.biopackages.net/das/genome/human/", "sources") [] Here's an example of validating a server with the wrong document type, to show what the error message look like. I've added newlines so the results aren't all on one string >>> server.validate_url("http://www.dasregistry.org/registry/das1/sources", "types") [['fatal', "Received Content-Type 'application/x-das-sources+xml', expected 'application/x-das-types+xml'."], ['fatal', "Expected element '{http://biodas.org/documents/das2}TYPES' but got '{http://biodas.org/documents/das2}SOURCES' at byte 41, line 2, column 2"], ['error', 'element "SOURCES" from namespace "http://biodas.org/documents/das2" not allowed in this context at byte 41, line 2, column 2']] >>> Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Wed Oct 25 13:42:32 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 25 Oct 2006 18:42:32 +0100 Subject: [DAS2] DAS2 validation service Message-ID: I've updated the DAS2 validation service a couple of ways. One was to improve the error handling, eg, point it to slashdot.org (not XML), slashdot.org/blahblah (404 - not found) or to blahblah.blah (host does not exist) and it reports an error instead of raising an exception. There was a problem of sorts with the XML-RPC server. I chose XML-RPC yesterday because I thought it would be dead simple to use in any environment. It's old, stable technology. Andreas tried a few Java XML-RPC clients and found there were various hard-to-resolve dependencies. Eg, the most modern one requires Java 1.5 but his system runs 1.4, and the older one requires some XML DOM parser which isn't included with the system and proved hard to track down. Rather than struggle to make that work, I've added a new HTTP interface for automated validation The URL is http://cgi.biodas.org:8080/validate_url It has a required parameter, "url", which is the URL to validate %curl 'http://cgi.biodas.org:8080/validate_url?url=http://slashdot.org/' It has an optional parameter "doctype" which is the document type to expect %curl 'http://cgi.biodas.org:8080/validate_url?\ url=http://das.biopackages.net/das/genome/human/;doctype=sources' In that last case there were no messages. The XML document is * A note about the doctype. If the server could not get the document then the validation will not have a doctype even if you gave it one. %curl 'http://cgi.biodas.org:8080/validate_url?url=http://slashdot.org; doctype=types' If you tell it the wrong doctype and it gets something in XML then it assumes the reponse is in the given doctype %curl 'http://cgi.biodas.org:8080/validate_url?url=http:// das.biopackages.net/das/genome/human/;doctype=types' If no input doctype is given then it will guess at the doctype based on analysis of what it got from the remote server %curl 'http://cgi.biodas.org:8080/validate_url?url=http:// das.biopackages.net/das/genome/human/' This XML should be easy for anyone to parse. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Oct 26 05:06:33 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 26 Oct 2006 10:06:33 +0100 Subject: [DAS2] stylesheets meeting Message-ID: <22090f570d5179afc3fe71a0768ed2ec@dalkescientific.com> I met yesterday afternoon with Andreas Prlic, Andreas Kahari and Eugene Kulesha to get information about their stylesheet needs. Ed said he would work more on the spec and this should provide some relevant information. We ended up talking about the stylesheet using a sort of CSS approach. There are selectors (feature uri, type uri, etc.) and properties (color, glyph shape, ...). Some of the properties inherit/cascade and others don't. There's nothing new in this; we talked about it during the 2nd sprint. The details of inheritance prove tricky. For example, consider [ Feature A ] ---- is of ---> [ Type 1 ] | contains | [ Feature B ] ---- is of ---> [ Type 2 ] where each feature and type has a style sheet. The property (say "color") for Feature B is determined first by the stylesheet for Feature B, then that of Type 2. If still not present, does it come from the parent(s) of Feature B and the parent's type? Given as that requires correct traversal in the face of multiple inheritance, I'll now argue "no". Even though this is an effectively solved problem in OO programming ("C3 method resolution order", from Dylan and also used in Python, Perl6, and others). It's complex enough to make it unjustifiable. The selectors people wanted are: - the feature type, based on its uri - the feature itself, based on its uri - view type, that is, "2D" vs "3D". Akin to "screen", "paper, in CSS. Andreas P's DAS-based structure viewer uses very different stylings ("ribbon", "vdw") than sequence. Note: only "and" selections are requested. There seems to be no need for selection like "features of type T1 which are descended from feature F2" Other possibilities are: - selectors based on the type ontology uri - application-specific styles (but this is probably handled best through properties and not though a selector; on the other hand, it would enable workarounds for app-specific bugs) - level of detail (but Eugene didn't even know this option existed in DAS1, so perhaps it's not needed for DAS2) - support for overrides in case of stylesheet conflicts (user overrides server overrides application, most recent definition overrides previous) For the view and the application selectors a space separated list seems reasonable, as view="2D 3D" ... color as yellow meaning that for 2D and 3D to draw the feature in yellow. Or just leave out the selector. One question was how to find the stylesheet. They can be listed in the SOURCES document but I was thinking they could also be listed in the FEATURES response, as Another question is the format of that selection language. That was quickly answered: "in XML". I brought up Ed's comment about (if I understand correctly) making the shape language a bit more abstract. For example, in DAS1 there's a GLYPH called "PRIMERS", while the others are names like "EX" and "ARROW". The general view is that this level of abstraction isn't useful. Andreas Prlic summarized it nicely as (reworded) "the goal of a stylesheet is to make thing concrete". Though perhaps an SVG-style set of drawing commands may be useful. That said, there may be a few things which need a more domain-specific name. The example which came up is in color. EBI has "contig blue" as a color name. Are there other colors like that? On the topic of colors, the desired colors are the CSS color names (though in-house they also have the X11 names) and the CSS-style #color #selection, as #0FF for cyan. The #RGB and #RRGGBB color names are sufficient. Other CSS variation, like rgb(255, 0, 0) and rgb(10%, 45%, 82%) are not needed. In the meeting I mentioned alpha/opacity values in CSS as #RGBA and #RRGGBBAA. In writing these notes up I see that CSS does not support that syntax. Alpha is a "wouldn't it be cool if .." feature and not one which is needed or specifically requested. I outlined support for more complex font information for DAS2. Feedback here say that's not important. There's no desire to change the font size, style, etc. Nor desire for super/subscript, underscore, italics, bold, condensed, etc. I asked about standardizing the drawing model so there is more consistency between different viewers. For example, if there is a glyph and a piece of text, where is the text drawn in relationship to the glpyh? Does the height of the glyph include both? There was no desire for this. On the other hand, a current user-specified option is where to draw the text, which corresponds to a stylesheet override. What they want is support for plots and color gradients. See the "Gradient" and "TilingArray" entries at http://www.ensembl.org/Homo_sapiens/contigview?conf_script=contigview; vc_start=25422500;vc_end=25447499;region=17; add_das_source=(name=Gradient+url=http://das.ensembl.org/ das+dsn=hydraeuf_00001350+type=ensembl_location_chromosome+stylesheet=y+ score=c+fg_merge=a+fg_grades=50+fg_data=l+fg_max=310+fg_min= -143+active=1);add_das_source=(name=TilingArray+url=http:// das.ensembl.org/ das+dsn=hydraeuf_00001350+type=ensembl_location_chromosome+stylesheet=y+ score=s+fg_merge=m+active=1 I can think of several ways to handle that. One is to declare a feature for the entire chromosome, as and viewers can use some agreed upon protocol to get the right data from somewhere/else. Another is R0lGODdhOABkAPMAABq15RaU14za5O3391660P////7//vz+/gAAAOCP4XJgv// 10AAAAACP4XRM j+F1ICwAAAAAOABkAAAE/xDJSau9OGtZuv9gKI4kyZVoqorn6r4sAs90S9+pje8x7/e/ YEcn3BGL tyNyply+ms4VNJqTUZPWKzOrfXK70i+4OvaWXdOzJ60usNXvc7w8H9fB925eu7/ 2qX9RgU6DS4VI h0WJQotBjT+PPpE8k0ZibSGVOJpYlm5DkJdhblaiMH1ZFKGba6Cmo1gTQ4GNspuvSZS4mJw1 u229 W5gowae/ OwYDBmZUAwIBBAMHsE7OANcA0dPExzMHzwAB4tkC2ybdMALX4uMBAgTmQEXJ7AHW2NnK I48D9QIGyQhgczdgHzoVBwiwK+dhgMAA5OKtOZiin7h/IL4pJCgvyDd3EtQ/ HJBm0Em8Ac4KqqiU 7Nm4aGSqqau3juG5ag8hEpwpoAQ/ gQNhFpipsuMPA+rWgfSQEEDPkkWahhP6YUC2kLOQOJyKtR8A rKS0PsQY4hk8qEK2Xiva8JrNTBRRfMNGVSO5m0je4SMANBtfsGGR2A23ly9buDIFvAPKV8Bh xCZR OlOMEvDEMwYOaPYZd0XmyYv5EngLgt9iiISvjU6JVgg4fAN1ulsGuYjFdkrZUS3dGYVLcY0d aybZ Oqpmy8WH8VaOl3lt5x+KMYMevTegDdiza7cQAQA7 with an agreed upon definition of how to interpret the in-line data. But for the entire genome this could be rather big. Another is to break it down into parts, as ... data for the first 10,000 bases ... ... data for the second 10,000 bases ... ... There is already the need for displaying images on the display, but the current use is to click on a point to bring up an image and not showing the image as a glyph. The current solution is a hack, embedding HTML in the NOTE field. Only a couple of HTML elements are supported. This can easily me moved into a property or a local extension in DAS2. If viewer does not understand one of the extensions, what does it display? There are two things in DAS1 which I don't know well enough to ask reasonable questions. One is the BUMP, which I think specifies if multiple glyphs of the same type may overlap. I think Eugene said they wanted more control over that, like limiting to at most 5 overlaps. Another is the GROUP, which in DAS1 was used to merge multiple feature types into a single track. Quoting from the DAS1 spec The canonical example is the CDS, exons and introns of a transcribed gene, which logically belong together. DAS1 has specialized stylesheet language for depicting groups. DAS2 uses hierarchical features instead. Does/can DAS2 do the right thing for depicting those? I think I've covered the major points. Please chime in if I've missed anything relevant. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Oct 26 09:46:24 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 26 Oct 2006 14:46:24 +0100 Subject: [DAS2] TYPE[@source] -> TYPE[@method] Message-ID: <4098539a2681ec2c3243e4008dac7855@dalkescientific.com> I would like to change the existing TYPE attribute of "source" and have it use a different attribute name. Its meaning conflicts with the other uses of "source" in DAS2. The best alternative is "method" because (I believe) it is supposed to store the same information as the corresponding DAS1 TYPE attribute. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Oct 27 15:56:27 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 27 Oct 2006 20:56:27 +0100 Subject: [DAS2] segments and types Message-ID: <91244d1fb88f2b49939a9d10f15d2b03@dalkescientific.com> A couple of observations about what I've seen in existing DAS1 servers. Nothing here concerns format changes. There are four different ways to handle segments: 1) Don't provide segment information "Our clients know the segment because of the id so they don't need a segments document" 2) use "size" (pre-DAS 1.0 spec) 3) use "start"/"stop" (DAS 1.0 spec) - with variations, like "0", "0" meaning the length is undefined (and even "1", "0", with a size="2", for one server!) 4) use a "version" field The last is mostly used for protein sequences, that I've seen. Its an aspect of #1 ("9pti" means "bovine pancreatic trypsin inhibitor structure from PDB") as an abstract identifier, with the version used to make it concrete ("with the update because the first release had a typo") I think it can be encapsulated in the uri scheme we now use because each version gets it own identifier, and since the client knows all versions there's no problem. The folks at EBI/Sanger (what's the correct collective term; Hinxton? Genome Campus?) know which servers provide which systems so many servers don't provide coordinates. In some cases, like rabbit, the server will generate about 120,000 segments, one for each scaffold. It takes quite some time (a minute or more) to generate the output. In theory this is static and can be precomputed by the server. For my own knowledge, when do people want the complete list of segments? When do they want the length? You, yes, you there, in front of the computer. When do you you want to use it? Let me stress -- this is not a request to change anything. I would like to know for my own sake, for writing the documentation, and for how much emphasis to put on this for the validation. As another observation, the Sanger/EBI servers also don't do much with the types document. Some don't even handle the request. Eugene said that no one had asked him to add it. It's there now (thanks Eugene). I think this is because most of their servers only had a single type and the solution was "display everything." They are running into difficulties with this for a few new servers and will be need type support, and type filter support soonish. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Oct 27 16:01:01 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 27 Oct 2006 21:01:01 +0100 Subject: [DAS2] das1->das2 proxy adapter Message-ID: As part of my effort to make sure DAS2 supports at least what DAS1 can do, and to simplify migration from DAS1 to DAS2, I have over this week developed a partial proxy adapter. It's a DAS2 server which translates the request then forwards it to a DAS1 server (including the "segment" and "overlaps" feature filters). It takes the results and reformats them into DAS2 format. I had used a template approach for this but that proved slow for for large responses. I rewrote the code so I generate the XML by hand, which also gives me a chance to put in a lot more validation code for DAS1. The goal there is to ensure that I catch all the extensions people added to DAS1. Andrew dalke at dalkescientific.com From ed_erwin at affymetrix.com Mon Oct 30 17:26:38 2006 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Mon, 30 Oct 2006 14:26:38 -0800 Subject: [DAS2] das1->das2 proxy adapter In-Reply-To: References: Message-ID: <45467C1E.1000705@affymetrix.com> Thanks Andrew, That sounds really useful. It might be nice to try to run the current NetAffx DAS/1 server through this translation and see what comes out the other end. How would we need to do that? Do we download your code and run it ourselves, or will you have some server that we can pass the data through? Ed Andrew Dalke wrote: > As part of my effort to make sure DAS2 supports at least what > DAS1 can do, and to simplify migration from DAS1 to DAS2, > I have over this week developed a partial proxy adapter. It's > a DAS2 server which translates the request then forwards it > to a DAS1 server (including the "segment" and "overlaps" > feature filters). > > It takes the results and reformats them into DAS2 format. I > had used a template approach for this but that proved slow for > for large responses. I rewrote the code so I generate the XML > by hand, which also gives me a chance to put in a lot more > validation code for DAS1. The goal there is to ensure that > I catch all the extensions people added to DAS1. > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > From Gregg_Helt at affymetrix.com Mon Oct 2 02:26:20 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Sun, 1 Oct 2006 19:26:20 -0700 Subject: [DAS2] No Monday teleconference this week -- switced to biweekly call Message-ID: Just wanted to remind everyone that we decided last month to switch from a weekly to a biweekly DAS/2 teleconference schedule. So the next DAS/2 conference call will be on Monday, October 9th at 9:30 AM PST. Conference phone #, US: 800-531-3250 Conference phone #, International: 303-928-2693 Conference ID: 2879055 Passcode: 1365 Thanks, Gregg From Steve_Chervitz at affymetrix.com Wed Oct 4 17:42:46 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Wed, 04 Oct 2006 10:42:46 -0700 Subject: [DAS2] Updated java runtimes for timezone change in 2007 Message-ID: Yes, the Bush administration's reach extends into the lives of Java developers, changing when DST starts and stops in 2007. Here's a link for updated Java runtimes for a variety of versions: http://java.sun.com/developer/technicalArticles/Intl/USDST/ This could be an issue for DAS, particularly for writeback. Some implementations may rely on consistent time-stamping, e.g., to determine which edit request was submitted first. May not make a difference within a server, but it would be an issue across multiple servers. Steve From Steve_Chervitz at affymetrix.com Mon Oct 9 17:30:42 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 09 Oct 2006 10:30:42 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 9 Oct 2006 $Id: das2-teleconf-2006-10-09.txt,v 1.1 2006/10/09 17:24:14 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * Status reports Topic: Status reports --------------------- gh: Funding thru end of may. shifting times around a bit here at affy. gh going up to a greater percentage during this period. going down to half time for next month due to house-related work. Focusing now on cleaning up impl of writeback on igb client. clean impl based on ideas sketched out at code sprint in Aug. Spec issue: ----------- gh: was there a resolution to the feature group assembly conversation on email thread. aday: died out. so the assumption is: no change. [A] Ask andrew about feature group assembly resolution, if any. ee: new release of IGB. bug fix then patch release. rapid turn around. Exposed need for more throurough testing. Specifying multiple urls for get more info links. sources for urls: track lines in psl/bed files. Also supporting das files (1 and probably 2) noticed: feature tag can give feat label and ID. IGB ignores these labels, because they seem to be attached to wrong thing. feat in das/1 is like 'exon' group is 'mrna'. it's the mrna we want the label on, not exon where the labels are on. gh: if people just label parent. names don't have to be unique. id is unique uri, name is displayed name. parser isn't looking into that now. [A] Ed will look into using feature name as label in IGB client sc: Installed updated das2_server code on affy the das/2 server (netaffxdas.affymetrix.com). Installed new, efficient version of exon array data for hg18 (Mar 2006) assembly on this server. (igb's 'Bprobe1' parser, generates new bp2 format files). Probe and probeset data loaded fine, but exon/transcript cluster data failed with exception about 'Probe_count is zero for '. gh: problem: the bp2 data format isn't designed for representing transcripts/exon just probe. problem in the part that generates the bp2 files. can take a look at that. [A] Gregg will look into steve's Bprobe1 parser error. Needs source gff. ee: Can you verify that the gff data you are loading doesn't have unmapped probes, probe sets? Some are not mapped after lifting from previous genome assembly. [A] Steve will remove unmapped objects in the source gff used for bp2 aday: working on UML for integrating the writeback and the read features. Also retrieval of dynamic features as well. Sent out example query. working on getting them all into a single model, determines what do do based on input query. will impl own block caching rather than apache caching. If I see a writeback coming in , can see which types have been modified, within each region. can fork off process to re-generate them after doing the writeback. will be a lot faster. Have a flowchart. partway through creating UML classes, functions, return types. Using poseidon. [A] Allen will distribute uml diagrams for das/2 modeling when ready gh: will locking be a part of that? aday: can make sure it's compatible. don't know how much of that to impl now. gh: useful to think about how to model that too. [A] Allen will include locking in his UML modelling. aday: flowchart is pretty generic. can be used by other servers. bo: no das work because of work on manuscript. started sourceforge project for das/2 assay "gyrax" (nee hyrax -- already taken at sf). The motivation for this project is to take the das/2 objects in igb and make them more generic. This project can host these objects. They could then be used for other apps (igb, gyrax, others). Mark Carlson in lab is working on the gyrax client. Could be a nice library for use by other apps, gui or not, that are built on top of a das server. gh: parts of the igb objects are tied into genometry model, a separate package also. but both of these could be separated from igb. ee: There was some email on genoviz forum where someone is writing an app based on old NGSDK objects, on the help forum on sourceforge. problems with >30,000 glyphs. advice: switch to efficient glyph versions (special drawing alg if children are too small to see). gh: Lots of caveats...There is code that hasn't been touched in a while. gh: question about hardware quote for UCLA [A] Allen will send gregg hardware quote for UCLA (<$5k) sc: status of hardware for affy das server upgrade? gh: plan to order end of oct, should have in place in first two weeks of nov. From allenday at ucla.edu Tue Oct 10 22:30:14 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 10 Oct 2006 15:30:14 -0700 Subject: [DAS2] biopackages server UML Message-ID: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com> Hi, I'm attaching my first draft for the UML of a server rewrite. Aside from all the spec churn, there are two main types of requests that need to be handled that spurred me to do this rewrite. The third reason I'm doing this is to rework the caching mechanism on the server. With the current code base there is a lot of custom table clustering and denormalization to get decent performance out of the Chado database. I did some experimenting (discussed in an earlier thread and on conf. calls) with a "tiling" or "block" caching strategy of cache that turns out to work really well, and I wanted to integrate that with the writeback functionality. 1) tighter integration of writeback, including locking. 2) configurability of feature types to be * dynamic (e.g. for on-the-fly gene prediction) * non-cacheable * cacheable 3) caching * segment range/type tiled caching * ability of writeback events to trigger cache flush events See attached UML. There is a .zuml file, you can view/edit with Poseidon, or if you need a .xml I can send another attachment. -Allen -------------- next part -------------- A non-text attachment was scrubbed... Name: das2_refactor.zuml Type: application/octet-stream Size: 34991 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: das2_refactor.png Type: image/png Size: 130013 bytes Desc: not available URL: From boconnor at ucla.edu Tue Oct 10 22:51:54 2006 From: boconnor at ucla.edu (Brian O'Connor) Date: Tue, 10 Oct 2006 15:51:54 -0700 Subject: [DAS2] biopackages server UML In-Reply-To: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com> References: <5c24dcc30610101530p3bb3b686p4055a813d0ad39ff@mail.gmail.com> Message-ID: <452C240A.4060603@ucla.edu> Hi Allen, I have a few questions. * How does feature (and other data types) filtering take place? Does the controller passes info into read_features() in Das2::Model::Genome? Where is the actual filtering implementation? In Das2::Model::Genome::Feature? * Where will the SQL queries live? In the current implementation we have an object where many of the prepared statements live. Do you plan on using something similar here? Or will the SQL generally be embedded in Das2::Model::Record objects and Das2::Model::Genome::Chado? * For the Das2::Model::Record subclasses, should there be another layer of inheritance with a Das2::Model::Chado::Record object? In case you want additional data adapters for other DBs/flat files in the future? --Brian Allen Day wrote: > Hi, > > I'm attaching my first draft for the UML of a server rewrite. Aside > from all the spec churn, there are two main types of requests that need > to be handled that spurred me to do this rewrite. The third reason I'm > doing this is to rework the caching mechanism on the server. With the > current code base there is a lot of custom table clustering and > denormalization to get decent performance out of the Chado database. I > did some experimenting (discussed in an earlier thread and on conf. > calls) with a "tiling" or "block" caching strategy of cache that turns > out to work really well, and I wanted to integrate that with the > writeback functionality. > > 1) tighter integration of writeback, including locking. > 2) configurability of feature types to be > * dynamic (e.g. for on-the-fly gene prediction) > * non-cacheable > * cacheable > 3) caching > * segment range/type tiled caching > * ability of writeback events to trigger cache flush events > > See attached UML. There is a .zuml file, you can view/edit with > Poseidon, or if you need a .xml I can send another attachment. > > -Allen > > ------------------------------------------------------------------------ > From dalke at dalkescientific.com Mon Oct 23 16:19:03 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 23 Oct 2006 17:19:03 +0100 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006 In-Reply-To: References: Message-ID: On Oct 9, 2006, at 6:30 PM, Steve Chervitz wrote: > [A] Ask andrew about feature group assembly resolution, if any. As far as I know there was no resolution. At last standing the problem is as follows. Consider a complex annotation with a single parent A and a single child B. There are several ways to represent this Option 1: This is the current spec. Parents point to children and children to parents. This was different than the GFF-style where only the children have a parent reference. My hope was to assemble complex annotations while reading the data from the remote server. In practice this streaming assembly proved hard to implement. The algorithm is non-trivial for complex structures so most people will do the assembly only after reading all features. Also, there's a possible error when parents don't list all children or vice versa, and likely most clients won't fully validate, so a top-down and a bottom-up assembly may give different results for the same server. Option 2: This is the GFF-style. The main limitations are support for streaming data, such as showing partial results while downloading and converting to/from other formats. In both cases this is because parent nodes may (and do) occur after children nodes, and there's no knowledge that all children have been seen. There is a problem in both option1 and option2 of not easily detecting cycles or multi-rooted structures. Variation: require that children are listed after parents. Option 3: That is, put all features which are part of the same feature group into a single element. This is essentially like the ### "no forward references" token in GFF3. It's cumbersome because either there are two data types ("FEATURE-GROUP" and "FEATURE") elements under the root or there are a lot of FEATURE-GROUPs containing a single sequence. There's still the need for cycle detection and checking that the parent/part relationship are valid. Option 4: Break the DAG into a tree structure (a spanning tree). In this case "B" is a child of "A". For a more complex structure where "C" is a child of "A" and "B", This doesn't fit well with relational databases. There's still the need to check for cycles but it's much simpler. Given the feedback I've heard, the use cases for streaming the data are not seen as important. Hence I'm willing to go with #2 (GFF-style, children point to parents) and have nothing like the no-forward-references of GFF3. Andrew dalke at dalkescientific.com From lstein at cshl.edu Mon Oct 23 14:01:01 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 23 Oct 2006 10:01:01 -0400 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 9 Oct 2006 In-Reply-To: References: Message-ID: <6dce9a0b0610230701q1898dc79wa3a3ff56814ff37e@mail.gmail.com> Hi Folks, I'm going to miss today's conference call again. I've been scheduled to interview a job candidate and I can't change it. Lincoln On 10/9/06, Steve Chervitz wrote: > > Notes from the weekly DAS/2 teleconference, 9 Oct 2006 > > $Id: das2-teleconf-2006-10-09.txt,v 1.1 2006/10/09 17:24:14 sac Exp $ > > Note taker: Steve Chervitz > > Attendees: > Affy: Steve Chervitz, Ed Erwin, Gregg Helt > UCLA: Allen Day, Brian O'connor > > Action items are flagged with '[A]'. > > These notes are checked into the biodas.org CVS repository at > das/das2/notes/2006. Instructions on how to access this > repository are at http://biodas.org > > DISCLAIMER: > The note taker aims for completeness and accuracy, but these goals are > not always achievable, given the desire to get the notes out with a > rapid turnaround. So don't consider these notes as complete minutes > from the meeting, but rather abbreviated, summarized versions of what > was discussed. There may be errors of commission and omission. > Participants are welcome to post comments and/or corrections to these > as they see fit. > > > Agenda > ------- > * Status reports > > > Topic: Status reports > --------------------- > gh: Funding thru end of may. shifting times around a bit here at > affy. gh going up to a greater percentage during this period. > going down to half time for next month due to house-related work. > > Focusing now on cleaning up impl of writeback on igb client. clean > impl based on ideas sketched out at code sprint in Aug. > > Spec issue: > ----------- > gh: was there a resolution to the feature group assembly conversation > on email thread. > aday: died out. so the assumption is: no change. > > [A] Ask andrew about feature group assembly resolution, if any. > > > ee: new release of IGB. bug fix then patch release. rapid turn > around. Exposed need for more throurough testing. > Specifying multiple urls for get more info links. sources for urls: > track lines in psl/bed files. Also supporting das files (1 and > probably 2) > noticed: feature tag can give feat label and ID. IGB ignores these > labels, because they seem to be attached to wrong thing. feat in das/1 > is like 'exon' group is 'mrna'. it's the mrna we want the label on, > not exon where the labels are on. > > gh: if people just label parent. names don't have to be unique. id is > unique uri, name is displayed name. parser isn't looking into that now. > > [A] Ed will look into using feature name as label in IGB client > > > sc: Installed updated das2_server code on affy the das/2 server > (netaffxdas.affymetrix.com). Installed new, efficient version of exon > array data for hg18 (Mar 2006) assembly on this server. (igb's 'Bprobe1' > parser, generates new bp2 format files). Probe and probeset data > loaded fine, but exon/transcript cluster data failed with exception > about 'Probe_count is zero for '. > > gh: problem: the bp2 data format isn't designed for representing > transcripts/exon just probe. problem in the part that generates the > bp2 files. can take a look at that. > > [A] Gregg will look into steve's Bprobe1 parser error. Needs source gff. > > ee: Can you verify that the gff data you are loading doesn't have > unmapped probes, probe sets? Some are not mapped after lifting from > previous genome assembly. > > [A] Steve will remove unmapped objects in the source gff used for bp2 > > > aday: working on UML for integrating the writeback and the read > features. Also retrieval of dynamic features as well. Sent out example > query. working on getting them all into a single model, determines > what do do based on input query. > > will impl own block caching rather than apache caching. > If I see a writeback coming in , can see which types have been > modified, within each region. can fork off process to re-generate them > after doing the writeback. will be a lot faster. > > Have a flowchart. partway through creating UML classes, functions, > return types. Using poseidon. > > [A] Allen will distribute uml diagrams for das/2 modeling when ready > > gh: will locking be a part of that? > aday: can make sure it's compatible. don't know how much of that to > impl now. > gh: useful to think about how to model that too. > > [A] Allen will include locking in his UML modelling. > > aday: flowchart is pretty generic. can be used by other servers. > > > bo: no das work because of work on manuscript. > started sourceforge project for das/2 assay "gyrax" (nee hyrax -- > already taken at sf). > The motivation for this project is to take the das/2 objects in igb > and make them more generic. This project can host these objects. They > could then be used for other apps (igb, gyrax, others). Mark > Carlson in lab is working on the gyrax client. Could be a nice > library for use by other apps, gui or not, that are built on top of a > das server. > > gh: parts of the igb objects are tied into genometry model, a separate > package also. but both of these could be separated from igb. > > ee: There was some email on genoviz forum where someone is writing an > app based on old NGSDK objects, on the help forum on > sourceforge. problems with >30,000 glyphs. advice: switch to efficient > glyph versions (special drawing alg if children are too small to see). > > gh: Lots of caveats...There is code that hasn't been touched in a > while. > > gh: question about hardware quote for UCLA > > [A] Allen will send gregg hardware quote for UCLA (<$5k) > > sc: status of hardware for affy das server upgrade? > > gh: plan to order end of oct, should have in place in first two weeks > of nov. > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Steve_Chervitz at affymetrix.com Tue Oct 24 01:17:46 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 23 Oct 2006 18:17:46 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 23 Oct 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 23 Oct 2006 $Id: das2-teleconf-2006-10-23.txt,v 1.1 2006/10/24 01:15:21 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Gregg Helt, Ed Erwin UCLA: Allen Day Dalke Scientific: Andrew Dalke Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda ------- * Status reports * Spec discussion Status Reports --------------- [Note: lots of digressions within status reports] ad: Have been looking at how Tim Hubbard's group is using das/1. gh: you are acting as our proxy to the uk group. gh: andreas has been working on das registry. ad: yes, in use for both das/1 and 2 servers. gh: am interested in his work to ping servers to test for live-ness. gh: see my response on das discussion list to Brian Gilman's message. where to find das/2 servers to hit on. biopackages was not giving correct answers for sources query. ee: was true two weeks ago. aday: just a bug. gh: we need to get both servers fixed. need an automated way to figure out when servers are down, such as what andreas is doing with das/1. [A] Andrew will ask Andreas about live-ness test for das/2 as well. gh: andrew's validator could be scripted to do this, too. gh: your validator is not running, btw. ad: server rebooted, not set up to restart automatically. [A] andrew will see that his validator server is up (done). gh: affy server is serving up incorrect xml base now. code is set up to allow which xml base to use. [A] steve will fix xml base on affy server gh: need to use four arg version: port, data dir, email for maintainer, xml:base without xml:base, everything goes screwy gh: Andrew's validator should catch this since xml:base resolution of capabilities would resolve to local host which would throw an error. ad: yes. gh: Andrew: you are focusing on das now? ad: this week at EBI, then next month focusing on DAS work. Status (continued) ------------------- gh: this week - distracted by igb issues, also on 1/2 time this month, so no new das work to report. ee: gff3 parser, got feedback from lincoln. adding support for track lines, several of our parsers there is a diff between the way igb puts things into tracks and the way the ucsc browser puts things into tracks. in igb: we put thing into tracks based on source field. so one file can lead to multiple tiers. in ucsc: everything below track line goes into one track. Soln: if there are track lines, do it the way UCSC does it. Otherwise, do it the igb way. Also worked on coloring by score (affects gff, ed, and one other). Makes it similar to ucsc. Assumption is white background. It is rigged to be based on normal foreground and background colors. white = ucsc Also participated in the java "ask the experts" thing: asked about swing, but they didn't answer. gh: das2 style sheets? ee: yes, how free am I to change that spec? ad: go for it. ee: don't want spec to say you need to use certain shaped glyphs -- hard to support. just simple things - colors, labels. ad: asked uk folks about style sheets, they haven't done anything. gh: gbrowse (lincoln) uses style sheets for das/1. ee: the stuff in das/2 come from das/1? ad: yes, with some changes. ee: also need to do documentation. sc: worked on added data for currently unsupported arrays on the Affy DAS/1 server to the quickload directory. Got some requests for mouse assembly aug 2005, RG-U34 rat arrays. Didn't update the annots.txt yet, so IGB users won't know they are available. [A] steve will update affy quickload annots.txt sc: ideally, this should be automated. gh/ee: could possibly have IGB detect these without needing to update an extra file. But there was no standard way to read directory contents. gh: chp files have no genomic location for probe sets, so igb needs to look this up, likely via das/2 server. primary way for people to look at results in igb. sc: did some work on loading exon array annotations into das/2 server using gregg's new bp2 format (reported last time). Didn't see any justification for the "probeset with zero probes" error it threw. [A] gregg and steve will look into bp2 format parsing issues [A] gregg will put in order for new hardware for affy das server aday: porting gff3 into writeback server as an alt format for loading data in. Email thread with Ed - ambiguities in the gff3 specification [A] Allen will forward email to list. aday: some communication with lincoln's group, re: validator. I need to create some sample gff3 docs to make sure validator can parse them all. will adding support to parser in bioperl (likely). Re: alignments: target and source have to be stranded, length of one have to be equal to or less than the one it's aligned to, etc. No work on server uml. hold off until spec is finalized before committing to uml model. Eg., fasta response not mentioned, broken hyperlinks, no response from Andrew. gh: fasta? aday: refered to but not described. properties response mentioned but not described. fasta has been replaced by segments, properties gone. See email on list. sc: sequence retrieval command used to return fasta format, hence the fasta request. this has been replaced with segments, but spec not updated. gh: property capability? aday: yes. not sure how to proceed yet. [A] Andrew will fix/respond to issues raised by Allen. gh: another spec issue: last code sprint I didn't like semantics of range feature filters, I eventually caved to majority. caveat: I wanted an optional attrib in types doc to say: "here's a type but you can or cannot use it in search filter." I.e., optionally restrict which types you can use in those filters. If false, it indicates to client it shouldn't use it as a searchable thing. ad: if it does anyway? gh: server could throw an error ad: or not return any results of that type? gh: ok ad: reason for this? is there a better word than 'searchable'? w/r/t the problem domain. gh: the reason: I want people to search for 'genscan transcripts' not 'genscan exon' because of how we decided to do range queries. ad: not sure why someone would want to do this. gh: it was agreed on at last code sprint... [A] gregg will write up use case for range feature filters underlying his need ad: Regarding parent and child bidirectional feature pointers: I'm willing to say that there's no need to assemble features dynamically on streaming approach. so we can get rid of parent or child relationship. make it more like gff3 to have parent link only. gh: worried about not having full closure. could get parents that don't know about child. if you have child, do you then have to have every parent in the response? ad: I thought we required it? if there is a feature then all features in that group must be returned. ee: never a fan of specifying both parents and children. can lead to mistakes - not compatible. andrew says parsing is more difficult... ad: when processing input you know when done with a feature group. this is useful. if no one impls it why have the overhead? ee: impl doesn't seem difficult gh: my impl doesn't catch cycles. still have to do cycle check regardless if it was bi-directional. ad: can't find a simple algorithm for doing it. gh: keep children around. check if tree is complete. bidirectionality allows me to crawl tree. ad: you don't check for cycles or multiply rooted trees. ee: just assume there are not such problems. ad: I don't like bogus data. ee: my gff3 parsing, I wait until end to assemble things. ad: as mine does, too. worried about extra fields means more possibilities of breaking things. bad data. ee: should be able to detect bad data. ad: duplicate links means you can't assemble from one but not other. most people will not check both. gh: main justification was to get complete feats before end of doc. lincoln was the one who wanted this ability. ad: several ways to do it. eg. contained feature elements with all children, spanning tree, etc. ee: catching loops is hard, need to wait till end. gh: let's wait till lincoln comes in. [A] Everyone will revisit bidirectional parent-child pointers with Lincoln Other issues: ------------- ad: Regarding Brian's question from email, the xml document he sent. gh: my reply: document was otherwise correct but xml:base was wrong. ad: also: lowercase close types element at end. ad: know anything about brian's deadline mentioned by lincoln? gh: no. [A] Someone will send Brian pointer to Andrew's validator. ee: das/2 impl is not usable by igb now. need to fix top-level document. gh: we really need an automated way to know when server is having problems. gh: conf call with Andreas and other's in UK? can set up a conf call to talk about registry. Also coordinate mapping - when one system is the same as the other. ties into registry stuff. [A] Gregg/Andrew maybe will have conf call with Andreas while Andrew is in UK From dalke at dalkescientific.com Tue Oct 24 09:17:58 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 24 Oct 2006 10:17:58 +0100 Subject: [DAS2] das2 diagrams, questions In-Reply-To: <5c24dcc30609210030k5324378fy18990dc41a1f1b1e@mail.gmail.com> References: <5c24dcc30609210030k5324378fy18990dc41a1f1b1e@mail.gmail.com> Message-ID: <57ca007c161fd08f104c8bb87e4127ac@dalkescientific.com> Allen: > I have a few questions, mostly targeted at Andrew, regarding the > current > HTML version of the spec on the biodas.org site. It hasn't been > updated in > about 5 months, and looks pretty out of date. Strange. The last changes were in August. > * Is the HTML document in sync with the "new_spec.txt" document in CVS? It should not be. That was a text document I was working on back in Jan/Feb as part of the updated to the current version of the spec. I've removed it from CVS. (Even though I know it's CVS, my fingers keep typing "svn" :) > * There is mention of a "fasta" command, and its fragment is linked > from the > ToC of the genome retrievals document, but it does not appear in the > document. Does this command exist? My understanding from conference > calls > is that the sequence/fasta/segment/dna stuff has all merged into the > "segment" response. Is this correct? That is correct. There is a segments request. Passing "format=fasta" to a segment request returns the sequence in FASTA format. I didn't catch that line when I was doing the changes. I've removed it from CVS. > * The "property" command seems to have disappeared. Is that correct? > Are > property keys no longer URIs? Also the "prop-*" feature filters could > be > better described, it is not clear to me if they are meant as some sort > of > replacement for the property command. The property command has disappeared. Notes are at das2-teleconf-2005-11-28.txt It was replaced by two things. One is the key/value PROP table, which is meant to store simple string data. It should be considered to be user-editable, eg, as a property sheet. The "prop-*" commands are used to search that table. The other the non-DAS namespace'd XML extensions. For example, ... In this case there is no default search mechanism. Instead the server may declare that it implements a map-specific search extension to the DAS query language, or a new search interface, and clients which understand the extension can add support for it. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Oct 24 14:03:54 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 24 Oct 2006 15:03:54 +0100 Subject: [DAS2] XML-RPC based DAS2 validator Message-ID: <4c0809629f5d0e26693547964e86d6c9@dalkescientific.com> I've added an XML-RPC service to the DAS validator. Andreas will be able to use it to verify new DAS2 entries in his registry. The entry point to the XML-RPC server is http://cgi.biodas.org:8080/RPC2/ The trailing "/" is important - use ".../RPC" and the server will do an HTTP redirect to ".../RPC/", which not all XML-RPC clients understand. At present the server implement a single RPC method named "validate_url". It takes two positional fields. The first is the required URL to validate. The second is the optional document type to validate against. If not given then the server will attempt to guess. The response is a list of 2-element tuples. In each pair the first is the severity level and will be one of "info" "warning" "error" "fatal" "fatal" means the validator normally should not continue. I can override that, which I do in the XML-RPC service in order to generate more messages. "error" means the result does not meet the spec but the validator will continue checking, at least in the normal case. (That too is user-defined.) "warning" is for things which are suspicious but not wrong, like using "application/xml" instead of the DAS2 content-type, or having a uri field with an empty content. (This is legal; it refers to the document itself. It's just strange and likely indicates an error in the server.) The "info" is for niggling details, like that the server guess the document type (in the case of application/xml response) by looking at the tag for the top-level element. Here's an example in Python's interactive shell. I'll first make a proxy to the remote server >>> import xmlrpclib >>> server = xmlrpclib.Server("http://cgi.biodas.org:8080/RPC2/") then call the new method with a single parameter; the URL to validate. >>> server.validate_url("http://das.biopackages.net/das/genome/human/") [['info', "Assuming doctype of 'sources' based on Content-Type"]] That's a list with a single element containing the (severity, message) tuple. The info statement came because it guessed the document type based on the content-type from the server. I can specify the document type directly and skip that warning statement >>> server.validate_url("http://das.biopackages.net/das/genome/human/", "sources") [] Here's an example of validating a server with the wrong document type, to show what the error message look like. I've added newlines so the results aren't all on one string >>> server.validate_url("http://www.dasregistry.org/registry/das1/sources", "types") [['fatal', "Received Content-Type 'application/x-das-sources+xml', expected 'application/x-das-types+xml'."], ['fatal', "Expected element '{http://biodas.org/documents/das2}TYPES' but got '{http://biodas.org/documents/das2}SOURCES' at byte 41, line 2, column 2"], ['error', 'element "SOURCES" from namespace "http://biodas.org/documents/das2" not allowed in this context at byte 41, line 2, column 2']] >>> Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Wed Oct 25 17:42:32 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 25 Oct 2006 18:42:32 +0100 Subject: [DAS2] DAS2 validation service Message-ID: I've updated the DAS2 validation service a couple of ways. One was to improve the error handling, eg, point it to slashdot.org (not XML), slashdot.org/blahblah (404 - not found) or to blahblah.blah (host does not exist) and it reports an error instead of raising an exception. There was a problem of sorts with the XML-RPC server. I chose XML-RPC yesterday because I thought it would be dead simple to use in any environment. It's old, stable technology. Andreas tried a few Java XML-RPC clients and found there were various hard-to-resolve dependencies. Eg, the most modern one requires Java 1.5 but his system runs 1.4, and the older one requires some XML DOM parser which isn't included with the system and proved hard to track down. Rather than struggle to make that work, I've added a new HTTP interface for automated validation The URL is http://cgi.biodas.org:8080/validate_url It has a required parameter, "url", which is the URL to validate %curl 'http://cgi.biodas.org:8080/validate_url?url=http://slashdot.org/' It has an optional parameter "doctype" which is the document type to expect %curl 'http://cgi.biodas.org:8080/validate_url?\ url=http://das.biopackages.net/das/genome/human/;doctype=sources' In that last case there were no messages. The XML document is * A note about the doctype. If the server could not get the document then the validation will not have a doctype even if you gave it one. %curl 'http://cgi.biodas.org:8080/validate_url?url=http://slashdot.org; doctype=types' If you tell it the wrong doctype and it gets something in XML then it assumes the reponse is in the given doctype %curl 'http://cgi.biodas.org:8080/validate_url?url=http:// das.biopackages.net/das/genome/human/;doctype=types' If no input doctype is given then it will guess at the doctype based on analysis of what it got from the remote server %curl 'http://cgi.biodas.org:8080/validate_url?url=http:// das.biopackages.net/das/genome/human/' This XML should be easy for anyone to parse. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Oct 26 09:06:33 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 26 Oct 2006 10:06:33 +0100 Subject: [DAS2] stylesheets meeting Message-ID: <22090f570d5179afc3fe71a0768ed2ec@dalkescientific.com> I met yesterday afternoon with Andreas Prlic, Andreas Kahari and Eugene Kulesha to get information about their stylesheet needs. Ed said he would work more on the spec and this should provide some relevant information. We ended up talking about the stylesheet using a sort of CSS approach. There are selectors (feature uri, type uri, etc.) and properties (color, glyph shape, ...). Some of the properties inherit/cascade and others don't. There's nothing new in this; we talked about it during the 2nd sprint. The details of inheritance prove tricky. For example, consider [ Feature A ] ---- is of ---> [ Type 1 ] | contains | [ Feature B ] ---- is of ---> [ Type 2 ] where each feature and type has a style sheet. The property (say "color") for Feature B is determined first by the stylesheet for Feature B, then that of Type 2. If still not present, does it come from the parent(s) of Feature B and the parent's type? Given as that requires correct traversal in the face of multiple inheritance, I'll now argue "no". Even though this is an effectively solved problem in OO programming ("C3 method resolution order", from Dylan and also used in Python, Perl6, and others). It's complex enough to make it unjustifiable. The selectors people wanted are: - the feature type, based on its uri - the feature itself, based on its uri - view type, that is, "2D" vs "3D". Akin to "screen", "paper, in CSS. Andreas P's DAS-based structure viewer uses very different stylings ("ribbon", "vdw") than sequence. Note: only "and" selections are requested. There seems to be no need for selection like "features of type T1 which are descended from feature F2" Other possibilities are: - selectors based on the type ontology uri - application-specific styles (but this is probably handled best through properties and not though a selector; on the other hand, it would enable workarounds for app-specific bugs) - level of detail (but Eugene didn't even know this option existed in DAS1, so perhaps it's not needed for DAS2) - support for overrides in case of stylesheet conflicts (user overrides server overrides application, most recent definition overrides previous) For the view and the application selectors a space separated list seems reasonable, as view="2D 3D" ... color as yellow meaning that for 2D and 3D to draw the feature in yellow. Or just leave out the selector. One question was how to find the stylesheet. They can be listed in the SOURCES document but I was thinking they could also be listed in the FEATURES response, as Another question is the format of that selection language. That was quickly answered: "in XML". I brought up Ed's comment about (if I understand correctly) making the shape language a bit more abstract. For example, in DAS1 there's a GLYPH called "PRIMERS", while the others are names like "EX" and "ARROW". The general view is that this level of abstraction isn't useful. Andreas Prlic summarized it nicely as (reworded) "the goal of a stylesheet is to make thing concrete". Though perhaps an SVG-style set of drawing commands may be useful. That said, there may be a few things which need a more domain-specific name. The example which came up is in color. EBI has "contig blue" as a color name. Are there other colors like that? On the topic of colors, the desired colors are the CSS color names (though in-house they also have the X11 names) and the CSS-style #color #selection, as #0FF for cyan. The #RGB and #RRGGBB color names are sufficient. Other CSS variation, like rgb(255, 0, 0) and rgb(10%, 45%, 82%) are not needed. In the meeting I mentioned alpha/opacity values in CSS as #RGBA and #RRGGBBAA. In writing these notes up I see that CSS does not support that syntax. Alpha is a "wouldn't it be cool if .." feature and not one which is needed or specifically requested. I outlined support for more complex font information for DAS2. Feedback here say that's not important. There's no desire to change the font size, style, etc. Nor desire for super/subscript, underscore, italics, bold, condensed, etc. I asked about standardizing the drawing model so there is more consistency between different viewers. For example, if there is a glyph and a piece of text, where is the text drawn in relationship to the glpyh? Does the height of the glyph include both? There was no desire for this. On the other hand, a current user-specified option is where to draw the text, which corresponds to a stylesheet override. What they want is support for plots and color gradients. See the "Gradient" and "TilingArray" entries at http://www.ensembl.org/Homo_sapiens/contigview?conf_script=contigview; vc_start=25422500;vc_end=25447499;region=17; add_das_source=(name=Gradient+url=http://das.ensembl.org/ das+dsn=hydraeuf_00001350+type=ensembl_location_chromosome+stylesheet=y+ score=c+fg_merge=a+fg_grades=50+fg_data=l+fg_max=310+fg_min= -143+active=1);add_das_source=(name=TilingArray+url=http:// das.ensembl.org/ das+dsn=hydraeuf_00001350+type=ensembl_location_chromosome+stylesheet=y+ score=s+fg_merge=m+active=1 I can think of several ways to handle that. One is to declare a feature for the entire chromosome, as and viewers can use some agreed upon protocol to get the right data from somewhere/else. Another is R0lGODdhOABkAPMAABq15RaU14za5O3391660P////7//vz+/gAAAOCP4XJgv// 10AAAAACP4XRM j+F1ICwAAAAAOABkAAAE/xDJSau9OGtZuv9gKI4kyZVoqorn6r4sAs90S9+pje8x7/e/ YEcn3BGL tyNyply+ms4VNJqTUZPWKzOrfXK70i+4OvaWXdOzJ60usNXvc7w8H9fB925eu7/ 2qX9RgU6DS4VI h0WJQotBjT+PPpE8k0ZibSGVOJpYlm5DkJdhblaiMH1ZFKGba6Cmo1gTQ4GNspuvSZS4mJw1 u229 W5gowae/ OwYDBmZUAwIBBAMHsE7OANcA0dPExzMHzwAB4tkC2ybdMALX4uMBAgTmQEXJ7AHW2NnK I48D9QIGyQhgczdgHzoVBwiwK+dhgMAA5OKtOZiin7h/IL4pJCgvyDd3EtQ/ HJBm0Em8Ac4KqqiU 7Nm4aGSqqau3juG5ag8hEpwpoAQ/ gQNhFpipsuMPA+rWgfSQEEDPkkWahhP6YUC2kLOQOJyKtR8A rKS0PsQY4hk8qEK2Xiva8JrNTBRRfMNGVSO5m0je4SMANBtfsGGR2A23ly9buDIFvAPKV8Bh xCZR OlOMEvDEMwYOaPYZd0XmyYv5EngLgt9iiISvjU6JVgg4fAN1ulsGuYjFdkrZUS3dGYVLcY0d aybZ Oqpmy8WH8VaOl3lt5x+KMYMevTegDdiza7cQAQA7 with an agreed upon definition of how to interpret the in-line data. But for the entire genome this could be rather big. Another is to break it down into parts, as ... data for the first 10,000 bases ... ... data for the second 10,000 bases ... ... There is already the need for displaying images on the display, but the current use is to click on a point to bring up an image and not showing the image as a glyph. The current solution is a hack, embedding HTML in the NOTE field. Only a couple of HTML elements are supported. This can easily me moved into a property or a local extension in DAS2. If viewer does not understand one of the extensions, what does it display? There are two things in DAS1 which I don't know well enough to ask reasonable questions. One is the BUMP, which I think specifies if multiple glyphs of the same type may overlap. I think Eugene said they wanted more control over that, like limiting to at most 5 overlaps. Another is the GROUP, which in DAS1 was used to merge multiple feature types into a single track. Quoting from the DAS1 spec The canonical example is the CDS, exons and introns of a transcribed gene, which logically belong together. DAS1 has specialized stylesheet language for depicting groups. DAS2 uses hierarchical features instead. Does/can DAS2 do the right thing for depicting those? I think I've covered the major points. Please chime in if I've missed anything relevant. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Oct 26 13:46:24 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 26 Oct 2006 14:46:24 +0100 Subject: [DAS2] TYPE[@source] -> TYPE[@method] Message-ID: <4098539a2681ec2c3243e4008dac7855@dalkescientific.com> I would like to change the existing TYPE attribute of "source" and have it use a different attribute name. Its meaning conflicts with the other uses of "source" in DAS2. The best alternative is "method" because (I believe) it is supposed to store the same information as the corresponding DAS1 TYPE attribute. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Oct 27 19:56:27 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 27 Oct 2006 20:56:27 +0100 Subject: [DAS2] segments and types Message-ID: <91244d1fb88f2b49939a9d10f15d2b03@dalkescientific.com> A couple of observations about what I've seen in existing DAS1 servers. Nothing here concerns format changes. There are four different ways to handle segments: 1) Don't provide segment information "Our clients know the segment because of the id so they don't need a segments document" 2) use "size" (pre-DAS 1.0 spec) 3) use "start"/"stop" (DAS 1.0 spec) - with variations, like "0", "0" meaning the length is undefined (and even "1", "0", with a size="2", for one server!) 4) use a "version" field The last is mostly used for protein sequences, that I've seen. Its an aspect of #1 ("9pti" means "bovine pancreatic trypsin inhibitor structure from PDB") as an abstract identifier, with the version used to make it concrete ("with the update because the first release had a typo") I think it can be encapsulated in the uri scheme we now use because each version gets it own identifier, and since the client knows all versions there's no problem. The folks at EBI/Sanger (what's the correct collective term; Hinxton? Genome Campus?) know which servers provide which systems so many servers don't provide coordinates. In some cases, like rabbit, the server will generate about 120,000 segments, one for each scaffold. It takes quite some time (a minute or more) to generate the output. In theory this is static and can be precomputed by the server. For my own knowledge, when do people want the complete list of segments? When do they want the length? You, yes, you there, in front of the computer. When do you you want to use it? Let me stress -- this is not a request to change anything. I would like to know for my own sake, for writing the documentation, and for how much emphasis to put on this for the validation. As another observation, the Sanger/EBI servers also don't do much with the types document. Some don't even handle the request. Eugene said that no one had asked him to add it. It's there now (thanks Eugene). I think this is because most of their servers only had a single type and the solution was "display everything." They are running into difficulties with this for a few new servers and will be need type support, and type filter support soonish. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Oct 27 20:01:01 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 27 Oct 2006 21:01:01 +0100 Subject: [DAS2] das1->das2 proxy adapter Message-ID: As part of my effort to make sure DAS2 supports at least what DAS1 can do, and to simplify migration from DAS1 to DAS2, I have over this week developed a partial proxy adapter. It's a DAS2 server which translates the request then forwards it to a DAS1 server (including the "segment" and "overlaps" feature filters). It takes the results and reformats them into DAS2 format. I had used a template approach for this but that proved slow for for large responses. I rewrote the code so I generate the XML by hand, which also gives me a chance to put in a lot more validation code for DAS1. The goal there is to ensure that I catch all the extensions people added to DAS1. Andrew dalke at dalkescientific.com From ed_erwin at affymetrix.com Mon Oct 30 22:26:38 2006 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Mon, 30 Oct 2006 14:26:38 -0800 Subject: [DAS2] das1->das2 proxy adapter In-Reply-To: References: Message-ID: <45467C1E.1000705@affymetrix.com> Thanks Andrew, That sounds really useful. It might be nice to try to run the current NetAffx DAS/1 server through this translation and see what comes out the other end. How would we need to do that? Do we download your code and run it ourselves, or will you have some server that we can pass the data through? Ed Andrew Dalke wrote: > As part of my effort to make sure DAS2 supports at least what > DAS1 can do, and to simplify migration from DAS1 to DAS2, > I have over this week developed a partial proxy adapter. It's > a DAS2 server which translates the request then forwards it > to a DAS1 server (including the "segment" and "overlaps" > feature filters). > > It takes the results and reformats them into DAS2 format. I > had used a template approach for this but that proved slow for > for large responses. I rewrote the code so I generate the XML > by hand, which also gives me a chance to put in a lot more > validation code for DAS1. The goal there is to ensure that > I catch all the extensions people added to DAS1. > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 >