From aloraine at gmail.com Tue Aug 1 13:02:53 2006 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 1 Aug 2006 12:02:53 -0500 Subject: [DAS2] Fwd: [MOBY-dev] Java Web Services: part 2 In-Reply-To: <44CF83BD.90103@ucalgary.ca> References: <44CF83BD.90103@ucalgary.ca> Message-ID: <83722dde0608011002k4d2f67dfmc65e37e97a2bb851@mail.gmail.com> Greetings, If you are interested in keeping up with BioMoby developments, this may be of interest. Cheers, Ann On 8/1/06, Paul Gordon wrote: > Hi all, > > I have just committed some new code for creating MOBY Java servlets. > It's intended for Extremely Lazy Programmers (such as myself), requiring > that you download just a particular WAR. No CVS, Axis, Ant, etc. > required. Some of my coworkers who have never deployed an servlet, or > knew anything about MOBY were able to have a registered, tested service > within 30 minutes! Hopefully this will be of use to some of you too... > > http://biomoby.open-bio.org/CVS_CONTENT/moby-live/Java/docs/deployingServices.html > > Regards, > > Paul > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From gilmanb at pantherinformatics.com Thu Aug 10 11:09:43 2006 From: gilmanb at pantherinformatics.com (Brian Gilman) Date: Thu, 10 Aug 2006 11:09:43 -0400 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: References: Message-ID: <44DB4C37.6040704@pantherinformatics.com> Trying to get a features document? Hello Greg et al. I'm desperately trying to get a features document out of one of the DAS 2 servers and have not been able to do it yet. Can someone help me out!? Thanks! -B Helt,Gregg wrote: >Affymetrix is hosting a DAS/2 code sprint on August 14-18, to coincide >with the CSB conference at Stanford. The sprint will be held at Affy's >Santa Clara location, which is about a 20 minute drive from the Stanford >campus. For those attending CSB, the proximity should make it easy to >join in, even if it's just for a morning or afternoon. We can provide >transportation to and from CSB if needed. If you are interested in >attending please email me, and specify whether you'll need a workstation >or will be bringing your own laptop. > >This is a code sprint, so the focus will be on DAS/2 client and server >implementations. As with previous sprints I'd like to start each day >with a teleconference at 9 AM Pacific time. If you can't be there >physically but still want to participate, please join in! > > Gregg > > >_______________________________________________ >DAS2 mailing list >DAS2 at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/das2 > > > > From Gregg_Helt at affymetrix.com Thu Aug 10 12:39:12 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 10 Aug 2006 09:39:12 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 Message-ID: Apologies, it looks like we're currently having some problem with proxy redirection on the Affy DAS/2 server. Steve, can you check on this? When I request anything but the top level ~/sequence, I'm getting back HTTP error 502 "Bad Gateway" with the message: "The proxy server received an invalid response from an upstream server." However, I just tried the biopackages server and it is working, though response times are slower than usual (unless the response has already been cached). Here's a feature query I recently ran, so it will be returned quickly from the server cache: http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26 027736:26068042;type=SO:mRNA hope that helps, gregg > -----Original Message----- > From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] > Sent: Thursday, August 10, 2006 8:10 AM > To: Helt,Gregg > Cc: DAS/2 > Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 > > Trying to get a features document? > > Hello Greg et al. I'm desperately trying to get a features document > out of one of the DAS 2 servers and have not been able to do it yet. Can > someone help me out!? > > Thanks! > > -B > > Helt,Gregg wrote: > > >Affymetrix is hosting a DAS/2 code sprint on August 14-18, to coincide > >with the CSB conference at Stanford. The sprint will be held at Affy's > >Santa Clara location, which is about a 20 minute drive from the Stanford > >campus. For those attending CSB, the proximity should make it easy to > >join in, even if it's just for a morning or afternoon. We can provide > >transportation to and from CSB if needed. If you are interested in > >attending please email me, and specify whether you'll need a workstation > >or will be bringing your own laptop. > > > >This is a code sprint, so the focus will be on DAS/2 client and server > >implementations. As with previous sprints I'd like to start each day > >with a teleconference at 9 AM Pacific time. If you can't be there > >physically but still want to participate, please join in! > > > > Gregg > > > > > >_______________________________________________ > >DAS2 mailing list > >DAS2 at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/das2 > > > > > > > > From Steve_Chervitz at affymetrix.com Thu Aug 10 16:14:48 2006 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Thu, 10 Aug 2006 13:14:48 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: Message-ID: The netaffxdas das/2 server is back up now. Turned out to be a memory trouble. The server got some whomping queries thrown at it, such as these: M_musculus_Aug_2005/features?overlaps=chr1/0:194923535;type=mrna;format=bps H_sapiens_Mar_2006/features?overlaps=chr20/0:62435964;type=mrna;format=bps Which it could not complete due to out of memory errors. But it could handle this sizeable query even after the above failed: H_sapiens_May_2004/features?overlaps=chr20/0:62435964;type=refseq;format=brs Eventually, Jetty just decided it had enough and shut down it's connection, shouting: WARN!! Stopping Acceptor ServerSocket My fix was to restart the das/2 server giving the java process another 200M of maximal heap. However, both das/1 and das/2 servers can now potentially claim 89% of physical ram on that box, which could become unhealthy. Ed notes that we might want to prevent such big queries in the first place. There is an error code in the das spec for this (HTTP error 413 "Request Entity Too Large"). But how do we determine the what's a reasonable maximum allowable query result? It will depend on the feature density on a particular assembly. This could be a good action item for the code sprint. Steve > From: "Helt,Gregg" > Date: Thu, 10 Aug 2006 09:39:12 -0700 > To: Brian Gilman > Cc: DAS/2 , "Chervitz, Steve" > > Conversation: [DAS2] DAS/2 Code Sprint, August 14-18 > Subject: RE: [DAS2] DAS/2 Code Sprint, August 14-18 > > Apologies, it looks like we're currently having some problem with proxy > redirection on the Affy DAS/2 server. Steve, can you check on this? > When I request anything but the top level ~/sequence, I'm getting back > HTTP error 502 "Bad Gateway" with the message: > "The proxy server received an invalid response from an upstream server." > > However, I just tried the biopackages server and it is working, though > response times are slower than usual (unless the response has already > been cached). Here's a feature query I recently ran, so it will be > returned quickly from the server cache: > > http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26 > 027736:26068042;type=SO:mRNA > > hope that helps, > gregg > >> -----Original Message----- >> From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] >> Sent: Thursday, August 10, 2006 8:10 AM >> To: Helt,Gregg >> Cc: DAS/2 >> Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 >> >> Trying to get a features document? >> >> Hello Greg et al. I'm desperately trying to get a features > document >> out of one of the DAS 2 servers and have not been able to do it yet. > Can >> someone help me out!? >> >> Thanks! >> >> -B >> >> Helt,Gregg wrote: >> >>> Affymetrix is hosting a DAS/2 code sprint on August 14-18, to > coincide >>> with the CSB conference at Stanford. The sprint will be held at > Affy's >>> Santa Clara location, which is about a 20 minute drive from the > Stanford >>> campus. For those attending CSB, the proximity should make it easy > to >>> join in, even if it's just for a morning or afternoon. We can > provide >>> transportation to and from CSB if needed. If you are interested in >>> attending please email me, and specify whether you'll need a > workstation >>> or will be bringing your own laptop. >>> >>> This is a code sprint, so the focus will be on DAS/2 client and > server >>> implementations. As with previous sprints I'd like to start each day >>> with a teleconference at 9 AM Pacific time. If you can't be there >>> physically but still want to participate, please join in! >>> >>> Gregg >>> >>> >>> _______________________________________________ >>> DAS2 mailing list >>> DAS2 at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/das2 >>> >>> >>> >>> > From gilmanb at pantherinformatics.com Thu Aug 10 16:18:20 2006 From: gilmanb at pantherinformatics.com (Brian Gilman) Date: Thu, 10 Aug 2006 16:18:20 -0400 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: References: Message-ID: Eh hem, sorry...I was playing around.... -B -- Brian Gilman President Panther Informatics Inc. E-Mail: gilmanb at pantherinformatics.com gilmanb at jforge.net AIM: gilmanb1 01000010 01101001 01101111 01001001 01101110 01100110 01101111 01110010 01101101 01100001 01110100 01101001 01100011 01101001 01100001 01101110 On Aug 10, 2006, at 4:14 PM, Steve Chervitz wrote: > The netaffxdas das/2 server is back up now. Turned out to be a memory > trouble. The server got some whomping queries thrown at it, such as > these: > > M_musculus_Aug_2005/features? > overlaps=chr1/0:194923535;type=mrna;format=bps > > H_sapiens_Mar_2006/features? > overlaps=chr20/0:62435964;type=mrna;format=bps > > Which it could not complete due to out of memory errors. But it > could handle > this sizeable query even after the above failed: > > H_sapiens_May_2004/features? > overlaps=chr20/0:62435964;type=refseq;format=brs > > Eventually, Jetty just decided it had enough and shut down it's > connection, > shouting: WARN!! Stopping Acceptor ServerSocket > > My fix was to restart the das/2 server giving the java process > another 200M > of maximal heap. However, both das/1 and das/2 servers can now > potentially > claim 89% of physical ram on that box, which could become unhealthy. > > Ed notes that we might want to prevent such big queries in the > first place. > There is an error code in the das spec for this (HTTP error 413 > "Request > Entity Too Large"). But how do we determine the what's a reasonable > maximum > allowable query result? It will depend on the feature density on a > particular assembly. This could be a good action item for the code > sprint. > > Steve > > > >> From: "Helt,Gregg" >> Date: Thu, 10 Aug 2006 09:39:12 -0700 >> To: Brian Gilman >> Cc: DAS/2 , "Chervitz, Steve" >> >> Conversation: [DAS2] DAS/2 Code Sprint, August 14-18 >> Subject: RE: [DAS2] DAS/2 Code Sprint, August 14-18 >> >> Apologies, it looks like we're currently having some problem with >> proxy >> redirection on the Affy DAS/2 server. Steve, can you check on this? >> When I request anything but the top level ~/sequence, I'm getting >> back >> HTTP error 502 "Bad Gateway" with the message: >> "The proxy server received an invalid response from an upstream >> server." >> >> However, I just tried the biopackages server and it is working, >> though >> response times are slower than usual (unless the response has already >> been cached). Here's a feature query I recently ran, so it will be >> returned quickly from the server cache: >> >> http://das.biopackages.net/das/genome/human/17/feature? >> overlaps=chr21/26 >> 027736:26068042;type=SO:mRNA >> >> hope that helps, >> gregg >> >>> -----Original Message----- >>> From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] >>> Sent: Thursday, August 10, 2006 8:10 AM >>> To: Helt,Gregg >>> Cc: DAS/2 >>> Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 >>> >>> Trying to get a features document? >>> >>> Hello Greg et al. I'm desperately trying to get a features >> document >>> out of one of the DAS 2 servers and have not been able to do it yet. >> Can >>> someone help me out!? >>> >>> Thanks! >>> >>> -B >>> >>> Helt,Gregg wrote: >>> >>>> Affymetrix is hosting a DAS/2 code sprint on August 14-18, to >> coincide >>>> with the CSB conference at Stanford. The sprint will be held at >> Affy's >>>> Santa Clara location, which is about a 20 minute drive from the >> Stanford >>>> campus. For those attending CSB, the proximity should make it easy >> to >>>> join in, even if it's just for a morning or afternoon. We can >> provide >>>> transportation to and from CSB if needed. If you are interested in >>>> attending please email me, and specify whether you'll need a >> workstation >>>> or will be bringing your own laptop. >>>> >>>> This is a code sprint, so the focus will be on DAS/2 client and >> server >>>> implementations. As with previous sprints I'd like to start >>>> each day >>>> with a teleconference at 9 AM Pacific time. If you can't be there >>>> physically but still want to participate, please join in! >>>> >>>> Gregg >>>> >>>> >>>> _______________________________________________ >>>> DAS2 mailing list >>>> DAS2 at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/das2 >>>> >>>> >>>> >>>> >> > > From Gregg_Helt at affymetrix.com Thu Aug 10 18:25:24 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 10 Aug 2006 15:25:24 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 Message-ID: Hmm... those queries really shouldn't stretch memory requirements too much -- the mrna objects are already in memory, so for the most part any extra memory is taken up by the output streaming through the server. Steve, can you send me the log file for the server when it was hitting these out-of-memory errors? Thanks, Gregg > -----Original Message----- > From: Chervitz, Steve > Sent: Thursday, August 10, 2006 1:15 PM > To: Helt,Gregg; Brian Gilman > Cc: DAS/2 > Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 > > The netaffxdas das/2 server is back up now. Turned out to be a memory > trouble. The server got some whomping queries thrown at it, such as these: > > M_musculus_Aug_2005/features?overlaps=chr1/0:194923535;type=mrna;format= bp > s > > H_sapiens_Mar_2006/features?overlaps=chr20/0:62435964;type=mrna;format=b ps > > Which it could not complete due to out of memory errors. But it could > handle > this sizeable query even after the above failed: > > H_sapiens_May_2004/features?overlaps=chr20/0:62435964;type=refseq;format =b > rs > > Eventually, Jetty just decided it had enough and shut down it's connection, > shouting: WARN!! Stopping Acceptor ServerSocket > > My fix was to restart the das/2 server giving the java process another > 200M > of maximal heap. However, both das/1 and das/2 servers can now potentially > claim 89% of physical ram on that box, which could become unhealthy. > > Ed notes that we might want to prevent such big queries in the first place. > There is an error code in the das spec for this (HTTP error 413 "Request > Entity Too Large"). But how do we determine the what's a reasonable > maximum > allowable query result? It will depend on the feature density on a > particular assembly. This could be a good action item for the code sprint. > > Steve > > > > > From: "Helt,Gregg" > > Date: Thu, 10 Aug 2006 09:39:12 -0700 > > To: Brian Gilman > > Cc: DAS/2 , "Chervitz, Steve" > > > > Conversation: [DAS2] DAS/2 Code Sprint, August 14-18 > > Subject: RE: [DAS2] DAS/2 Code Sprint, August 14-18 > > > > Apologies, it looks like we're currently having some problem with proxy > > redirection on the Affy DAS/2 server. Steve, can you check on this? > > When I request anything but the top level ~/sequence, I'm getting back > > HTTP error 502 "Bad Gateway" with the message: > > "The proxy server received an invalid response from an upstream server." > > > > However, I just tried the biopackages server and it is working, though > > response times are slower than usual (unless the response has already > > been cached). Here's a feature query I recently ran, so it will be > > returned quickly from the server cache: > > > > http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26 > > 027736:26068042;type=SO:mRNA > > > > hope that helps, > > gregg > > > >> -----Original Message----- > >> From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] > >> Sent: Thursday, August 10, 2006 8:10 AM > >> To: Helt,Gregg > >> Cc: DAS/2 > >> Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 > >> > >> Trying to get a features document? > >> > >> Hello Greg et al. I'm desperately trying to get a features > > document > >> out of one of the DAS 2 servers and have not been able to do it yet. > > Can > >> someone help me out!? > >> > >> Thanks! > >> > >> -B > >> > >> Helt,Gregg wrote: > >> > >>> Affymetrix is hosting a DAS/2 code sprint on August 14-18, to > > coincide > >>> with the CSB conference at Stanford. The sprint will be held at > > Affy's > >>> Santa Clara location, which is about a 20 minute drive from the > > Stanford > >>> campus. For those attending CSB, the proximity should make it easy > > to > >>> join in, even if it's just for a morning or afternoon. We can > > provide > >>> transportation to and from CSB if needed. If you are interested in > >>> attending please email me, and specify whether you'll need a > > workstation > >>> or will be bringing your own laptop. > >>> > >>> This is a code sprint, so the focus will be on DAS/2 client and > > server > >>> implementations. As with previous sprints I'd like to start each day > >>> with a teleconference at 9 AM Pacific time. If you can't be there > >>> physically but still want to participate, please join in! > >>> > >>> Gregg > >>> > >>> > >>> _______________________________________________ > >>> DAS2 mailing list > >>> DAS2 at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/das2 > >>> > >>> > >>> > >>> > > From aloraine at gmail.com Mon Aug 14 02:13:10 2006 From: aloraine at gmail.com (Ann Loraine) Date: Sun, 13 Aug 2006 23:13:10 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: <6dce9a0b0608132227o12924d90ud8a8cca329b30fb@mail.gmail.com> References: <83722dde0607240911w4d50b9cfo43adff514f6df39c@mail.gmail.com> <6dce9a0b0608132227o12924d90ud8a8cca329b30fb@mail.gmail.com> Message-ID: <83722dde0608132313g4ec0cdf5p990284ad00b0d17@mail.gmail.com> Hi, Last I heard, it's starting Monday (the 14th), beginning with a conference call at 9 am. Directions: http://www.affymetrix.com/site/contact/directions.jsp?loc=sc Best, Ann On 8/13/06, Lincoln Stein wrote: > Hi, > > Is the code sprint starting on the 13th or the 14th? I am here in Palo Alto > and have Monday morning free. > > Can I get driving directions from the Affy web site? > > Lincoln > > On 7/24/06, Ann Loraine wrote: > > Hi Gregg, > > > > I would like to suggest shifting the code spring by a day and have it > > start Monday August 13. > > > > That way it won't perfectly overlap the conference and those us who > > need to be at the conference full-time (such as myself) will be able > > to visit the code spring. > > > > Cheers, > > > > Ann > > > > On 7/24/06, Helt,Gregg wrote: > > > Affymetrix is hosting a DAS/2 code sprint on August 14-18, to coincide > > > with the CSB conference at Stanford. The sprint will be held at Affy's > > > Santa Clara location, which is about a 20 minute drive from the Stanford > > > campus. For those attending CSB, the proximity should make it easy to > > > join in, even if it's just for a morning or afternoon. We can provide > > > transportation to and from CSB if needed. If you are interested in > > > attending please email me, and specify whether you'll need a workstation > > > or will be bringing your own laptop. > > > > > > This is a code sprint, so the focus will be on DAS/2 client and server > > > implementations. As with previous sprints I'd like to start each day > > > with a teleconference at 9 AM Pacific time. If you can't be there > > > physically but still want to participate, please join in! > > > > > > Gregg > > > > > > > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/das2 > > > > > > > > > -- > > Ann Loraine > > Assistant Professor > > Section on Statistical Genetics > > University of Alabama at Birmingham > > http://www.ssg.uab.edu > > http://www.transvar.org > > _______________________________________________ > > DAS2 mailing list > > DAS2 at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/das2 > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From allenday at ucla.edu Mon Aug 14 03:14:49 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 14 Aug 2006 00:14:49 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: References: Message-ID: <5c24dcc30608140014u3d9dd1b5w9e487e142d1ca077@mail.gmail.com> Ah, I may implement this. Let's discuss tomorrow morning. Is there an agenda set? Is anyone teleconferencing in? -Allen On 8/10/06, Chervitz, Steve wrote: > > The netaffxdas das/2 server is back up now. Turned out to be a memory > trouble. The server got some whomping queries thrown at it, such as these: > > > M_musculus_Aug_2005/features?overlaps=chr1/0:194923535;type=mrna;format=bps > > H_sapiens_Mar_2006/features?overlaps=chr20/0:62435964;type=mrna;format=bps > > Which it could not complete due to out of memory errors. But it could > handle > this sizeable query even after the above failed: > > > H_sapiens_May_2004/features?overlaps=chr20/0:62435964;type=refseq;format=brs > > Eventually, Jetty just decided it had enough and shut down it's > connection, > shouting: WARN!! Stopping Acceptor ServerSocket > > My fix was to restart the das/2 server giving the java process another > 200M > of maximal heap. However, both das/1 and das/2 servers can now potentially > claim 89% of physical ram on that box, which could become unhealthy. > > Ed notes that we might want to prevent such big queries in the first > place. > There is an error code in the das spec for this (HTTP error 413 "Request > Entity Too Large"). But how do we determine the what's a reasonable > maximum > allowable query result? It will depend on the feature density on a > particular assembly. This could be a good action item for the code sprint. > > Steve > > > > > From: "Helt,Gregg" > > Date: Thu, 10 Aug 2006 09:39:12 -0700 > > To: Brian Gilman > > Cc: DAS/2 , "Chervitz, Steve" > > > > Conversation: [DAS2] DAS/2 Code Sprint, August 14-18 > > Subject: RE: [DAS2] DAS/2 Code Sprint, August 14-18 > > > > Apologies, it looks like we're currently having some problem with proxy > > redirection on the Affy DAS/2 server. Steve, can you check on this? > > When I request anything but the top level ~/sequence, I'm getting back > > HTTP error 502 "Bad Gateway" with the message: > > "The proxy server received an invalid response from an upstream server." > > > > However, I just tried the biopackages server and it is working, though > > response times are slower than usual (unless the response has already > > been cached). Here's a feature query I recently ran, so it will be > > returned quickly from the server cache: > > > > http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26 > > 027736:26068042;type=SO:mRNA > > > > hope that helps, > > gregg > > > >> -----Original Message----- > >> From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] > >> Sent: Thursday, August 10, 2006 8:10 AM > >> To: Helt,Gregg > >> Cc: DAS/2 > >> Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 > >> > >> Trying to get a features document? > >> > >> Hello Greg et al. I'm desperately trying to get a features > > document > >> out of one of the DAS 2 servers and have not been able to do it yet. > > Can > >> someone help me out!? > >> > >> Thanks! > >> > >> -B > >> > >> Helt,Gregg wrote: > >> > >>> Affymetrix is hosting a DAS/2 code sprint on August 14-18, to > > coincide > >>> with the CSB conference at Stanford. The sprint will be held at > > Affy's > >>> Santa Clara location, which is about a 20 minute drive from the > > Stanford > >>> campus. For those attending CSB, the proximity should make it easy > > to > >>> join in, even if it's just for a morning or afternoon. We can > > provide > >>> transportation to and from CSB if needed. If you are interested in > >>> attending please email me, and specify whether you'll need a > > workstation > >>> or will be bringing your own laptop. > >>> > >>> This is a code sprint, so the focus will be on DAS/2 client and > > server > >>> implementations. As with previous sprints I'd like to start each day > >>> with a teleconference at 9 AM Pacific time. If you can't be there > >>> physically but still want to participate, please join in! > >>> > >>> Gregg > >>> > >>> > >>> _______________________________________________ > >>> DAS2 mailing list > >>> DAS2 at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/das2 > >>> > >>> > >>> > >>> > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > From steve_chervitz at affymetrix.com Mon Aug 14 04:56:31 2006 From: steve_chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 14 Aug 2006 01:56:31 -0700 (PDT) Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: <5c24dcc30608140014u3d9dd1b5w9e487e142d1ca077@mail.gmail.com> References: <5c24dcc30608140014u3d9dd1b5w9e487e142d1ca077@mail.gmail.com> Message-ID: On Mon, 14 Aug 2006, Allen Day wrote: > Ah, I may implement this. Let's discuss tomorrow morning. Is there an > agenda set? Is anyone teleconferencing in? I haven't seen a specific agenda, but something like this seems reasonable: * status reports, including what you want to focus on for the sprint * establish a prioitized list of goals and deliverables for the current sprint Teleconferencing will start at 9AM PST on the usual number: TEL=800-531-3250 (US) or 303-928-2693 (Int'l) ID=2879055 PIN=1365 Steve > > On 8/10/06, Chervitz, Steve wrote: >> >> The netaffxdas das/2 server is back up now. Turned out to be a memory >> trouble. The server got some whomping queries thrown at it, such as these: >> >> >> M_musculus_Aug_2005/features?overlaps=chr1/0:194923535;type=mrna;format=bps >> >> H_sapiens_Mar_2006/features?overlaps=chr20/0:62435964;type=mrna;format=bps >> >> Which it could not complete due to out of memory errors. But it could >> handle >> this sizeable query even after the above failed: >> >> >> H_sapiens_May_2004/features?overlaps=chr20/0:62435964;type=refseq;format=brs >> >> Eventually, Jetty just decided it had enough and shut down it's >> connection, >> shouting: WARN!! Stopping Acceptor ServerSocket >> >> My fix was to restart the das/2 server giving the java process another >> 200M >> of maximal heap. However, both das/1 and das/2 servers can now potentially >> claim 89% of physical ram on that box, which could become unhealthy. >> >> Ed notes that we might want to prevent such big queries in the first >> place. >> There is an error code in the das spec for this (HTTP error 413 "Request >> Entity Too Large"). But how do we determine the what's a reasonable >> maximum >> allowable query result? It will depend on the feature density on a >> particular assembly. This could be a good action item for the code sprint. >> >> Steve >> >> >> >> > From: "Helt,Gregg" >> > Date: Thu, 10 Aug 2006 09:39:12 -0700 >> > To: Brian Gilman >> > Cc: DAS/2 , "Chervitz, Steve" >> > >> > Conversation: [DAS2] DAS/2 Code Sprint, August 14-18 >> > Subject: RE: [DAS2] DAS/2 Code Sprint, August 14-18 >> > >> > Apologies, it looks like we're currently having some problem with proxy >> > redirection on the Affy DAS/2 server. Steve, can you check on this? >> > When I request anything but the top level ~/sequence, I'm getting back >> > HTTP error 502 "Bad Gateway" with the message: >> > "The proxy server received an invalid response from an upstream server." >> > >> > However, I just tried the biopackages server and it is working, though >> > response times are slower than usual (unless the response has already >> > been cached). Here's a feature query I recently ran, so it will be >> > returned quickly from the server cache: >> > >> > http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26 >> > 027736:26068042;type=SO:mRNA >> > >> > hope that helps, >> > gregg >> > >> >> -----Original Message----- >> >> From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] >> >> Sent: Thursday, August 10, 2006 8:10 AM >> >> To: Helt,Gregg >> >> Cc: DAS/2 >> >> Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 >> >> >> >> Trying to get a features document? >> >> >> >> Hello Greg et al. I'm desperately trying to get a features >> > document >> >> out of one of the DAS 2 servers and have not been able to do it yet. >> > Can >> >> someone help me out!? >> >> >> >> Thanks! >> >> >> >> -B >> >> >> >> Helt,Gregg wrote: >> >> >> >>> Affymetrix is hosting a DAS/2 code sprint on August 14-18, to >> > coincide >> >>> with the CSB conference at Stanford. The sprint will be held at >> > Affy's >> >>> Santa Clara location, which is about a 20 minute drive from the >> > Stanford >> >>> campus. For those attending CSB, the proximity should make it easy >> > to >> >>> join in, even if it's just for a morning or afternoon. We can >> > provide >> >>> transportation to and from CSB if needed. If you are interested in >> >>> attending please email me, and specify whether you'll need a >> > workstation >> >>> or will be bringing your own laptop. >> >>> >> >>> This is a code sprint, so the focus will be on DAS/2 client and >> > server >> >>> implementations. As with previous sprints I'd like to start each day >> >>> with a teleconference at 9 AM Pacific time. If you can't be there >> >>> physically but still want to participate, please join in! >> >>> >> >>> Gregg >> >>> >> >>> >> >>> _______________________________________________ >> >>> DAS2 mailing list >> >>> DAS2 at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/das2 >> >>> >> >>> >> >>> >> >>> >> > >> >> >> _______________________________________________ >> DAS2 mailing list >> DAS2 at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das2 >> > From Gregg_Helt at affymetrix.com Mon Aug 14 08:39:07 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 14 Aug 2006 05:39:07 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 Message-ID: Apologies for not posting the details sooner! DAS/2 Code Sprint, August 14 (Monday) through August 18 (Friday) Conference Call, 9 AM PST every morning 800-531-3250 Conference ID: 2879055 Passcode: 1365 We're in the Computer Training room at Affymetrix Santa Clara, Building 3450 Directions to Affymetrix Building 3450 (3450 Kifer Road, Santa Clara, CA): http://www.affymetrix.com/site/contact/directions.jsp?loc=sccentral This is about a 20 minute drive from the Stanford campus. If there is no receptionist at 3420, you may need to check in at the reception area in Building 3420. Please call me on my cell phone if there are any problems finding the room: 510-205-9652 See you all soon! Gregg -----Original Message----- From: Lincoln Stein [mailto:lincoln.stein at gmail.com] Sent: Sunday, August 13, 2006 10:28 PM To: Ann Loraine Cc: Helt,Gregg; DAS/2 Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 Hi, Is the code sprint starting on the 13th or the 14th? I am here in Palo Alto and have Monday morning free. Can I get driving directions from the Affy web site? Lincoln On 7/24/06, Ann Loraine wrote: Hi Gregg, I would like to suggest shifting the code spring by a day and have it start Monday August 13. That way it won't perfectly overlap the conference and those us who need to be at the conference full-time (such as myself) will be able to visit the code spring. Cheers, Ann On 7/24/06, Helt,Gregg wrote: > Affymetrix is hosting a DAS/2 code sprint on August 14-18, to coincide > with the CSB conference at Stanford. The sprint will be held at Affy's > Santa Clara location, which is about a 20 minute drive from the Stanford > campus. For those attending CSB, the proximity should make it easy to > join in, even if it's just for a morning or afternoon. We can provide > transportation to and from CSB if needed. If you are interested in > attending please email me, and specify whether you'll need a workstation > or will be bringing your own laptop. > > This is a code sprint, so the focus will be on DAS/2 client and server > implementations. As with previous sprints I'd like to start each day > with a teleconference at 9 AM Pacific time. If you can't be there > physically but still want to participate, please join in! > > Gregg > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org _______________________________________________ DAS2 mailing list DAS2 at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Gregg_Helt at affymetrix.com Mon Aug 14 08:48:15 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 14 Aug 2006 05:48:15 -0700 Subject: [DAS2] DAS/2 Code Sprint Details, August 14-18 Message-ID: Apologies for not posting the details sooner! DAS/2 Code Sprint, August 14 (Monday) through August 18 (Friday) Conference Call, 9 AM PST every morning US: 800-531-3250, International: 303-928-2693 Conference ID: 2879055 Passcode: 1365 We're in the Computer Training room at Affymetrix Santa Clara, Building 3450 Directions to Affymetrix Building 3450 (3450 Kifer Road, Santa Clara, CA): http://www.affymetrix.com/site/contact/directions.jsp?loc=sccentral This is about a 20 minute drive from the Stanford campus. If there is no receptionist at 3420, you may need to check in at the reception area in Building 3420. Please call me on my cell phone if there are any problems finding the room: 510-205-9652 See you all soon! Gregg From dalke at dalkescientific.com Mon Aug 14 12:20:03 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 14 Aug 2006 18:20:03 +0200 Subject: [DAS2] Fwd: DAS/2 code sprint next week! Message-ID: <4b1a3bf29da8f435273d2b25013d15cf@dalkescientific.com> Begin forwarded message: > From: Andrew Dalke > Date: August 14, 2006 6:00:30 PM GMT+02:00 > To: "Helt,Gregg" > Subject: Re: DAS/2 code sprint next week! > >> We?re hosting another DAS/2 code sprint next week at Affy Santa >> Clara, to coincide with the CSB meeting at Stanford.? Will you be >> able to join in?? If not in person, then we?re having a daily 9 AM >> PST conference call you could join. > > I'll be there. It starts in a few minutes. I'm in a cybercafe in > Cape Town. > >> ?I?m wondering what the status of the DAS/2 writeback spec is. > > It's unchanged. I'll be working on that over the sprint. > > I've spent most of the last, month+ catching up on the latest in > web development systems for Python, and learning various libraries. > Including giving a 2 week course on it. As my test case I've > been working on a DAS2 server. I have a reference server nearly > finished and a few things came up during it: > > - I'm iffy about the current SEGMENTS document. It lists > "title" and "reference" for each segment but not for the list of > segments as a whole. Does it make sense allowing those to be > specified if they have reasonable names? (I know they don't always.) > > It's part of that separation between the sources document, which > describes these, and the segments document. > > - did we specify that the sequence is in upper-case, lower-case, > etc.? > > - I would like some experience with an agp or other assembly format. > I'm concerned about how a client can piece together segment names > from that document with the URIs we're using. It seems to me that > most places use a local name ("yeast_1" or somesuch) which is not > exposed via the web. If the assembly document, fasta file, etc. > use the local name and not the URL then it's hard to tie them together. > > - There are two different segment titles I've come across. One > is the name you want to see in a pull-down menu, etc. while the > other is the text you want in the FASTA header . These could be the > same but I don't think they are always the same. > Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Mon Aug 14 14:30:35 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 14 Aug 2006 11:30:35 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day one, 14 Aug 2006 Message-ID: Notes from DAS/2 code sprint #3, day one, 14 Aug 2006 $Id: das2-teleconf-2006-08-14.txt,v 1.2 2006/08/14 18:28:47 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke Panther Informatics: Brian Gilman UAB: All Loraine UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: * Status reports, including what you want/need to focus on for this sprint, progress from last sprint. Status Reports --------------- gh: have done writeback work. IGB can create curation, post to biopackages writeback server, das/2 client can see curations. no editing yet. client can edit own data models, can't post those edits. to work on ID mapping stuff: client can't accept newly create ids from server. currently just holds onto temporary id's. IGB client has had one or more release since last. priorities - mainly writeback for client. ls: continue working on perl client interface to das/2, not functional at present. need to backout changes since last sprint. das/2 tracks in gbrowse. About 10hrs needed. sc: have been working on keeping data on Affymetrix public das servers up to date, dealing with memory issues cause by increasing amount of array data to support. Gregg has new efficient format for modeling exon array features with lower memory requirements. Will work on getting the das server to use it. Long-term plan is to remove our das/1 server and just have das/2, easier to use and maintain. Complete transition will take time though. Have continued working to automate the pipeline for updating the affy das servers. Have a new page that lists available data on the servers, currently manually created but plan to automate. ad: web dev in python, taught course on that. plan: getting python server up, to experiment with writeback. updating spec as per a couple of months ago. gh: andrew will make spec a top priority, grant is funding for that. bg: tasked to take das/2 data and produce set of objects to use within caCORE system at NCI. Have objects for das/2 data and service. can retrieve das/2 data from affy server. present in simple web page. Using java and ruby. gh: good week to ask questions as you flesh out the impl. ee: gregg and I will put out new IGB release this week. can work on style sheets (left over from last time). Or can build a gff3 parser into IGB (lots of excitement!). al: two things: demo applications for self and collaborators and das newbies. retrieve genomic locations for targets of affy probe sets and then retrieve promoter regions upstream. gh: promoter data in das2 server? al: can just say 500bp upstream of gene. not identifying control. Just retrieve seq to pipe into control analysis. Second one: meta analysis, results from diff groups for associated phenotypes. Input: list of markers, output: annotations associated with these. Statistical analysis. Ultimately obtain candidate genes associated with markers. Some preliminary work on obesity that looks promising. [A] Steve will help Ann convert fly probe set ids into genome locations. Goal is to write something that can do random sampling of gene annotations. ideal world: das server gets region, returns gene ids and go ids. Less ideal: just get genes within the peaks (from association studies). bo: doing rpm packaging for the mac (tgen). so people can set up das2 server on a mac. update rpm packages with results of work this week. clean up bug queue on biopackages server impl, bringing it up to spec. can talk about analysis part of server. internal hirax client for retrieval of assay data. communication with server is out of sync. Spec issues: ------------ gh: want to focus on writeback. wants full xml features rather than mapping document. aday: work on writes as well as deletes. Impl 413 entity request too large adding this for requests that exceed some size threshold (10kb, 100kb) if at or below, OK. gh: need to coord with me on writeback, I focus on client writeback, you on server. Editing is ok. Deletes are harder. Other Issues: ------------- gh: Contact peter good about funding. Extending from 2yr to 3yr. talk with lincoln and suzi about plans for next grant. sc: status of bugzilla open bugs on spec? [A] Someone should go through and update bugzilla list for spec bg: version field. gh: not too understandable. at last sprint, two freezes, the version tells which v of spec freeze the server is using. assumption is that now the servers are using the most recent spec. If they're not compliant, please let us know. affy server: won't give back a list of all features. requires an overlaps and types restrictor. biopackages: should be good with latest spec. bg: sources document, source tag has version. if you do a query like types, also has version? No. ad: sources document: worm 161 (data source). capabilities describe things like writeback support for v161, but not v160. bg: that version seems to have different sematics given query. biggest issue was parsing and populating my object model. gh: coordinate subelement in version elem. has a version attr. my client does not deal with coord stuff. meant to make sure that annots from two servers are refering to same coords, so you can overlay annots from different servers. my client is using version URIs for that instead. bg: other issue: in order to know what server you're hitting, you have to know name space of doc, which has base URI. XML base in segments query. xmlns biodas.org/das2. to have tracability in documents you receive, you as implementer must track urls, converting relative to absolute. can be a problem when hitting 5 different servers. gh: my obj model (client) has model of server with root url of the das server, sources objects which has xml base of each source. bg: you could get back a 404 from xml:base. Perfectly apropriate. server could put whatever it wants in xml:base. currently it's the document. ad: we're using the xml:base spec, so you can put xml:base on any node you want to. construct full url by. gh: in our schema is it clear which attribs are resolved by xml:base? ad: no. bg: would like to see one big document with every element, not several different files. relaxNG isn't best format. would like a w3c XSD that defines the elements. from coders standpoint, don't have to go and look at 5 different docs. Have to have multiple windows up, figure out how they are connected to each other. semantics within each query, who is calling what. ad: I gave brian one. using trang to spit it out. bg: trang is not best xml schema writer. I could work on this. why do you use relaxNG? ad: I can read it and understand it. there were good examples. bg: I can autgenerate code that is in XSD, soap and other wservices stuff does that for you. Can generate a parser, point it a uri, get doc, generate a parser and object model. ad: parser would break if server returns extra attributes. In spec there are some extension points. can put any element that is in a separate namespace. I know how to do that in relaxNG, but not in XSD. bg: you just have to add another xmlns. define an extension point with that namespace. ad: should be able to resolve it into one. bg: Three items. 1. will ask w3c people about XSD to relaxNG. 2. semantics confusion. 3. xml:base appropriate to supply a 404 if client was dependent on that attribute. ad: version tag is problem if there are duplicates. should be changed so there are no duplicates. can build parser on rng bg: it's experimental, alpha s'ware. don't want to use for production. bg: when you put a relative url inside a xml:base. ad: resolvable via http, or in abolute url. gh: if you resolve it up to the top level doc, then use the url of the document itself. whether clients actual do this, depends on impl. say to implementers, we could state that the top level document should resolve to absolute url. we wanted to say, "Das/2 uses xml:base spec. period." bg: put this in the spec, how you want it to be used. ad: don't like saying, "we use xml:base with these additional things" bg: can put off for now. ls: In my library when I see a url and can't resolve, I fall back to a hard coded url. From dalke at dalkescientific.com Mon Aug 14 15:29:54 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 14 Aug 2006 21:29:54 +0200 Subject: [DAS2] duplicate use of VERSION Message-ID: <3f8e5c60e918cd90eabe6597403a5448@dalkescientific.com> Brian G. pointed out that "VERSION" is used twice in the spec, with different meanings. I thought we used it twice as an element but that's not the case. It's used once as "versioned source" element and another time as an attribute in the COORDINATES element # This is the version of the build (if a genomic sequence). # However, protein databases don't do versions this way attribute version { text }?, In looking around I don't see duplicate uses of any tag for elements with different meanings. Brian? Is this the one you were talking about? In thinking about it though, I've found it awkward to talk about "versioned source". First off, the Mac's Mail.app gives squiggles under the "versioned" indicating a misspelling. Second, it's hard to say and annoying to write "versioned_source" in my code and in the documentation. I would like to use "release" instead. That is, change das2:VERSION to das2:RELEASE. That's a shorter word, closer to the intended meaning, and generally nicer. Eg, "there are many data sources and each source may have multiple releases." That's a simple change but it's highly non-backwards compatible. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Aug 14 15:46:23 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 14 Aug 2006 21:46:23 +0200 Subject: [DAS2] duplicate use of VERSION In-Reply-To: <3f8e5c60e918cd90eabe6597403a5448@dalkescientific.com> References: <3f8e5c60e918cd90eabe6597403a5448@dalkescientific.com> Message-ID: <7e92910f143448a82fae138d41a7e195@dalkescientific.com> > In looking around I don't see duplicate uses of any tag for > elements with different meanings. I should have added... Even though they are not duplicate element tags, they should not have the same name as it causes confusion. For example, someone seeing "version" may think it is the name/uri/url for a VERSION element when it is absolutely not. > I would like to use "release" instead. That is, change > das2:VERSION to das2:RELEASE. Still would like it. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Aug 14 17:38:27 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 14 Aug 2006 23:38:27 +0200 Subject: [DAS2] mapping document Message-ID: Been thinking about the response to a writeback. The spec said the server responds with a mapping document saying "uploaded id X is now Y". As per discussion this will now return a features document. Each feature element may contain a new attribute "was" if its URI changed. This happens for one of two reasons: - the client created the feature using the private naming scheme - the server supports versioning and each feature version gets its own identifier Perhaps also "the server's ornery and jest feels like it." I had written the spec so a server could optionally implement type writeback. With this change that is not possible. It's possible to have a new return document which combines features and types (which is very similar to the current writeback spec). However, type writeback was not considered a high priority and none of the servers under development will support such thing. (Correct?) If needed we have extension mechanisms by which that can be supported in the future. questions: - I wrote above that the new attribute is named "was", as in The word "was" is wrong. Otherwise the new version should be "is", and not "uri". Other options are "previously", "old_uri", "prev_uri", "previous_uri", "uri_was" I can't find old discussion on this. Anyone one not like "old_uri" and have a better name? - anyone want type writeback in this version of the spec? if not i'll remove all traces of it from the spec. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Aug 14 18:27:49 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 00:27:49 +0200 Subject: [DAS2] relative URLs and xml:base in the writeback document Message-ID: On the topic of relative URLs ... The writeback document contains FEATURE elements. Because we aren't supporting types I want to change the writeback document so it looks like this Reason for the change ... ... Problem #1: if I lift the existing FEATURE element definition then the uri attributes may contain relative URIs and the FEATURE element may contain an xml:base attribute. We can also have that "WRITEBACK" contains an xml:base attribute. What happens if after all of that the writeback URI is still a relative URL? How does the server convert the relative URL into an absolute one? Does it use the writeback URL as the document base? That's the only one which comes close to making sense, but it doesn't make much sense. No client in its right mind will deconvolute the feature uris to be relative urls with respect to the writeback URL (which, after all, may be on an entirely different machine). I checked the xml:base spec http://www.w3.org/TR/xmlbase/ and it refers to the URI RFC 2396 http://www.ietf.org/rfc/rfc2396.txt These are both defined in terms of document retrieval. Eg, > If no base URI is embedded, the base URI of a document is > defined by the document's retrieval context. This makes no sense in a POST document. I think in this case it's fine to say "URIs in a writeback spec must be absolute URLs". Either they are written as absolute URLs or they are made absolute in the context of some xml:base defined in the writeback delta. What say you all? A. all URIs in writeback must be absolute - don't support xml:base at all B. URIs may be relative but must be absolute once all enclosing xml:base attributes are included C. URIs may be relative and the writeback URL itself is used as the retrieval context My vote is that the server implements B but that clients will all do A. Speaking of which, digging through the xml:base spec and the history of our discussion I see that we are free to define when xml:base is valid. We could use it only on the root element if we so desire. Right now it can be on any element. The reason we have it on every element is from the influence of this blog post: http://norman.walsh.name/2005/04/01/xinclude > Ugh. In the short term, I think there's only one answer: update your > schemas to allow xml:base either (a) everywhere or (b) everywhere you > want XInclude to be allowed. I urge you to put it everywhere as your > users are likely to want to do things you never imagined. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Aug 14 18:32:43 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 00:32:43 +0200 Subject: [DAS2] element identity Message-ID: <63f85e217e45faf272d769ba9f2fd135@dalkescientific.com> again, working on the writeback spec. The writeback spec will look like Reason for the change ... ... The response document will look like this ... ... This FEATURE element is very similar but different than the normal FEATURE element in that it has a new "old_uri" attribute. Does anyone see that as a problem? I don't, but it breaks the guideline we talked about earlier where two XML elements with the same tag must refer to the same thing. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Aug 14 19:54:39 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 01:54:39 +0200 Subject: [DAS2] updated writeback spec Message-ID: <1182e48978effe1454d7350cb9634283@dalkescientific.com> I've updated the writeback spec. Here's the log message > Respond with a modified features document instead of a mapping > document. > > Removed references to type writeback. > > Writeback URIs must be fully resolvable in the document. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Aug 15 09:46:14 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 15:46:14 +0200 Subject: [DAS2] Fwd: can't view XML from DAS2 server in IE4 or Safari Message-ID: <1f5d9f7aa32f2cdd91e60a3867e58037@dalkescientific.com> Oops! Hit "reply-to" instead of "reply-all". Begin forwarded message: > From: Andrew Dalke > Date: August 15, 2006 5:26:03 AM GMT+02:00 > To: "Ann Loraine" > Subject: Re: can't view XML from DAS2 server in IE4 or Safari > >> I'm trying to view the XML delivered from the DAS2 server in Firefox >> or IE4 without having to save it and then load it. >> >> I think this is something to do with the fact that the XML is >> delivered as type application versus XML plain text, which is what the >> DAS1 servers seem to do. > > Yes. It's a 4 year old bug in Mozilla. > https://bugzilla.mozilla.org/show_bug.cgi?id=155730 > > >> Is there a way I can tell Firefox to render the XML directly without >> my having to save it first? > > We've run into this before. I want a way to make this be less > of a problem. > > I propose that if "text/xml" is in the Accept header then the > server should return the das2xml document but with a "text/xml" > content-type. > > I tested that out on my copy of Firefox and it was a happy camper. > It showed the XML tree, though it did complain about the lack > of a stylesheet. Okay, perhaps it was more feeling okay than happy.. > > Of course another possibility is to see the "text/html" there > and show something more presentable to humans, but that makes things > worse for those like Ann who want to see the XML structure. Andrew dalke at dalkescientific.com From boconnor at ucla.edu Tue Aug 15 03:07:03 2006 From: boconnor at ucla.edu (Brian O'Connor) Date: Tue, 15 Aug 2006 00:07:03 -0700 Subject: [DAS2] updated writeback spec In-Reply-To: <1182e48978effe1454d7350cb9634283@dalkescientific.com> References: <1182e48978effe1454d7350cb9634283@dalkescientific.com> Message-ID: <44E17297.7090901@ucla.edu> Hi Andrew, During the last code sprint I used the DAS/2 validation tool you wrote to help debug the das.biopackages.net server. It was very helpful!! Has it been updated to the current spec (v 1.33 2006/04/27 on the website). What is the URL? Thanks --Brian Andrew Dalke wrote: > I've updated the writeback spec. Here's the log message > > >>Respond with a modified features document instead of a mapping >>document. >> >>Removed references to type writeback. >> >>Writeback URIs must be fully resolvable in the document. > > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 From dalke at dalkescientific.com Tue Aug 15 11:16:35 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 17:16:35 +0200 Subject: [DAS2] xlm:base -- fer it or agin' it? Message-ID: <1c606eb000e651a6741eb1b09d30da06@dalkescientific.com> I see three reasonable options (or rather, logically defensible) related to xml:base in DAS2 documents. 1) don't us it at all 2) only have it in the root element of the document 3) have it anywhere in the document (this is the old programming dictum of "the only limits should be 0, 1 and infinity") Pros and cons: #1 is the least confusing. Given relative URL, use the document's url to make it absolute, etc. as per URI spec. #2 This is similar to the restrictions in the BASE element in the HTML header. (Which I've only used once.) It's used most often in saved documents so relative URLs work without needing to rewrite the rest of the document. Take your DOM, and stick the URL in the root node if "xml:base" is not present, otherwise do root.attrib["xml:base"] <-- urljoin(document_url, root.attrib["xml:base"]) #3 This is the most complicated. The main use case mentioned was support for xinclude, which is not something anyone here has said they need. For all I know it may be useful XSLT and other languages. I don't know the XML toolchain well enough. Here is another use case. Consider a registration / aggregation service. It could work by fully parsing everything from each client and making absolute URIs for everything. Or it could do ... That is, it reads the sources document and pulls the SOURCE elements out of the XML. It sticks in the right xml:base (perhaps with a set of joins from the parent elements in the document) and serves the result. No need to parse further. Here's another. Consider a meta-feature server which sucked in primary records from multiple other servers (with permission). It might provide better search capabilities, better ranking, whatever. The features are unchanged. The server wants to return the results as it got them from the original server. Without xml:base it needs to convert all relative URLs into absolute ones ... ... ... ... which requires the server know about all field which are URLs. This precludes support for any extensions which include URL fields because the meta-server won't know about them. OTOH, with xml:base ... ... ... ... and any embedded extensions work w/o problems. Hence I'm fer numb'r 3. Andrew dalke at dalkescientific.com From aloraine at gmail.com Tue Aug 15 11:08:43 2006 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 15 Aug 2006 08:08:43 -0700 Subject: [DAS2] Fwd: can't view XML from DAS2 server in IE4 or Safari In-Reply-To: <1f5d9f7aa32f2cdd91e60a3867e58037@dalkescientific.com> References: <1f5d9f7aa32f2cdd91e60a3867e58037@dalkescientific.com> Message-ID: <83722dde0608150808m15cf3b15g894009ab0ac5fde@mail.gmail.com> Hi Andrew, This sounds great to me! Being able to use my Web browser to show people DAS XML after typing in a URL (teaching) and also to see it myself as I familiarize myself with the URL-building conventions (coding) is a huge plus. It really gets the point across in an accesible and dramatic way. A lot of us started to "get" programming after having friends or colleagues show us the HTML coding underlying Web pages using the "view source" function of Netscape Navigator. I think being able to see the XML beautifully rendered in a browser can have the same sort of function for a lot of people and will help them understand the concept of structured data, the meaning of machine-readable, and good stuff like that. Cheers, Ann On 8/15/06, Andrew Dalke wrote: > Oops! Hit "reply-to" instead of "reply-all". > > Begin forwarded message: > > > From: Andrew Dalke > > Date: August 15, 2006 5:26:03 AM GMT+02:00 > > To: "Ann Loraine" > > Subject: Re: can't view XML from DAS2 server in IE4 or Safari > > > >> I'm trying to view the XML delivered from the DAS2 server in Firefox > >> or IE4 without having to save it and then load it. > >> > >> I think this is something to do with the fact that the XML is > >> delivered as type application versus XML plain text, which is what the > >> DAS1 servers seem to do. > > > > Yes. It's a 4 year old bug in Mozilla. > > https://bugzilla.mozilla.org/show_bug.cgi?id=155730 > > > > > >> Is there a way I can tell Firefox to render the XML directly without > >> my having to save it first? > > > > We've run into this before. I want a way to make this be less > > of a problem. > > > > I propose that if "text/xml" is in the Accept header then the > > server should return the das2xml document but with a "text/xml" > > content-type. > > > > I tested that out on my copy of Firefox and it was a happy camper. > > It showed the XML tree, though it did complain about the lack > > of a stylesheet. Okay, perhaps it was more feeling okay than happy.. > > > > Of course another possibility is to see the "text/html" there > > and show something more presentable to humans, but that makes things > > worse for those like Ann who want to see the XML structure. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From dalke at dalkescientific.com Tue Aug 15 12:49:21 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 18:49:21 +0200 Subject: [DAS2] global reference identifiers Message-ID: <61a3a74f23f83cf89a05055e0bc7e0a7@dalkescientific.com> http://open-bio.org/wiki/DAS:GlobalSeqIDs Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Aug 15 14:04:58 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 20:04:58 +0200 Subject: [DAS2] global reference identifiers In-Reply-To: <61a3a74f23f83cf89a05055e0bc7e0a7@dalkescientific.com> References: <61a3a74f23f83cf89a05055e0bc7e0a7@dalkescientific.com> Message-ID: <409ce859211622e5781c58db5b014da9@dalkescientific.com> > D.melanogaster, C.elegans, and C.briggsae are here, but no > S.cerevisiae, > R.norvegicus, M.musculus, or H.sapiens. > -Allen It's a wiki - feel free to add new ones! :) Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Tue Aug 15 14:57:19 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Tue, 15 Aug 2006 11:57:19 -0700 Subject: [DAS2] Fwd: can't view XML from DAS2 server in IE4 or Safari In-Reply-To: <83722dde0608150808m15cf3b15g894009ab0ac5fde@mail.gmail.com> Message-ID: I agree it is very useful to view XML documents, even though das xml is intended for applications. For whatever reason, some humans (myself included) seem to have a fascination with XML and like to view it, so it makes sense to provide for this. As for viewing das2xml data directly by clicking on das2 server links in Firefox, I have no problem. When you first click on a link returning das2xml formatted data (mime type=application/x-das-*+xml), Firefox should provide a dialog box asking what you want to do with it. Click "open with" and select Firefox itself. Do this for each of the types of das documents and you'll be set. Btw, there are a bunch of different das2xml links available here for testing: http://netaffxdas.affymetrix.com/das2/ If you have already specified that Firefox should save the das2xml data to disk, you should be able to change your preference by going to Preferences -> Downloads -> View & Edit Actions... (this is on OS X with Firefox 1.5.0.6. I don't see any entries for application/x-das* entries in mine; not sure why not, but it's working now, so I don't worry). According to the following article, Firefox will use its default xml handler for any mime type matching application/*+xml (see 'Types of XML' on this page): http://www-128.ibm.com/developerworks/xml/library/x-ffox2/index.html While we're on the subject, there's another recent article in this series on manipulating XML with javascript in Firefox. Might be interesting to try some of these ideas with das2xml data: http://www-128.ibm.com/developerworks/library/x-ffox3/ Steve > From: Ann Loraine > Date: Tue, 15 Aug 2006 08:08:43 -0700 > To: Andrew Dalke > Cc: DAS/2 > Subject: Re: [DAS2] Fwd: can't view XML from DAS2 server in IE4 or Safari > > Hi Andrew, > > This sounds great to me! > > Being able to use my Web browser to show people DAS XML after typing > in a URL (teaching) and also to see it myself as I familiarize myself with > the URL-building conventions (coding) is a huge plus. It really gets > the point across in an accesible and dramatic way. > > A lot of us started to "get" programming after having friends or > colleagues show us the HTML coding underlying Web pages using the > "view source" function of Netscape Navigator. I think being able to > see the XML beautifully rendered in a browser can have the same sort > of function for a lot of people and will help them understand the > concept of structured data, the meaning of machine-readable, and good > stuff like that. > > Cheers, > > Ann > > On 8/15/06, Andrew Dalke wrote: >> Oops! Hit "reply-to" instead of "reply-all". >> >> Begin forwarded message: >> >>> From: Andrew Dalke >>> Date: August 15, 2006 5:26:03 AM GMT+02:00 >>> To: "Ann Loraine" >>> Subject: Re: can't view XML from DAS2 server in IE4 or Safari >>> >>>> I'm trying to view the XML delivered from the DAS2 server in Firefox >>>> or IE4 without having to save it and then load it. >>>> >>>> I think this is something to do with the fact that the XML is >>>> delivered as type application versus XML plain text, which is what the >>>> DAS1 servers seem to do. >>> >>> Yes. It's a 4 year old bug in Mozilla. >>> https://bugzilla.mozilla.org/show_bug.cgi?id=155730 >>> >>> >>>> Is there a way I can tell Firefox to render the XML directly without >>>> my having to save it first? >>> >>> We've run into this before. I want a way to make this be less >>> of a problem. >>> >>> I propose that if "text/xml" is in the Accept header then the >>> server should return the das2xml document but with a "text/xml" >>> content-type. >>> >>> I tested that out on my copy of Firefox and it was a happy camper. >>> It showed the XML tree, though it did complain about the lack >>> of a stylesheet. Okay, perhaps it was more feeling okay than happy.. >>> >>> Of course another possibility is to see the "text/html" there >>> and show something more presentable to humans, but that makes things >>> worse for those like Ann who want to see the XML structure. >> >> Andrew >> dalke at dalkescientific.com >> >> _______________________________________________ >> DAS2 mailing list >> DAS2 at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das2 >> > > > -- > Ann Loraine > Assistant Professor > Section on Statistical Genetics > University of Alabama at Birmingham > http://www.ssg.uab.edu > http://www.transvar.org > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 From Steve_Chervitz at affymetrix.com Tue Aug 15 15:11:33 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Tue, 15 Aug 2006 12:11:33 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day two, 15 Aug 2006 Message-ID: Notes from DAS/2 code sprint #3, day two, 15 Aug 2006 $Id: das2-teleconf-2006-08-15.txt,v 1.1 2006/08/15 19:10:02 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein, Scott Cain Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: Spec updates ------------------- ad: made changes to the writeback spec. nothing serious, stuff we talked about. removed possibility of writeback for types, updated docs. returns back a features document. feature element contains old_uri to refer to previous uri if it changed. Not for a response document. gh: can we freeze it at this point? Like the idea of reusing the feature xml. Hoping to call it frozen for rest of code sprint. ad: do we allow relative urls inside the writeback doc. relative to what? gh: xml:base applies ad: if url is still relative once you get to the top of the document, what happens? gh: free to throw an error ad: so 'application defined'. seems ok. gh: can uri's be local when curation is created on client, you're making up your own id. fully resolvable. ad: it is das_private uri, not a relative uri, no resolvability requirements. aday: order of operations issue with insertion and deletions for features with same id. do a delete-insert or insert-delete? does delete get processed before insert? ad: all deletes go first. aday: are all features required to be processed from top to bottom as well? ad: doesn't specify. aday: natural ordering in the document for feature processing. on creation of a new feature. if it has a das_private feature that is declared in the doc which hasn't been seen before. will cause problems. ad: pref aday: require features to be declared in order so that everything declared below refers to things declared above. ad: not possible for new features. aday: where is type writeback going to go? ad: not to be supported. could use a separate document. gh: fine with not dealing with types now. let's get feature writeback going first. aday: would like to make it extensible. to see how you could create a types writeback. gh: separate document. aday: so writeback for types is a element enclosed in a writeback element. gh: any other issues with writeback spec yesterday? many conversations here after the teleconf. the order of operations thing, and the need to freeze ASAP. ad: b gilman's use of VERSION in two diff places. see my email from yesterday. I proposed using 'release' than 'versioned_source'. too late now to change the versioned element. gh: change name of att Topic: Versioned source -> release --------------------------------- See andrew's email from yesterday. aday: has a working server. will send out url out today, after incorporating latest developments. returns a mapping document. gh: will clean up curation stuff today. figure out how to swap ids out. this is an igb internal release. Topic: Microdeltas ------------------ ad: microdeltas: take the delta of the document we have now, break it up into lots of parts. no big two-hour curation, but server tracks changes as they occur. this way you can track reasons for each change. gh: so curator should push 'save to server' button each time they make an edit. this is up to client to impose this. you have a comment element in the writeback. ad: there is a distinction between changes that computer made vs. human comment - reason why they did a whole set of changes. not sure the reason the resolution. gh: microdeltas might be getting a little more complicated for what we're trying to do. Topic: Coordinates in read spec -------------------------------- gh: questions regarding read. Is allen serving up coordinate stuff? aday: segment coordinate uri? gh: the thing we're supposed to be using to decide whether annotations from two servers are on the same coord system. if uri's for two different versioned_sources match, assume they're the same coord system. lincoln set up names for genomes. gh: haven't implemented part on client that makes use of it. currently using a hard-wired way. ad: on open-bio.org site. wiki. gh: writable nature of server is supposed to be in capabilities section. OK you've got in right place. my bad. gh: locking, not worked on. aday: exclusive lock on table to be modified. other clients wanting to write cannot get it. so it's under the hood, no special reponse. ad: how do we indicate a server supports writeback? I wanted an extension tag, not attribute. haven't looked at recently. gh: can't remember. can a versioned_source have... If a versioned source is writable, can any data on that be editable? yes. ad: why does it make a difference. gh: concerned whether there are certain types of annots that should not be writable, level of distinction (granularity). either you can edit any annotations on that versioned source, or none of them. gh: eg. blast results vs human-made curations. can't edit blast results. ad: I don't thing a single bit flag is good enough. gh: per type? ad: not sure. gh: ok as is. you can have multiple servers, some holding mutable data some holding immutable data. ad: I support writing for some people, some time. user is in charge of figuring out which types on which servers can be changed. gh: client has to be smart -- ie., try to edit then undo it then tell user they can edit. or allow user to edit stuff and find out at commit time if editing is ok (possibly not). ad: ideally would like a way to figure out from server what you can and cannot do on a given versioned source. gh: let's not get into that now. that is the simplest way to go w/r/t to the spec. Topic: Viewing das2xml responses in web browser ----------------------------------------------- See Ann Loraine's email on list about trouble of looking at das2 responses via IE4 and Safari. ee: needs text/xml in order to see it in browsers. ad: viewing xml documents is an extension of das, which was intended for computer communication. aday: some problems with javascript/AJAX making it unusable. must have content-type as text/xml. ad: javascript talking to server can specify what format it wants it back. there's a firefox bug in the '+xml' specification. gh: we are telling it xml, it's aday: there are real clients out there that cannot deal with the advance http headers we are using. ad: format= in query parameter gh: format=xml then content-type in header should be text/xml? ad: not in the spec now. you specify das2xml and get back application/.... bo: could have proxy code that sits in between client and server and convers to text/xml ad: default for web browsers. server could decide to support ajax by allowing format=json. aday: gh: need to say that servers have the option to provide content-type=text/xml if format=xml. we are compliant to content-header spec, some ajax implementations don't handle it properly. ad: if client makes request and string text/xml appears in the accepts header, then server should be free to give back regular das2xml response document but as content-type text/xml? by 'free', meaning not required. gh: some libraries are not compliant with http header content type spec. if servers supports that, then they can return different content types. ad: what is recommendation for this case? aday: for firefox and javascript clients. sc: I have had no trouble with firefox on os x. I can try to troubleshoot Ann's set up. Topic: Dasypus online validation tool -------------------------------------- bo: dasypus validation tool is it up to date? ad: server is down since it hasn't been used for a while. should be up to date. [A] andrew will bring dasypus online validator online. Status Reports --------------- bo: bugfixes on das.biopackages.net server. gh: write back curations, id resolution on client side, igb release today. aday: update/edit/delete, changing response type today ad: relaxNG, getting dasypus server back up, my own das server. ee: getting igb release out today. gff3 parser. sc: working with gregg's new Bprobe1Parser to create new versions of exon array data files, more memory efficient. Will send to gregg for testing. Also updating list of available data on the affy das servers. From allenday at ucla.edu Tue Aug 15 21:15:07 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 15 Aug 2006 18:15:07 -0700 Subject: [DAS2] xml:base and XML::DOM::XML_Base Message-ID: <5c24dcc30608151815t17a13144t54ff11407b94f397@mail.gmail.com> Lincoln, I needed an xml:base resolution module for my writeback code, and there wasn't one available for any of the lightweight XML libs on CPAN, so I wrote an XML::DOM extension. Feel free to use it if you have not already finished your implementation, it should be on CPAN within the next day or so, I just uploaded it. -Allen From dalke at dalkescientific.com Tue Aug 15 21:29:40 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 16 Aug 2006 03:29:40 +0200 Subject: [DAS2] Fwd: can't view XML from DAS2 server in IE4 or Safari In-Reply-To: References: Message-ID: Steve: > As for viewing das2xml data directly by clicking on das2 server links > in > Firefox, I have no problem. When you first click on a link returning > das2xml > formatted data (mime type=application/x-das-*+xml), Firefox should > provide a > dialog box asking what you want to do with it. Click "open with" and > select > Firefox itself. I never would have thought of that. A-ha. It works but it works by downloading the file, saving it to a temp.xml file then doing the equivalent of "Firefox tmp.xml", which opens a new window on a Mac. It doesn't open it in the current window as I would like. And the temp file persists in my download directory. I experimented with content negotiation, where the client may send an accept header to the server with the desired content types. My server supports "text/plain" (fasta), "text/xml", and "application/x-das2segments+xml" Examples below. I did this because I want the documentation to say "If the format parameter is not specified in the query string then the server may use HTTP content negotiation to determine the most appropriate representation. If multiple representations matc then the das2xml version should be returned, if allowed. and leave it at that. This includes the "if 'text/xml' exists in the Accept field ..." solution we talked about earlier. In the "An Annotationed Guide to the DAS spec" then include what what that means and why it's done. The Apache content negotiation strategy is at http://httpd.apache.org/docs/1.3/content-negotiation.html Using that scheme, the following describes possible variants for a DAS service URI: features format: das2xml Content-type: application/x-das2features+xml; qs=1.0 format: xml Content-type: text/xml; qs=0.95 format: fasta Content-type: text/plain; qs=0.95 where "qs" means "quality of service". Apache ranks solutions so "q*qs" is largest. Firefox sends ACCEPT: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/ plain;q=0.8,image/png,*/*;q=0.5 This orders the results as: xml = 0.95*0.9 -> 0.855 fasta = 0.95*0.8 -> 0.76 das2xml = 1.0*0.5 -> 0.5 while Python's url fetcher does not send an Accept and curl sends "*/*". Both of these would cause "das2xml" to be returned over other formats. In other words, the "send 'text/xml' if the client asks for it else send 'application/x-das2*+xml' " is an acceptable way to do conneg. Here's what my test reference server does under different conditions. ## ask for text/plain, which returns FASTA % curl -H "Accept: text/plain" -i http://localhost:8080/seq/fly_v1 HTTP/1.1 200 OK Date: Wed, 16 Aug 2006 00:25:40 GMT Server: CherryPy/2.2.1 Content-Length: 48 Content-Type: text/plain Connection: close >Chr1 ABCDEFG >Chr2 abcdefgh >Chr3 987654321 ## ask for text/xml, which returns the normal XML as "text/xml" % curl -H "Accept: text/xml" -i http://localhost:8080/seq/fly_v1 HTTP/1.1 200 OK Date: Wed, 16 Aug 2006 00:26:02 GMT Server: CherryPy/2.2.1 Content-Length: 435 Content-Type: text/xml Connection: close ## ask for anything under the "application" namespace, with a needless quality factor % url -H "Accept: application/*;q=0.5" -i http://localhost:8080/seq/fly_v1 HTTP/1.1 200 OK Date: Wed, 16 Aug 2006 00:28:13 GMT Server: CherryPy/2.2.1 Content-Length: 435 Content-Type: application/x-das2segments+xml Connection: close ## give an image if it's there, text/plain is next best, then an application % curl -H "Accept: image/*, application/*;q=0.5, text/plain;q=0.9" -i http://localhost:8080/seq/fly_v1 HTTP/1.1 200 OK Date: Wed, 16 Aug 2006 00:34:15 GMT Server: CherryPy/2.2.1 Content-Length: 48 Content-Type: text/plain Connection: close >Chr1 ABCDEFG >Chr2 abcdefgh >Chr3 987654321 In my case the server has multiple text/plain outputs but FASTA always wins over raw. I can force any format with the "format=" option, which ignores the "Accept" header completely. % curl -H "Accept: text/xml" -i 'http://localhost:8080/seq/fly_v1/1?format=raw' HTTP/1.1 200 OK Date: Wed, 16 Aug 2006 00:35:54 GMT Server: CherryPy/2.2.1 Content-Length: 7 Content-Type: text/plain Connection: close ABCDEFG Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Wed Aug 16 13:16:47 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Wed, 16 Aug 2006 10:16:47 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day three, 16 Aug 2006 Message-ID: Notes from DAS/2 code sprint #3, day three, 16 Aug 2006 $Id: das2-teleconf-2006-08-16.txt,v 1.1 2006/08/16 17:05:24 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: Spec Q&A --------------- bo: perusing spec, saw mention of XID as a filter. can I get more explanation? ad: can't remember without looking at docs, but think I was not sure what XID was supposed to be, lincoln sent email to clarify. aday: an external db id trying to resolve into local space, eg., for gene das. ad: don't think there was enough info there to be useful. gh: just uri and nothing else? ad: looking at steve's notes from 16 march. looks like we deferred it. gh: input was minimal. I have no particular use for it. bo: need to know what support to provide for the biopackages server. in the read spec, says "it's not well though-out. should have authority, type, id, description." bo: type vs. exact type gh: did we get rid of exact type? ad: see gregg's email from 16 march: http://lists.open-bio.org/pipermail/das2/2006-March/000655.html The assumption was, there's no type inferencing done on the server. it's just done on the client. we were to rename 'exacttype' to 'type' and use exacttype semantics for it. gh: there is no parent-child structure to types. there is to ontology though. ad: type records in das aren't parent-child relations because they combine other info about type, e.g., ways to depict it. bo: looking for places where our server disagreed with spec. segments feature filter is not supported on our end. overlaps segments. but this is just work we need to do, not a spec issue. gh: allen and lincoln were struggling with xml:base resolution yesterday, looking through the xml:base spec, dealing with edges. are you satisfied? aday: yes gh: for implementes that don't already deal with xml:base resoultion, it may take a day or so to deal with it. nomi and I struggled as well. I was suprised it is not so supported in xml libraries. ad: just a matter of walking up the xml tree. gh: recursively had to verify that the resolve stuff in the java networking libraries actually worked according to the xml:base spec. but we've moved through this. bo: url example, uses 'segment' and 'sequence'. not so consistent. gh: pros and cons to this. it shows that das/2 links can be built using different uris. ad: used different url structures to show that this was possible. bo: confusing when you only see a snippet and don't see where the uri was coming from. showing variety is useful though. gh: are both specs frozen now? ad: yes. Topic: Status Reports ----------------------- bo: went through spec. updated our bug queue. added bug re: passing in id filters vs. uris. working on this today. aday: need to resolve type ids, need to deal with relative ids given in the document. now can go back to working on writeback. gh speaking for lincoln: perl stuff for gbrowser to connect to das servers. went through xml:base abyss. updated uris for sequence and genome version ids for human and mouse on the wiki page: http://open-bio.org/wiki/DAS:GlobalSeqIDs sc: should we allow anyone to edit this, of just lincoln? gh: would like to restrict it. worried about wiki graffiti. ad: you have to register. we can always back things out. sc: lincoln will get notification upon any edits. gh: ok. gh: working on igb release. adding parsing abilities. can now focus on das/2, mostly writeback stuff, refining that in igb client. ee: finishing up bugfixes before igb release. will start on gff3 parser today. ad: looked into content negotiation stuff. why validator server on open-bio site isn't working: I updated underlying webserver framework. working on that. sc: worked on creating new data files used by the affy das server for exon arrays using gregg's new parser. gh: this is generating more efficient versions of probe sets for exon arrays. important since the affy das server is in-memory. sc: this will help us support more arrays in the das server and also move away from having to maintain two different das servers, so we can focus on just the das/2 server. sc: also working on final touches on web page describing available data on our das servers. gh: we can modify xml from the server to point at that page as an info url. sources element has info url, and sub elements as well, but we can just put the info page at the top level. sc: also was working on ann's fly data project, where she needs to pull genomic regions relative to probe sets. we need to update our das alignment file (link.psl) to be based on dm2. gh: we don't provide residues. she'll have to do a das/1 query at ucsc to get residues. From dalke at dalkescientific.com Wed Aug 16 15:17:04 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 16 Aug 2006 21:17:04 +0200 Subject: [DAS2] validator working again Message-ID: Silly me, I went and upgraded the TurboGears package used by the validator. From 0.8* to 0.9*. There were differences, and one (a unicode encoding problem) quite subtle. The validator is up and running. http://cgi.biodas.org:8080/ Let me know of any problems. Andrew dalke at dalkescientific.com From allenday at ucla.edu Wed Aug 16 16:35:04 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 16 Aug 2006 13:35:04 -0700 Subject: [DAS2] new writeback URI Message-ID: <5c24dcc30608161335n267201a7w1ef5221ceb9fcdc5@mail.gmail.com> Hi, You can POST writebacks for the http://das.biopackages.net/das/genome/human/writeback/ vsource here: http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml-parser/stable2.pl The returned document will either be an element, or a element, depending on what was POSTed. I will update the relevant sections in the main sources/source/vsource docs on the biopackages server. I will send another email when the response document is up-to-date with the latest specification revisions -- I'm under the impression I just have to return das2xml for all updated and created features instead of returning the previously specified element. -Allen From dalke at dalkescientific.com Thu Aug 17 08:14:59 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 17 Aug 2006 14:14:59 +0200 Subject: [DAS2] content-negotiation, conclusion Message-ID: <20680849e0825c09bdd56f215da74f1a@dalkescientific.com> After experimenting with a content-negotation implementation and trying it out under different circumstances I've come to the conclusion that the errors are too subtle and hard to debug in the generic case. Quoting from http://norman.walsh.name/2003/07/02/conneg > At this point, we're about eleven levels farther down in the web > architecture than any mortal should have to tread. On the one hand, > content negotiation offers a transparent solution to a tricky problem. > On the other hand, the very transparency of such solutions makes them > devilishly hard to understand when they stop working. Even for the limited case of DAS2 where we want web browsers to see "text/xml" instead of "application/x-das*+xml" it's just not possible. It turns out Safari only uses "*/*" in the Accept header. I do not want a system which gives different results when viewed in different browsers. Ann? How about this solution to your case - we'll have a "xml" format defined as being the same as "das2xml" but returning a "text/xml" header. Or perhaps a "html" format designed for people. When you are showing people how DAS works, and if the browser doesn't understand the */*+xml content type as being in XML, then you can say "oh, add 'format=html' to the URL to see it in HTML". The spec will look like: If the format is not specified in the query string then the server must return the document in das2xml format (or fasta format for segment records) unless the client sends an Accepts header with a mime-type starting "application/x-das-". In that case the server may implement HTTP content-negotiation. HTTP content-negotiation is an experimental feature in DAS2 and is not required in the client nor the server. Structured this way there's no way a generic browser can trigger conneg with a das2 server. Only das-aware clients can do it. This gives room for future experimentation. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Aug 17 08:29:19 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 17 Aug 2006 14:29:19 +0200 Subject: [DAS2] default format for a single segment Message-ID: <593e57347c2de14ed36df1ef69fd9f5c@dalkescientific.com> Two proposals here: 1) change the default format for a single segment request from FASTA -> das2xml 2) add optional elements to each segment == Proposal 1 === Currently every DAS2 service returns an application/x-das-*+xml document by default except for the segment document. A request for on a segment URI returns its FASTA sequence. I would like to change that. I would like the segment document by default to return a das-segment document. For example, if this is the segments document then doing the request for "segment/chrI" should return == Proposal 2 == My server implements a "raw" sequence format which contains only sequence data and does not even contain the FASTA header. The raw format only works for a single segment and not for the list of segments. In the current spec the "FORMAT" entry is somewhat ambiguous. Does it work for the set of segments or for a single given segment? That is, segments?format=das2xml --> the segments document for all of the segments segment/chrI?format=das2xml --> the segments document for a given segments segments?format=fasta --> all sequences, in FASTA format segment/chrI?format=das2xml --> the FASTA sequence for the given segment However, segments?format=raw makes no sense. No one will use that one for real. I propose that the SEGMENT elements also get an optional FORMAT element which looks like this The formats for a given segment are the union of its elements and those in the top-level. That is, each segment here implements "raw", "fasta" and "das2xml" formats. Andrew dalke at dalkescientific.com From lstein at cshl.edu Thu Aug 17 12:01:18 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 17 Aug 2006 12:01:18 -0400 Subject: [DAS2] Notes from DAS/2 code sprint #3, day three, 16 Aug 2006 In-Reply-To: References: Message-ID: <6dce9a0b0608170901t44c6e074q5ca24e5fd2cacc72@mail.gmail.com> What's the conference call number? Lincoln On 8/16/06, Steve Chervitz wrote: > > Notes from DAS/2 code sprint #3, day three, 16 Aug 2006 > > $Id: das2-teleconf-2006-08-16.txt,v 1.1 2006/08/16 17:05:24 sac Exp $ > > Note taker: Steve Chervitz > > Attendees: > Affy: Steve Chervitz, Ed E., Gregg Helt > Dalke Scientific: Andrew Dalke > UCLA: Allen Day, Brian O'Connor > > Action items are flagged with '[A]'. > > These notes are checked into the biodas.org CVS repository at > das/das2/notes/2006. Instructions on how to access this > repository are at http://biodas.org > > DISCLAIMER: > The note taker aims for completeness and accuracy, but these goals are > not always achievable, given the desire to get the notes out with a > rapid turnaround. So don't consider these notes as complete minutes > from the meeting, but rather abbreviated, summarized versions of what > was discussed. There may be errors of commission and omission. > Participants are welcome to post comments and/or corrections to these > as they see fit. > > > Topic: Spec Q&A > --------------- > > bo: perusing spec, saw mention of XID as a filter. can I get more > explanation? > ad: can't remember without looking at docs, but think I was not sure > what XID was supposed to be, lincoln sent email to clarify. > aday: an external db id trying to resolve into local space, eg., for gene > das. > ad: don't think there was enough info there to be useful. > gh: just uri and nothing else? > ad: looking at steve's notes from 16 march. looks like we deferred it. > > gh: input was minimal. I have no particular use for it. > bo: need to know what support to provide for the biopackages server. > in the read spec, says "it's not well though-out. should have > authority, type, id, description." > > bo: type vs. exact type > gh: did we get rid of exact type? > ad: see gregg's email from 16 march: > http://lists.open-bio.org/pipermail/das2/2006-March/000655.html > > The assumption was, there's no type inferencing done on the > server. it's just done on the client. we were to rename 'exacttype' to > 'type' and use exacttype semantics for it. > gh: there is no parent-child structure to types. there is to ontology > though. > ad: type records in das aren't parent-child relations because they > combine other info about type, e.g., ways to depict it. > > bo: looking for places where our server disagreed with spec. segments > feature filter is not supported on our end. overlaps segments. but > this is just work we need to do, not a spec issue. > > gh: allen and lincoln were struggling with xml:base resolution yesterday, > looking through the xml:base spec, dealing with edges. are you satisfied? > aday: yes > gh: for implementes that don't already deal with xml:base resoultion, > it may take a day or so to deal with it. nomi and I struggled as > well. I was suprised it is not so supported in xml libraries. > ad: just a matter of walking up the xml tree. > gh: recursively had to verify that the resolve stuff in the java > networking libraries actually worked according to the xml:base spec. > but we've moved through this. > > bo: url example, uses 'segment' and 'sequence'. not so consistent. > gh: pros and cons to this. it shows that das/2 links can be built > using different uris. > ad: used different url structures to show that this was possible. > bo: confusing when you only see a snippet and don't see where the uri > was coming from. showing variety is useful though. > > gh: are both specs frozen now? > ad: yes. > > > Topic: Status Reports > ----------------------- > > bo: went through spec. updated our bug queue. added bug re: passing in > id filters vs. uris. working on this today. > > aday: need to resolve type ids, need to deal with relative ids given > in the document. now can go back to working on writeback. > > gh speaking for lincoln: perl stuff for gbrowser to connect to das > servers. went through xml:base abyss. > updated uris for sequence and genome version ids for human and mouse > on the wiki page: http://open-bio.org/wiki/DAS:GlobalSeqIDs > > sc: should we allow anyone to edit this, of just lincoln? > gh: would like to restrict it. worried about wiki graffiti. > ad: you have to register. we can always back things out. > sc: lincoln will get notification upon any edits. > gh: ok. > > gh: working on igb release. adding parsing abilities. can now focus on > das/2, mostly writeback stuff, refining that in igb client. > > ee: finishing up bugfixes before igb release. will start on gff3 > parser today. > > ad: looked into content negotiation stuff. why validator server on > open-bio site isn't working: I updated underlying webserver > framework. working on that. > > sc: worked on creating new data files used by the affy das server for > exon arrays using gregg's new parser. > gh: this is generating more efficient versions of probe sets for exon > arrays. important since the affy das server is in-memory. > sc: this will help us support more arrays in the das server and also > move away from having to maintain two different das servers, so we can > focus on just the das/2 server. > > sc: also working on final touches on web page describing available data on > our das servers. > gh: we can modify xml from the server to point at that page as an info > url. sources element has info url, and sub elements as well, but we > can just put the info page at the top level. > > sc: also was working on ann's fly data project, where she needs to > pull genomic regions relative to probe sets. we need to update our > das alignment file (link.psl) to be based on dm2. > gh: we don't provide residues. she'll have to do a das/1 query at ucsc > to get residues. > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Thu Aug 17 11:59:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 17 Aug 2006 11:59:47 -0400 Subject: [DAS2] xml:base on biopackages still not quite right In-Reply-To: References: Message-ID: <6dce9a0b0608170859o7d22ef3cnc6cacf4579a7e305@mail.gmail.com> Hi, I'm getting an incorrect xml:base on the segments request: % GET http://das.biopackages.net/das/genome/human/17/segment ... The problem is that the xml:base ends with a slash, so the synthesized URIs are http://das.biopackages.net/das/genome/human/17/segment/segment/chr1 Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Gregg_Helt at affymetrix.com Thu Aug 17 14:39:40 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 17 Aug 2006 11:39:40 -0700 Subject: [DAS2] DAS/2 writeback capability vs. writeable attribute Message-ID: In the current writeback spec, the ability of a server to support writeback is indicated by: under the versioned source element. However, the retrieval spec talks about both the writeback capability element and a "writeable" attribute for the versioned source element. I think the "writeable" attribute can be removed, since the capability provides all the needed information. The current writeback spec doesn't mention this "writeable" element at all. gregg From dalke at dalkescientific.com Thu Aug 17 15:26:20 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 17 Aug 2006 21:26:20 +0200 Subject: [DAS2] DAS/2 writeback capability vs. writeable attribute In-Reply-To: References: Message-ID: gregg: > However, the retrieval spec talks about both the writeback capability > element and a "writeable" attribute for the versioned source element. > I > think the "writeable" attribute can be removed, since the capability > provides all the needed information. The current writeback spec > doesn't > mention this "writeable" element at all. This was up for debate during the last sprint and we decided to keep things as they were until we got to writeback. Which is now. :) I agree with you. Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Thu Aug 17 18:18:21 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 17 Aug 2006 15:18:21 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day four, 17 Aug 2006 Message-ID: Notes from DAS/2 code sprint #3, day four, 17 Aug 2006 $Id: das2-teleconf-2006-08-17.txt,v 1.1 2006/08/17 22:15:30 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CHSL: Lincoln Stein Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: Status Reports ---------------------- ls: Perl interface in good shape. reorg'd to get parser based on content type dynamically. response comes in, figures out what parser to use, returns the objects, should be extensible for other formats. main task todo is to implement the feature object so that I can actually return features. now parser is there, object is not. Not a Bio::SeqFeatureI object, in order to work with gbrowse and other parts of bioperl. some issues with biopackages with xml:base, sometimes slashes there that shouldn't be and vice versa. segments request has extraneous / at end, so it has 'segments' repeated twice. didn't try to fetch to see if would work, but looks like a bug. gh: regarding parent-child relationships between features: if they have parent, need to point to it, if they have children need to point to them. ls: parsing with sax, I'll know when an object is complete. will create a feature stream and start returning features as the parse is coming across. threaded, so you can have multiple streams going simultaneously. gh: more issues with parent child hierarchy. will wait for allen to arrive before discussing. Topic: Spec issues ------------------ ad: working on content negotiation, but now is not right time to do it. in sequence doc, default doc should be das2-segments. sc: xml:base issue -- where do we allow it (0, 1, infinity)? gh: our policy is that we follow the xml:base spec. ad: if you use it, use it everywhere. gh: my parser is looking for it where it everywhere. ad: my email explains why you might want to use it on multiple features. eg., combining data from different servers. sc: what about brian gilman's issue, when you get to root what if xml:base is still relative? ad: uri spec defines how to define relative urls, e.g., get it from document. gh: relaxNG says it can be anywhere. I think it should therefore be allowed anywhere. ad: right now all services returns an xml object file except segment request -- fasta file. would like to return xml. sc: this is along the lines of what I proposed a while back. I like it. See discussion under this thread: http://lists.open-bio.org/pipermail/das2/2005-December/000395.html ad: formats per-segment basis. current scheme only defines per-everything basis. propose have each segment also has it's own format. each segment can have alt formats. (see ad's email from today on this topic). gh: like it. it means that a server doesn't have to know about all residues. ad: for case of reference server, we guarantee that it supports fasta sequence. affects other servers, not just reference server. gh: I like that flexibility. any objections? [silence] gh: if you return the segments doc we now have, you are only serving up xml. if you want to return fasta, you need to return a format element. ls: is there a way for client to determine what it will get? gh: in the segments document, returned back from reference server. client can specify format defined there. ls: not impl yet, just a proposal? gh: yes. another plus is the ability to specify more efficient binary formats too. Topic: Ann's issue on content-type ---------------------------------- gh: server has option to specify that you can return things as text/xml, but still send das2xml format. ad: content negotiation doesn't work to allow the browser to view XML. only works for clients that can do content neg, not general clients (e.g., safari). I tried two different browsers, got two different results. [A] Ask Ann Loraine if this solution is sufficient. Topic: Writeback issues ------------------------- aday: problem writeback. creating new feat or update existing feat. if it's a new feature, das_private uri scheme has no info about source or versioned source that the feature is intended to be written to. This is not necessarily a problem, could be a different uri post. But it is a problem when parsing and it's possible for parents or children to be attached to the feat and they are not the source/vsource combination. make sense? ad: every feat has unique id. could do it by saying when you see this id, it corresponds to this segment or this versioned source. ad: feature comes from NCBI but is being posted to affymetrix. gh: I talked about this as a use case for the grant. Example: snps being served by an authority (dbSNP) and people are trying to create their own haplotype blocking structure. you want them to be able to point to the authority for the leaf features (snps, children). so you can have one server serving up haplotype blocks, and points to snps that reside on another server that is the authority. right now in the spec, can't do that because of the bidirectional parent-child stuff. you'd have to point the snps at the authority to the new stuff. ad: could have parent-child relationships that are incorrect. all parents connected together are places you can get to. has to be a single root. gh: due to that and the bidirectional stuff, we can't support my use case, also can't build features from multiple servers to construct curations. ad: can do it in datamodel. I point to features over there. gh: in xml it can't be done. ad: also means that, you have to keep requesting features over and over again. you have to do at least one request for every feat. gh: even if we have these restrictions, how can we enforce them with das-private id. aday: the document is not enough to tell you if the parent being associated with a feature is valid. you have to know more. aday: it's only these das-private ids that are a problem, you cannot know where it came from or where it's to be written to. the child-parent pointers are not a problem. gh: post to a writeable das server with das-private id, it means the feature is to be written on that server. aday: new document comes in, you don't know where to write them to. gh: which writeable server are they to be written to. ad: there will be a different distinct url. gh: client is aware of 5 different writeback servers, which one do I write to. this is a client issue. it should present options to the user and let them select. aday: what about creating a hybrid feature? gh: it's a totally new curation. ad: what if you want to have one writeback url for several dbs on the server? gh: i would say no. aday: you need to know what is the context of the write. gh: for server, it knows, for client. aday: so are we saying that the document does not need to be validatable when standalone (ie, outside the context of the server)? there is not enough information to know whether some features being grouped together should be. I upload this document to xxx, is it be loadable? gh: i dont' see that as an issue. we have validation issues with read document as well. the validators don't go into the uris of each feature and see if they come from same server. aday: if absolute, yes, but if all relative. as long as all relative, you can tell if compatible. gh: if you have document element was retrieved from, it's relative to that. if not, it's application-specific, which in our case means punting. validator can't guarantee that certain uri's are compatible. to do that, it would have to know how to resolve every uri, and they don't need to be url's. nobody knows how to resolve every uri. what that means is that the server will have to reject the post if it sees uris that it doesn't recognize them. aday: or, that it sc: how does server know if uri's are compatible? gh: for posts, those features have to be coming from that server aday: adding new exon to transcript that already exists in db, can I give you the new exon and pointer to transcript? get's into uri compatibility issue. I have exon whose parent I don't have access to (on remote server). could I do an external request on the parent, figure out it's location, close it, send xid to parent on remote server. ad: would say it's legal but you have to pass in the complete feature record. gh: the legality is in the document that is being posted. you have parent-child resolvability back up to the root. that's the requirement now. gh: is it worth considering relaxing our bidirectional closure requirement? ad: makes parsing harder. have to wait to very end. takes lots of time, memory. gh: use case you have, you need parent. we could relax it to require parent-child (as needed for my use case). but for Allen's case you need child-to-parent pointers. ad: using xid gh: xid's are free form. how do you know that it means x was derived from y? there's no way to represent that in our xml. it's open to interpretation by client and server. ad: in the xid have one of them be the type, constrained vocab, so you know what kind of link it is. keyword 'rel', this means get css, rss.... also the xml-link stuff steve mentioned a while ago. gh: would require some significant rejiggering to resolve it. ad: can we do it by having a new feature type, of it's own vocabulary. gh: if you do this in one client, it does this by cloning, it looks to user you are doing it from different servers. write to client. another one reads it, and it has no way of know that it was derived from the two different sources. gh: for now, you can only point to newly created features or features coming from the server you are posting to, for feature ids. need to know more about evidence trails, to know more about what info they need to preserve. [A] talk to curator pro (nomi) about what evidence to save when creating/modifying feats ad: new type: external-feature-reference, do a new element at end of record. doesn't require a new format. gh: it's outside the spec right now, allen doesn't have to support it. extra xml in the document to describe the relationship. e.g., a derived-from element. it's doable, but I don't think it should be in the current spec. ad: can be done without making backwards incompatible changes to the current spec. aday: now I get free reign to validate the way I want to. I will be liberal in what I reject. gh: end of the spec issues we were looking at yesterday. Topic: Status report -------------------- ee: started working on gff3 parser for IGB. bo: feature filtering. using full uri's not just 'chr2'. going through biopackages.net server checking if it is up to spec. coordinates issues, mapping document, stored in extra file. gh: reference to each segment. aday: writeback server able to do delete and update now. fixed bug reported by andrew. name based query was not returning parents. gh: lincoln mentioned xml:base problem. segment/segment/ bo/aday: fixed this. aday: started impl a new server that takes any arbitrary range request. performs modulus on range request. you know that there is only certain blocks being requested, so you can use a cache. does it satisfy requested range, and return that. I always do children before parent. inserting hints on the thing that does backend parsing. gh: are you supporting multiple parents of children (e.g., multiple transcripts that share an exon)? aday: a good question. I keep track of children and multiple locations of children and then I given parents after that. after the grooming, I can have multiple hints, 'this is the end of this 15mb block'. all parents are presented. then all of my comments would be presented. gh: got out IGB release, but had to recall it, since it broke things. verifying I can write back to new and improved writeback server. if you post to a writeback server, that's also the address you should be using to get the.... a versioned source with a writeable attribute. I should be able to use that same source to both write to and retrieve from. aday: you can't retrieve gh: I have to use two different urls to do retrieve and posts. The way I think it should work: anything you write to you should be able to do retrieval as well. aday: writeable=yes attribute, and go over here and write. should be ok. thinking about using redirection under the covers. gh: resolving new ids mapping to das-private ids, editing is working on client side. sc: worked on info page for affy das servers. Generating new drosophila alignment data for Ann. gh: had trouble hooking up exon chp data with new binary formatted exon data you generated (gregg's new bp2 format for exon data). could be that I have only control probes and they are not in your data. [A] steve will check to see if there are any control probes in the exon array data. ad: I got the validation server back up and running. will work on sequence retrieval spec. question: does spec guarantee that seq will be upper or lowercase? gh: no, fasta can be either. gh: spec docs don't have date stamp, eg, writeback document. this is useful to see if it has been updated. [A] andrew will put date stamp back in spec docs that don't have it. From dalke at dalkescientific.com Thu Aug 17 19:16:49 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 01:16:49 +0200 Subject: [DAS2] SEGMENTS does not have a "uri". Message-ID: <6b674ebdfbd129ae2d20686f9ba174e4@dalkescientific.com> I just noticed that the SEGMENTS element in the segments document does not have a "uri" attribute. That doesn't seem right so I added it to the schema. I committed that and the change for FORMAT elements under each SEGMENT. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Aug 17 19:24:12 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 01:24:12 +0200 Subject: [DAS2] das2 reference server Message-ID: <1d186e51a531d835afb67910a28f5f36@dalkescientific.com> Here's the experimental reference server I've been working on. http://cgi.biodas.org:8081/seq/?format=html The entries for # fly_42 * 2L (22407834 bases): * 2R (20766785 bases): * 3L (23771897 bases): * 3R (27905053 bases): * 4 (1281640 bases): * X (22224390 bases): # fly_43 * 2L (22407834 bases): * 2R (20766785 bases): * 3L (23771897 bases): * 3R (27905053 bases): * 4 (1281640 bases): * X (22224390 bases): # worm_160 * I (15072418 bases): * II (15279314 bases): * III (13783677 bases): * IV (17493785 bases): * V (20919396 bases): * X (17718851 bases): * Mit (13794 bases): # worm_161 * I (15072418 bases): * II (15279314 bases): * III (13783677 bases): * IV (17493785 bases): * V (20919396 bases): * X (17718851 bases): * Mit (13794 bases): # worm_162 * I (15072418 bases): * II (15279314 bases): * III (13783677 bases): * IV (17493785 bases): * V (20919396 bases): * X (17718851 bases): * Mit (13794 bases): should be real. The others are part of my test set. It even validates. Amazing that. One thing - I've created a new document type which lists all of the "segments" documents available from the reference server. (nomenclature: I'm using "assembly" to mean "a collection of segments". I know, it isn't really an assembly. I'm using it for now because I didn't like using "segment", "segments", "segments_list" instead using "segment", "assembly", "assemblies" Ideas on a better name? ) This should be a sources document instead. Haven't gotten there yet. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Aug 17 19:26:43 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 01:26:43 +0200 Subject: [DAS2] SEGMENTS does not have a "uri". In-Reply-To: <6b674ebdfbd129ae2d20686f9ba174e4@dalkescientific.com> References: <6b674ebdfbd129ae2d20686f9ba174e4@dalkescientific.com> Message-ID: <6dde1e7303796ce01fa641b98dd254d4@dalkescientific.com> > I just noticed that the SEGMENTS element in the segments > document does not have a "uri" attribute. That doesn't > seem right so I added it to the schema. Shouldn't the SEGMENTS element also have an optional "reference" attribute? Take a look at http://cgi.biodas.org:8081/seq/fly_42/?format=html to see a real-world record. It feels like there should be a reference="http://www.flybase.org/genome/D_melanogaster/R4.2" in there some place. Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Thu Aug 17 19:37:00 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 17 Aug 2006 16:37:00 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day four, 17 Aug 2006 In-Reply-To: Message-ID: Following up on a side-topic that came up briefly in morning's teleconf, > aday: now I get free reign to validate the way I want to. I will be > liberal in what I reject. here's a post I made to a thread on the bioperl list last regarding aberrant fasta files (another reason why to not standardize das/2 sequence responses on fasta format): http://bioperl.org/pipermail/bioperl-l/2005-July/019407.html Another cited source of this philosophy is from the TCP spec (section 2.10) as the Robustness Principle: Be conservative in what you do, be liberal in what you accept. http://www.faqs.org/rfcs/rfc793.html I actually think it has wider appeal beyond software design or electronic devices, but I'll save that discussion for later... Steve From dalke at dalkescientific.com Thu Aug 17 19:59:32 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 01:59:32 +0200 Subject: [DAS2] Notes from DAS/2 code sprint #3, day four, 17 Aug 2006 In-Reply-To: References: Message-ID: > [A] Ask Ann Loraine if this solution is sufficient. I tried calling her cell number but got a fax machine (or an old modem). Perhaps I have the wrong number ? > [A] andrew will put date stamp back in spec docs that don't have it. Done. Also, for some reason das2_stylesheet was never added to version control so I went and did that too. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Aug 17 20:31:57 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 02:31:57 +0200 Subject: [DAS2] Notes from DAS/2 code sprint #3, day four, 17 Aug 2006 In-Reply-To: References: Message-ID: <0e9f6ceda2ecb02e53d5c23c248fd5cf@dalkescientific.com> > here's a post I made to a thread on the bioperl list last regarding > aberrant > fasta files (another reason why to not standardize das/2 sequence > responses > on fasta format): > > http://bioperl.org/pipermail/bioperl-l/2005-July/019407.html In it you said: > I found a recent presentation on the FCC site showing results of a > survey > about whether part 15 stifles innovation (10/14 respondants said no, > and 9/5 > said more stringent regulations might even permit *more* innovation): Okay, I downloaded the PPT. Those questions are biased. 1. asks "is it too limiting" and doesn't ask "is the current standard okay" or "is it too lenient." Consider the population sample of existing members of the technology advisory committee. What selection bias is present there? 1b. Could more stringent regulations, insuring that there will be no unknown types of interference, permit additional innovation? Note the "could", not "would .. likely increase innovation". This could be answered "yes" if there's only a 5% change of it happening. 2. "Should the FCC deal with interference issues with licensed services in a different way." Okay, I agree with that one. Depends on what "different way" means though. 3a. .. I still don't know what a Part 15 device is. Does that include wireless? does it include interference from when I nuke something? 3b. Can home users be guaranteed that there will be no interference from, or to, users in nearby homes or apartments? Huh? Even with FCC Part 15 or whatever there's no guarantee. There's no guarantee on anything. Someone else could pull the cover off an old computer causing extra interference. Of course the answer to this is "no". Even under threat of capital punishment there's no guarantee. 4. In a spectrum with no rules, can individual users be assured of effective communications? What does "no rules" mean? Does existing wireless service count as "no rules"? Yet it "is certainly innovative." BTW, I think the FCC should allow micropower radio stations. Those are not allowed because of the concern that the stations would interfere with larger commercial stations. I don't think those are technically valid. I think they are more to preserve the investment made by commercial station owners. I also don't think the FCC regulates noise from commercial stations well enough, and lets problems persist for years. > Another cited source of this philosophy is from the TCP spec (section > 2.10) > as the Robustness Principle: > Be conservative in what you do, be liberal in what you accept. > > http://www.faqs.org/rfcs/rfc793.html I remember now this came up in bioperl in .. 1999? I was complaining about file formats. Ewan mentioned that principle. My complaint was that bioperl's (and others') parsers are usually quite liberal, but so is the output format generation. Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Fri Aug 18 15:15:33 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Fri, 18 Aug 2006 12:15:33 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day five, 18 Aug 2006 Message-ID: Notes from DAS/2 code sprint #3, day five, 18 Aug 2006 $Id: das2-teleconf-2006-08-18.txt,v 1.2 2006/08/18 19:14:11 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: Spec concerns --------------------- ad: segments doc (not 'segment') top-level element is missing three fields, one is uri (I added). second is reference (a collection corresponding to a dataset). seemed less useful since it's already mentioned in vsource document. I added id to schema, not spec yet. last thing: missing a doc_href, for each segment ok, but we can't say, here's doc for human. gh: optional? ad: yes. gh: if optional doesn't change server impl. uri for segments is specified in segment capability. gh: my only objection is spec churn. gh: question about writeback spec: what you're supposed to do if you remove an exon from a txt, you are supposed to have a delete element in post that deletes that id. ad: yes gh: if you just have that delete, does that force parent to remove it's child, or do you also have to have the parent in there? ad: everything in that relation has to be sent. gh: in that example, if you have a delete for that exon, you have to return the rooted hierarchy as well with txt not having that part element. ad: yes gh: what if you create a curation with three exons in it, you then decide to delete the middle exon. server gets post with same annotation, but exon is missing and parent is not pointing to it as a part. is that legal? ad: nothing that says delete? gh: no ad: i think it should be illegal. if you have three generations. grand parent and grand child with no intermediate. also illegal. gh: server will have to catch these things. ad: easy. just check whether all ids involved are representing something on the server, if so, you delete old, update new. gh: allen, will your server catch this? aday: if you modify something, it already has to check before it gets deleted, i can just reject it. now I say, you modified it, here are the things that are modified by your request. gh: [drawing] d:a-----b-----c -> d:a----------c , b read this as: transcript d has exons a,b,c three exons attached to a txt, never indicated that anything was deleted, I just re-wrote the feature as a--------c gh: this should throw an error, since you didn't explicitly delete b. aday: what's wrong with leaving d dangling? ok to not mention the missing exon ad: one is to keep it there, one is to delete automatically, gh: if keep does it have pointer to parent? that's enough to tell db it's not connected? aday: yes, it becomes an orphan. you should get back a message, "hey you affected all of these features." so client can see what your modification affected. you'll know from response what was affected by deletion you performed. gh: if you now submit a new transcript named e containing a and c: e:a-----------c ad: so annotation 'd' will come back as saying, "was deleted" aday: my response tells you everything that needs to be updated. you might see things that need to be cleaned up that weren't expected. ad: python maxim: when in doubt, refuse temptation to guess. you're guessing it makes sense to leave orphans around ad: if it's ambiguous, should be not supported. gh: from allen's side, it might be hard to catch and call error. aday: no I can catch. i track all changes caused by client request. I have to track all changes made, see if it was present in the submitted document, if not, an error. just another level of tracking. can do. gh: if this is what you wanted to do, client would submit, write b (with no txt as parent), write d with txt as parent. and no delete to get this d:a-----b-------c -> a-----------c + b gh: if you really want to get rid of children, you need to specify both parent and child. gh: approach on client. I do on client. curational model is that you are never really editing locations or parent child relations ships, you are just making successors, so I keep this version chain. not deleting old ones on server (that is the plan though). aday: every edit does a delete and create on server. that's very transactional. can you keep track of it in memory. gh: yes. user has to request writeback. any number of edits between one and the next. once you've committed you can rollback on client. aday: everything is pruned off in client? gh: no you need redo. aday: redo is not considered saved unless you save again. gh: if you re-edit after a undo, you can't redo. no branching. aday: just keep track of recent save point. gh: todo: keep modification dates. so if there were no edits since the last save then there's no need to write back to the db again. gh: if you want something deleted, you must explicitly do it. if you want to delete it do this: * delete b * write d:a------------c if you want to orphan it do this: * write b with no parent * write d:a------------c Topic: semantics of insides and overlaps as they relate to parent-child ------------------------------------------------------------------------- gh: this is a continuation from yesterday's discussion we had offline. bring up spec, feature filters. see part that says, "any part of a complex feature that is one with parents... then all parts are returned". that's wrong. you do an insides query, you only get back things that are inside. two exons in a txt, one is inside, one is not inside. ad: gh: if it has no location, it's never going to be returned by a range query. ad: by type q gh: if multiple locations on the feature.if one of those locations is inside the range query it passes. sc: gh: not the same as multiple locatsion -- aligns to multiple places in the genome. top level parent of a feat hierarchy must have a location that passes one of the location in the range query. one of the locations has to pass the range filter. and it is at the top level of the hierarchy. aday: think of this: locations are cols in matrix, filters are rows. in order for column to qualify, the entire row must be true. ad: different people may have modeled it differently. may get only part of it back. gh: if two servers model the same data differently you may get different answers back. that's the way it goes. ad: annotation contains features. returns all annotations that match the query. gh: don't add notion of some other object that is sort of a feature, but is really a group of feats. aday: i call it a feature group. range filters operation on the group. gh: we don't need to have a special designation. it's just a feature with no parents. what your're calling a feature group. aday: all things under the parentless feature is the group. ad: yes aday: not identical to the root, it's the root plus all attached things. gh: to clarify things in the spec, maybe call it annotation/feature group, maybe ok. ad: all things connected by a parent-part relationship. return the entire feature group. gh: change: root of the feature hierarchy matches (range filters) the root of the group has to pass all the feature filters in the range query. ad: you want the root to be guaranteed to have locations if any sub feats have location. featureless roots. aday: no way to retrieve based on location. weird. parent with no location. gh: not weird. bounds of gene are fuzzy. they'll spell out bounds of exon but not the gene we can say the highest level with location. we can say that if children has location, then parent has. ad: put all children ranges in the root. gh: ok. no children should never have locations outside their parent. ad: old conversation: is this single or multiple rooted. single is easier to understand. but there is a use case for multiple locations. now we say the single root must be union of all it contains. gh: inclusive, not necessarily union. ad: software check will be needed gh: you don't want someone submitting exons that are outside bounds of a transcript. dangerous to have children outside location of parent. aday: true for bioperl ad: for only root, or intermediate? aday: every intermediate gh: only acceptible if you want to punt on location of upper level thing whose location isn't well understood (gene). aday: feature 100-200, locationless thing attached to it.. gh: if you have locationless, they need to be locationless up to the root. maybe we should not allow that for now. if you have a locationless feature, it's locationless all the way down and all the way up. meets requirement for gene das. ad: don't understand why this restriction needs to be there. ee: we want it. gh: you cannot have children outside bounds of their parents and their parents recursively. to me, that needs to happen. question: can you have children with location that have parents that are locationless? ad: why parents that don't overlap child location? gh: throws off our range filter mechanism. no easy answers to ad: if any children meet criteria, then they all get returned. gh: they you get back features that don't meet sc: lets say you're editing an exon... gh: forget editing. just basic reading. there was ambiguousnes in old spec here that I want to kill. I've seen desire to have locationless thing above, but never the reverse: definitive location above but locationless below. gh: we hashed this out in last code sprint. let's complete it! ad: if any feature matches, then all features match. includes the situation if parent has no location, but child matches, that implcitly matches. my proposal was to return all things in feat group if any one of the features match. same as assuming all parents have location of their children. this search will get back the parent. returning the feat group is a way to say all parents implicitly include locations of their children. aday: not all parents, multiple roots. gh: they all must go to a single root. aday: if any location of the root of group matches, then the whole group matches. boils down to: are descendent feats are allowed to be outside the bound of parent. gh: [insides query example on board] aday: the query is on the feature group root features ad: I don't remember allowing range queries being allowed only on root elements. two exons that are very far apart. query hits in between them. gh: parent meets overlap, return them all. ad: parent has only two small locations, not one large locations. gh: modeled as multiple small locations, not child features. sc: so it doesn't include the interveneing sequence. aday: gh: cannonical example of mult location stuff: 25mer probe that hits 4 diff locations in genome. multiple alignments, where none of the alignments align to the whole thing. aday: two probe pair, only some of the children are in the region. ad: example: protein structure catalytic group, three residues on different chains. gh: mult locations of probe set, one location falls inside query, return the probe set why can the rule be ad: besides range searches: when you find that a feature matches title or curator name, do you return back just the matching feats or the group? gh: don't see why we can't add more rules. aday: name search and exon is named, return it's parents. ad: so for any searches besides ranges, it returns all features in the feature group. gh: different behavior for range queries. they already have different behavior that other queries. ad: my criteria, if any feature matches, then all features in group are returned, except that in range query, only this that match the range query are returned. gh: don't see why you have a problem with that requirement. ad: do the search on all features, root is not special, if any feat match, get all features in group, if a range filter, then get features that pass. if a filter, then full hierarchies are not returned, only those that pass filter. gh: don't like. do an overlaps, two exon are in, two are not. you send back only the txt and the two that are, you are depriving user of data, there's no way of know that it's missing, how can they get at it? ad: i'm confused. in system you want, you return back everything? gh: yes. everything that has a root with one location that matches all range filters. if the root of the feat group meets range criteria for at least one of it's locations. aday: and any name filter ad: root has no location info, but one of exons overlap, whole thing returned. ee: distinction between olap and includes, different if parent lacks location info. aday: gregg needs for range optimizations. name may matches, but feat location may not, but root of group may ad: specified in root node. not convinced we need locationless features that aren't descented. gh: we're not talking about locationless nodes now. parent has location, that's all you need to search on. ad: use pieces, or whole range? gh: the whole range, not piece by piece. ad: why aday: there can be things gh: I argued against having mult locations, caused problems in bioperl, children with locations, and mult locatable features. so I didn't want to have mult locations, but got voted down. only thing it makes sense: when you want one feat to represent one feature to represent an alignment to things on genome. OK to represent with mult locs, but better to not. aday: offsets relative to the root. gh: no. will confuse people a lot. ad: any annotations that will go on mult segments in dna world. aday: blast results, very common. gh: every blast hit is a separate feature, avoids the problem. I use them in transforms, so I can say this feature maps to different genome assemblies. fine in a data model. but causes problems when it's in a spec, hard to describe when you should use one vs the other. aday: what rules do you use internally? gh: i know it when i see it. ee: in genometry, these are equivalent regions on these genomes. gh: right. the length of the range is the same length can be identical, but seq is different. genometry doesn't care about sequence identity. "this part of hg17 is equivalent to hg18". but this is getting tangential. ad: question is what do you do for things that are mult segments. example where parent is wider than children aday: you don't know where 3' end it gh: haplotype block for a set of snps, you know it extends to the next block, so the block is bigger than the bounds of the snps used to construct it. ad: curation tool, marked off three regions, one thing can extend over a broader range. tool automatically inserts. allows curator to stretch it out as need be. sc: this is what fuzzy locations are used for at genbank. gh: we don't have fuzzy locs. no needs for these at present. ad: implicitly the parent is the min-max ov its children. a db could optimize that way. curation tool gets data back from server. does curation tool know to change the parent range or not? gh: it better ad: if user changes the min/max exon bounds, will tool know to adjust parent transcript? the txt could be left extending past the current location of these. gh: up to the client app to figure it out. a smart gui should say, you cannot extend the txt past the exons you have, but for a genotype block, it might allow such a change. in theory, your client would understand what elements in the sequence ontology you could do it and what you could not. ee: this is outside the spec. should say it's possible for parent to extend beyond bounds of children, and not possible for childre to be outside of parent. ad: which of these can be on multiple segments? gh: if we're going to have mult locs, then everything can. ee: if child can, then parent can. aday: an argument for doing relative offsets I suggested. only allow parents to have relative offsets to children. no duplication of data. gh: duplication of data is a red herring. ad: more error prone to checking a string to see if it matches. hard to extend the parent to be a bit wider than children, gh: range queries to apply to root of featu hierarchies, and at least one of the children to pass all range filters? ad: why is this diff than requirement I gave? gh: your's give back partial feature groups. it's allowing filters to apply to any of the children , not just the root. ad: only difference is if you have two widely spaced features, everything has an implicit convex hull. if your query hits the midddle. gh: [whiteboard drawing] +-----------+ exon a in transcript c +----------------+ exon b in transcript c |______________| | inside query ee: for overlaps you would include the parent, for inside query you would not. ad: how will software guarantee this? min-max or just union of the children. ee: min-max of all children. ad: should be in the spec. gh: allen: how do you do min and max of mRNA, implicit or explicit? for me, it's explicit. aday: explicit. ee: using gff1 where it's implicit, but our parsers force it to be explicit in our data model. aday: in gff3 it can be implicit (using '.'). gh: gff, bed, psl, xml formats, raw blast output -- all explicit. ad: does server verify that it meets this criteria. each feature comming in, if it has parent it can only have one segment id. for eeach segment in the parent, find each one that matches the range in the child, if any child has segment x, only one location on segment x aday: can have mult locs on the same segment. ad: why not model as one range? aday: need to create the parent in two locations. gh: as long as one loc of parent contains the loc of the child, it's ok. ad: gregg saying that aday: location only includes one instance of the children. two locations for exon a, b, c. first set of locations for these exons is different than the second set of locations for these exons. a logical grouping not simple collection of all parts. mult locations on the same segment is harder. check location of parents, rify that no two childs. ad: spec now allows for dumb servers. by putting this extra requirements, it doesn't make server easier, complicates life on clients. gh: it makes clients life simpler. aday: location as two additional attribs: group, rank group - groups things together that are in the same segment rank - prioritized location conceptual grouping of things, to know which child locs match up with which parent locations, because locations can overlap. gh: (aside) can you make them multiple feats rather than diff locations? when it comes out as das2xml. ad: need to mention to lincoln and berkeley folks. specify what the algorithm is to Topic: status reports --------------------- gh: doing writeback to allens writeback server. create new annot, edit location, add, remove, extend exons, can write them all back. keeps creating new features in the db instead of editing the ones that are there. plan: delete the old annot in the same doc that edits the new one. aday: so you're leaving lots of old annots around. aday: finishing touches. old uri - new uri mapping, so gregg knows. fixing bugs on writeback server. working on new das front end that takes incoming reqest , breaks down with modulus operation with configurable blocks size, filters the results, this is for caching. working well. can convert the typical 40-50s response times down to 7s on a single megabase region. takes a while to get cache populated. todo: automatically populate cache. add code to know when a block became stale, so server can flush cache to get new stuff. bo: refactor domain factor response. found lots of hardcoded logic. went back to refactor. one object that populates hash structure of objects, handles. support for wiki stuff from lincoln, unique coord identifiers. todo: go ahead and update test suite now out of date. coord filter needs to be added in. gh: server now supports full type uris and segment uris? bo: yes, in cvs. todo make rpm package and install on production server. gh: then public release of igb can start using full type uri. bo: can communicate with you on it. gh: congrats -- end of code sprint. good to get the writeback stuff going. spec changes are little, but feels very nailed down. ad: finished off action items from yesterday. timestamp. reference server implementation. ee: still working on gff3 parser. progress nothing to report. sc: updated affy probe set alignments for drosophila arrays to be based on dm2 on our das/1 server (Ann's request). Restarted server. Worked on updating the affy das server info page in progress. todo: update the das2_server with latest improvements committed by gregg, then test the new and improved bp2 format for exon data. will need to deal with array prefix used by netaffx ('1:') rather than as used in CHP files ('HuEx:'). Post-teleconference Discussion ------------------------------- gh: would you be willing to give up multiple locations in the spec? aday: would you be willing to give up bidirectional parent-child pointers? gh: let me think about it... From dalke at dalkescientific.com Fri Aug 18 16:44:07 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 22:44:07 +0200 Subject: [DAS2] Notes from DAS/2 code sprint #3, day five, 18 Aug 2006 In-Reply-To: References: Message-ID: <3e168bac8dabbb6b9ee9cc10137ef368@dalkescientific.com> > Action items are flagged with '[A]'. I see there aren't any. > ad: need to mention to lincoln and berkeley folks. specify what the > algorithm is to I would have added this as an [A]. > Post-teleconference Discussion > ------------------------------- > > gh: would you be willing to give up multiple locations in the spec? > aday: would you be willing to give up bidirectional parent-child > pointers? > gh: let me think about it... Regarding bidi pointers (btw, should we change the tag to ?) As someone parsing GFF3 it's annoying that I have to keep any features with an ID hanging around until the end just in case someone wants to refer to it later as a parent. GFF3 does have a directive to allow flushing but so far I haven't come across a gff3 file which uses it. Having bidi links makes this trivial. As someone concerned about database integrity issue, what happens with a writeback which says "I am a child of X and Y" where X and Y were not previously connected. (Or "I am a parent of...", depending on the link direction.) Does the server allow that? Reject that? Regarding multiple locations, it's a data modeling issue. If a record Q has N multiple locations then it's identical to a record Q' with N children, Q1', Q2', ..., Qn'. The Qi' will have a different type record than Q'. As a plus, or minus, each Qi' will be annotatable, have it's own identifier, etc. which is not the case if features can have multiple locations. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Aug 18 17:55:55 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 23:55:55 +0200 Subject: [DAS2] feature locations Message-ID: [ I hope to hear a response before the end of the sprint today. ] For those not in the phone conference call today there were several issues which didn't get resolve regarding feature locations: 1) do we need multiple locations on a feature? (vs 0 or 1 location) (I argue this is mostly a data modeling issue as I can decompose anything to a set of features with at most 1 location.) 2) if a child has a location is its parent required to have locations which includes the child locations? (currently no) 3) if #2, is the parent required to have a single location per each segment? ie, if there are children on a given segment then the parent must have a single location on that segment where start_location <= min(children.start_location) end_location >= max(children.end_location) 4) how is the feature search done? Here's what I think is the problem question. Feature X is the parent of Y and Z with Y.location = (10,20) and Z.location = (50, 60) What do you get from an overlap(30, 40) search? In the way I've been thinking about it, this returns nothing. None of the features have locations which overlap that range. I gather that others want this to return {X,Y,Z} and do so because X should be assigned the location (10, 60). X cannot be location-less. I don't know enough DNA to give an example of something for which a location makes no sense. I think in proteins. Consider X = "catalytic site" with Y and Z denoting regions essential to catalysis. The section between Y and Z has nothing to do with "catalytic site". Automatically including that range in X makes no sense. For that matter, Y and Z may be on different segments. Hence I don't like #3. It doesn't make sense for some data types. (Now it may be that certain data types must work this way. But that's up to users of features of that type. A database could enforce those cases but a dumb database shouldn't be required to know all types.) Without the extra qualification of #3 then here's a dead simple way to implement #2 - parent_locations = { all of its children locations } Hence in my test case: Y has 1 location (10, 20) Z has 1 location (50, 60) ---> X has two locations (10, 20) and (50, 60) That perfectly agrees with #2. But only because we support multiple locations. We need multiple locations because we have features which span multiple segments. Hence the additional restriction required to make #3. If #2 is in place then I'll argue that a client should only put in the union of the regions because unless it knows the type it doesn't know if the min/max single location make sense. Please let me know if I'm on the right track before going onwards with search. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Aug 18 18:03:50 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 19 Aug 2006 00:03:50 +0200 Subject: [DAS2] complex feature examples Message-ID: <503eb1ab1ac82327753e9d25c1fe119d@dalkescientific.com> I would like a dozen or two examples of features - especially complex features - in das2xml format. We discussed this last February but it didn't lead to anything. I think this would be useful for three reasons: - help make sure our spec works (any last details we forgot?) - provide better examples for the documentation - help new DAS people learn best practices Here are a few ideas: exons, blast results, haplotype block for a set of snps, probe sets, primer locations (including cut point?), predicted gene locations, repressor locations Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Fri Aug 18 19:44:05 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Fri, 18 Aug 2006 16:44:05 -0700 Subject: [DAS2] complex feature examples In-Reply-To: <503eb1ab1ac82327753e9d25c1fe119d@dalkescientific.com> Message-ID: Excellent idea Andrew. > From: Andrew Dalke > Date: Sat, 19 Aug 2006 00:03:50 +0200 > To: DAS/2 > Subject: [DAS2] complex feature examples > > I would like a dozen or two examples of features - especially > complex features - in das2xml format. We discussed this last > February but it didn't lead to anything. > > I think this would be useful for three reasons: > - help make sure our spec works (any last details we forgot?) > - provide better examples for the documentation > - help new DAS people learn best practices > > Here are a few ideas: > exons, blast results, haplotype block for a set of snps, > probe sets, primer locations (including cut point?), > predicted gene locations, repressor locations Adding to this: representing different types of alternative splicing. See this figure: http://genomebiology.com/2002/3/11/REVIEWS/0008/figure/F1 In the context of DAS, we should be able to deal with alt splicing in two contexts: 1) read-only context: representing sets of features belonging to a transcript that exhibits alt splicing, indicating which features belong to which variants and that the variants are related. 2) writeback context: being able to add alt splice information about a transcript which originally lacked any alt splicing information, and to cover this for the various classes of alt splicing. That should give the spec a good workout. Steve From Ed_Erwin at affymetrix.com Fri Aug 18 19:33:44 2006 From: Ed_Erwin at affymetrix.com (Erwin, Ed) Date: Fri, 18 Aug 2006 16:33:44 -0700 Subject: [DAS2] feature locations Message-ID: I think all of us this morning, except you, want 2) Yes, parent region must encompass all child regions 3) Yes, a single segment that encompasses all child regions 4) In your example: overlaps(30,40) returns the whole parent and child inside(30,40) returns neither the parent nor the child The user (client) is responsible for asking for things that make sense. For mRNA transcripts and exons, an overlaps query is sensible. Here is my two cents about the "catalytic site" you talk about.... I agree that a "catalytic site" such as you describe requires some thought. But it requires thought from the curator on how to describe it, not smartness of the DAS server itself. If the catalytic site is composed of parts of exons on a single mRNA, they should be maybe be put into a parent-child relationship. If different components of the catalytic site are on different mRNAs that fold-up and combine into a complex compound (like hemoglobin) then the parts that are on different mRNAs probably should be treated as different features. Or even more simply, there could be a feature type "catalytic site component" that can be a "part of" an exon. Anyway, that is *my* opinion. #2 Yes, #3 Yes, and #4 the annotator is responsible for being smart. I can at least see now why you think there might be a problem, but I don't agree that it is a problem. -----Original Message----- From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open-bio.org] On Behalf Of Andrew Dalke Sent: Friday, August 18, 2006 2:56 PM To: DAS/2 Subject: [DAS2] feature locations [ I hope to hear a response before the end of the sprint today. ] For those not in the phone conference call today there were several issues which didn't get resolve regarding feature locations: 1) do we need multiple locations on a feature? (vs 0 or 1 location) (I argue this is mostly a data modeling issue as I can decompose anything to a set of features with at most 1 location.) 2) if a child has a location is its parent required to have locations which includes the child locations? (currently no) 3) if #2, is the parent required to have a single location per each segment? ie, if there are children on a given segment then the parent must have a single location on that segment where start_location <= min(children.start_location) end_location >= max(children.end_location) 4) how is the feature search done? Here's what I think is the problem question. Feature X is the parent of Y and Z with Y.location = (10,20) and Z.location = (50, 60) What do you get from an overlap(30, 40) search? In the way I've been thinking about it, this returns nothing. None of the features have locations which overlap that range. I gather that others want this to return {X,Y,Z} and do so because X should be assigned the location (10, 60). X cannot be location-less. I don't know enough DNA to give an example of something for which a location makes no sense. I think in proteins. Consider X = "catalytic site" with Y and Z denoting regions essential to catalysis. The section between Y and Z has nothing to do with "catalytic site". Automatically including that range in X makes no sense. For that matter, Y and Z may be on different segments. Hence I don't like #3. It doesn't make sense for some data types. (Now it may be that certain data types must work this way. But that's up to users of features of that type. A database could enforce those cases but a dumb database shouldn't be required to know all types.) Without the extra qualification of #3 then here's a dead simple way to implement #2 - parent_locations = { all of its children locations } Hence in my test case: Y has 1 location (10, 20) Z has 1 location (50, 60) ---> X has two locations (10, 20) and (50, 60) That perfectly agrees with #2. But only because we support multiple locations. We need multiple locations because we have features which span multiple segments. Hence the additional restriction required to make #3. If #2 is in place then I'll argue that a client should only put in the union of the regions because unless it knows the type it doesn't know if the min/max single location make sense. Please let me know if I'm on the right track before going onwards with search. Andrew dalke at dalkescientific.com _______________________________________________ DAS2 mailing list DAS2 at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/das2 From Ed_Erwin at affymetrix.com Fri Aug 18 20:16:25 2006 From: Ed_Erwin at affymetrix.com (Erwin, Ed) Date: Fri, 18 Aug 2006 17:16:25 -0700 Subject: [DAS2] complex feature examples Message-ID: Blechhhhhh! Just my 2 cents, but I think the graphics in that figure are more confusing than useful. In all those cases, it would be simpler to just show all the observed transcripts separately. In some cases, the number of possible transcripts may be very large, but so be it. There doesn't need to be any pointers relating the different transcripts to one another. They might be given similar, or the same, gene names, but the fact that alternative splicing is going is on clear from the fact that there are overlapping exons and doesn't need to be explicitly mentioned. (Textual annotations can say which type is seen in which tissue, etc.) -----Original Message----- From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open-bio.org] On Behalf Of Steve Chervitz Sent: Friday, August 18, 2006 4:44 PM To: Andrew Dalke; DAS/2 Subject: Re: [DAS2] complex feature examples Excellent idea Andrew. > From: Andrew Dalke > Date: Sat, 19 Aug 2006 00:03:50 +0200 > To: DAS/2 > Subject: [DAS2] complex feature examples > > I would like a dozen or two examples of features - especially > complex features - in das2xml format. We discussed this last > February but it didn't lead to anything. > Adding to this: representing different types of alternative splicing. See this figure: http://genomebiology.com/2002/3/11/REVIEWS/0008/figure/F1 From Steve_Chervitz at affymetrix.com Fri Aug 18 20:21:10 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Fri, 18 Aug 2006 17:21:10 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day five, 18 Aug 2006 In-Reply-To: <3e168bac8dabbb6b9ee9cc10137ef368@dalkescientific.com> Message-ID: Andrew wrote: >> Action items are flagged with '[A]'. > > I see there aren't any. > >> ad: need to mention to lincoln and berkeley folks. specify what the >> algorithm is to > > > I would have added this as an [A]. Yep. The discussion was going fast and furious. Didn't have time to flag these, as I was trying to follow the discussion and contribute as well. Here's some actions items to add in retrospect: [A] Steve will set up an emacs macro for flagging action items easily [A] Andrew will go through the notes and identify action items Cheers, Steve From dalke at dalkescientific.com Fri Aug 18 22:49:27 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 19 Aug 2006 04:49:27 +0200 Subject: [DAS2] feature locations In-Reply-To: References: Message-ID: <9c4ce2413e467975628c5ba12ba64cfa@dalkescientific.com> Ed: > I think all of us this morning, except you, want > > 2) Yes, parent region must encompass all child regions > 3) Yes, a single segment that encompasses all child regions > 4) In your example: > overlaps(30,40) returns the whole parent and child > inside(30,40) returns neither the parent nor the child That's what I figured was the case. > The user (client) is responsible for asking for things that make sense. > For mRNA transcripts and exons, an overlaps query is sensible. Isn't the client also responsible for making sure the features makes sense? (Possibly validated in the server.) In the case which comes up most often - transcripts and exons - it makes sense that the client give locations to both the transcript and the exons. For that feature type doing #3 is right. I'm not convinced that it's correct for the general case. > Here is my two cents about the "catalytic site" you talk about.... I can come up with more examples in the protein world. "Surface residues". "S-S bonded residues". These don't require 3D structure for visualization. Eg, I should be able to see "surface residues" highlighted differently than others even on a 1D display. Useful when homology modeling. > I agree that a "catalytic site" such as you describe requires some > thought. But it requires thought from the curator on how to describe > it, not smartness of the DAS server itself. If the catalytic site is > composed of parts of exons on a single mRNA, they should be maybe be > put > into a parent-child relationship. If different components of the > catalytic site are on different mRNAs that fold-up and combine into a > complex compound (like hemoglobin) then the parts that are on different > mRNAs probably should be treated as different features. Or even more > simply, there could be a feature type "catalytic site component" that > can be a "part of" an exon. (Naming ambiguity: "treated as different features" or "treated as different feature groups"? Per today's discussion I would have them be different features in the same feature group.) Well, I was thinking of proteins, and an annotation which is more properly part of a structural assembly. To make my objections less needlessly complex, the site residues can all be on the same chain. For that case it still does not make sense to have a parent feature have a location across all intermediate residues. If a the two cysteines of a S-S bond are at 22 and 98 then an overlaps search of (30,50) should not return the S-S bond information. Arguing proteins is wrong because they are so small. Nearly everyone will download everything and not do range searches on the server. Perhaps that's why my intuition is leading me astray.... I've been trying to come up with some more DNA-centric examples. I really don't know the domain well enough. What about: Some genes have multiple promoters. EPD puts those into a "promoters group". See http://www.epd.isb-sib.ch/current/AP.html for the known cases. Here are three members from one group FP Rn IGF II E1P1 :+R EM:X17012.1 1+ 18227; 28008. 036*1 FP Rn IGF II E2P2 :+S EM:X17012.1 1+ 19978; 25032.137 036*2 FP Rn IGF II E3P3+:+S EM:X17012.1 1+ 21966; 25033.155 036*3 The docs at http://www.epd.isb-sib.ch/current/usrman.html say these have position numbers of 18227, 19978, 21966. Would it be reasonable to want to annotate this as a "promoters group" using a single DAS2 feature group? If so, should the parent include the portions between the three promoters? Genbank is notorious for its complex annotations. I looked for interesting things (non-gene/CDS/exon/intron records). Here are a few The D-loop from a cow's mitochondria http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? db=nucleotide&val=27543905 D-loop join(15791..16337,1..362) D-loops appear to be a feature where it does not makes sense to have the parent join the intermediate sequence. The cat mitochondria record (I"m scanning gbmam hence cow and cat) at http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1098523 has a feature misc_feature join(16315..17009,1..865) /note="control region; CR" but I can't figure out what that means. Jumping to another file, here's one from Tobacco leaf curl Japan virus http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=8096283 stem_loop join(1..19,2754..2761) That's a nice structural example. Strange that it's in two sections. Perhaps that only works because the first section is terminal? This example points out a class of RNA and ssDNA annotations on shape, like pseudoknots, which are essentially structural. Oh, and then there are functional RNA structures like ribozymes structures where you might annotate the functional regions, but that's back to the realm of the small. I have managed to convince myself that the difference in viewpoints is because of a difference in molecular expectations. DNA really doesn't do all that much. It sits there and gets transcribed. There are some structurally interesting regions but nothing like what protein has or does. RNA and ssDNA are more interesting, but they are small. I did come across a paper titled "DNA supercoiling allows enhancer action over a large distance" where it was best to think of the 3D structure of DNA, but that sort of thing is rare. How portable should the FEATURE structure from DAS2 be for 2D protein annotations? In the way I've been thinking of it it's quite portable. With this "parent locations must overlap all children's locations" restriction everything but the leaf locations will likely be useless blobs in protein annotations. > Anyway, that is *my* opinion. #2 Yes, #3 Yes, and #4 the annotator is > responsible for being smart. > > I can at least see now why you think there might be a problem, but I > don't agree that it is a problem. As #3 is trivially computed from the data, the only difference I can see must be in the results from range searches done on the server. I'll write about that some other time. This email is long enough. I'm off to bed. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Sat Aug 19 09:44:47 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 19 Aug 2006 15:44:47 +0200 Subject: [DAS2] feature search algorithm (was Re: feature locations) In-Reply-To: References: Message-ID: <7e9ae58f481d989f3873acc9dd7c1159@dalkescientific.com> Given a database: Foo: (10, 60) Bar: (10, 20) Baz: (50, 60) I understand that everyone wants "overlaps(30,40)" to return {Foo, Bar, Baz}. That includes me. I question the need for the requirement that parent locations include all of the children locations. Putting that aside for now I have a question about the above structure, which we all agree is valid. What does the search overlaps(30,40) and title == "Foo" return? I think it should return nothing. There are no features named "Foo" in that range. If I understand you all correctly it should return {Foo, Bar, Baz} because the overlaps search is only done on the root feature, returning all features in the feature group, while the title search is done on on a per-feature basis. How is the server search algorithm supposed to work? Given a range search "in_range(feature)" and non-range search "is_match(feature)" (for things like title, type, etc.) then the current search algorithm can be expressed: find all features X where: feature X is in the same feature group as feature Y where: in_range(Y) and is_match(Y) As I understand from you all, the search algorithm should be find all features X where both: - feature X is in the same feature group as Y where: Y is a root element and in_range(Y) - feature X is in the same feature group as Z where: is_match(Z) Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Sat Aug 19 10:04:31 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 19 Aug 2006 16:04:31 +0200 Subject: [DAS2] locations and writeback behavior Message-ID: <473730ce334d4768aeaf34444d7b8d7f@dalkescientific.com> Assume there is the requirement that parent node locations must cover children locations. What does the server do on writeback for the following circumstances: Case #1. the parent feature has no locations Foo: -- no locations -- Bar: (10, 20) Baz: (50, 60) Here are 6 possibilities: 1. reject the writeback 2. accept it and use an implicit location of (10, 60) (implicit means the record is not modified and clients downloading feature Foo will not get any locations for it but the server will act as if it was present.) 3. accept it and use an implicit location of [(10, 20), (50, 60)] 4. accept it and insert the location (10, 60) (explict; clients fetching feature Foo will see the server inserted locations) 5. accept it and explicitly insert the locations [(10, 20), (50, 60)] 6. accept it unchanged; range searches will always fail because the root node has no locations Case #2. the parent feature has a location which does not overlap all of the children Foo: (15, 85) Bar: (10, 20) Baz: (50, 60) Case #3: the parent has multiple locations; the parent's locations overlap those of the children Foo: (10, 30), (40, 66), (543, 567) Bar: (10, 20) Baz: (50, 60) Case #4: the parent has a single location which is broader than those of the children Foo: (10, 567) Bar: (10, 20) Baz: (50, 60) Case #5: the children contain multiple locations, the parent covers them all Foo: (10, 100) Bar: (10, 20), (22, 24) Baz: (50, 60), (70, 80) The server already does some validation for cyclic detection. It can easily check for ranges as well. As I understand things the answers should be: Case #1: reject (parent must cover all children locations) Case #2: reject (parent must cover all children locations) Case #3: reject (parent can only have a single location per segment) Case #4: accept, and use the broader range Case #5: accept (leaves and leaves only may have multiple locations on the same segment) and the reasons for these answers are: - it doesn't make sense to have location-less parents when the children don't have locations - it makes the search algorithm work correctly Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Sat Aug 19 10:18:36 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 19 Aug 2006 16:18:36 +0200 Subject: [DAS2] feature search algorithm (was Re: feature locations) In-Reply-To: <7e9ae58f481d989f3873acc9dd7c1159@dalkescientific.com> References: <7e9ae58f481d989f3873acc9dd7c1159@dalkescientific.com> Message-ID: > Given a database: > > Foo: (10, 60) > Bar: (10, 20) > Baz: (50, 60) I'll modify it a bit more. Make it be Foo: type=transcript, location=(10, 60) Bar: type=exon, location=(10, 20) Baz: type=exon, location=(50, 60) What does the search for overlaps(30,40), type==exon, title==Foo return and why? I can think of three answers: 1) return everything because in the feature group there is a feature which overlaps(30,40) there is a feature which is of type exon there is a feature with title "Foo" (call this the "each query term must match at least one feature in a feature group" algorithm) 2) return nothing because there is no feature which overlaps(30,40) and has type exon and has title "Foo" (call this the "at least one feature must be matched by all query terms" algorithm. This is the current algorithm) 3) return nothing because while the root feature overlaps(30,40) there is no feature which is both of type exon and with title "Foo". (call this the "range searches are special" algorithm.) Now what does the search for overlaps(30,40), type==exon, title==Bar return and why? Using the same three algorithms: 1) return everything because each of the three criteria are matched by at least one feature in the feature group 2) return nothing because no feature matches all three criteria. 3) return everything because the root feature overlaps(30,40) and the Bar feature meets the other two criteria. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Mon Aug 21 11:46:29 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 21 Aug 2006 08:46:29 -0700 Subject: [DAS2] DAS/2 teleconference today, 9:30 AM Message-ID: We're back to our regular Monday DAS/2 teleconference today, at 9:30 AM. Mainly I'd like to summarize progress during the code sprint and discuss the few remaining spec issues. thanks, Gregg From Gregg_Helt at affymetrix.com Mon Aug 21 12:28:28 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 21 Aug 2006 09:28:28 -0700 Subject: [DAS2] feature search algorithm (was Re: feature locations) Message-ID: 4) Return the entire feature group because in the feature group: a) a location of the root of the group overlaps (30,40) b) there is a feature of type exon c) there is a feature with title "Foo" Pushing this farther: Foo: type=transcript, location=(10, 60) Bip: type=polyA-site location = (50,55) Bar: type=exon, location=(10, 20) Baz: type=exon, location=(50, 60) Search: overlaps(30,40), type=polyA-site, title=Baz Also returns the feature group because: Foo meets root overlaps(30,40) Bip meets type=polyA-site Baz meets title=Baz In trying to work backwards from what I feel multi-filter queries should return, here's the rules that seem to give me what I want: a) For range filters, the feature group passes the filter if the root of the feature group meets the range requirement. b) For non-range filters, the feature group passes the filter if any feature in the feature group meets the filter requirement. c) All filters are AND'd together gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open- > bio.org] On Behalf Of Andrew Dalke > Sent: Saturday, August 19, 2006 7:19 AM > To: DAS/2 > Subject: Re: [DAS2] feature search algorithm (was Re: feature locations) > > > Given a database: > > > > Foo: (10, 60) > > Bar: (10, 20) > > Baz: (50, 60) > > I'll modify it a bit more. Make it be > > Foo: type=transcript, location=(10, 60) > Bar: type=exon, location=(10, 20) > Baz: type=exon, location=(50, 60)> > What does the search for > > overlaps(30,40), type==exon, title==Foo > > return and why? I can think of three answers: > > 1) return everything because in the feature group > > there is a feature which overlaps(30,40) > there is a feature which is of type exon > there is a feature with title "Foo" > > (call this the "each query term must match at least > one feature in a feature group" algorithm) > > 2) return nothing because there is no feature > which overlaps(30,40) and has type exon and > has title "Foo" > > (call this the "at least one feature must be > matched by all query terms" algorithm. This is > the current algorithm) > > 3) return nothing because while the root feature > overlaps(30,40) there is no feature which is both > of type exon and with title "Foo". > > (call this the "range searches are special" algorithm.) > > > Now what does the search for > > overlaps(30,40), type==exon, title==Bar > > return and why? Using the same three algorithms: > > 1) return everything because each of the three > criteria are matched by at least one feature in > the feature group > > 2) return nothing because no feature matches all > three criteria. > > 3) return everything because the root feature > overlaps(30,40) and the Bar feature meets the > other two criteria. > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 From lstein at cshl.edu Mon Aug 21 14:50:48 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 21 Aug 2006 14:50:48 -0400 Subject: [DAS2] Fwd: locations and writeback behavior In-Reply-To: References: Message-ID: <6dce9a0b0608211150k465399dfx7147d8f90029a807@mail.gmail.com> ---------- Forwarded message ---------- From: das2-owner at lists.open-bio.org Date: Aug 21, 2006 2:49 PM Subject: Re: [DAS2] locations and writeback behavior To: lincoln.stein at gmail.com You are not allowed to post to this mailing list, and your message has been automatically rejected. If you think that your messages are being rejected in error, contact the mailing list owner at das2-owner at lists.open-bio.org. ---------- Forwarded message ---------- From: "Lincoln Stein" To: "Andrew Dalke" Date: Mon, 21 Aug 2006 14:48:55 -0400 Subject: Re: [DAS2] locations and writeback behavior I think that the server should accept the locations given for features without checking that the children are contained within their parents' coordinates. This is because there are genomic features that are discontinuous. Lincoln On 8/19/06, Andrew Dalke wrote: > > Assume there is the requirement that parent node locations must > cover children locations. What does the server do on writeback > for the following circumstances: > > Case #1. the parent feature has no locations > > Foo: -- no locations -- > Bar: (10, 20) > Baz: (50, 60) > > Here are 6 possibilities: > 1. reject the writeback > 2. accept it and use an implicit location of (10, 60) > (implicit means the record is not modified and clients > downloading feature Foo will not get any locations for it > but the server will act as if it was present.) > 3. accept it and use an implicit location of [(10, 20), (50, 60)] > 4. accept it and insert the location (10, 60) (explict; clients > fetching feature Foo will see the server inserted locations) > 5. accept it and explicitly insert the locations [(10, 20), (50, 60)] > 6. accept it unchanged; range searches will always fail because > the root node has no locations > > Case #2. the parent feature has a location which does not overlap > all of the children > > Foo: (15, 85) > Bar: (10, 20) > Baz: (50, 60) > > > Case #3: the parent has multiple locations; the parent's locations > overlap those of the children > > Foo: (10, 30), (40, 66), (543, 567) > Bar: (10, 20) > Baz: (50, 60) > > > Case #4: the parent has a single location which is broader than > those of the children > > Foo: (10, 567) > Bar: (10, 20) > Baz: (50, 60) > > > Case #5: the children contain multiple locations, the parent > covers them all > Foo: (10, 100) > Bar: (10, 20), (22, 24) > Baz: (50, 60), (70, 80) > > > The server already does some validation for cyclic detection. > It can easily check for ranges as well. As I understand things > the answers should be: > Case #1: reject (parent must cover all children locations) > Case #2: reject (parent must cover all children locations) > Case #3: reject (parent can only have a single location per segment) > Case #4: accept, and use the broader range > Case #5: accept (leaves and leaves only may have multiple > locations on the same segment) > > and the reasons for these answers are: > - it doesn't make sense to have location-less parents when > the children don't have locations > - it makes the search algorithm work correctly > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Mon Aug 21 15:17:14 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 21 Aug 2006 15:17:14 -0400 Subject: [DAS2] feature locations Message-ID: <6dce9a0b0608211217s7b67d889q8f0570a378a5b2e5@mail.gmail.com> From: "Lincoln Stein" To: "Andrew Dalke" Date: Mon, 21 Aug 2006 15:04:09 -0400 Subject: Re: [DAS2] feature locations On 8/18/06, Andrew Dalke wrote: > > [ I hope to hear a response before the end of the sprint today. ] > > For those not in the phone conference call today there were several > issues which didn't get resolve regarding feature locations: > > 1) do we need multiple locations on a feature? (vs 0 or 1 location) > (I argue this is mostly a data modeling issue as I can > decompose anything to a set of features with at most 1 > location.) Yes, because a feature may be discontinuous. This feature won't be used very often, however, and simple servers might simply refuse to handle such features. 2) if a child has a location is its parent required to have > locations which includes the child locations? (currently no) No. Parent/child relationships are defined by functional/biological relationships and not by genomic coordinates. For example, a C. elegans transcript is assembled from discontinuous regions of the genome (the mRNA on one chromosome, the spliced leader on the other), and enforcing restriction (2) would make it impossible to represent nematode genomes, the most populous multicellular organism on earth. 3) if #2, is the parent required to have a single location per > each segment? ie, if there are children on a given segment > then the parent must have a single location on that segment where > start_location <= min(children.start_location) > end_location >= max(children.end_location) N/A 4) how is the feature search done? A feature may have multiple locations. If any of its locations matches the range query, then the feature, plus its parents and children, is returned. There is no "transitive" matching. That is, if the query consists of a feature type plus a range, then IT IS NOT appropriate to return a feature if its child matches the range and the feature itself matches the type. The query should only return a feature if both the feature's type and location matches. Lincoln Here's what I think is the problem question. > > Feature X is the parent of Y and Z with > Y.location = (10,20) and Z.location = (50, 60) > > What do you get from an overlap(30, 40) search? > > In the way I've been thinking about it, this returns nothing. None > of the features have locations which overlap that range. > > I gather that others want this to return {X,Y,Z} and do so > because X should be assigned the location (10, 60). X cannot > be location-less. > > > I don't know enough DNA to give an example of something for > which a location makes no sense. I think in proteins. Consider > X = "catalytic site" with Y and Z denoting regions essential > to catalysis. > > The section between Y and Z has nothing to do with "catalytic > site". Automatically including that range in X makes no sense. > For that matter, Y and Z may be on different segments. > > Hence I don't like #3. It doesn't make sense for some data types. > (Now it may be that certain data types must work this way. But > that's up to users of features of that type. A database could > enforce those cases but a dumb database shouldn't be required to > know all types.) > > > Without the extra qualification of #3 then here's a dead simple > way to implement #2 - > > parent_locations = { all of its children locations } > > Hence in my test case: > Y has 1 location (10, 20) > Z has 1 location (50, 60) > ---> X has two locations (10, 20) and (50, 60) > > That perfectly agrees with #2. But only because we support > multiple locations. We need multiple locations because > we have features which span multiple segments. Hence the > additional restriction required to make #3. > > If #2 is in place then I'll argue that a client should > only put in the union of the regions because unless it > knows the type it doesn't know if the min/max single > location make sense. > > > Please let me know if I'm on the right track before going > onwards with search. > > Andrew > dalke at dalkescientific.com > > ______________________________ _________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Mon Aug 21 16:24:46 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 21 Aug 2006 22:24:46 +0200 Subject: [DAS2] xml:base subtleties Message-ID: DAS2 support extensions through non-das2: namespaced XML elements. ... DAS2 supports xml:base ... For both IMAGE and MOVIE the link attribute is expanded via the enclosing xml:base, for clients which know about those elements. When a client gets the record it's supposed to treat any extension elements as character blobs and send them opaquely in the writeback. At least that's my thought. I don't know if you all have a different opinion. I believe this approach gives the best chance for compatibility so a server can serve things the client doesn't know about and still let the client do writeback for the things it does know about. (Another approach is to send diff/edit commands to the server, which we decided against. For good reasons.) Suppose there's a change to the above record. What does the client send back? The following is obviously valid but it requires the server keep track of the xml:base attributes at different levels ... change to the 'name' field ... The following should also be valid. It collapses the top-level xml:base into the FEATURE-level. ... change to the 'name' field ... It's even possible to get rid of xml:base for all the fields the client knows about. This will be most likely for servers which convert the XML into a class/object model and flatten the incoming URIs while doing so. ... change to the 'name' field ... The question is, what does the client do with the non-DAS elements where it doesn't know what to do. Should it insert xml:base attributes in them? Or is the best practice to persist the xml:base on a per-feature basis and always include the xml:base in the writeback? My feeling is it's the last paragraph. I really want the clients to treat extensions as mostly opaque things. (They can assume it's okay to remove instructions, comments, etc. and collapse whitespace.) Okay, now the other way around. Suppose the server is configured to treat unknown extensions as blobs. What does it do with the xml:base attributes? In the following the DAS2 URIs are all absolute URIs so the DAS2-specific code never even looks at the xml:base attributes. ... change to the 'name' field ... Normally on writeback the server inserts xml:base attribute into the FEATURES (and FEATURE in this case) document. It can't when the client sent in the above structure. At best it can collapse the xml:base attributes inward so they only apply to the non-DAS elements. That is, turn the above into ... change to the 'name' field ... That's not complicated but it is finicky. To summarize: if we have xml:base and support for blob elements then 1) the client must preserve xml:base on writeback 2) the server must fix up the writeback to make sure it does not conflict with the server's use of xml:base Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Mon Aug 21 17:42:30 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 21 Aug 2006 14:42:30 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 21 Aug 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 21 Aug 2006 $Id: das2-teleconf-2006-08-21.txt,v 1.1 2006/08/21 21:00:01 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: -------- Summarize progress during last week's code sprint and discuss the few remaining spec issues. Topic: Spec Discussion ---------------------- [The note taker apologizes for attending late (~30min)] gh: could a server in the types doc restrict the types. just say 'transcripts'? ls: yes. if not going to allow for searching for feature, only via parent, then types doc should only include parent. gh: types doc specifies which types you can query on. ls: ontology gives you access to all types that might come back ad: and how to depict them. gh: yes, but it can be restrictive of the types. ad: what does client do to display it? gh: implies we separate out style into stylesheet info again. no one is serving or using, so we can change w/o major impl changes. ad: type doc ties a feature to ontology, how to display it, and includes this extra source field. gh: types doc has all types server contains but tags as to what the server allows searching on. ad: feels weird. can't see why i'd want to do in my server. bo: better than limiting the types doc, just have a searchable field. ad: easy gh: if you don't say no, then it's searchable. this is backwards compatible. gh: other thing: for my optimization on client to work, need hint about particular type on a server can have children outside bounds of parent. or need the opposite: that all children are guaranteed to be within bounds. ad: can't see why this is needed. gh: can you trust me on it? ad: no. going back to the case where you ask for introns in this range and you want to return back everything. gh: the reason i need it: if children are outside bounds of parents, and i do query on parent, i never know if I'll get children outside the bounds i specified. messes up my optimization. ad: it will give you the children. gh: i want to optimize so that i don't have to get that back. i want assurance that there won't be something hanging off the region in the query. that there won't be anything outside the range that I queried. ls: that's always the case. you can do the query that somethings are outside the region you requested. you can filter things out. gh: i don't want server to send them. ls: semantically correct to always send the complete object. gh: there are optimizations on client that depend on it. ls: this will give you back more than you want. gh: i don't want to have to it filter out (defeats optimization). ad: range search for id=abc. you'll get all features in feat group id=abc. ad: modify servers gh: not so easy for servers I don't control. ls: you won't be able to convince worm or microbial communities which have features with different locations, some that are in trans on different chromosomes. gh: blat, blast, genscan, etc. majority of algorithmic seq will meet that condition. ls: if you feel comfortable going thru SO and flagging all features that meet that requirement, we can add to SO and you can use it in your optimization. gh: not necessary to modify SO. no blat, blast in SO ls: yes there are: computational matches gh: not all comp matches ls: we do have blast, you can add blat. ad: type ontology has extension area, you can add that. gh: no one will live with that. gh: will try on my server, see how it works. [A] Gregg will try flagging types on server, see if works with client optimizations ls: i have to go. gh: this will change all impls, could be trouble. ad: why does it change server impl at all? gh: where filter range applies only to nodes that meet the type filter. ls: that's the way it's in the spec now. ad: for any filter aday: if you match a range it's root feature that matches range, can reduce overhead by factor of 10-20. ls: aday: won't trigger range query because type doesn't match ls: searching over range you'd pick up exon because it's contained in the range. ad: if your server or allen's decides to model all stuctures by your logic, it won't work. there are occasions where you will have non overlapping impls in the server. gh: your right. to allen: does this affect your server impl? gh: proposal is to clarify spec to say that range queries apply only to the nodes of a feat group that pass the types filter. ad: range and non-range filters must both be true for a given feature gh: ok, as long as we can say in types doc that some types are not filtered. aday: gh: if searching for types=exon in range that's in the intron, gh: exon 1 in feature group, if it's outside range. aday: this is the way I've impl'd: find things in range, see if they match, then look for other things in other filters match. all filters operate on feature granularity, except range that operate on feature group granularity. all parents are located and encompass min/max bounds of encompassed features. gh: you get more things passing range query, but they get stopped by name, or type, or id query. ad: i'm happy with it. gh: i'm not but will go along. bo: have to leave now. [A] andrew will clarify range and type filtering logic in the spec [A] andrew will introduce concept of feature group (currently in spec as 'complex feat with children') [A] andrew will add searchable flag to type document [A] andrew will add optional circularity flag to segments document gh: Something we need todo: come back to stylesheet issue. ad: we should have impl in place before making spec work. [A] Discuss stylesheets when we have an impl in place Topic: Summarize code sprint work ---------------------------------- Focus on what people did last Friday (last day of sprint). gh: more complete write back on client. sync data model with how writeback is working: delete feat group, add back with change. then hit wall where that triggers issues in how to deal with undo/redo in client. I then did a massive chart on wall for how to deal with, now have a clear path forward. ad: Here's another issue: xml:base in writeback doc and how it interacts with extensions. server may not know extension is to be supported in writeback doc. e.g., link to image url. if xml:base in writeback doc, then you have to make sure the context of the extension that may have relative urls still preserver xml:base. seems ugly. do we say servers are free to ignore xml:base? gh: they should preserve it. ad: so if my writeback doc says features is http://biodas.org. feautures has a different one, individual features have different ones, and extensions has a different one. my impl would ignore xml:base in the data. (too complex to explain...) [A] Andrew will describe his xml:base issue with writeback and send email ad: worked on getting search algorithm to work. came up with counter examples re: parent element containing/not containing children. sc: mostly worked on notes and catching up with mailing list. Some todo items: [A] Steve Verify with Ann about new dm2-based affy das server data. [A] Steve Finish info page for data hosted by affy das servers. [A] Steve Update affy das/2 server to test new binary exon data (bp2) [A] Steve Add id to wiki page for new drosophila assembly (R5) aday: working on getting block translation server up and running. close. code to automatically set up caching and staling out the blocks. geting binary set up for onlth fly analysis servers. primer3, ncbi ePCR, blat, blast binaries on server. now need to install blat/blast dbs, can start serving up analyses. ee: [not present, but heard from Ed after meeting] - continuing work on gff3 parser for IGB client. [A] Next teleconf in two weeks (4 Sep 2006) gh: we had a successful sprint, hashed out critical decisison in the spec, got a lot of work done. [A] Next code sprint in Healdsburg at Helt Retreat Center. Possible date? Not until end of year or begin of next year (lots of construction in town). From Gregg_Helt at affymetrix.com Mon Aug 28 09:24:39 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 28 Aug 2006 06:24:39 -0700 Subject: [DAS2] DAS/2 teleconference now held biweekly Message-ID: As discussed during the code sprint and last week's teleconference, the DAS/2 teleconference is being rescheduled for once every two weeks rather the every week. So no teleconference this week, and since next Monday is a holiday in the US, no teleconference next week either. The next DAS/2 teleconference will be held Monday, September 11 at 9:30 AM Pacific time. Talk to you then! thanks, Gregg From lstein at cshl.edu Mon Aug 28 10:32:03 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 28 Aug 2006 10:32:03 -0400 Subject: [DAS2] Possibly dialing in late today Message-ID: <6dce9a0b0608280732r3363e362ue967b07a74608b02@mail.gmail.com> Hi All, I have a doctor's appointment just beforehand, so I may be a little late calling in today. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From allenday at ucla.edu Thu Aug 31 22:11:59 2006 From: allenday at ucla.edu (Allen Day) Date: Thu, 31 Aug 2006 19:11:59 -0700 Subject: [DAS2] dynamic das2 features Message-ID: <5c24dcc30608311911q38ac2520k24c166bb33c29e75@mail.gmail.com> I have a prototype that will generate primer3 primers. Temporarily up here: http://jugular.ctrl.ucla.edu:3000/feature?type=primer3;seq=ATATCCTAAAAGCATAACTGATGCATCTTTAATCTTGTATGTGACACTACTCATACGAAGGGACTATATCTAGTCAAGACGATACTGTGATAGGTACGTTATTTAATAGGATCTATAACGAAATGTCAAATAATTTTACGGTAATATAACTTATCAGCGGCGTATACTAAAACGGACGTTACGATATTGTCTCACTTCATCTTACCACCCTCTATCTTATTGCTGATAGAACACTAACCCCTCAGCTTTATTTCTAGTTACAGTTACACAAAAAACTATGCCAACCCAGAAATCTTGATATTTTACGTGTCAAAAAATGAGGGTCTCTAAATGAGAGTTTGGTACCATGACTTGTAACTCGCACTGCCCTGATCTGCAATCTTGTTCTTAGAAGTGACGCATATTCTATACGGCCCGACGCGACGCGCCAAAAAATGAAAAACGAAGCAGCGACTCATTTTTATTTAAGGACAAAGGTTGCGAAGCCGCACATTTCCAATTTCATTGTTGTTTATTGGACATACACTGTTAGCTTTATTACCGTCCACGTTTTTTCTACAATAGTGTAGAAGTTTCTTTCTTATGTTCATCGTATTCATAAAATGCTTCACGAACACCGTCATTGATCAAATAGGTCTATAATATTAATATACATTTATATAATCTACGGTATTTATATCATCAAAAAAAAGTAGTTTTTTTATTTTATTTTGTTCGTTAATTTTCAATTTCTATGGAAACCCGTTCGTAAAATTGGCGTTTGTCTCTAGTTTGCGATAGTGTAGATACCGTCCTTGGATAGAGCACTGGAGATGGCTGGCTTTAATCTGCTGGAGTACCATGGAACACCGGTGATCATTCTGGTCACTTGGTCTGGAGCAATACCGGTCAACATGGTGGTGAAGTCACCGTAGTTGAAAACGGCTTCAGCAACTTCGACTGGGTAGGTTTCAGTTGGGTGGGCGGCTTGGAACATGTAGTATTGGGCTAAGTGAGCTCTGATATCAGAGACGTAGACACCCAATTCCACCAAGTTGACTCTTTCGTCAGATTGAGCTAGAGTGGTGGTTGCAGAAGCAGTAGCAGCGATGGCAGCGACACCAGCGGCGATTGAAGTTAATTTGACCATTGTATTTGTTTTGTTTGTTAGTGCTGATATAAGCTTAACAGGAAAGGAAAGAATAAAGACATATTCTCAAAGGCATATAGTTGAAGCAGCTCTATTTATACCCATTCCCTCATGGGTTGTTGCTATTTAAACGATCGCTGACTGGCACCAGTTCCTCATCAAATATTCTCTATATCTCATCTTTCACACAATCTCATTATCTCTATGGAGATGCTCTTGTTTCTGAACGAATCATAAATCTTTCATAGGTTTCGTATGTGGAGTACTGTTTTATGGCGCTTATGTGTATTCGTATGCGCAGAATGTGGGAATGCCAATTATAGGGGTGCCGAGGTGCCTTATAAAACCCTTTTCTGTGCCTGTGACATTTCCTTTTTCGGTCAAAAAGAATATCCGAATTTTAGATTTGGACCCTCGTACAGAAGCTTATTGTCTAAGCCTGAATTCAGTCTGCTTTAAACGGCTTCCGCGGAGGAAATATTTCCATCTCTTGAATTCGTACAACATTAAACGTGTGTTGGGAGTCGTATACTGTTAGGGTCTGTAAACTTGTGAACTCTCGGCAAATGCCTTGGTGCAATTACGTAATTTTAGCCGCTGAGAAGCGGATGGTAATGAGACAAGTTGATATCAAACAGATACATATTTAAAAGAGGGTACCGCTAATTTAGCAGGGCAGTATTATTGTAGTTTGATATGTACGGCTAACTGAACCTAAGTAGGGATATGAGAGTAAGAACGTTCGGCTACTCTTCTTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTTAAAAGGTTATTAAAGTTCCGCACAAAGAACGCTTGGAAATCGCATTCATCAAAGAACAACTCTTCGTTTTCCAAACAATCTTCCCGAAAAAGTAGCCGTTCATTTCCCTTCCGATTTCATTCCTAGACTGCCAAATTTTTCTTGCTCATTTATAATGATTGATAAGAATTGTATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGATGTTAAAAAATTATTATTTTCTTCATAAAGAAG I have all my ducks in a row to implement primer3 (done), blat, blastn, tblastn, tblastx, genscan, and rePCR. I will do some reworking of the GET params to allow specification of parameters (e.g. required primer size range) using the property filter syntax. This server will require both a type= and overlaps= filter for all requests so that it can do a backend GET on the sequence from the main das server. Gregg, please take a look and let me know if this is roughly suitable for Genoviz. -Allen From aloraine at gmail.com Tue Aug 1 17:02:53 2006 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 1 Aug 2006 12:02:53 -0500 Subject: [DAS2] Fwd: [MOBY-dev] Java Web Services: part 2 In-Reply-To: <44CF83BD.90103@ucalgary.ca> References: <44CF83BD.90103@ucalgary.ca> Message-ID: <83722dde0608011002k4d2f67dfmc65e37e97a2bb851@mail.gmail.com> Greetings, If you are interested in keeping up with BioMoby developments, this may be of interest. Cheers, Ann On 8/1/06, Paul Gordon wrote: > Hi all, > > I have just committed some new code for creating MOBY Java servlets. > It's intended for Extremely Lazy Programmers (such as myself), requiring > that you download just a particular WAR. No CVS, Axis, Ant, etc. > required. Some of my coworkers who have never deployed an servlet, or > knew anything about MOBY were able to have a registered, tested service > within 30 minutes! Hopefully this will be of use to some of you too... > > http://biomoby.open-bio.org/CVS_CONTENT/moby-live/Java/docs/deployingServices.html > > Regards, > > Paul > > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From gilmanb at pantherinformatics.com Thu Aug 10 15:09:43 2006 From: gilmanb at pantherinformatics.com (Brian Gilman) Date: Thu, 10 Aug 2006 11:09:43 -0400 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: References: Message-ID: <44DB4C37.6040704@pantherinformatics.com> Trying to get a features document? Hello Greg et al. I'm desperately trying to get a features document out of one of the DAS 2 servers and have not been able to do it yet. Can someone help me out!? Thanks! -B Helt,Gregg wrote: >Affymetrix is hosting a DAS/2 code sprint on August 14-18, to coincide >with the CSB conference at Stanford. The sprint will be held at Affy's >Santa Clara location, which is about a 20 minute drive from the Stanford >campus. For those attending CSB, the proximity should make it easy to >join in, even if it's just for a morning or afternoon. We can provide >transportation to and from CSB if needed. If you are interested in >attending please email me, and specify whether you'll need a workstation >or will be bringing your own laptop. > >This is a code sprint, so the focus will be on DAS/2 client and server >implementations. As with previous sprints I'd like to start each day >with a teleconference at 9 AM Pacific time. If you can't be there >physically but still want to participate, please join in! > > Gregg > > >_______________________________________________ >DAS2 mailing list >DAS2 at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/das2 > > > > From Gregg_Helt at affymetrix.com Thu Aug 10 16:39:12 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 10 Aug 2006 09:39:12 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 Message-ID: Apologies, it looks like we're currently having some problem with proxy redirection on the Affy DAS/2 server. Steve, can you check on this? When I request anything but the top level ~/sequence, I'm getting back HTTP error 502 "Bad Gateway" with the message: "The proxy server received an invalid response from an upstream server." However, I just tried the biopackages server and it is working, though response times are slower than usual (unless the response has already been cached). Here's a feature query I recently ran, so it will be returned quickly from the server cache: http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26 027736:26068042;type=SO:mRNA hope that helps, gregg > -----Original Message----- > From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] > Sent: Thursday, August 10, 2006 8:10 AM > To: Helt,Gregg > Cc: DAS/2 > Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 > > Trying to get a features document? > > Hello Greg et al. I'm desperately trying to get a features document > out of one of the DAS 2 servers and have not been able to do it yet. Can > someone help me out!? > > Thanks! > > -B > > Helt,Gregg wrote: > > >Affymetrix is hosting a DAS/2 code sprint on August 14-18, to coincide > >with the CSB conference at Stanford. The sprint will be held at Affy's > >Santa Clara location, which is about a 20 minute drive from the Stanford > >campus. For those attending CSB, the proximity should make it easy to > >join in, even if it's just for a morning or afternoon. We can provide > >transportation to and from CSB if needed. If you are interested in > >attending please email me, and specify whether you'll need a workstation > >or will be bringing your own laptop. > > > >This is a code sprint, so the focus will be on DAS/2 client and server > >implementations. As with previous sprints I'd like to start each day > >with a teleconference at 9 AM Pacific time. If you can't be there > >physically but still want to participate, please join in! > > > > Gregg > > > > > >_______________________________________________ > >DAS2 mailing list > >DAS2 at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/das2 > > > > > > > > From Steve_Chervitz at affymetrix.com Thu Aug 10 20:14:48 2006 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Thu, 10 Aug 2006 13:14:48 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: Message-ID: The netaffxdas das/2 server is back up now. Turned out to be a memory trouble. The server got some whomping queries thrown at it, such as these: M_musculus_Aug_2005/features?overlaps=chr1/0:194923535;type=mrna;format=bps H_sapiens_Mar_2006/features?overlaps=chr20/0:62435964;type=mrna;format=bps Which it could not complete due to out of memory errors. But it could handle this sizeable query even after the above failed: H_sapiens_May_2004/features?overlaps=chr20/0:62435964;type=refseq;format=brs Eventually, Jetty just decided it had enough and shut down it's connection, shouting: WARN!! Stopping Acceptor ServerSocket My fix was to restart the das/2 server giving the java process another 200M of maximal heap. However, both das/1 and das/2 servers can now potentially claim 89% of physical ram on that box, which could become unhealthy. Ed notes that we might want to prevent such big queries in the first place. There is an error code in the das spec for this (HTTP error 413 "Request Entity Too Large"). But how do we determine the what's a reasonable maximum allowable query result? It will depend on the feature density on a particular assembly. This could be a good action item for the code sprint. Steve > From: "Helt,Gregg" > Date: Thu, 10 Aug 2006 09:39:12 -0700 > To: Brian Gilman > Cc: DAS/2 , "Chervitz, Steve" > > Conversation: [DAS2] DAS/2 Code Sprint, August 14-18 > Subject: RE: [DAS2] DAS/2 Code Sprint, August 14-18 > > Apologies, it looks like we're currently having some problem with proxy > redirection on the Affy DAS/2 server. Steve, can you check on this? > When I request anything but the top level ~/sequence, I'm getting back > HTTP error 502 "Bad Gateway" with the message: > "The proxy server received an invalid response from an upstream server." > > However, I just tried the biopackages server and it is working, though > response times are slower than usual (unless the response has already > been cached). Here's a feature query I recently ran, so it will be > returned quickly from the server cache: > > http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26 > 027736:26068042;type=SO:mRNA > > hope that helps, > gregg > >> -----Original Message----- >> From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] >> Sent: Thursday, August 10, 2006 8:10 AM >> To: Helt,Gregg >> Cc: DAS/2 >> Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 >> >> Trying to get a features document? >> >> Hello Greg et al. I'm desperately trying to get a features > document >> out of one of the DAS 2 servers and have not been able to do it yet. > Can >> someone help me out!? >> >> Thanks! >> >> -B >> >> Helt,Gregg wrote: >> >>> Affymetrix is hosting a DAS/2 code sprint on August 14-18, to > coincide >>> with the CSB conference at Stanford. The sprint will be held at > Affy's >>> Santa Clara location, which is about a 20 minute drive from the > Stanford >>> campus. For those attending CSB, the proximity should make it easy > to >>> join in, even if it's just for a morning or afternoon. We can > provide >>> transportation to and from CSB if needed. If you are interested in >>> attending please email me, and specify whether you'll need a > workstation >>> or will be bringing your own laptop. >>> >>> This is a code sprint, so the focus will be on DAS/2 client and > server >>> implementations. As with previous sprints I'd like to start each day >>> with a teleconference at 9 AM Pacific time. If you can't be there >>> physically but still want to participate, please join in! >>> >>> Gregg >>> >>> >>> _______________________________________________ >>> DAS2 mailing list >>> DAS2 at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/das2 >>> >>> >>> >>> > From gilmanb at pantherinformatics.com Thu Aug 10 20:18:20 2006 From: gilmanb at pantherinformatics.com (Brian Gilman) Date: Thu, 10 Aug 2006 16:18:20 -0400 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: References: Message-ID: Eh hem, sorry...I was playing around.... -B -- Brian Gilman President Panther Informatics Inc. E-Mail: gilmanb at pantherinformatics.com gilmanb at jforge.net AIM: gilmanb1 01000010 01101001 01101111 01001001 01101110 01100110 01101111 01110010 01101101 01100001 01110100 01101001 01100011 01101001 01100001 01101110 On Aug 10, 2006, at 4:14 PM, Steve Chervitz wrote: > The netaffxdas das/2 server is back up now. Turned out to be a memory > trouble. The server got some whomping queries thrown at it, such as > these: > > M_musculus_Aug_2005/features? > overlaps=chr1/0:194923535;type=mrna;format=bps > > H_sapiens_Mar_2006/features? > overlaps=chr20/0:62435964;type=mrna;format=bps > > Which it could not complete due to out of memory errors. But it > could handle > this sizeable query even after the above failed: > > H_sapiens_May_2004/features? > overlaps=chr20/0:62435964;type=refseq;format=brs > > Eventually, Jetty just decided it had enough and shut down it's > connection, > shouting: WARN!! Stopping Acceptor ServerSocket > > My fix was to restart the das/2 server giving the java process > another 200M > of maximal heap. However, both das/1 and das/2 servers can now > potentially > claim 89% of physical ram on that box, which could become unhealthy. > > Ed notes that we might want to prevent such big queries in the > first place. > There is an error code in the das spec for this (HTTP error 413 > "Request > Entity Too Large"). But how do we determine the what's a reasonable > maximum > allowable query result? It will depend on the feature density on a > particular assembly. This could be a good action item for the code > sprint. > > Steve > > > >> From: "Helt,Gregg" >> Date: Thu, 10 Aug 2006 09:39:12 -0700 >> To: Brian Gilman >> Cc: DAS/2 , "Chervitz, Steve" >> >> Conversation: [DAS2] DAS/2 Code Sprint, August 14-18 >> Subject: RE: [DAS2] DAS/2 Code Sprint, August 14-18 >> >> Apologies, it looks like we're currently having some problem with >> proxy >> redirection on the Affy DAS/2 server. Steve, can you check on this? >> When I request anything but the top level ~/sequence, I'm getting >> back >> HTTP error 502 "Bad Gateway" with the message: >> "The proxy server received an invalid response from an upstream >> server." >> >> However, I just tried the biopackages server and it is working, >> though >> response times are slower than usual (unless the response has already >> been cached). Here's a feature query I recently ran, so it will be >> returned quickly from the server cache: >> >> http://das.biopackages.net/das/genome/human/17/feature? >> overlaps=chr21/26 >> 027736:26068042;type=SO:mRNA >> >> hope that helps, >> gregg >> >>> -----Original Message----- >>> From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] >>> Sent: Thursday, August 10, 2006 8:10 AM >>> To: Helt,Gregg >>> Cc: DAS/2 >>> Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 >>> >>> Trying to get a features document? >>> >>> Hello Greg et al. I'm desperately trying to get a features >> document >>> out of one of the DAS 2 servers and have not been able to do it yet. >> Can >>> someone help me out!? >>> >>> Thanks! >>> >>> -B >>> >>> Helt,Gregg wrote: >>> >>>> Affymetrix is hosting a DAS/2 code sprint on August 14-18, to >> coincide >>>> with the CSB conference at Stanford. The sprint will be held at >> Affy's >>>> Santa Clara location, which is about a 20 minute drive from the >> Stanford >>>> campus. For those attending CSB, the proximity should make it easy >> to >>>> join in, even if it's just for a morning or afternoon. We can >> provide >>>> transportation to and from CSB if needed. If you are interested in >>>> attending please email me, and specify whether you'll need a >> workstation >>>> or will be bringing your own laptop. >>>> >>>> This is a code sprint, so the focus will be on DAS/2 client and >> server >>>> implementations. As with previous sprints I'd like to start >>>> each day >>>> with a teleconference at 9 AM Pacific time. If you can't be there >>>> physically but still want to participate, please join in! >>>> >>>> Gregg >>>> >>>> >>>> _______________________________________________ >>>> DAS2 mailing list >>>> DAS2 at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/das2 >>>> >>>> >>>> >>>> >> > > From Gregg_Helt at affymetrix.com Thu Aug 10 22:25:24 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 10 Aug 2006 15:25:24 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 Message-ID: Hmm... those queries really shouldn't stretch memory requirements too much -- the mrna objects are already in memory, so for the most part any extra memory is taken up by the output streaming through the server. Steve, can you send me the log file for the server when it was hitting these out-of-memory errors? Thanks, Gregg > -----Original Message----- > From: Chervitz, Steve > Sent: Thursday, August 10, 2006 1:15 PM > To: Helt,Gregg; Brian Gilman > Cc: DAS/2 > Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 > > The netaffxdas das/2 server is back up now. Turned out to be a memory > trouble. The server got some whomping queries thrown at it, such as these: > > M_musculus_Aug_2005/features?overlaps=chr1/0:194923535;type=mrna;format= bp > s > > H_sapiens_Mar_2006/features?overlaps=chr20/0:62435964;type=mrna;format=b ps > > Which it could not complete due to out of memory errors. But it could > handle > this sizeable query even after the above failed: > > H_sapiens_May_2004/features?overlaps=chr20/0:62435964;type=refseq;format =b > rs > > Eventually, Jetty just decided it had enough and shut down it's connection, > shouting: WARN!! Stopping Acceptor ServerSocket > > My fix was to restart the das/2 server giving the java process another > 200M > of maximal heap. However, both das/1 and das/2 servers can now potentially > claim 89% of physical ram on that box, which could become unhealthy. > > Ed notes that we might want to prevent such big queries in the first place. > There is an error code in the das spec for this (HTTP error 413 "Request > Entity Too Large"). But how do we determine the what's a reasonable > maximum > allowable query result? It will depend on the feature density on a > particular assembly. This could be a good action item for the code sprint. > > Steve > > > > > From: "Helt,Gregg" > > Date: Thu, 10 Aug 2006 09:39:12 -0700 > > To: Brian Gilman > > Cc: DAS/2 , "Chervitz, Steve" > > > > Conversation: [DAS2] DAS/2 Code Sprint, August 14-18 > > Subject: RE: [DAS2] DAS/2 Code Sprint, August 14-18 > > > > Apologies, it looks like we're currently having some problem with proxy > > redirection on the Affy DAS/2 server. Steve, can you check on this? > > When I request anything but the top level ~/sequence, I'm getting back > > HTTP error 502 "Bad Gateway" with the message: > > "The proxy server received an invalid response from an upstream server." > > > > However, I just tried the biopackages server and it is working, though > > response times are slower than usual (unless the response has already > > been cached). Here's a feature query I recently ran, so it will be > > returned quickly from the server cache: > > > > http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26 > > 027736:26068042;type=SO:mRNA > > > > hope that helps, > > gregg > > > >> -----Original Message----- > >> From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] > >> Sent: Thursday, August 10, 2006 8:10 AM > >> To: Helt,Gregg > >> Cc: DAS/2 > >> Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 > >> > >> Trying to get a features document? > >> > >> Hello Greg et al. I'm desperately trying to get a features > > document > >> out of one of the DAS 2 servers and have not been able to do it yet. > > Can > >> someone help me out!? > >> > >> Thanks! > >> > >> -B > >> > >> Helt,Gregg wrote: > >> > >>> Affymetrix is hosting a DAS/2 code sprint on August 14-18, to > > coincide > >>> with the CSB conference at Stanford. The sprint will be held at > > Affy's > >>> Santa Clara location, which is about a 20 minute drive from the > > Stanford > >>> campus. For those attending CSB, the proximity should make it easy > > to > >>> join in, even if it's just for a morning or afternoon. We can > > provide > >>> transportation to and from CSB if needed. If you are interested in > >>> attending please email me, and specify whether you'll need a > > workstation > >>> or will be bringing your own laptop. > >>> > >>> This is a code sprint, so the focus will be on DAS/2 client and > > server > >>> implementations. As with previous sprints I'd like to start each day > >>> with a teleconference at 9 AM Pacific time. If you can't be there > >>> physically but still want to participate, please join in! > >>> > >>> Gregg > >>> > >>> > >>> _______________________________________________ > >>> DAS2 mailing list > >>> DAS2 at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/das2 > >>> > >>> > >>> > >>> > > From aloraine at gmail.com Mon Aug 14 06:13:10 2006 From: aloraine at gmail.com (Ann Loraine) Date: Sun, 13 Aug 2006 23:13:10 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: <6dce9a0b0608132227o12924d90ud8a8cca329b30fb@mail.gmail.com> References: <83722dde0607240911w4d50b9cfo43adff514f6df39c@mail.gmail.com> <6dce9a0b0608132227o12924d90ud8a8cca329b30fb@mail.gmail.com> Message-ID: <83722dde0608132313g4ec0cdf5p990284ad00b0d17@mail.gmail.com> Hi, Last I heard, it's starting Monday (the 14th), beginning with a conference call at 9 am. Directions: http://www.affymetrix.com/site/contact/directions.jsp?loc=sc Best, Ann On 8/13/06, Lincoln Stein wrote: > Hi, > > Is the code sprint starting on the 13th or the 14th? I am here in Palo Alto > and have Monday morning free. > > Can I get driving directions from the Affy web site? > > Lincoln > > On 7/24/06, Ann Loraine wrote: > > Hi Gregg, > > > > I would like to suggest shifting the code spring by a day and have it > > start Monday August 13. > > > > That way it won't perfectly overlap the conference and those us who > > need to be at the conference full-time (such as myself) will be able > > to visit the code spring. > > > > Cheers, > > > > Ann > > > > On 7/24/06, Helt,Gregg wrote: > > > Affymetrix is hosting a DAS/2 code sprint on August 14-18, to coincide > > > with the CSB conference at Stanford. The sprint will be held at Affy's > > > Santa Clara location, which is about a 20 minute drive from the Stanford > > > campus. For those attending CSB, the proximity should make it easy to > > > join in, even if it's just for a morning or afternoon. We can provide > > > transportation to and from CSB if needed. If you are interested in > > > attending please email me, and specify whether you'll need a workstation > > > or will be bringing your own laptop. > > > > > > This is a code sprint, so the focus will be on DAS/2 client and server > > > implementations. As with previous sprints I'd like to start each day > > > with a teleconference at 9 AM Pacific time. If you can't be there > > > physically but still want to participate, please join in! > > > > > > Gregg > > > > > > > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/das2 > > > > > > > > > -- > > Ann Loraine > > Assistant Professor > > Section on Statistical Genetics > > University of Alabama at Birmingham > > http://www.ssg.uab.edu > > http://www.transvar.org > > _______________________________________________ > > DAS2 mailing list > > DAS2 at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/das2 > > > > > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From allenday at ucla.edu Mon Aug 14 07:14:49 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 14 Aug 2006 00:14:49 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: References: Message-ID: <5c24dcc30608140014u3d9dd1b5w9e487e142d1ca077@mail.gmail.com> Ah, I may implement this. Let's discuss tomorrow morning. Is there an agenda set? Is anyone teleconferencing in? -Allen On 8/10/06, Chervitz, Steve wrote: > > The netaffxdas das/2 server is back up now. Turned out to be a memory > trouble. The server got some whomping queries thrown at it, such as these: > > > M_musculus_Aug_2005/features?overlaps=chr1/0:194923535;type=mrna;format=bps > > H_sapiens_Mar_2006/features?overlaps=chr20/0:62435964;type=mrna;format=bps > > Which it could not complete due to out of memory errors. But it could > handle > this sizeable query even after the above failed: > > > H_sapiens_May_2004/features?overlaps=chr20/0:62435964;type=refseq;format=brs > > Eventually, Jetty just decided it had enough and shut down it's > connection, > shouting: WARN!! Stopping Acceptor ServerSocket > > My fix was to restart the das/2 server giving the java process another > 200M > of maximal heap. However, both das/1 and das/2 servers can now potentially > claim 89% of physical ram on that box, which could become unhealthy. > > Ed notes that we might want to prevent such big queries in the first > place. > There is an error code in the das spec for this (HTTP error 413 "Request > Entity Too Large"). But how do we determine the what's a reasonable > maximum > allowable query result? It will depend on the feature density on a > particular assembly. This could be a good action item for the code sprint. > > Steve > > > > > From: "Helt,Gregg" > > Date: Thu, 10 Aug 2006 09:39:12 -0700 > > To: Brian Gilman > > Cc: DAS/2 , "Chervitz, Steve" > > > > Conversation: [DAS2] DAS/2 Code Sprint, August 14-18 > > Subject: RE: [DAS2] DAS/2 Code Sprint, August 14-18 > > > > Apologies, it looks like we're currently having some problem with proxy > > redirection on the Affy DAS/2 server. Steve, can you check on this? > > When I request anything but the top level ~/sequence, I'm getting back > > HTTP error 502 "Bad Gateway" with the message: > > "The proxy server received an invalid response from an upstream server." > > > > However, I just tried the biopackages server and it is working, though > > response times are slower than usual (unless the response has already > > been cached). Here's a feature query I recently ran, so it will be > > returned quickly from the server cache: > > > > http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26 > > 027736:26068042;type=SO:mRNA > > > > hope that helps, > > gregg > > > >> -----Original Message----- > >> From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] > >> Sent: Thursday, August 10, 2006 8:10 AM > >> To: Helt,Gregg > >> Cc: DAS/2 > >> Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 > >> > >> Trying to get a features document? > >> > >> Hello Greg et al. I'm desperately trying to get a features > > document > >> out of one of the DAS 2 servers and have not been able to do it yet. > > Can > >> someone help me out!? > >> > >> Thanks! > >> > >> -B > >> > >> Helt,Gregg wrote: > >> > >>> Affymetrix is hosting a DAS/2 code sprint on August 14-18, to > > coincide > >>> with the CSB conference at Stanford. The sprint will be held at > > Affy's > >>> Santa Clara location, which is about a 20 minute drive from the > > Stanford > >>> campus. For those attending CSB, the proximity should make it easy > > to > >>> join in, even if it's just for a morning or afternoon. We can > > provide > >>> transportation to and from CSB if needed. If you are interested in > >>> attending please email me, and specify whether you'll need a > > workstation > >>> or will be bringing your own laptop. > >>> > >>> This is a code sprint, so the focus will be on DAS/2 client and > > server > >>> implementations. As with previous sprints I'd like to start each day > >>> with a teleconference at 9 AM Pacific time. If you can't be there > >>> physically but still want to participate, please join in! > >>> > >>> Gregg > >>> > >>> > >>> _______________________________________________ > >>> DAS2 mailing list > >>> DAS2 at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/das2 > >>> > >>> > >>> > >>> > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > From steve_chervitz at affymetrix.com Mon Aug 14 08:56:31 2006 From: steve_chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 14 Aug 2006 01:56:31 -0700 (PDT) Subject: [DAS2] DAS/2 Code Sprint, August 14-18 In-Reply-To: <5c24dcc30608140014u3d9dd1b5w9e487e142d1ca077@mail.gmail.com> References: <5c24dcc30608140014u3d9dd1b5w9e487e142d1ca077@mail.gmail.com> Message-ID: On Mon, 14 Aug 2006, Allen Day wrote: > Ah, I may implement this. Let's discuss tomorrow morning. Is there an > agenda set? Is anyone teleconferencing in? I haven't seen a specific agenda, but something like this seems reasonable: * status reports, including what you want to focus on for the sprint * establish a prioitized list of goals and deliverables for the current sprint Teleconferencing will start at 9AM PST on the usual number: TEL=800-531-3250 (US) or 303-928-2693 (Int'l) ID=2879055 PIN=1365 Steve > > On 8/10/06, Chervitz, Steve wrote: >> >> The netaffxdas das/2 server is back up now. Turned out to be a memory >> trouble. The server got some whomping queries thrown at it, such as these: >> >> >> M_musculus_Aug_2005/features?overlaps=chr1/0:194923535;type=mrna;format=bps >> >> H_sapiens_Mar_2006/features?overlaps=chr20/0:62435964;type=mrna;format=bps >> >> Which it could not complete due to out of memory errors. But it could >> handle >> this sizeable query even after the above failed: >> >> >> H_sapiens_May_2004/features?overlaps=chr20/0:62435964;type=refseq;format=brs >> >> Eventually, Jetty just decided it had enough and shut down it's >> connection, >> shouting: WARN!! Stopping Acceptor ServerSocket >> >> My fix was to restart the das/2 server giving the java process another >> 200M >> of maximal heap. However, both das/1 and das/2 servers can now potentially >> claim 89% of physical ram on that box, which could become unhealthy. >> >> Ed notes that we might want to prevent such big queries in the first >> place. >> There is an error code in the das spec for this (HTTP error 413 "Request >> Entity Too Large"). But how do we determine the what's a reasonable >> maximum >> allowable query result? It will depend on the feature density on a >> particular assembly. This could be a good action item for the code sprint. >> >> Steve >> >> >> >> > From: "Helt,Gregg" >> > Date: Thu, 10 Aug 2006 09:39:12 -0700 >> > To: Brian Gilman >> > Cc: DAS/2 , "Chervitz, Steve" >> > >> > Conversation: [DAS2] DAS/2 Code Sprint, August 14-18 >> > Subject: RE: [DAS2] DAS/2 Code Sprint, August 14-18 >> > >> > Apologies, it looks like we're currently having some problem with proxy >> > redirection on the Affy DAS/2 server. Steve, can you check on this? >> > When I request anything but the top level ~/sequence, I'm getting back >> > HTTP error 502 "Bad Gateway" with the message: >> > "The proxy server received an invalid response from an upstream server." >> > >> > However, I just tried the biopackages server and it is working, though >> > response times are slower than usual (unless the response has already >> > been cached). Here's a feature query I recently ran, so it will be >> > returned quickly from the server cache: >> > >> > http://das.biopackages.net/das/genome/human/17/feature?overlaps=chr21/26 >> > 027736:26068042;type=SO:mRNA >> > >> > hope that helps, >> > gregg >> > >> >> -----Original Message----- >> >> From: Brian Gilman [mailto:gilmanb at pantherinformatics.com] >> >> Sent: Thursday, August 10, 2006 8:10 AM >> >> To: Helt,Gregg >> >> Cc: DAS/2 >> >> Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 >> >> >> >> Trying to get a features document? >> >> >> >> Hello Greg et al. I'm desperately trying to get a features >> > document >> >> out of one of the DAS 2 servers and have not been able to do it yet. >> > Can >> >> someone help me out!? >> >> >> >> Thanks! >> >> >> >> -B >> >> >> >> Helt,Gregg wrote: >> >> >> >>> Affymetrix is hosting a DAS/2 code sprint on August 14-18, to >> > coincide >> >>> with the CSB conference at Stanford. The sprint will be held at >> > Affy's >> >>> Santa Clara location, which is about a 20 minute drive from the >> > Stanford >> >>> campus. For those attending CSB, the proximity should make it easy >> > to >> >>> join in, even if it's just for a morning or afternoon. We can >> > provide >> >>> transportation to and from CSB if needed. If you are interested in >> >>> attending please email me, and specify whether you'll need a >> > workstation >> >>> or will be bringing your own laptop. >> >>> >> >>> This is a code sprint, so the focus will be on DAS/2 client and >> > server >> >>> implementations. As with previous sprints I'd like to start each day >> >>> with a teleconference at 9 AM Pacific time. If you can't be there >> >>> physically but still want to participate, please join in! >> >>> >> >>> Gregg >> >>> >> >>> >> >>> _______________________________________________ >> >>> DAS2 mailing list >> >>> DAS2 at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/das2 >> >>> >> >>> >> >>> >> >>> >> > >> >> >> _______________________________________________ >> DAS2 mailing list >> DAS2 at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das2 >> > From Gregg_Helt at affymetrix.com Mon Aug 14 12:39:07 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 14 Aug 2006 05:39:07 -0700 Subject: [DAS2] DAS/2 Code Sprint, August 14-18 Message-ID: Apologies for not posting the details sooner! DAS/2 Code Sprint, August 14 (Monday) through August 18 (Friday) Conference Call, 9 AM PST every morning 800-531-3250 Conference ID: 2879055 Passcode: 1365 We're in the Computer Training room at Affymetrix Santa Clara, Building 3450 Directions to Affymetrix Building 3450 (3450 Kifer Road, Santa Clara, CA): http://www.affymetrix.com/site/contact/directions.jsp?loc=sccentral This is about a 20 minute drive from the Stanford campus. If there is no receptionist at 3420, you may need to check in at the reception area in Building 3420. Please call me on my cell phone if there are any problems finding the room: 510-205-9652 See you all soon! Gregg -----Original Message----- From: Lincoln Stein [mailto:lincoln.stein at gmail.com] Sent: Sunday, August 13, 2006 10:28 PM To: Ann Loraine Cc: Helt,Gregg; DAS/2 Subject: Re: [DAS2] DAS/2 Code Sprint, August 14-18 Hi, Is the code sprint starting on the 13th or the 14th? I am here in Palo Alto and have Monday morning free. Can I get driving directions from the Affy web site? Lincoln On 7/24/06, Ann Loraine wrote: Hi Gregg, I would like to suggest shifting the code spring by a day and have it start Monday August 13. That way it won't perfectly overlap the conference and those us who need to be at the conference full-time (such as myself) will be able to visit the code spring. Cheers, Ann On 7/24/06, Helt,Gregg wrote: > Affymetrix is hosting a DAS/2 code sprint on August 14-18, to coincide > with the CSB conference at Stanford. The sprint will be held at Affy's > Santa Clara location, which is about a 20 minute drive from the Stanford > campus. For those attending CSB, the proximity should make it easy to > join in, even if it's just for a morning or afternoon. We can provide > transportation to and from CSB if needed. If you are interested in > attending please email me, and specify whether you'll need a workstation > or will be bringing your own laptop. > > This is a code sprint, so the focus will be on DAS/2 client and server > implementations. As with previous sprints I'd like to start each day > with a teleconference at 9 AM Pacific time. If you can't be there > physically but still want to participate, please join in! > > Gregg > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org _______________________________________________ DAS2 mailing list DAS2 at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Gregg_Helt at affymetrix.com Mon Aug 14 12:48:15 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 14 Aug 2006 05:48:15 -0700 Subject: [DAS2] DAS/2 Code Sprint Details, August 14-18 Message-ID: Apologies for not posting the details sooner! DAS/2 Code Sprint, August 14 (Monday) through August 18 (Friday) Conference Call, 9 AM PST every morning US: 800-531-3250, International: 303-928-2693 Conference ID: 2879055 Passcode: 1365 We're in the Computer Training room at Affymetrix Santa Clara, Building 3450 Directions to Affymetrix Building 3450 (3450 Kifer Road, Santa Clara, CA): http://www.affymetrix.com/site/contact/directions.jsp?loc=sccentral This is about a 20 minute drive from the Stanford campus. If there is no receptionist at 3420, you may need to check in at the reception area in Building 3420. Please call me on my cell phone if there are any problems finding the room: 510-205-9652 See you all soon! Gregg From dalke at dalkescientific.com Mon Aug 14 16:20:03 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 14 Aug 2006 18:20:03 +0200 Subject: [DAS2] Fwd: DAS/2 code sprint next week! Message-ID: <4b1a3bf29da8f435273d2b25013d15cf@dalkescientific.com> Begin forwarded message: > From: Andrew Dalke > Date: August 14, 2006 6:00:30 PM GMT+02:00 > To: "Helt,Gregg" > Subject: Re: DAS/2 code sprint next week! > >> We?re hosting another DAS/2 code sprint next week at Affy Santa >> Clara, to coincide with the CSB meeting at Stanford.? Will you be >> able to join in?? If not in person, then we?re having a daily 9 AM >> PST conference call you could join. > > I'll be there. It starts in a few minutes. I'm in a cybercafe in > Cape Town. > >> ?I?m wondering what the status of the DAS/2 writeback spec is. > > It's unchanged. I'll be working on that over the sprint. > > I've spent most of the last, month+ catching up on the latest in > web development systems for Python, and learning various libraries. > Including giving a 2 week course on it. As my test case I've > been working on a DAS2 server. I have a reference server nearly > finished and a few things came up during it: > > - I'm iffy about the current SEGMENTS document. It lists > "title" and "reference" for each segment but not for the list of > segments as a whole. Does it make sense allowing those to be > specified if they have reasonable names? (I know they don't always.) > > It's part of that separation between the sources document, which > describes these, and the segments document. > > - did we specify that the sequence is in upper-case, lower-case, > etc.? > > - I would like some experience with an agp or other assembly format. > I'm concerned about how a client can piece together segment names > from that document with the URIs we're using. It seems to me that > most places use a local name ("yeast_1" or somesuch) which is not > exposed via the web. If the assembly document, fasta file, etc. > use the local name and not the URL then it's hard to tie them together. > > - There are two different segment titles I've come across. One > is the name you want to see in a pull-down menu, etc. while the > other is the text you want in the FASTA header . These could be the > same but I don't think they are always the same. > Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Mon Aug 14 18:30:35 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 14 Aug 2006 11:30:35 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day one, 14 Aug 2006 Message-ID: Notes from DAS/2 code sprint #3, day one, 14 Aug 2006 $Id: das2-teleconf-2006-08-14.txt,v 1.2 2006/08/14 18:28:47 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke Panther Informatics: Brian Gilman UAB: All Loraine UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: * Status reports, including what you want/need to focus on for this sprint, progress from last sprint. Status Reports --------------- gh: have done writeback work. IGB can create curation, post to biopackages writeback server, das/2 client can see curations. no editing yet. client can edit own data models, can't post those edits. to work on ID mapping stuff: client can't accept newly create ids from server. currently just holds onto temporary id's. IGB client has had one or more release since last. priorities - mainly writeback for client. ls: continue working on perl client interface to das/2, not functional at present. need to backout changes since last sprint. das/2 tracks in gbrowse. About 10hrs needed. sc: have been working on keeping data on Affymetrix public das servers up to date, dealing with memory issues cause by increasing amount of array data to support. Gregg has new efficient format for modeling exon array features with lower memory requirements. Will work on getting the das server to use it. Long-term plan is to remove our das/1 server and just have das/2, easier to use and maintain. Complete transition will take time though. Have continued working to automate the pipeline for updating the affy das servers. Have a new page that lists available data on the servers, currently manually created but plan to automate. ad: web dev in python, taught course on that. plan: getting python server up, to experiment with writeback. updating spec as per a couple of months ago. gh: andrew will make spec a top priority, grant is funding for that. bg: tasked to take das/2 data and produce set of objects to use within caCORE system at NCI. Have objects for das/2 data and service. can retrieve das/2 data from affy server. present in simple web page. Using java and ruby. gh: good week to ask questions as you flesh out the impl. ee: gregg and I will put out new IGB release this week. can work on style sheets (left over from last time). Or can build a gff3 parser into IGB (lots of excitement!). al: two things: demo applications for self and collaborators and das newbies. retrieve genomic locations for targets of affy probe sets and then retrieve promoter regions upstream. gh: promoter data in das2 server? al: can just say 500bp upstream of gene. not identifying control. Just retrieve seq to pipe into control analysis. Second one: meta analysis, results from diff groups for associated phenotypes. Input: list of markers, output: annotations associated with these. Statistical analysis. Ultimately obtain candidate genes associated with markers. Some preliminary work on obesity that looks promising. [A] Steve will help Ann convert fly probe set ids into genome locations. Goal is to write something that can do random sampling of gene annotations. ideal world: das server gets region, returns gene ids and go ids. Less ideal: just get genes within the peaks (from association studies). bo: doing rpm packaging for the mac (tgen). so people can set up das2 server on a mac. update rpm packages with results of work this week. clean up bug queue on biopackages server impl, bringing it up to spec. can talk about analysis part of server. internal hirax client for retrieval of assay data. communication with server is out of sync. Spec issues: ------------ gh: want to focus on writeback. wants full xml features rather than mapping document. aday: work on writes as well as deletes. Impl 413 entity request too large adding this for requests that exceed some size threshold (10kb, 100kb) if at or below, OK. gh: need to coord with me on writeback, I focus on client writeback, you on server. Editing is ok. Deletes are harder. Other Issues: ------------- gh: Contact peter good about funding. Extending from 2yr to 3yr. talk with lincoln and suzi about plans for next grant. sc: status of bugzilla open bugs on spec? [A] Someone should go through and update bugzilla list for spec bg: version field. gh: not too understandable. at last sprint, two freezes, the version tells which v of spec freeze the server is using. assumption is that now the servers are using the most recent spec. If they're not compliant, please let us know. affy server: won't give back a list of all features. requires an overlaps and types restrictor. biopackages: should be good with latest spec. bg: sources document, source tag has version. if you do a query like types, also has version? No. ad: sources document: worm 161 (data source). capabilities describe things like writeback support for v161, but not v160. bg: that version seems to have different sematics given query. biggest issue was parsing and populating my object model. gh: coordinate subelement in version elem. has a version attr. my client does not deal with coord stuff. meant to make sure that annots from two servers are refering to same coords, so you can overlay annots from different servers. my client is using version URIs for that instead. bg: other issue: in order to know what server you're hitting, you have to know name space of doc, which has base URI. XML base in segments query. xmlns biodas.org/das2. to have tracability in documents you receive, you as implementer must track urls, converting relative to absolute. can be a problem when hitting 5 different servers. gh: my obj model (client) has model of server with root url of the das server, sources objects which has xml base of each source. bg: you could get back a 404 from xml:base. Perfectly apropriate. server could put whatever it wants in xml:base. currently it's the document. ad: we're using the xml:base spec, so you can put xml:base on any node you want to. construct full url by. gh: in our schema is it clear which attribs are resolved by xml:base? ad: no. bg: would like to see one big document with every element, not several different files. relaxNG isn't best format. would like a w3c XSD that defines the elements. from coders standpoint, don't have to go and look at 5 different docs. Have to have multiple windows up, figure out how they are connected to each other. semantics within each query, who is calling what. ad: I gave brian one. using trang to spit it out. bg: trang is not best xml schema writer. I could work on this. why do you use relaxNG? ad: I can read it and understand it. there were good examples. bg: I can autgenerate code that is in XSD, soap and other wservices stuff does that for you. Can generate a parser, point it a uri, get doc, generate a parser and object model. ad: parser would break if server returns extra attributes. In spec there are some extension points. can put any element that is in a separate namespace. I know how to do that in relaxNG, but not in XSD. bg: you just have to add another xmlns. define an extension point with that namespace. ad: should be able to resolve it into one. bg: Three items. 1. will ask w3c people about XSD to relaxNG. 2. semantics confusion. 3. xml:base appropriate to supply a 404 if client was dependent on that attribute. ad: version tag is problem if there are duplicates. should be changed so there are no duplicates. can build parser on rng bg: it's experimental, alpha s'ware. don't want to use for production. bg: when you put a relative url inside a xml:base. ad: resolvable via http, or in abolute url. gh: if you resolve it up to the top level doc, then use the url of the document itself. whether clients actual do this, depends on impl. say to implementers, we could state that the top level document should resolve to absolute url. we wanted to say, "Das/2 uses xml:base spec. period." bg: put this in the spec, how you want it to be used. ad: don't like saying, "we use xml:base with these additional things" bg: can put off for now. ls: In my library when I see a url and can't resolve, I fall back to a hard coded url. From dalke at dalkescientific.com Mon Aug 14 19:29:54 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 14 Aug 2006 21:29:54 +0200 Subject: [DAS2] duplicate use of VERSION Message-ID: <3f8e5c60e918cd90eabe6597403a5448@dalkescientific.com> Brian G. pointed out that "VERSION" is used twice in the spec, with different meanings. I thought we used it twice as an element but that's not the case. It's used once as "versioned source" element and another time as an attribute in the COORDINATES element # This is the version of the build (if a genomic sequence). # However, protein databases don't do versions this way attribute version { text }?, In looking around I don't see duplicate uses of any tag for elements with different meanings. Brian? Is this the one you were talking about? In thinking about it though, I've found it awkward to talk about "versioned source". First off, the Mac's Mail.app gives squiggles under the "versioned" indicating a misspelling. Second, it's hard to say and annoying to write "versioned_source" in my code and in the documentation. I would like to use "release" instead. That is, change das2:VERSION to das2:RELEASE. That's a shorter word, closer to the intended meaning, and generally nicer. Eg, "there are many data sources and each source may have multiple releases." That's a simple change but it's highly non-backwards compatible. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Aug 14 19:46:23 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 14 Aug 2006 21:46:23 +0200 Subject: [DAS2] duplicate use of VERSION In-Reply-To: <3f8e5c60e918cd90eabe6597403a5448@dalkescientific.com> References: <3f8e5c60e918cd90eabe6597403a5448@dalkescientific.com> Message-ID: <7e92910f143448a82fae138d41a7e195@dalkescientific.com> > In looking around I don't see duplicate uses of any tag for > elements with different meanings. I should have added... Even though they are not duplicate element tags, they should not have the same name as it causes confusion. For example, someone seeing "version" may think it is the name/uri/url for a VERSION element when it is absolutely not. > I would like to use "release" instead. That is, change > das2:VERSION to das2:RELEASE. Still would like it. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Aug 14 21:38:27 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 14 Aug 2006 23:38:27 +0200 Subject: [DAS2] mapping document Message-ID: Been thinking about the response to a writeback. The spec said the server responds with a mapping document saying "uploaded id X is now Y". As per discussion this will now return a features document. Each feature element may contain a new attribute "was" if its URI changed. This happens for one of two reasons: - the client created the feature using the private naming scheme - the server supports versioning and each feature version gets its own identifier Perhaps also "the server's ornery and jest feels like it." I had written the spec so a server could optionally implement type writeback. With this change that is not possible. It's possible to have a new return document which combines features and types (which is very similar to the current writeback spec). However, type writeback was not considered a high priority and none of the servers under development will support such thing. (Correct?) If needed we have extension mechanisms by which that can be supported in the future. questions: - I wrote above that the new attribute is named "was", as in The word "was" is wrong. Otherwise the new version should be "is", and not "uri". Other options are "previously", "old_uri", "prev_uri", "previous_uri", "uri_was" I can't find old discussion on this. Anyone one not like "old_uri" and have a better name? - anyone want type writeback in this version of the spec? if not i'll remove all traces of it from the spec. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Aug 14 22:27:49 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 00:27:49 +0200 Subject: [DAS2] relative URLs and xml:base in the writeback document Message-ID: On the topic of relative URLs ... The writeback document contains FEATURE elements. Because we aren't supporting types I want to change the writeback document so it looks like this Reason for the change ... ... Problem #1: if I lift the existing FEATURE element definition then the uri attributes may contain relative URIs and the FEATURE element may contain an xml:base attribute. We can also have that "WRITEBACK" contains an xml:base attribute. What happens if after all of that the writeback URI is still a relative URL? How does the server convert the relative URL into an absolute one? Does it use the writeback URL as the document base? That's the only one which comes close to making sense, but it doesn't make much sense. No client in its right mind will deconvolute the feature uris to be relative urls with respect to the writeback URL (which, after all, may be on an entirely different machine). I checked the xml:base spec http://www.w3.org/TR/xmlbase/ and it refers to the URI RFC 2396 http://www.ietf.org/rfc/rfc2396.txt These are both defined in terms of document retrieval. Eg, > If no base URI is embedded, the base URI of a document is > defined by the document's retrieval context. This makes no sense in a POST document. I think in this case it's fine to say "URIs in a writeback spec must be absolute URLs". Either they are written as absolute URLs or they are made absolute in the context of some xml:base defined in the writeback delta. What say you all? A. all URIs in writeback must be absolute - don't support xml:base at all B. URIs may be relative but must be absolute once all enclosing xml:base attributes are included C. URIs may be relative and the writeback URL itself is used as the retrieval context My vote is that the server implements B but that clients will all do A. Speaking of which, digging through the xml:base spec and the history of our discussion I see that we are free to define when xml:base is valid. We could use it only on the root element if we so desire. Right now it can be on any element. The reason we have it on every element is from the influence of this blog post: http://norman.walsh.name/2005/04/01/xinclude > Ugh. In the short term, I think there's only one answer: update your > schemas to allow xml:base either (a) everywhere or (b) everywhere you > want XInclude to be allowed. I urge you to put it everywhere as your > users are likely to want to do things you never imagined. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Aug 14 22:32:43 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 00:32:43 +0200 Subject: [DAS2] element identity Message-ID: <63f85e217e45faf272d769ba9f2fd135@dalkescientific.com> again, working on the writeback spec. The writeback spec will look like Reason for the change ... ... The response document will look like this ... ... This FEATURE element is very similar but different than the normal FEATURE element in that it has a new "old_uri" attribute. Does anyone see that as a problem? I don't, but it breaks the guideline we talked about earlier where two XML elements with the same tag must refer to the same thing. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Aug 14 23:54:39 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 01:54:39 +0200 Subject: [DAS2] updated writeback spec Message-ID: <1182e48978effe1454d7350cb9634283@dalkescientific.com> I've updated the writeback spec. Here's the log message > Respond with a modified features document instead of a mapping > document. > > Removed references to type writeback. > > Writeback URIs must be fully resolvable in the document. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Aug 15 13:46:14 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 15:46:14 +0200 Subject: [DAS2] Fwd: can't view XML from DAS2 server in IE4 or Safari Message-ID: <1f5d9f7aa32f2cdd91e60a3867e58037@dalkescientific.com> Oops! Hit "reply-to" instead of "reply-all". Begin forwarded message: > From: Andrew Dalke > Date: August 15, 2006 5:26:03 AM GMT+02:00 > To: "Ann Loraine" > Subject: Re: can't view XML from DAS2 server in IE4 or Safari > >> I'm trying to view the XML delivered from the DAS2 server in Firefox >> or IE4 without having to save it and then load it. >> >> I think this is something to do with the fact that the XML is >> delivered as type application versus XML plain text, which is what the >> DAS1 servers seem to do. > > Yes. It's a 4 year old bug in Mozilla. > https://bugzilla.mozilla.org/show_bug.cgi?id=155730 > > >> Is there a way I can tell Firefox to render the XML directly without >> my having to save it first? > > We've run into this before. I want a way to make this be less > of a problem. > > I propose that if "text/xml" is in the Accept header then the > server should return the das2xml document but with a "text/xml" > content-type. > > I tested that out on my copy of Firefox and it was a happy camper. > It showed the XML tree, though it did complain about the lack > of a stylesheet. Okay, perhaps it was more feeling okay than happy.. > > Of course another possibility is to see the "text/html" there > and show something more presentable to humans, but that makes things > worse for those like Ann who want to see the XML structure. Andrew dalke at dalkescientific.com From boconnor at ucla.edu Tue Aug 15 07:07:03 2006 From: boconnor at ucla.edu (Brian O'Connor) Date: Tue, 15 Aug 2006 00:07:03 -0700 Subject: [DAS2] updated writeback spec In-Reply-To: <1182e48978effe1454d7350cb9634283@dalkescientific.com> References: <1182e48978effe1454d7350cb9634283@dalkescientific.com> Message-ID: <44E17297.7090901@ucla.edu> Hi Andrew, During the last code sprint I used the DAS/2 validation tool you wrote to help debug the das.biopackages.net server. It was very helpful!! Has it been updated to the current spec (v 1.33 2006/04/27 on the website). What is the URL? Thanks --Brian Andrew Dalke wrote: > I've updated the writeback spec. Here's the log message > > >>Respond with a modified features document instead of a mapping >>document. >> >>Removed references to type writeback. >> >>Writeback URIs must be fully resolvable in the document. > > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 From dalke at dalkescientific.com Tue Aug 15 15:16:35 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 17:16:35 +0200 Subject: [DAS2] xlm:base -- fer it or agin' it? Message-ID: <1c606eb000e651a6741eb1b09d30da06@dalkescientific.com> I see three reasonable options (or rather, logically defensible) related to xml:base in DAS2 documents. 1) don't us it at all 2) only have it in the root element of the document 3) have it anywhere in the document (this is the old programming dictum of "the only limits should be 0, 1 and infinity") Pros and cons: #1 is the least confusing. Given relative URL, use the document's url to make it absolute, etc. as per URI spec. #2 This is similar to the restrictions in the BASE element in the HTML header. (Which I've only used once.) It's used most often in saved documents so relative URLs work without needing to rewrite the rest of the document. Take your DOM, and stick the URL in the root node if "xml:base" is not present, otherwise do root.attrib["xml:base"] <-- urljoin(document_url, root.attrib["xml:base"]) #3 This is the most complicated. The main use case mentioned was support for xinclude, which is not something anyone here has said they need. For all I know it may be useful XSLT and other languages. I don't know the XML toolchain well enough. Here is another use case. Consider a registration / aggregation service. It could work by fully parsing everything from each client and making absolute URIs for everything. Or it could do ... That is, it reads the sources document and pulls the SOURCE elements out of the XML. It sticks in the right xml:base (perhaps with a set of joins from the parent elements in the document) and serves the result. No need to parse further. Here's another. Consider a meta-feature server which sucked in primary records from multiple other servers (with permission). It might provide better search capabilities, better ranking, whatever. The features are unchanged. The server wants to return the results as it got them from the original server. Without xml:base it needs to convert all relative URLs into absolute ones ... ... ... ... which requires the server know about all field which are URLs. This precludes support for any extensions which include URL fields because the meta-server won't know about them. OTOH, with xml:base ... ... ... ... and any embedded extensions work w/o problems. Hence I'm fer numb'r 3. Andrew dalke at dalkescientific.com From aloraine at gmail.com Tue Aug 15 15:08:43 2006 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 15 Aug 2006 08:08:43 -0700 Subject: [DAS2] Fwd: can't view XML from DAS2 server in IE4 or Safari In-Reply-To: <1f5d9f7aa32f2cdd91e60a3867e58037@dalkescientific.com> References: <1f5d9f7aa32f2cdd91e60a3867e58037@dalkescientific.com> Message-ID: <83722dde0608150808m15cf3b15g894009ab0ac5fde@mail.gmail.com> Hi Andrew, This sounds great to me! Being able to use my Web browser to show people DAS XML after typing in a URL (teaching) and also to see it myself as I familiarize myself with the URL-building conventions (coding) is a huge plus. It really gets the point across in an accesible and dramatic way. A lot of us started to "get" programming after having friends or colleagues show us the HTML coding underlying Web pages using the "view source" function of Netscape Navigator. I think being able to see the XML beautifully rendered in a browser can have the same sort of function for a lot of people and will help them understand the concept of structured data, the meaning of machine-readable, and good stuff like that. Cheers, Ann On 8/15/06, Andrew Dalke wrote: > Oops! Hit "reply-to" instead of "reply-all". > > Begin forwarded message: > > > From: Andrew Dalke > > Date: August 15, 2006 5:26:03 AM GMT+02:00 > > To: "Ann Loraine" > > Subject: Re: can't view XML from DAS2 server in IE4 or Safari > > > >> I'm trying to view the XML delivered from the DAS2 server in Firefox > >> or IE4 without having to save it and then load it. > >> > >> I think this is something to do with the fact that the XML is > >> delivered as type application versus XML plain text, which is what the > >> DAS1 servers seem to do. > > > > Yes. It's a 4 year old bug in Mozilla. > > https://bugzilla.mozilla.org/show_bug.cgi?id=155730 > > > > > >> Is there a way I can tell Firefox to render the XML directly without > >> my having to save it first? > > > > We've run into this before. I want a way to make this be less > > of a problem. > > > > I propose that if "text/xml" is in the Accept header then the > > server should return the das2xml document but with a "text/xml" > > content-type. > > > > I tested that out on my copy of Firefox and it was a happy camper. > > It showed the XML tree, though it did complain about the lack > > of a stylesheet. Okay, perhaps it was more feeling okay than happy.. > > > > Of course another possibility is to see the "text/html" there > > and show something more presentable to humans, but that makes things > > worse for those like Ann who want to see the XML structure. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From dalke at dalkescientific.com Tue Aug 15 16:49:21 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 18:49:21 +0200 Subject: [DAS2] global reference identifiers Message-ID: <61a3a74f23f83cf89a05055e0bc7e0a7@dalkescientific.com> http://open-bio.org/wiki/DAS:GlobalSeqIDs Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Aug 15 18:04:58 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 15 Aug 2006 20:04:58 +0200 Subject: [DAS2] global reference identifiers In-Reply-To: <61a3a74f23f83cf89a05055e0bc7e0a7@dalkescientific.com> References: <61a3a74f23f83cf89a05055e0bc7e0a7@dalkescientific.com> Message-ID: <409ce859211622e5781c58db5b014da9@dalkescientific.com> > D.melanogaster, C.elegans, and C.briggsae are here, but no > S.cerevisiae, > R.norvegicus, M.musculus, or H.sapiens. > -Allen It's a wiki - feel free to add new ones! :) Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Tue Aug 15 18:57:19 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Tue, 15 Aug 2006 11:57:19 -0700 Subject: [DAS2] Fwd: can't view XML from DAS2 server in IE4 or Safari In-Reply-To: <83722dde0608150808m15cf3b15g894009ab0ac5fde@mail.gmail.com> Message-ID: I agree it is very useful to view XML documents, even though das xml is intended for applications. For whatever reason, some humans (myself included) seem to have a fascination with XML and like to view it, so it makes sense to provide for this. As for viewing das2xml data directly by clicking on das2 server links in Firefox, I have no problem. When you first click on a link returning das2xml formatted data (mime type=application/x-das-*+xml), Firefox should provide a dialog box asking what you want to do with it. Click "open with" and select Firefox itself. Do this for each of the types of das documents and you'll be set. Btw, there are a bunch of different das2xml links available here for testing: http://netaffxdas.affymetrix.com/das2/ If you have already specified that Firefox should save the das2xml data to disk, you should be able to change your preference by going to Preferences -> Downloads -> View & Edit Actions... (this is on OS X with Firefox 1.5.0.6. I don't see any entries for application/x-das* entries in mine; not sure why not, but it's working now, so I don't worry). According to the following article, Firefox will use its default xml handler for any mime type matching application/*+xml (see 'Types of XML' on this page): http://www-128.ibm.com/developerworks/xml/library/x-ffox2/index.html While we're on the subject, there's another recent article in this series on manipulating XML with javascript in Firefox. Might be interesting to try some of these ideas with das2xml data: http://www-128.ibm.com/developerworks/library/x-ffox3/ Steve > From: Ann Loraine > Date: Tue, 15 Aug 2006 08:08:43 -0700 > To: Andrew Dalke > Cc: DAS/2 > Subject: Re: [DAS2] Fwd: can't view XML from DAS2 server in IE4 or Safari > > Hi Andrew, > > This sounds great to me! > > Being able to use my Web browser to show people DAS XML after typing > in a URL (teaching) and also to see it myself as I familiarize myself with > the URL-building conventions (coding) is a huge plus. It really gets > the point across in an accesible and dramatic way. > > A lot of us started to "get" programming after having friends or > colleagues show us the HTML coding underlying Web pages using the > "view source" function of Netscape Navigator. I think being able to > see the XML beautifully rendered in a browser can have the same sort > of function for a lot of people and will help them understand the > concept of structured data, the meaning of machine-readable, and good > stuff like that. > > Cheers, > > Ann > > On 8/15/06, Andrew Dalke wrote: >> Oops! Hit "reply-to" instead of "reply-all". >> >> Begin forwarded message: >> >>> From: Andrew Dalke >>> Date: August 15, 2006 5:26:03 AM GMT+02:00 >>> To: "Ann Loraine" >>> Subject: Re: can't view XML from DAS2 server in IE4 or Safari >>> >>>> I'm trying to view the XML delivered from the DAS2 server in Firefox >>>> or IE4 without having to save it and then load it. >>>> >>>> I think this is something to do with the fact that the XML is >>>> delivered as type application versus XML plain text, which is what the >>>> DAS1 servers seem to do. >>> >>> Yes. It's a 4 year old bug in Mozilla. >>> https://bugzilla.mozilla.org/show_bug.cgi?id=155730 >>> >>> >>>> Is there a way I can tell Firefox to render the XML directly without >>>> my having to save it first? >>> >>> We've run into this before. I want a way to make this be less >>> of a problem. >>> >>> I propose that if "text/xml" is in the Accept header then the >>> server should return the das2xml document but with a "text/xml" >>> content-type. >>> >>> I tested that out on my copy of Firefox and it was a happy camper. >>> It showed the XML tree, though it did complain about the lack >>> of a stylesheet. Okay, perhaps it was more feeling okay than happy.. >>> >>> Of course another possibility is to see the "text/html" there >>> and show something more presentable to humans, but that makes things >>> worse for those like Ann who want to see the XML structure. >> >> Andrew >> dalke at dalkescientific.com >> >> _______________________________________________ >> DAS2 mailing list >> DAS2 at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das2 >> > > > -- > Ann Loraine > Assistant Professor > Section on Statistical Genetics > University of Alabama at Birmingham > http://www.ssg.uab.edu > http://www.transvar.org > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 From Steve_Chervitz at affymetrix.com Tue Aug 15 19:11:33 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Tue, 15 Aug 2006 12:11:33 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day two, 15 Aug 2006 Message-ID: Notes from DAS/2 code sprint #3, day two, 15 Aug 2006 $Id: das2-teleconf-2006-08-15.txt,v 1.1 2006/08/15 19:10:02 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein, Scott Cain Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: Spec updates ------------------- ad: made changes to the writeback spec. nothing serious, stuff we talked about. removed possibility of writeback for types, updated docs. returns back a features document. feature element contains old_uri to refer to previous uri if it changed. Not for a response document. gh: can we freeze it at this point? Like the idea of reusing the feature xml. Hoping to call it frozen for rest of code sprint. ad: do we allow relative urls inside the writeback doc. relative to what? gh: xml:base applies ad: if url is still relative once you get to the top of the document, what happens? gh: free to throw an error ad: so 'application defined'. seems ok. gh: can uri's be local when curation is created on client, you're making up your own id. fully resolvable. ad: it is das_private uri, not a relative uri, no resolvability requirements. aday: order of operations issue with insertion and deletions for features with same id. do a delete-insert or insert-delete? does delete get processed before insert? ad: all deletes go first. aday: are all features required to be processed from top to bottom as well? ad: doesn't specify. aday: natural ordering in the document for feature processing. on creation of a new feature. if it has a das_private feature that is declared in the doc which hasn't been seen before. will cause problems. ad: pref aday: require features to be declared in order so that everything declared below refers to things declared above. ad: not possible for new features. aday: where is type writeback going to go? ad: not to be supported. could use a separate document. gh: fine with not dealing with types now. let's get feature writeback going first. aday: would like to make it extensible. to see how you could create a types writeback. gh: separate document. aday: so writeback for types is a element enclosed in a writeback element. gh: any other issues with writeback spec yesterday? many conversations here after the teleconf. the order of operations thing, and the need to freeze ASAP. ad: b gilman's use of VERSION in two diff places. see my email from yesterday. I proposed using 'release' than 'versioned_source'. too late now to change the versioned element. gh: change name of att Topic: Versioned source -> release --------------------------------- See andrew's email from yesterday. aday: has a working server. will send out url out today, after incorporating latest developments. returns a mapping document. gh: will clean up curation stuff today. figure out how to swap ids out. this is an igb internal release. Topic: Microdeltas ------------------ ad: microdeltas: take the delta of the document we have now, break it up into lots of parts. no big two-hour curation, but server tracks changes as they occur. this way you can track reasons for each change. gh: so curator should push 'save to server' button each time they make an edit. this is up to client to impose this. you have a comment element in the writeback. ad: there is a distinction between changes that computer made vs. human comment - reason why they did a whole set of changes. not sure the reason the resolution. gh: microdeltas might be getting a little more complicated for what we're trying to do. Topic: Coordinates in read spec -------------------------------- gh: questions regarding read. Is allen serving up coordinate stuff? aday: segment coordinate uri? gh: the thing we're supposed to be using to decide whether annotations from two servers are on the same coord system. if uri's for two different versioned_sources match, assume they're the same coord system. lincoln set up names for genomes. gh: haven't implemented part on client that makes use of it. currently using a hard-wired way. ad: on open-bio.org site. wiki. gh: writable nature of server is supposed to be in capabilities section. OK you've got in right place. my bad. gh: locking, not worked on. aday: exclusive lock on table to be modified. other clients wanting to write cannot get it. so it's under the hood, no special reponse. ad: how do we indicate a server supports writeback? I wanted an extension tag, not attribute. haven't looked at recently. gh: can't remember. can a versioned_source have... If a versioned source is writable, can any data on that be editable? yes. ad: why does it make a difference. gh: concerned whether there are certain types of annots that should not be writable, level of distinction (granularity). either you can edit any annotations on that versioned source, or none of them. gh: eg. blast results vs human-made curations. can't edit blast results. ad: I don't thing a single bit flag is good enough. gh: per type? ad: not sure. gh: ok as is. you can have multiple servers, some holding mutable data some holding immutable data. ad: I support writing for some people, some time. user is in charge of figuring out which types on which servers can be changed. gh: client has to be smart -- ie., try to edit then undo it then tell user they can edit. or allow user to edit stuff and find out at commit time if editing is ok (possibly not). ad: ideally would like a way to figure out from server what you can and cannot do on a given versioned source. gh: let's not get into that now. that is the simplest way to go w/r/t to the spec. Topic: Viewing das2xml responses in web browser ----------------------------------------------- See Ann Loraine's email on list about trouble of looking at das2 responses via IE4 and Safari. ee: needs text/xml in order to see it in browsers. ad: viewing xml documents is an extension of das, which was intended for computer communication. aday: some problems with javascript/AJAX making it unusable. must have content-type as text/xml. ad: javascript talking to server can specify what format it wants it back. there's a firefox bug in the '+xml' specification. gh: we are telling it xml, it's aday: there are real clients out there that cannot deal with the advance http headers we are using. ad: format= in query parameter gh: format=xml then content-type in header should be text/xml? ad: not in the spec now. you specify das2xml and get back application/.... bo: could have proxy code that sits in between client and server and convers to text/xml ad: default for web browsers. server could decide to support ajax by allowing format=json. aday: gh: need to say that servers have the option to provide content-type=text/xml if format=xml. we are compliant to content-header spec, some ajax implementations don't handle it properly. ad: if client makes request and string text/xml appears in the accepts header, then server should be free to give back regular das2xml response document but as content-type text/xml? by 'free', meaning not required. gh: some libraries are not compliant with http header content type spec. if servers supports that, then they can return different content types. ad: what is recommendation for this case? aday: for firefox and javascript clients. sc: I have had no trouble with firefox on os x. I can try to troubleshoot Ann's set up. Topic: Dasypus online validation tool -------------------------------------- bo: dasypus validation tool is it up to date? ad: server is down since it hasn't been used for a while. should be up to date. [A] andrew will bring dasypus online validator online. Status Reports --------------- bo: bugfixes on das.biopackages.net server. gh: write back curations, id resolution on client side, igb release today. aday: update/edit/delete, changing response type today ad: relaxNG, getting dasypus server back up, my own das server. ee: getting igb release out today. gff3 parser. sc: working with gregg's new Bprobe1Parser to create new versions of exon array data files, more memory efficient. Will send to gregg for testing. Also updating list of available data on the affy das servers. From allenday at ucla.edu Wed Aug 16 01:15:07 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 15 Aug 2006 18:15:07 -0700 Subject: [DAS2] xml:base and XML::DOM::XML_Base Message-ID: <5c24dcc30608151815t17a13144t54ff11407b94f397@mail.gmail.com> Lincoln, I needed an xml:base resolution module for my writeback code, and there wasn't one available for any of the lightweight XML libs on CPAN, so I wrote an XML::DOM extension. Feel free to use it if you have not already finished your implementation, it should be on CPAN within the next day or so, I just uploaded it. -Allen From dalke at dalkescientific.com Wed Aug 16 01:29:40 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 16 Aug 2006 03:29:40 +0200 Subject: [DAS2] Fwd: can't view XML from DAS2 server in IE4 or Safari In-Reply-To: References: Message-ID: Steve: > As for viewing das2xml data directly by clicking on das2 server links > in > Firefox, I have no problem. When you first click on a link returning > das2xml > formatted data (mime type=application/x-das-*+xml), Firefox should > provide a > dialog box asking what you want to do with it. Click "open with" and > select > Firefox itself. I never would have thought of that. A-ha. It works but it works by downloading the file, saving it to a temp.xml file then doing the equivalent of "Firefox tmp.xml", which opens a new window on a Mac. It doesn't open it in the current window as I would like. And the temp file persists in my download directory. I experimented with content negotiation, where the client may send an accept header to the server with the desired content types. My server supports "text/plain" (fasta), "text/xml", and "application/x-das2segments+xml" Examples below. I did this because I want the documentation to say "If the format parameter is not specified in the query string then the server may use HTTP content negotiation to determine the most appropriate representation. If multiple representations matc then the das2xml version should be returned, if allowed. and leave it at that. This includes the "if 'text/xml' exists in the Accept field ..." solution we talked about earlier. In the "An Annotationed Guide to the DAS spec" then include what what that means and why it's done. The Apache content negotiation strategy is at http://httpd.apache.org/docs/1.3/content-negotiation.html Using that scheme, the following describes possible variants for a DAS service URI: features format: das2xml Content-type: application/x-das2features+xml; qs=1.0 format: xml Content-type: text/xml; qs=0.95 format: fasta Content-type: text/plain; qs=0.95 where "qs" means "quality of service". Apache ranks solutions so "q*qs" is largest. Firefox sends ACCEPT: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/ plain;q=0.8,image/png,*/*;q=0.5 This orders the results as: xml = 0.95*0.9 -> 0.855 fasta = 0.95*0.8 -> 0.76 das2xml = 1.0*0.5 -> 0.5 while Python's url fetcher does not send an Accept and curl sends "*/*". Both of these would cause "das2xml" to be returned over other formats. In other words, the "send 'text/xml' if the client asks for it else send 'application/x-das2*+xml' " is an acceptable way to do conneg. Here's what my test reference server does under different conditions. ## ask for text/plain, which returns FASTA % curl -H "Accept: text/plain" -i http://localhost:8080/seq/fly_v1 HTTP/1.1 200 OK Date: Wed, 16 Aug 2006 00:25:40 GMT Server: CherryPy/2.2.1 Content-Length: 48 Content-Type: text/plain Connection: close >Chr1 ABCDEFG >Chr2 abcdefgh >Chr3 987654321 ## ask for text/xml, which returns the normal XML as "text/xml" % curl -H "Accept: text/xml" -i http://localhost:8080/seq/fly_v1 HTTP/1.1 200 OK Date: Wed, 16 Aug 2006 00:26:02 GMT Server: CherryPy/2.2.1 Content-Length: 435 Content-Type: text/xml Connection: close ## ask for anything under the "application" namespace, with a needless quality factor % url -H "Accept: application/*;q=0.5" -i http://localhost:8080/seq/fly_v1 HTTP/1.1 200 OK Date: Wed, 16 Aug 2006 00:28:13 GMT Server: CherryPy/2.2.1 Content-Length: 435 Content-Type: application/x-das2segments+xml Connection: close ## give an image if it's there, text/plain is next best, then an application % curl -H "Accept: image/*, application/*;q=0.5, text/plain;q=0.9" -i http://localhost:8080/seq/fly_v1 HTTP/1.1 200 OK Date: Wed, 16 Aug 2006 00:34:15 GMT Server: CherryPy/2.2.1 Content-Length: 48 Content-Type: text/plain Connection: close >Chr1 ABCDEFG >Chr2 abcdefgh >Chr3 987654321 In my case the server has multiple text/plain outputs but FASTA always wins over raw. I can force any format with the "format=" option, which ignores the "Accept" header completely. % curl -H "Accept: text/xml" -i 'http://localhost:8080/seq/fly_v1/1?format=raw' HTTP/1.1 200 OK Date: Wed, 16 Aug 2006 00:35:54 GMT Server: CherryPy/2.2.1 Content-Length: 7 Content-Type: text/plain Connection: close ABCDEFG Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Wed Aug 16 17:16:47 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Wed, 16 Aug 2006 10:16:47 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day three, 16 Aug 2006 Message-ID: Notes from DAS/2 code sprint #3, day three, 16 Aug 2006 $Id: das2-teleconf-2006-08-16.txt,v 1.1 2006/08/16 17:05:24 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: Spec Q&A --------------- bo: perusing spec, saw mention of XID as a filter. can I get more explanation? ad: can't remember without looking at docs, but think I was not sure what XID was supposed to be, lincoln sent email to clarify. aday: an external db id trying to resolve into local space, eg., for gene das. ad: don't think there was enough info there to be useful. gh: just uri and nothing else? ad: looking at steve's notes from 16 march. looks like we deferred it. gh: input was minimal. I have no particular use for it. bo: need to know what support to provide for the biopackages server. in the read spec, says "it's not well though-out. should have authority, type, id, description." bo: type vs. exact type gh: did we get rid of exact type? ad: see gregg's email from 16 march: http://lists.open-bio.org/pipermail/das2/2006-March/000655.html The assumption was, there's no type inferencing done on the server. it's just done on the client. we were to rename 'exacttype' to 'type' and use exacttype semantics for it. gh: there is no parent-child structure to types. there is to ontology though. ad: type records in das aren't parent-child relations because they combine other info about type, e.g., ways to depict it. bo: looking for places where our server disagreed with spec. segments feature filter is not supported on our end. overlaps segments. but this is just work we need to do, not a spec issue. gh: allen and lincoln were struggling with xml:base resolution yesterday, looking through the xml:base spec, dealing with edges. are you satisfied? aday: yes gh: for implementes that don't already deal with xml:base resoultion, it may take a day or so to deal with it. nomi and I struggled as well. I was suprised it is not so supported in xml libraries. ad: just a matter of walking up the xml tree. gh: recursively had to verify that the resolve stuff in the java networking libraries actually worked according to the xml:base spec. but we've moved through this. bo: url example, uses 'segment' and 'sequence'. not so consistent. gh: pros and cons to this. it shows that das/2 links can be built using different uris. ad: used different url structures to show that this was possible. bo: confusing when you only see a snippet and don't see where the uri was coming from. showing variety is useful though. gh: are both specs frozen now? ad: yes. Topic: Status Reports ----------------------- bo: went through spec. updated our bug queue. added bug re: passing in id filters vs. uris. working on this today. aday: need to resolve type ids, need to deal with relative ids given in the document. now can go back to working on writeback. gh speaking for lincoln: perl stuff for gbrowser to connect to das servers. went through xml:base abyss. updated uris for sequence and genome version ids for human and mouse on the wiki page: http://open-bio.org/wiki/DAS:GlobalSeqIDs sc: should we allow anyone to edit this, of just lincoln? gh: would like to restrict it. worried about wiki graffiti. ad: you have to register. we can always back things out. sc: lincoln will get notification upon any edits. gh: ok. gh: working on igb release. adding parsing abilities. can now focus on das/2, mostly writeback stuff, refining that in igb client. ee: finishing up bugfixes before igb release. will start on gff3 parser today. ad: looked into content negotiation stuff. why validator server on open-bio site isn't working: I updated underlying webserver framework. working on that. sc: worked on creating new data files used by the affy das server for exon arrays using gregg's new parser. gh: this is generating more efficient versions of probe sets for exon arrays. important since the affy das server is in-memory. sc: this will help us support more arrays in the das server and also move away from having to maintain two different das servers, so we can focus on just the das/2 server. sc: also working on final touches on web page describing available data on our das servers. gh: we can modify xml from the server to point at that page as an info url. sources element has info url, and sub elements as well, but we can just put the info page at the top level. sc: also was working on ann's fly data project, where she needs to pull genomic regions relative to probe sets. we need to update our das alignment file (link.psl) to be based on dm2. gh: we don't provide residues. she'll have to do a das/1 query at ucsc to get residues. From dalke at dalkescientific.com Wed Aug 16 19:17:04 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 16 Aug 2006 21:17:04 +0200 Subject: [DAS2] validator working again Message-ID: Silly me, I went and upgraded the TurboGears package used by the validator. From 0.8* to 0.9*. There were differences, and one (a unicode encoding problem) quite subtle. The validator is up and running. http://cgi.biodas.org:8080/ Let me know of any problems. Andrew dalke at dalkescientific.com From allenday at ucla.edu Wed Aug 16 20:35:04 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 16 Aug 2006 13:35:04 -0700 Subject: [DAS2] new writeback URI Message-ID: <5c24dcc30608161335n267201a7w1ef5221ceb9fcdc5@mail.gmail.com> Hi, You can POST writebacks for the http://das.biopackages.net/das/genome/human/writeback/ vsource here: http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml-parser/stable2.pl The returned document will either be an element, or a element, depending on what was POSTed. I will update the relevant sections in the main sources/source/vsource docs on the biopackages server. I will send another email when the response document is up-to-date with the latest specification revisions -- I'm under the impression I just have to return das2xml for all updated and created features instead of returning the previously specified element. -Allen From dalke at dalkescientific.com Thu Aug 17 12:14:59 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 17 Aug 2006 14:14:59 +0200 Subject: [DAS2] content-negotiation, conclusion Message-ID: <20680849e0825c09bdd56f215da74f1a@dalkescientific.com> After experimenting with a content-negotation implementation and trying it out under different circumstances I've come to the conclusion that the errors are too subtle and hard to debug in the generic case. Quoting from http://norman.walsh.name/2003/07/02/conneg > At this point, we're about eleven levels farther down in the web > architecture than any mortal should have to tread. On the one hand, > content negotiation offers a transparent solution to a tricky problem. > On the other hand, the very transparency of such solutions makes them > devilishly hard to understand when they stop working. Even for the limited case of DAS2 where we want web browsers to see "text/xml" instead of "application/x-das*+xml" it's just not possible. It turns out Safari only uses "*/*" in the Accept header. I do not want a system which gives different results when viewed in different browsers. Ann? How about this solution to your case - we'll have a "xml" format defined as being the same as "das2xml" but returning a "text/xml" header. Or perhaps a "html" format designed for people. When you are showing people how DAS works, and if the browser doesn't understand the */*+xml content type as being in XML, then you can say "oh, add 'format=html' to the URL to see it in HTML". The spec will look like: If the format is not specified in the query string then the server must return the document in das2xml format (or fasta format for segment records) unless the client sends an Accepts header with a mime-type starting "application/x-das-". In that case the server may implement HTTP content-negotiation. HTTP content-negotiation is an experimental feature in DAS2 and is not required in the client nor the server. Structured this way there's no way a generic browser can trigger conneg with a das2 server. Only das-aware clients can do it. This gives room for future experimentation. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Aug 17 12:29:19 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 17 Aug 2006 14:29:19 +0200 Subject: [DAS2] default format for a single segment Message-ID: <593e57347c2de14ed36df1ef69fd9f5c@dalkescientific.com> Two proposals here: 1) change the default format for a single segment request from FASTA -> das2xml 2) add optional elements to each segment == Proposal 1 === Currently every DAS2 service returns an application/x-das-*+xml document by default except for the segment document. A request for on a segment URI returns its FASTA sequence. I would like to change that. I would like the segment document by default to return a das-segment document. For example, if this is the segments document then doing the request for "segment/chrI" should return == Proposal 2 == My server implements a "raw" sequence format which contains only sequence data and does not even contain the FASTA header. The raw format only works for a single segment and not for the list of segments. In the current spec the "FORMAT" entry is somewhat ambiguous. Does it work for the set of segments or for a single given segment? That is, segments?format=das2xml --> the segments document for all of the segments segment/chrI?format=das2xml --> the segments document for a given segments segments?format=fasta --> all sequences, in FASTA format segment/chrI?format=das2xml --> the FASTA sequence for the given segment However, segments?format=raw makes no sense. No one will use that one for real. I propose that the SEGMENT elements also get an optional FORMAT element which looks like this The formats for a given segment are the union of its elements and those in the top-level. That is, each segment here implements "raw", "fasta" and "das2xml" formats. Andrew dalke at dalkescientific.com From lstein at cshl.edu Thu Aug 17 16:01:18 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 17 Aug 2006 12:01:18 -0400 Subject: [DAS2] Notes from DAS/2 code sprint #3, day three, 16 Aug 2006 In-Reply-To: References: Message-ID: <6dce9a0b0608170901t44c6e074q5ca24e5fd2cacc72@mail.gmail.com> What's the conference call number? Lincoln On 8/16/06, Steve Chervitz wrote: > > Notes from DAS/2 code sprint #3, day three, 16 Aug 2006 > > $Id: das2-teleconf-2006-08-16.txt,v 1.1 2006/08/16 17:05:24 sac Exp $ > > Note taker: Steve Chervitz > > Attendees: > Affy: Steve Chervitz, Ed E., Gregg Helt > Dalke Scientific: Andrew Dalke > UCLA: Allen Day, Brian O'Connor > > Action items are flagged with '[A]'. > > These notes are checked into the biodas.org CVS repository at > das/das2/notes/2006. Instructions on how to access this > repository are at http://biodas.org > > DISCLAIMER: > The note taker aims for completeness and accuracy, but these goals are > not always achievable, given the desire to get the notes out with a > rapid turnaround. So don't consider these notes as complete minutes > from the meeting, but rather abbreviated, summarized versions of what > was discussed. There may be errors of commission and omission. > Participants are welcome to post comments and/or corrections to these > as they see fit. > > > Topic: Spec Q&A > --------------- > > bo: perusing spec, saw mention of XID as a filter. can I get more > explanation? > ad: can't remember without looking at docs, but think I was not sure > what XID was supposed to be, lincoln sent email to clarify. > aday: an external db id trying to resolve into local space, eg., for gene > das. > ad: don't think there was enough info there to be useful. > gh: just uri and nothing else? > ad: looking at steve's notes from 16 march. looks like we deferred it. > > gh: input was minimal. I have no particular use for it. > bo: need to know what support to provide for the biopackages server. > in the read spec, says "it's not well though-out. should have > authority, type, id, description." > > bo: type vs. exact type > gh: did we get rid of exact type? > ad: see gregg's email from 16 march: > http://lists.open-bio.org/pipermail/das2/2006-March/000655.html > > The assumption was, there's no type inferencing done on the > server. it's just done on the client. we were to rename 'exacttype' to > 'type' and use exacttype semantics for it. > gh: there is no parent-child structure to types. there is to ontology > though. > ad: type records in das aren't parent-child relations because they > combine other info about type, e.g., ways to depict it. > > bo: looking for places where our server disagreed with spec. segments > feature filter is not supported on our end. overlaps segments. but > this is just work we need to do, not a spec issue. > > gh: allen and lincoln were struggling with xml:base resolution yesterday, > looking through the xml:base spec, dealing with edges. are you satisfied? > aday: yes > gh: for implementes that don't already deal with xml:base resoultion, > it may take a day or so to deal with it. nomi and I struggled as > well. I was suprised it is not so supported in xml libraries. > ad: just a matter of walking up the xml tree. > gh: recursively had to verify that the resolve stuff in the java > networking libraries actually worked according to the xml:base spec. > but we've moved through this. > > bo: url example, uses 'segment' and 'sequence'. not so consistent. > gh: pros and cons to this. it shows that das/2 links can be built > using different uris. > ad: used different url structures to show that this was possible. > bo: confusing when you only see a snippet and don't see where the uri > was coming from. showing variety is useful though. > > gh: are both specs frozen now? > ad: yes. > > > Topic: Status Reports > ----------------------- > > bo: went through spec. updated our bug queue. added bug re: passing in > id filters vs. uris. working on this today. > > aday: need to resolve type ids, need to deal with relative ids given > in the document. now can go back to working on writeback. > > gh speaking for lincoln: perl stuff for gbrowser to connect to das > servers. went through xml:base abyss. > updated uris for sequence and genome version ids for human and mouse > on the wiki page: http://open-bio.org/wiki/DAS:GlobalSeqIDs > > sc: should we allow anyone to edit this, of just lincoln? > gh: would like to restrict it. worried about wiki graffiti. > ad: you have to register. we can always back things out. > sc: lincoln will get notification upon any edits. > gh: ok. > > gh: working on igb release. adding parsing abilities. can now focus on > das/2, mostly writeback stuff, refining that in igb client. > > ee: finishing up bugfixes before igb release. will start on gff3 > parser today. > > ad: looked into content negotiation stuff. why validator server on > open-bio site isn't working: I updated underlying webserver > framework. working on that. > > sc: worked on creating new data files used by the affy das server for > exon arrays using gregg's new parser. > gh: this is generating more efficient versions of probe sets for exon > arrays. important since the affy das server is in-memory. > sc: this will help us support more arrays in the das server and also > move away from having to maintain two different das servers, so we can > focus on just the das/2 server. > > sc: also working on final touches on web page describing available data on > our das servers. > gh: we can modify xml from the server to point at that page as an info > url. sources element has info url, and sub elements as well, but we > can just put the info page at the top level. > > sc: also was working on ann's fly data project, where she needs to > pull genomic regions relative to probe sets. we need to update our > das alignment file (link.psl) to be based on dm2. > gh: we don't provide residues. she'll have to do a das/1 query at ucsc > to get residues. > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Thu Aug 17 15:59:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 17 Aug 2006 11:59:47 -0400 Subject: [DAS2] xml:base on biopackages still not quite right In-Reply-To: References: Message-ID: <6dce9a0b0608170859o7d22ef3cnc6cacf4579a7e305@mail.gmail.com> Hi, I'm getting an incorrect xml:base on the segments request: % GET http://das.biopackages.net/das/genome/human/17/segment ... The problem is that the xml:base ends with a slash, so the synthesized URIs are http://das.biopackages.net/das/genome/human/17/segment/segment/chr1 Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Gregg_Helt at affymetrix.com Thu Aug 17 18:39:40 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 17 Aug 2006 11:39:40 -0700 Subject: [DAS2] DAS/2 writeback capability vs. writeable attribute Message-ID: In the current writeback spec, the ability of a server to support writeback is indicated by: under the versioned source element. However, the retrieval spec talks about both the writeback capability element and a "writeable" attribute for the versioned source element. I think the "writeable" attribute can be removed, since the capability provides all the needed information. The current writeback spec doesn't mention this "writeable" element at all. gregg From dalke at dalkescientific.com Thu Aug 17 19:26:20 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 17 Aug 2006 21:26:20 +0200 Subject: [DAS2] DAS/2 writeback capability vs. writeable attribute In-Reply-To: References: Message-ID: gregg: > However, the retrieval spec talks about both the writeback capability > element and a "writeable" attribute for the versioned source element. > I > think the "writeable" attribute can be removed, since the capability > provides all the needed information. The current writeback spec > doesn't > mention this "writeable" element at all. This was up for debate during the last sprint and we decided to keep things as they were until we got to writeback. Which is now. :) I agree with you. Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Thu Aug 17 22:18:21 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 17 Aug 2006 15:18:21 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day four, 17 Aug 2006 Message-ID: Notes from DAS/2 code sprint #3, day four, 17 Aug 2006 $Id: das2-teleconf-2006-08-17.txt,v 1.1 2006/08/17 22:15:30 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CHSL: Lincoln Stein Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: Status Reports ---------------------- ls: Perl interface in good shape. reorg'd to get parser based on content type dynamically. response comes in, figures out what parser to use, returns the objects, should be extensible for other formats. main task todo is to implement the feature object so that I can actually return features. now parser is there, object is not. Not a Bio::SeqFeatureI object, in order to work with gbrowse and other parts of bioperl. some issues with biopackages with xml:base, sometimes slashes there that shouldn't be and vice versa. segments request has extraneous / at end, so it has 'segments' repeated twice. didn't try to fetch to see if would work, but looks like a bug. gh: regarding parent-child relationships between features: if they have parent, need to point to it, if they have children need to point to them. ls: parsing with sax, I'll know when an object is complete. will create a feature stream and start returning features as the parse is coming across. threaded, so you can have multiple streams going simultaneously. gh: more issues with parent child hierarchy. will wait for allen to arrive before discussing. Topic: Spec issues ------------------ ad: working on content negotiation, but now is not right time to do it. in sequence doc, default doc should be das2-segments. sc: xml:base issue -- where do we allow it (0, 1, infinity)? gh: our policy is that we follow the xml:base spec. ad: if you use it, use it everywhere. gh: my parser is looking for it where it everywhere. ad: my email explains why you might want to use it on multiple features. eg., combining data from different servers. sc: what about brian gilman's issue, when you get to root what if xml:base is still relative? ad: uri spec defines how to define relative urls, e.g., get it from document. gh: relaxNG says it can be anywhere. I think it should therefore be allowed anywhere. ad: right now all services returns an xml object file except segment request -- fasta file. would like to return xml. sc: this is along the lines of what I proposed a while back. I like it. See discussion under this thread: http://lists.open-bio.org/pipermail/das2/2005-December/000395.html ad: formats per-segment basis. current scheme only defines per-everything basis. propose have each segment also has it's own format. each segment can have alt formats. (see ad's email from today on this topic). gh: like it. it means that a server doesn't have to know about all residues. ad: for case of reference server, we guarantee that it supports fasta sequence. affects other servers, not just reference server. gh: I like that flexibility. any objections? [silence] gh: if you return the segments doc we now have, you are only serving up xml. if you want to return fasta, you need to return a format element. ls: is there a way for client to determine what it will get? gh: in the segments document, returned back from reference server. client can specify format defined there. ls: not impl yet, just a proposal? gh: yes. another plus is the ability to specify more efficient binary formats too. Topic: Ann's issue on content-type ---------------------------------- gh: server has option to specify that you can return things as text/xml, but still send das2xml format. ad: content negotiation doesn't work to allow the browser to view XML. only works for clients that can do content neg, not general clients (e.g., safari). I tried two different browsers, got two different results. [A] Ask Ann Loraine if this solution is sufficient. Topic: Writeback issues ------------------------- aday: problem writeback. creating new feat or update existing feat. if it's a new feature, das_private uri scheme has no info about source or versioned source that the feature is intended to be written to. This is not necessarily a problem, could be a different uri post. But it is a problem when parsing and it's possible for parents or children to be attached to the feat and they are not the source/vsource combination. make sense? ad: every feat has unique id. could do it by saying when you see this id, it corresponds to this segment or this versioned source. ad: feature comes from NCBI but is being posted to affymetrix. gh: I talked about this as a use case for the grant. Example: snps being served by an authority (dbSNP) and people are trying to create their own haplotype blocking structure. you want them to be able to point to the authority for the leaf features (snps, children). so you can have one server serving up haplotype blocks, and points to snps that reside on another server that is the authority. right now in the spec, can't do that because of the bidirectional parent-child stuff. you'd have to point the snps at the authority to the new stuff. ad: could have parent-child relationships that are incorrect. all parents connected together are places you can get to. has to be a single root. gh: due to that and the bidirectional stuff, we can't support my use case, also can't build features from multiple servers to construct curations. ad: can do it in datamodel. I point to features over there. gh: in xml it can't be done. ad: also means that, you have to keep requesting features over and over again. you have to do at least one request for every feat. gh: even if we have these restrictions, how can we enforce them with das-private id. aday: the document is not enough to tell you if the parent being associated with a feature is valid. you have to know more. aday: it's only these das-private ids that are a problem, you cannot know where it came from or where it's to be written to. the child-parent pointers are not a problem. gh: post to a writeable das server with das-private id, it means the feature is to be written on that server. aday: new document comes in, you don't know where to write them to. gh: which writeable server are they to be written to. ad: there will be a different distinct url. gh: client is aware of 5 different writeback servers, which one do I write to. this is a client issue. it should present options to the user and let them select. aday: what about creating a hybrid feature? gh: it's a totally new curation. ad: what if you want to have one writeback url for several dbs on the server? gh: i would say no. aday: you need to know what is the context of the write. gh: for server, it knows, for client. aday: so are we saying that the document does not need to be validatable when standalone (ie, outside the context of the server)? there is not enough information to know whether some features being grouped together should be. I upload this document to xxx, is it be loadable? gh: i dont' see that as an issue. we have validation issues with read document as well. the validators don't go into the uris of each feature and see if they come from same server. aday: if absolute, yes, but if all relative. as long as all relative, you can tell if compatible. gh: if you have document element was retrieved from, it's relative to that. if not, it's application-specific, which in our case means punting. validator can't guarantee that certain uri's are compatible. to do that, it would have to know how to resolve every uri, and they don't need to be url's. nobody knows how to resolve every uri. what that means is that the server will have to reject the post if it sees uris that it doesn't recognize them. aday: or, that it sc: how does server know if uri's are compatible? gh: for posts, those features have to be coming from that server aday: adding new exon to transcript that already exists in db, can I give you the new exon and pointer to transcript? get's into uri compatibility issue. I have exon whose parent I don't have access to (on remote server). could I do an external request on the parent, figure out it's location, close it, send xid to parent on remote server. ad: would say it's legal but you have to pass in the complete feature record. gh: the legality is in the document that is being posted. you have parent-child resolvability back up to the root. that's the requirement now. gh: is it worth considering relaxing our bidirectional closure requirement? ad: makes parsing harder. have to wait to very end. takes lots of time, memory. gh: use case you have, you need parent. we could relax it to require parent-child (as needed for my use case). but for Allen's case you need child-to-parent pointers. ad: using xid gh: xid's are free form. how do you know that it means x was derived from y? there's no way to represent that in our xml. it's open to interpretation by client and server. ad: in the xid have one of them be the type, constrained vocab, so you know what kind of link it is. keyword 'rel', this means get css, rss.... also the xml-link stuff steve mentioned a while ago. gh: would require some significant rejiggering to resolve it. ad: can we do it by having a new feature type, of it's own vocabulary. gh: if you do this in one client, it does this by cloning, it looks to user you are doing it from different servers. write to client. another one reads it, and it has no way of know that it was derived from the two different sources. gh: for now, you can only point to newly created features or features coming from the server you are posting to, for feature ids. need to know more about evidence trails, to know more about what info they need to preserve. [A] talk to curator pro (nomi) about what evidence to save when creating/modifying feats ad: new type: external-feature-reference, do a new element at end of record. doesn't require a new format. gh: it's outside the spec right now, allen doesn't have to support it. extra xml in the document to describe the relationship. e.g., a derived-from element. it's doable, but I don't think it should be in the current spec. ad: can be done without making backwards incompatible changes to the current spec. aday: now I get free reign to validate the way I want to. I will be liberal in what I reject. gh: end of the spec issues we were looking at yesterday. Topic: Status report -------------------- ee: started working on gff3 parser for IGB. bo: feature filtering. using full uri's not just 'chr2'. going through biopackages.net server checking if it is up to spec. coordinates issues, mapping document, stored in extra file. gh: reference to each segment. aday: writeback server able to do delete and update now. fixed bug reported by andrew. name based query was not returning parents. gh: lincoln mentioned xml:base problem. segment/segment/ bo/aday: fixed this. aday: started impl a new server that takes any arbitrary range request. performs modulus on range request. you know that there is only certain blocks being requested, so you can use a cache. does it satisfy requested range, and return that. I always do children before parent. inserting hints on the thing that does backend parsing. gh: are you supporting multiple parents of children (e.g., multiple transcripts that share an exon)? aday: a good question. I keep track of children and multiple locations of children and then I given parents after that. after the grooming, I can have multiple hints, 'this is the end of this 15mb block'. all parents are presented. then all of my comments would be presented. gh: got out IGB release, but had to recall it, since it broke things. verifying I can write back to new and improved writeback server. if you post to a writeback server, that's also the address you should be using to get the.... a versioned source with a writeable attribute. I should be able to use that same source to both write to and retrieve from. aday: you can't retrieve gh: I have to use two different urls to do retrieve and posts. The way I think it should work: anything you write to you should be able to do retrieval as well. aday: writeable=yes attribute, and go over here and write. should be ok. thinking about using redirection under the covers. gh: resolving new ids mapping to das-private ids, editing is working on client side. sc: worked on info page for affy das servers. Generating new drosophila alignment data for Ann. gh: had trouble hooking up exon chp data with new binary formatted exon data you generated (gregg's new bp2 format for exon data). could be that I have only control probes and they are not in your data. [A] steve will check to see if there are any control probes in the exon array data. ad: I got the validation server back up and running. will work on sequence retrieval spec. question: does spec guarantee that seq will be upper or lowercase? gh: no, fasta can be either. gh: spec docs don't have date stamp, eg, writeback document. this is useful to see if it has been updated. [A] andrew will put date stamp back in spec docs that don't have it. From dalke at dalkescientific.com Thu Aug 17 23:16:49 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 01:16:49 +0200 Subject: [DAS2] SEGMENTS does not have a "uri". Message-ID: <6b674ebdfbd129ae2d20686f9ba174e4@dalkescientific.com> I just noticed that the SEGMENTS element in the segments document does not have a "uri" attribute. That doesn't seem right so I added it to the schema. I committed that and the change for FORMAT elements under each SEGMENT. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Aug 17 23:24:12 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 01:24:12 +0200 Subject: [DAS2] das2 reference server Message-ID: <1d186e51a531d835afb67910a28f5f36@dalkescientific.com> Here's the experimental reference server I've been working on. http://cgi.biodas.org:8081/seq/?format=html The entries for # fly_42 * 2L (22407834 bases): * 2R (20766785 bases): * 3L (23771897 bases): * 3R (27905053 bases): * 4 (1281640 bases): * X (22224390 bases): # fly_43 * 2L (22407834 bases): * 2R (20766785 bases): * 3L (23771897 bases): * 3R (27905053 bases): * 4 (1281640 bases): * X (22224390 bases): # worm_160 * I (15072418 bases): * II (15279314 bases): * III (13783677 bases): * IV (17493785 bases): * V (20919396 bases): * X (17718851 bases): * Mit (13794 bases): # worm_161 * I (15072418 bases): * II (15279314 bases): * III (13783677 bases): * IV (17493785 bases): * V (20919396 bases): * X (17718851 bases): * Mit (13794 bases): # worm_162 * I (15072418 bases): * II (15279314 bases): * III (13783677 bases): * IV (17493785 bases): * V (20919396 bases): * X (17718851 bases): * Mit (13794 bases): should be real. The others are part of my test set. It even validates. Amazing that. One thing - I've created a new document type which lists all of the "segments" documents available from the reference server. (nomenclature: I'm using "assembly" to mean "a collection of segments". I know, it isn't really an assembly. I'm using it for now because I didn't like using "segment", "segments", "segments_list" instead using "segment", "assembly", "assemblies" Ideas on a better name? ) This should be a sources document instead. Haven't gotten there yet. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Aug 17 23:26:43 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 01:26:43 +0200 Subject: [DAS2] SEGMENTS does not have a "uri". In-Reply-To: <6b674ebdfbd129ae2d20686f9ba174e4@dalkescientific.com> References: <6b674ebdfbd129ae2d20686f9ba174e4@dalkescientific.com> Message-ID: <6dde1e7303796ce01fa641b98dd254d4@dalkescientific.com> > I just noticed that the SEGMENTS element in the segments > document does not have a "uri" attribute. That doesn't > seem right so I added it to the schema. Shouldn't the SEGMENTS element also have an optional "reference" attribute? Take a look at http://cgi.biodas.org:8081/seq/fly_42/?format=html to see a real-world record. It feels like there should be a reference="http://www.flybase.org/genome/D_melanogaster/R4.2" in there some place. Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Thu Aug 17 23:37:00 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 17 Aug 2006 16:37:00 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day four, 17 Aug 2006 In-Reply-To: Message-ID: Following up on a side-topic that came up briefly in morning's teleconf, > aday: now I get free reign to validate the way I want to. I will be > liberal in what I reject. here's a post I made to a thread on the bioperl list last regarding aberrant fasta files (another reason why to not standardize das/2 sequence responses on fasta format): http://bioperl.org/pipermail/bioperl-l/2005-July/019407.html Another cited source of this philosophy is from the TCP spec (section 2.10) as the Robustness Principle: Be conservative in what you do, be liberal in what you accept. http://www.faqs.org/rfcs/rfc793.html I actually think it has wider appeal beyond software design or electronic devices, but I'll save that discussion for later... Steve From dalke at dalkescientific.com Thu Aug 17 23:59:32 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 01:59:32 +0200 Subject: [DAS2] Notes from DAS/2 code sprint #3, day four, 17 Aug 2006 In-Reply-To: References: Message-ID: > [A] Ask Ann Loraine if this solution is sufficient. I tried calling her cell number but got a fax machine (or an old modem). Perhaps I have the wrong number ? > [A] andrew will put date stamp back in spec docs that don't have it. Done. Also, for some reason das2_stylesheet was never added to version control so I went and did that too. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Aug 18 00:31:57 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 02:31:57 +0200 Subject: [DAS2] Notes from DAS/2 code sprint #3, day four, 17 Aug 2006 In-Reply-To: References: Message-ID: <0e9f6ceda2ecb02e53d5c23c248fd5cf@dalkescientific.com> > here's a post I made to a thread on the bioperl list last regarding > aberrant > fasta files (another reason why to not standardize das/2 sequence > responses > on fasta format): > > http://bioperl.org/pipermail/bioperl-l/2005-July/019407.html In it you said: > I found a recent presentation on the FCC site showing results of a > survey > about whether part 15 stifles innovation (10/14 respondants said no, > and 9/5 > said more stringent regulations might even permit *more* innovation): Okay, I downloaded the PPT. Those questions are biased. 1. asks "is it too limiting" and doesn't ask "is the current standard okay" or "is it too lenient." Consider the population sample of existing members of the technology advisory committee. What selection bias is present there? 1b. Could more stringent regulations, insuring that there will be no unknown types of interference, permit additional innovation? Note the "could", not "would .. likely increase innovation". This could be answered "yes" if there's only a 5% change of it happening. 2. "Should the FCC deal with interference issues with licensed services in a different way." Okay, I agree with that one. Depends on what "different way" means though. 3a. .. I still don't know what a Part 15 device is. Does that include wireless? does it include interference from when I nuke something? 3b. Can home users be guaranteed that there will be no interference from, or to, users in nearby homes or apartments? Huh? Even with FCC Part 15 or whatever there's no guarantee. There's no guarantee on anything. Someone else could pull the cover off an old computer causing extra interference. Of course the answer to this is "no". Even under threat of capital punishment there's no guarantee. 4. In a spectrum with no rules, can individual users be assured of effective communications? What does "no rules" mean? Does existing wireless service count as "no rules"? Yet it "is certainly innovative." BTW, I think the FCC should allow micropower radio stations. Those are not allowed because of the concern that the stations would interfere with larger commercial stations. I don't think those are technically valid. I think they are more to preserve the investment made by commercial station owners. I also don't think the FCC regulates noise from commercial stations well enough, and lets problems persist for years. > Another cited source of this philosophy is from the TCP spec (section > 2.10) > as the Robustness Principle: > Be conservative in what you do, be liberal in what you accept. > > http://www.faqs.org/rfcs/rfc793.html I remember now this came up in bioperl in .. 1999? I was complaining about file formats. Ewan mentioned that principle. My complaint was that bioperl's (and others') parsers are usually quite liberal, but so is the output format generation. Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Fri Aug 18 19:15:33 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Fri, 18 Aug 2006 12:15:33 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day five, 18 Aug 2006 Message-ID: Notes from DAS/2 code sprint #3, day five, 18 Aug 2006 $Id: das2-teleconf-2006-08-18.txt,v 1.2 2006/08/18 19:14:11 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: Spec concerns --------------------- ad: segments doc (not 'segment') top-level element is missing three fields, one is uri (I added). second is reference (a collection corresponding to a dataset). seemed less useful since it's already mentioned in vsource document. I added id to schema, not spec yet. last thing: missing a doc_href, for each segment ok, but we can't say, here's doc for human. gh: optional? ad: yes. gh: if optional doesn't change server impl. uri for segments is specified in segment capability. gh: my only objection is spec churn. gh: question about writeback spec: what you're supposed to do if you remove an exon from a txt, you are supposed to have a delete element in post that deletes that id. ad: yes gh: if you just have that delete, does that force parent to remove it's child, or do you also have to have the parent in there? ad: everything in that relation has to be sent. gh: in that example, if you have a delete for that exon, you have to return the rooted hierarchy as well with txt not having that part element. ad: yes gh: what if you create a curation with three exons in it, you then decide to delete the middle exon. server gets post with same annotation, but exon is missing and parent is not pointing to it as a part. is that legal? ad: nothing that says delete? gh: no ad: i think it should be illegal. if you have three generations. grand parent and grand child with no intermediate. also illegal. gh: server will have to catch these things. ad: easy. just check whether all ids involved are representing something on the server, if so, you delete old, update new. gh: allen, will your server catch this? aday: if you modify something, it already has to check before it gets deleted, i can just reject it. now I say, you modified it, here are the things that are modified by your request. gh: [drawing] d:a-----b-----c -> d:a----------c , b read this as: transcript d has exons a,b,c three exons attached to a txt, never indicated that anything was deleted, I just re-wrote the feature as a--------c gh: this should throw an error, since you didn't explicitly delete b. aday: what's wrong with leaving d dangling? ok to not mention the missing exon ad: one is to keep it there, one is to delete automatically, gh: if keep does it have pointer to parent? that's enough to tell db it's not connected? aday: yes, it becomes an orphan. you should get back a message, "hey you affected all of these features." so client can see what your modification affected. you'll know from response what was affected by deletion you performed. gh: if you now submit a new transcript named e containing a and c: e:a-----------c ad: so annotation 'd' will come back as saying, "was deleted" aday: my response tells you everything that needs to be updated. you might see things that need to be cleaned up that weren't expected. ad: python maxim: when in doubt, refuse temptation to guess. you're guessing it makes sense to leave orphans around ad: if it's ambiguous, should be not supported. gh: from allen's side, it might be hard to catch and call error. aday: no I can catch. i track all changes caused by client request. I have to track all changes made, see if it was present in the submitted document, if not, an error. just another level of tracking. can do. gh: if this is what you wanted to do, client would submit, write b (with no txt as parent), write d with txt as parent. and no delete to get this d:a-----b-------c -> a-----------c + b gh: if you really want to get rid of children, you need to specify both parent and child. gh: approach on client. I do on client. curational model is that you are never really editing locations or parent child relations ships, you are just making successors, so I keep this version chain. not deleting old ones on server (that is the plan though). aday: every edit does a delete and create on server. that's very transactional. can you keep track of it in memory. gh: yes. user has to request writeback. any number of edits between one and the next. once you've committed you can rollback on client. aday: everything is pruned off in client? gh: no you need redo. aday: redo is not considered saved unless you save again. gh: if you re-edit after a undo, you can't redo. no branching. aday: just keep track of recent save point. gh: todo: keep modification dates. so if there were no edits since the last save then there's no need to write back to the db again. gh: if you want something deleted, you must explicitly do it. if you want to delete it do this: * delete b * write d:a------------c if you want to orphan it do this: * write b with no parent * write d:a------------c Topic: semantics of insides and overlaps as they relate to parent-child ------------------------------------------------------------------------- gh: this is a continuation from yesterday's discussion we had offline. bring up spec, feature filters. see part that says, "any part of a complex feature that is one with parents... then all parts are returned". that's wrong. you do an insides query, you only get back things that are inside. two exons in a txt, one is inside, one is not inside. ad: gh: if it has no location, it's never going to be returned by a range query. ad: by type q gh: if multiple locations on the feature.if one of those locations is inside the range query it passes. sc: gh: not the same as multiple locatsion -- aligns to multiple places in the genome. top level parent of a feat hierarchy must have a location that passes one of the location in the range query. one of the locations has to pass the range filter. and it is at the top level of the hierarchy. aday: think of this: locations are cols in matrix, filters are rows. in order for column to qualify, the entire row must be true. ad: different people may have modeled it differently. may get only part of it back. gh: if two servers model the same data differently you may get different answers back. that's the way it goes. ad: annotation contains features. returns all annotations that match the query. gh: don't add notion of some other object that is sort of a feature, but is really a group of feats. aday: i call it a feature group. range filters operation on the group. gh: we don't need to have a special designation. it's just a feature with no parents. what your're calling a feature group. aday: all things under the parentless feature is the group. ad: yes aday: not identical to the root, it's the root plus all attached things. gh: to clarify things in the spec, maybe call it annotation/feature group, maybe ok. ad: all things connected by a parent-part relationship. return the entire feature group. gh: change: root of the feature hierarchy matches (range filters) the root of the group has to pass all the feature filters in the range query. ad: you want the root to be guaranteed to have locations if any sub feats have location. featureless roots. aday: no way to retrieve based on location. weird. parent with no location. gh: not weird. bounds of gene are fuzzy. they'll spell out bounds of exon but not the gene we can say the highest level with location. we can say that if children has location, then parent has. ad: put all children ranges in the root. gh: ok. no children should never have locations outside their parent. ad: old conversation: is this single or multiple rooted. single is easier to understand. but there is a use case for multiple locations. now we say the single root must be union of all it contains. gh: inclusive, not necessarily union. ad: software check will be needed gh: you don't want someone submitting exons that are outside bounds of a transcript. dangerous to have children outside location of parent. aday: true for bioperl ad: for only root, or intermediate? aday: every intermediate gh: only acceptible if you want to punt on location of upper level thing whose location isn't well understood (gene). aday: feature 100-200, locationless thing attached to it.. gh: if you have locationless, they need to be locationless up to the root. maybe we should not allow that for now. if you have a locationless feature, it's locationless all the way down and all the way up. meets requirement for gene das. ad: don't understand why this restriction needs to be there. ee: we want it. gh: you cannot have children outside bounds of their parents and their parents recursively. to me, that needs to happen. question: can you have children with location that have parents that are locationless? ad: why parents that don't overlap child location? gh: throws off our range filter mechanism. no easy answers to ad: if any children meet criteria, then they all get returned. gh: they you get back features that don't meet sc: lets say you're editing an exon... gh: forget editing. just basic reading. there was ambiguousnes in old spec here that I want to kill. I've seen desire to have locationless thing above, but never the reverse: definitive location above but locationless below. gh: we hashed this out in last code sprint. let's complete it! ad: if any feature matches, then all features match. includes the situation if parent has no location, but child matches, that implcitly matches. my proposal was to return all things in feat group if any one of the features match. same as assuming all parents have location of their children. this search will get back the parent. returning the feat group is a way to say all parents implicitly include locations of their children. aday: not all parents, multiple roots. gh: they all must go to a single root. aday: if any location of the root of group matches, then the whole group matches. boils down to: are descendent feats are allowed to be outside the bound of parent. gh: [insides query example on board] aday: the query is on the feature group root features ad: I don't remember allowing range queries being allowed only on root elements. two exons that are very far apart. query hits in between them. gh: parent meets overlap, return them all. ad: parent has only two small locations, not one large locations. gh: modeled as multiple small locations, not child features. sc: so it doesn't include the interveneing sequence. aday: gh: cannonical example of mult location stuff: 25mer probe that hits 4 diff locations in genome. multiple alignments, where none of the alignments align to the whole thing. aday: two probe pair, only some of the children are in the region. ad: example: protein structure catalytic group, three residues on different chains. gh: mult locations of probe set, one location falls inside query, return the probe set why can the rule be ad: besides range searches: when you find that a feature matches title or curator name, do you return back just the matching feats or the group? gh: don't see why we can't add more rules. aday: name search and exon is named, return it's parents. ad: so for any searches besides ranges, it returns all features in the feature group. gh: different behavior for range queries. they already have different behavior that other queries. ad: my criteria, if any feature matches, then all features in group are returned, except that in range query, only this that match the range query are returned. gh: don't see why you have a problem with that requirement. ad: do the search on all features, root is not special, if any feat match, get all features in group, if a range filter, then get features that pass. if a filter, then full hierarchies are not returned, only those that pass filter. gh: don't like. do an overlaps, two exon are in, two are not. you send back only the txt and the two that are, you are depriving user of data, there's no way of know that it's missing, how can they get at it? ad: i'm confused. in system you want, you return back everything? gh: yes. everything that has a root with one location that matches all range filters. if the root of the feat group meets range criteria for at least one of it's locations. aday: and any name filter ad: root has no location info, but one of exons overlap, whole thing returned. ee: distinction between olap and includes, different if parent lacks location info. aday: gregg needs for range optimizations. name may matches, but feat location may not, but root of group may ad: specified in root node. not convinced we need locationless features that aren't descented. gh: we're not talking about locationless nodes now. parent has location, that's all you need to search on. ad: use pieces, or whole range? gh: the whole range, not piece by piece. ad: why aday: there can be things gh: I argued against having mult locations, caused problems in bioperl, children with locations, and mult locatable features. so I didn't want to have mult locations, but got voted down. only thing it makes sense: when you want one feat to represent one feature to represent an alignment to things on genome. OK to represent with mult locs, but better to not. aday: offsets relative to the root. gh: no. will confuse people a lot. ad: any annotations that will go on mult segments in dna world. aday: blast results, very common. gh: every blast hit is a separate feature, avoids the problem. I use them in transforms, so I can say this feature maps to different genome assemblies. fine in a data model. but causes problems when it's in a spec, hard to describe when you should use one vs the other. aday: what rules do you use internally? gh: i know it when i see it. ee: in genometry, these are equivalent regions on these genomes. gh: right. the length of the range is the same length can be identical, but seq is different. genometry doesn't care about sequence identity. "this part of hg17 is equivalent to hg18". but this is getting tangential. ad: question is what do you do for things that are mult segments. example where parent is wider than children aday: you don't know where 3' end it gh: haplotype block for a set of snps, you know it extends to the next block, so the block is bigger than the bounds of the snps used to construct it. ad: curation tool, marked off three regions, one thing can extend over a broader range. tool automatically inserts. allows curator to stretch it out as need be. sc: this is what fuzzy locations are used for at genbank. gh: we don't have fuzzy locs. no needs for these at present. ad: implicitly the parent is the min-max ov its children. a db could optimize that way. curation tool gets data back from server. does curation tool know to change the parent range or not? gh: it better ad: if user changes the min/max exon bounds, will tool know to adjust parent transcript? the txt could be left extending past the current location of these. gh: up to the client app to figure it out. a smart gui should say, you cannot extend the txt past the exons you have, but for a genotype block, it might allow such a change. in theory, your client would understand what elements in the sequence ontology you could do it and what you could not. ee: this is outside the spec. should say it's possible for parent to extend beyond bounds of children, and not possible for childre to be outside of parent. ad: which of these can be on multiple segments? gh: if we're going to have mult locs, then everything can. ee: if child can, then parent can. aday: an argument for doing relative offsets I suggested. only allow parents to have relative offsets to children. no duplication of data. gh: duplication of data is a red herring. ad: more error prone to checking a string to see if it matches. hard to extend the parent to be a bit wider than children, gh: range queries to apply to root of featu hierarchies, and at least one of the children to pass all range filters? ad: why is this diff than requirement I gave? gh: your's give back partial feature groups. it's allowing filters to apply to any of the children , not just the root. ad: only difference is if you have two widely spaced features, everything has an implicit convex hull. if your query hits the midddle. gh: [whiteboard drawing] +-----------+ exon a in transcript c +----------------+ exon b in transcript c |______________| | inside query ee: for overlaps you would include the parent, for inside query you would not. ad: how will software guarantee this? min-max or just union of the children. ee: min-max of all children. ad: should be in the spec. gh: allen: how do you do min and max of mRNA, implicit or explicit? for me, it's explicit. aday: explicit. ee: using gff1 where it's implicit, but our parsers force it to be explicit in our data model. aday: in gff3 it can be implicit (using '.'). gh: gff, bed, psl, xml formats, raw blast output -- all explicit. ad: does server verify that it meets this criteria. each feature comming in, if it has parent it can only have one segment id. for eeach segment in the parent, find each one that matches the range in the child, if any child has segment x, only one location on segment x aday: can have mult locs on the same segment. ad: why not model as one range? aday: need to create the parent in two locations. gh: as long as one loc of parent contains the loc of the child, it's ok. ad: gregg saying that aday: location only includes one instance of the children. two locations for exon a, b, c. first set of locations for these exons is different than the second set of locations for these exons. a logical grouping not simple collection of all parts. mult locations on the same segment is harder. check location of parents, rify that no two childs. ad: spec now allows for dumb servers. by putting this extra requirements, it doesn't make server easier, complicates life on clients. gh: it makes clients life simpler. aday: location as two additional attribs: group, rank group - groups things together that are in the same segment rank - prioritized location conceptual grouping of things, to know which child locs match up with which parent locations, because locations can overlap. gh: (aside) can you make them multiple feats rather than diff locations? when it comes out as das2xml. ad: need to mention to lincoln and berkeley folks. specify what the algorithm is to Topic: status reports --------------------- gh: doing writeback to allens writeback server. create new annot, edit location, add, remove, extend exons, can write them all back. keeps creating new features in the db instead of editing the ones that are there. plan: delete the old annot in the same doc that edits the new one. aday: so you're leaving lots of old annots around. aday: finishing touches. old uri - new uri mapping, so gregg knows. fixing bugs on writeback server. working on new das front end that takes incoming reqest , breaks down with modulus operation with configurable blocks size, filters the results, this is for caching. working well. can convert the typical 40-50s response times down to 7s on a single megabase region. takes a while to get cache populated. todo: automatically populate cache. add code to know when a block became stale, so server can flush cache to get new stuff. bo: refactor domain factor response. found lots of hardcoded logic. went back to refactor. one object that populates hash structure of objects, handles. support for wiki stuff from lincoln, unique coord identifiers. todo: go ahead and update test suite now out of date. coord filter needs to be added in. gh: server now supports full type uris and segment uris? bo: yes, in cvs. todo make rpm package and install on production server. gh: then public release of igb can start using full type uri. bo: can communicate with you on it. gh: congrats -- end of code sprint. good to get the writeback stuff going. spec changes are little, but feels very nailed down. ad: finished off action items from yesterday. timestamp. reference server implementation. ee: still working on gff3 parser. progress nothing to report. sc: updated affy probe set alignments for drosophila arrays to be based on dm2 on our das/1 server (Ann's request). Restarted server. Worked on updating the affy das server info page in progress. todo: update the das2_server with latest improvements committed by gregg, then test the new and improved bp2 format for exon data. will need to deal with array prefix used by netaffx ('1:') rather than as used in CHP files ('HuEx:'). Post-teleconference Discussion ------------------------------- gh: would you be willing to give up multiple locations in the spec? aday: would you be willing to give up bidirectional parent-child pointers? gh: let me think about it... From dalke at dalkescientific.com Fri Aug 18 20:44:07 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 22:44:07 +0200 Subject: [DAS2] Notes from DAS/2 code sprint #3, day five, 18 Aug 2006 In-Reply-To: References: Message-ID: <3e168bac8dabbb6b9ee9cc10137ef368@dalkescientific.com> > Action items are flagged with '[A]'. I see there aren't any. > ad: need to mention to lincoln and berkeley folks. specify what the > algorithm is to I would have added this as an [A]. > Post-teleconference Discussion > ------------------------------- > > gh: would you be willing to give up multiple locations in the spec? > aday: would you be willing to give up bidirectional parent-child > pointers? > gh: let me think about it... Regarding bidi pointers (btw, should we change the tag to ?) As someone parsing GFF3 it's annoying that I have to keep any features with an ID hanging around until the end just in case someone wants to refer to it later as a parent. GFF3 does have a directive to allow flushing but so far I haven't come across a gff3 file which uses it. Having bidi links makes this trivial. As someone concerned about database integrity issue, what happens with a writeback which says "I am a child of X and Y" where X and Y were not previously connected. (Or "I am a parent of...", depending on the link direction.) Does the server allow that? Reject that? Regarding multiple locations, it's a data modeling issue. If a record Q has N multiple locations then it's identical to a record Q' with N children, Q1', Q2', ..., Qn'. The Qi' will have a different type record than Q'. As a plus, or minus, each Qi' will be annotatable, have it's own identifier, etc. which is not the case if features can have multiple locations. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Aug 18 21:55:55 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 18 Aug 2006 23:55:55 +0200 Subject: [DAS2] feature locations Message-ID: [ I hope to hear a response before the end of the sprint today. ] For those not in the phone conference call today there were several issues which didn't get resolve regarding feature locations: 1) do we need multiple locations on a feature? (vs 0 or 1 location) (I argue this is mostly a data modeling issue as I can decompose anything to a set of features with at most 1 location.) 2) if a child has a location is its parent required to have locations which includes the child locations? (currently no) 3) if #2, is the parent required to have a single location per each segment? ie, if there are children on a given segment then the parent must have a single location on that segment where start_location <= min(children.start_location) end_location >= max(children.end_location) 4) how is the feature search done? Here's what I think is the problem question. Feature X is the parent of Y and Z with Y.location = (10,20) and Z.location = (50, 60) What do you get from an overlap(30, 40) search? In the way I've been thinking about it, this returns nothing. None of the features have locations which overlap that range. I gather that others want this to return {X,Y,Z} and do so because X should be assigned the location (10, 60). X cannot be location-less. I don't know enough DNA to give an example of something for which a location makes no sense. I think in proteins. Consider X = "catalytic site" with Y and Z denoting regions essential to catalysis. The section between Y and Z has nothing to do with "catalytic site". Automatically including that range in X makes no sense. For that matter, Y and Z may be on different segments. Hence I don't like #3. It doesn't make sense for some data types. (Now it may be that certain data types must work this way. But that's up to users of features of that type. A database could enforce those cases but a dumb database shouldn't be required to know all types.) Without the extra qualification of #3 then here's a dead simple way to implement #2 - parent_locations = { all of its children locations } Hence in my test case: Y has 1 location (10, 20) Z has 1 location (50, 60) ---> X has two locations (10, 20) and (50, 60) That perfectly agrees with #2. But only because we support multiple locations. We need multiple locations because we have features which span multiple segments. Hence the additional restriction required to make #3. If #2 is in place then I'll argue that a client should only put in the union of the regions because unless it knows the type it doesn't know if the min/max single location make sense. Please let me know if I'm on the right track before going onwards with search. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Aug 18 22:03:50 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 19 Aug 2006 00:03:50 +0200 Subject: [DAS2] complex feature examples Message-ID: <503eb1ab1ac82327753e9d25c1fe119d@dalkescientific.com> I would like a dozen or two examples of features - especially complex features - in das2xml format. We discussed this last February but it didn't lead to anything. I think this would be useful for three reasons: - help make sure our spec works (any last details we forgot?) - provide better examples for the documentation - help new DAS people learn best practices Here are a few ideas: exons, blast results, haplotype block for a set of snps, probe sets, primer locations (including cut point?), predicted gene locations, repressor locations Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Fri Aug 18 23:44:05 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Fri, 18 Aug 2006 16:44:05 -0700 Subject: [DAS2] complex feature examples In-Reply-To: <503eb1ab1ac82327753e9d25c1fe119d@dalkescientific.com> Message-ID: Excellent idea Andrew. > From: Andrew Dalke > Date: Sat, 19 Aug 2006 00:03:50 +0200 > To: DAS/2 > Subject: [DAS2] complex feature examples > > I would like a dozen or two examples of features - especially > complex features - in das2xml format. We discussed this last > February but it didn't lead to anything. > > I think this would be useful for three reasons: > - help make sure our spec works (any last details we forgot?) > - provide better examples for the documentation > - help new DAS people learn best practices > > Here are a few ideas: > exons, blast results, haplotype block for a set of snps, > probe sets, primer locations (including cut point?), > predicted gene locations, repressor locations Adding to this: representing different types of alternative splicing. See this figure: http://genomebiology.com/2002/3/11/REVIEWS/0008/figure/F1 In the context of DAS, we should be able to deal with alt splicing in two contexts: 1) read-only context: representing sets of features belonging to a transcript that exhibits alt splicing, indicating which features belong to which variants and that the variants are related. 2) writeback context: being able to add alt splice information about a transcript which originally lacked any alt splicing information, and to cover this for the various classes of alt splicing. That should give the spec a good workout. Steve From Ed_Erwin at affymetrix.com Fri Aug 18 23:33:44 2006 From: Ed_Erwin at affymetrix.com (Erwin, Ed) Date: Fri, 18 Aug 2006 16:33:44 -0700 Subject: [DAS2] feature locations Message-ID: I think all of us this morning, except you, want 2) Yes, parent region must encompass all child regions 3) Yes, a single segment that encompasses all child regions 4) In your example: overlaps(30,40) returns the whole parent and child inside(30,40) returns neither the parent nor the child The user (client) is responsible for asking for things that make sense. For mRNA transcripts and exons, an overlaps query is sensible. Here is my two cents about the "catalytic site" you talk about.... I agree that a "catalytic site" such as you describe requires some thought. But it requires thought from the curator on how to describe it, not smartness of the DAS server itself. If the catalytic site is composed of parts of exons on a single mRNA, they should be maybe be put into a parent-child relationship. If different components of the catalytic site are on different mRNAs that fold-up and combine into a complex compound (like hemoglobin) then the parts that are on different mRNAs probably should be treated as different features. Or even more simply, there could be a feature type "catalytic site component" that can be a "part of" an exon. Anyway, that is *my* opinion. #2 Yes, #3 Yes, and #4 the annotator is responsible for being smart. I can at least see now why you think there might be a problem, but I don't agree that it is a problem. -----Original Message----- From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open-bio.org] On Behalf Of Andrew Dalke Sent: Friday, August 18, 2006 2:56 PM To: DAS/2 Subject: [DAS2] feature locations [ I hope to hear a response before the end of the sprint today. ] For those not in the phone conference call today there were several issues which didn't get resolve regarding feature locations: 1) do we need multiple locations on a feature? (vs 0 or 1 location) (I argue this is mostly a data modeling issue as I can decompose anything to a set of features with at most 1 location.) 2) if a child has a location is its parent required to have locations which includes the child locations? (currently no) 3) if #2, is the parent required to have a single location per each segment? ie, if there are children on a given segment then the parent must have a single location on that segment where start_location <= min(children.start_location) end_location >= max(children.end_location) 4) how is the feature search done? Here's what I think is the problem question. Feature X is the parent of Y and Z with Y.location = (10,20) and Z.location = (50, 60) What do you get from an overlap(30, 40) search? In the way I've been thinking about it, this returns nothing. None of the features have locations which overlap that range. I gather that others want this to return {X,Y,Z} and do so because X should be assigned the location (10, 60). X cannot be location-less. I don't know enough DNA to give an example of something for which a location makes no sense. I think in proteins. Consider X = "catalytic site" with Y and Z denoting regions essential to catalysis. The section between Y and Z has nothing to do with "catalytic site". Automatically including that range in X makes no sense. For that matter, Y and Z may be on different segments. Hence I don't like #3. It doesn't make sense for some data types. (Now it may be that certain data types must work this way. But that's up to users of features of that type. A database could enforce those cases but a dumb database shouldn't be required to know all types.) Without the extra qualification of #3 then here's a dead simple way to implement #2 - parent_locations = { all of its children locations } Hence in my test case: Y has 1 location (10, 20) Z has 1 location (50, 60) ---> X has two locations (10, 20) and (50, 60) That perfectly agrees with #2. But only because we support multiple locations. We need multiple locations because we have features which span multiple segments. Hence the additional restriction required to make #3. If #2 is in place then I'll argue that a client should only put in the union of the regions because unless it knows the type it doesn't know if the min/max single location make sense. Please let me know if I'm on the right track before going onwards with search. Andrew dalke at dalkescientific.com _______________________________________________ DAS2 mailing list DAS2 at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/das2 From Ed_Erwin at affymetrix.com Sat Aug 19 00:16:25 2006 From: Ed_Erwin at affymetrix.com (Erwin, Ed) Date: Fri, 18 Aug 2006 17:16:25 -0700 Subject: [DAS2] complex feature examples Message-ID: Blechhhhhh! Just my 2 cents, but I think the graphics in that figure are more confusing than useful. In all those cases, it would be simpler to just show all the observed transcripts separately. In some cases, the number of possible transcripts may be very large, but so be it. There doesn't need to be any pointers relating the different transcripts to one another. They might be given similar, or the same, gene names, but the fact that alternative splicing is going is on clear from the fact that there are overlapping exons and doesn't need to be explicitly mentioned. (Textual annotations can say which type is seen in which tissue, etc.) -----Original Message----- From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open-bio.org] On Behalf Of Steve Chervitz Sent: Friday, August 18, 2006 4:44 PM To: Andrew Dalke; DAS/2 Subject: Re: [DAS2] complex feature examples Excellent idea Andrew. > From: Andrew Dalke > Date: Sat, 19 Aug 2006 00:03:50 +0200 > To: DAS/2 > Subject: [DAS2] complex feature examples > > I would like a dozen or two examples of features - especially > complex features - in das2xml format. We discussed this last > February but it didn't lead to anything. > Adding to this: representing different types of alternative splicing. See this figure: http://genomebiology.com/2002/3/11/REVIEWS/0008/figure/F1 From Steve_Chervitz at affymetrix.com Sat Aug 19 00:21:10 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Fri, 18 Aug 2006 17:21:10 -0700 Subject: [DAS2] Notes from DAS/2 code sprint #3, day five, 18 Aug 2006 In-Reply-To: <3e168bac8dabbb6b9ee9cc10137ef368@dalkescientific.com> Message-ID: Andrew wrote: >> Action items are flagged with '[A]'. > > I see there aren't any. > >> ad: need to mention to lincoln and berkeley folks. specify what the >> algorithm is to > > > I would have added this as an [A]. Yep. The discussion was going fast and furious. Didn't have time to flag these, as I was trying to follow the discussion and contribute as well. Here's some actions items to add in retrospect: [A] Steve will set up an emacs macro for flagging action items easily [A] Andrew will go through the notes and identify action items Cheers, Steve From dalke at dalkescientific.com Sat Aug 19 02:49:27 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 19 Aug 2006 04:49:27 +0200 Subject: [DAS2] feature locations In-Reply-To: References: Message-ID: <9c4ce2413e467975628c5ba12ba64cfa@dalkescientific.com> Ed: > I think all of us this morning, except you, want > > 2) Yes, parent region must encompass all child regions > 3) Yes, a single segment that encompasses all child regions > 4) In your example: > overlaps(30,40) returns the whole parent and child > inside(30,40) returns neither the parent nor the child That's what I figured was the case. > The user (client) is responsible for asking for things that make sense. > For mRNA transcripts and exons, an overlaps query is sensible. Isn't the client also responsible for making sure the features makes sense? (Possibly validated in the server.) In the case which comes up most often - transcripts and exons - it makes sense that the client give locations to both the transcript and the exons. For that feature type doing #3 is right. I'm not convinced that it's correct for the general case. > Here is my two cents about the "catalytic site" you talk about.... I can come up with more examples in the protein world. "Surface residues". "S-S bonded residues". These don't require 3D structure for visualization. Eg, I should be able to see "surface residues" highlighted differently than others even on a 1D display. Useful when homology modeling. > I agree that a "catalytic site" such as you describe requires some > thought. But it requires thought from the curator on how to describe > it, not smartness of the DAS server itself. If the catalytic site is > composed of parts of exons on a single mRNA, they should be maybe be > put > into a parent-child relationship. If different components of the > catalytic site are on different mRNAs that fold-up and combine into a > complex compound (like hemoglobin) then the parts that are on different > mRNAs probably should be treated as different features. Or even more > simply, there could be a feature type "catalytic site component" that > can be a "part of" an exon. (Naming ambiguity: "treated as different features" or "treated as different feature groups"? Per today's discussion I would have them be different features in the same feature group.) Well, I was thinking of proteins, and an annotation which is more properly part of a structural assembly. To make my objections less needlessly complex, the site residues can all be on the same chain. For that case it still does not make sense to have a parent feature have a location across all intermediate residues. If a the two cysteines of a S-S bond are at 22 and 98 then an overlaps search of (30,50) should not return the S-S bond information. Arguing proteins is wrong because they are so small. Nearly everyone will download everything and not do range searches on the server. Perhaps that's why my intuition is leading me astray.... I've been trying to come up with some more DNA-centric examples. I really don't know the domain well enough. What about: Some genes have multiple promoters. EPD puts those into a "promoters group". See http://www.epd.isb-sib.ch/current/AP.html for the known cases. Here are three members from one group FP Rn IGF II E1P1 :+R EM:X17012.1 1+ 18227; 28008. 036*1 FP Rn IGF II E2P2 :+S EM:X17012.1 1+ 19978; 25032.137 036*2 FP Rn IGF II E3P3+:+S EM:X17012.1 1+ 21966; 25033.155 036*3 The docs at http://www.epd.isb-sib.ch/current/usrman.html say these have position numbers of 18227, 19978, 21966. Would it be reasonable to want to annotate this as a "promoters group" using a single DAS2 feature group? If so, should the parent include the portions between the three promoters? Genbank is notorious for its complex annotations. I looked for interesting things (non-gene/CDS/exon/intron records). Here are a few The D-loop from a cow's mitochondria http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? db=nucleotide&val=27543905 D-loop join(15791..16337,1..362) D-loops appear to be a feature where it does not makes sense to have the parent join the intermediate sequence. The cat mitochondria record (I"m scanning gbmam hence cow and cat) at http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=1098523 has a feature misc_feature join(16315..17009,1..865) /note="control region; CR" but I can't figure out what that means. Jumping to another file, here's one from Tobacco leaf curl Japan virus http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=8096283 stem_loop join(1..19,2754..2761) That's a nice structural example. Strange that it's in two sections. Perhaps that only works because the first section is terminal? This example points out a class of RNA and ssDNA annotations on shape, like pseudoknots, which are essentially structural. Oh, and then there are functional RNA structures like ribozymes structures where you might annotate the functional regions, but that's back to the realm of the small. I have managed to convince myself that the difference in viewpoints is because of a difference in molecular expectations. DNA really doesn't do all that much. It sits there and gets transcribed. There are some structurally interesting regions but nothing like what protein has or does. RNA and ssDNA are more interesting, but they are small. I did come across a paper titled "DNA supercoiling allows enhancer action over a large distance" where it was best to think of the 3D structure of DNA, but that sort of thing is rare. How portable should the FEATURE structure from DAS2 be for 2D protein annotations? In the way I've been thinking of it it's quite portable. With this "parent locations must overlap all children's locations" restriction everything but the leaf locations will likely be useless blobs in protein annotations. > Anyway, that is *my* opinion. #2 Yes, #3 Yes, and #4 the annotator is > responsible for being smart. > > I can at least see now why you think there might be a problem, but I > don't agree that it is a problem. As #3 is trivially computed from the data, the only difference I can see must be in the results from range searches done on the server. I'll write about that some other time. This email is long enough. I'm off to bed. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Sat Aug 19 13:44:47 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 19 Aug 2006 15:44:47 +0200 Subject: [DAS2] feature search algorithm (was Re: feature locations) In-Reply-To: References: Message-ID: <7e9ae58f481d989f3873acc9dd7c1159@dalkescientific.com> Given a database: Foo: (10, 60) Bar: (10, 20) Baz: (50, 60) I understand that everyone wants "overlaps(30,40)" to return {Foo, Bar, Baz}. That includes me. I question the need for the requirement that parent locations include all of the children locations. Putting that aside for now I have a question about the above structure, which we all agree is valid. What does the search overlaps(30,40) and title == "Foo" return? I think it should return nothing. There are no features named "Foo" in that range. If I understand you all correctly it should return {Foo, Bar, Baz} because the overlaps search is only done on the root feature, returning all features in the feature group, while the title search is done on on a per-feature basis. How is the server search algorithm supposed to work? Given a range search "in_range(feature)" and non-range search "is_match(feature)" (for things like title, type, etc.) then the current search algorithm can be expressed: find all features X where: feature X is in the same feature group as feature Y where: in_range(Y) and is_match(Y) As I understand from you all, the search algorithm should be find all features X where both: - feature X is in the same feature group as Y where: Y is a root element and in_range(Y) - feature X is in the same feature group as Z where: is_match(Z) Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Sat Aug 19 14:04:31 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 19 Aug 2006 16:04:31 +0200 Subject: [DAS2] locations and writeback behavior Message-ID: <473730ce334d4768aeaf34444d7b8d7f@dalkescientific.com> Assume there is the requirement that parent node locations must cover children locations. What does the server do on writeback for the following circumstances: Case #1. the parent feature has no locations Foo: -- no locations -- Bar: (10, 20) Baz: (50, 60) Here are 6 possibilities: 1. reject the writeback 2. accept it and use an implicit location of (10, 60) (implicit means the record is not modified and clients downloading feature Foo will not get any locations for it but the server will act as if it was present.) 3. accept it and use an implicit location of [(10, 20), (50, 60)] 4. accept it and insert the location (10, 60) (explict; clients fetching feature Foo will see the server inserted locations) 5. accept it and explicitly insert the locations [(10, 20), (50, 60)] 6. accept it unchanged; range searches will always fail because the root node has no locations Case #2. the parent feature has a location which does not overlap all of the children Foo: (15, 85) Bar: (10, 20) Baz: (50, 60) Case #3: the parent has multiple locations; the parent's locations overlap those of the children Foo: (10, 30), (40, 66), (543, 567) Bar: (10, 20) Baz: (50, 60) Case #4: the parent has a single location which is broader than those of the children Foo: (10, 567) Bar: (10, 20) Baz: (50, 60) Case #5: the children contain multiple locations, the parent covers them all Foo: (10, 100) Bar: (10, 20), (22, 24) Baz: (50, 60), (70, 80) The server already does some validation for cyclic detection. It can easily check for ranges as well. As I understand things the answers should be: Case #1: reject (parent must cover all children locations) Case #2: reject (parent must cover all children locations) Case #3: reject (parent can only have a single location per segment) Case #4: accept, and use the broader range Case #5: accept (leaves and leaves only may have multiple locations on the same segment) and the reasons for these answers are: - it doesn't make sense to have location-less parents when the children don't have locations - it makes the search algorithm work correctly Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Sat Aug 19 14:18:36 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 19 Aug 2006 16:18:36 +0200 Subject: [DAS2] feature search algorithm (was Re: feature locations) In-Reply-To: <7e9ae58f481d989f3873acc9dd7c1159@dalkescientific.com> References: <7e9ae58f481d989f3873acc9dd7c1159@dalkescientific.com> Message-ID: > Given a database: > > Foo: (10, 60) > Bar: (10, 20) > Baz: (50, 60) I'll modify it a bit more. Make it be Foo: type=transcript, location=(10, 60) Bar: type=exon, location=(10, 20) Baz: type=exon, location=(50, 60) What does the search for overlaps(30,40), type==exon, title==Foo return and why? I can think of three answers: 1) return everything because in the feature group there is a feature which overlaps(30,40) there is a feature which is of type exon there is a feature with title "Foo" (call this the "each query term must match at least one feature in a feature group" algorithm) 2) return nothing because there is no feature which overlaps(30,40) and has type exon and has title "Foo" (call this the "at least one feature must be matched by all query terms" algorithm. This is the current algorithm) 3) return nothing because while the root feature overlaps(30,40) there is no feature which is both of type exon and with title "Foo". (call this the "range searches are special" algorithm.) Now what does the search for overlaps(30,40), type==exon, title==Bar return and why? Using the same three algorithms: 1) return everything because each of the three criteria are matched by at least one feature in the feature group 2) return nothing because no feature matches all three criteria. 3) return everything because the root feature overlaps(30,40) and the Bar feature meets the other two criteria. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Mon Aug 21 15:46:29 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 21 Aug 2006 08:46:29 -0700 Subject: [DAS2] DAS/2 teleconference today, 9:30 AM Message-ID: We're back to our regular Monday DAS/2 teleconference today, at 9:30 AM. Mainly I'd like to summarize progress during the code sprint and discuss the few remaining spec issues. thanks, Gregg From Gregg_Helt at affymetrix.com Mon Aug 21 16:28:28 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 21 Aug 2006 09:28:28 -0700 Subject: [DAS2] feature search algorithm (was Re: feature locations) Message-ID: 4) Return the entire feature group because in the feature group: a) a location of the root of the group overlaps (30,40) b) there is a feature of type exon c) there is a feature with title "Foo" Pushing this farther: Foo: type=transcript, location=(10, 60) Bip: type=polyA-site location = (50,55) Bar: type=exon, location=(10, 20) Baz: type=exon, location=(50, 60) Search: overlaps(30,40), type=polyA-site, title=Baz Also returns the feature group because: Foo meets root overlaps(30,40) Bip meets type=polyA-site Baz meets title=Baz In trying to work backwards from what I feel multi-filter queries should return, here's the rules that seem to give me what I want: a) For range filters, the feature group passes the filter if the root of the feature group meets the range requirement. b) For non-range filters, the feature group passes the filter if any feature in the feature group meets the filter requirement. c) All filters are AND'd together gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open- > bio.org] On Behalf Of Andrew Dalke > Sent: Saturday, August 19, 2006 7:19 AM > To: DAS/2 > Subject: Re: [DAS2] feature search algorithm (was Re: feature locations) > > > Given a database: > > > > Foo: (10, 60) > > Bar: (10, 20) > > Baz: (50, 60) > > I'll modify it a bit more. Make it be > > Foo: type=transcript, location=(10, 60) > Bar: type=exon, location=(10, 20) > Baz: type=exon, location=(50, 60)> > What does the search for > > overlaps(30,40), type==exon, title==Foo > > return and why? I can think of three answers: > > 1) return everything because in the feature group > > there is a feature which overlaps(30,40) > there is a feature which is of type exon > there is a feature with title "Foo" > > (call this the "each query term must match at least > one feature in a feature group" algorithm) > > 2) return nothing because there is no feature > which overlaps(30,40) and has type exon and > has title "Foo" > > (call this the "at least one feature must be > matched by all query terms" algorithm. This is > the current algorithm) > > 3) return nothing because while the root feature > overlaps(30,40) there is no feature which is both > of type exon and with title "Foo". > > (call this the "range searches are special" algorithm.) > > > Now what does the search for > > overlaps(30,40), type==exon, title==Bar > > return and why? Using the same three algorithms: > > 1) return everything because each of the three > criteria are matched by at least one feature in > the feature group > > 2) return nothing because no feature matches all > three criteria. > > 3) return everything because the root feature > overlaps(30,40) and the Bar feature meets the > other two criteria. > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 From lstein at cshl.edu Mon Aug 21 18:50:48 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 21 Aug 2006 14:50:48 -0400 Subject: [DAS2] Fwd: locations and writeback behavior In-Reply-To: References: Message-ID: <6dce9a0b0608211150k465399dfx7147d8f90029a807@mail.gmail.com> ---------- Forwarded message ---------- From: das2-owner at lists.open-bio.org Date: Aug 21, 2006 2:49 PM Subject: Re: [DAS2] locations and writeback behavior To: lincoln.stein at gmail.com You are not allowed to post to this mailing list, and your message has been automatically rejected. If you think that your messages are being rejected in error, contact the mailing list owner at das2-owner at lists.open-bio.org. ---------- Forwarded message ---------- From: "Lincoln Stein" To: "Andrew Dalke" Date: Mon, 21 Aug 2006 14:48:55 -0400 Subject: Re: [DAS2] locations and writeback behavior I think that the server should accept the locations given for features without checking that the children are contained within their parents' coordinates. This is because there are genomic features that are discontinuous. Lincoln On 8/19/06, Andrew Dalke wrote: > > Assume there is the requirement that parent node locations must > cover children locations. What does the server do on writeback > for the following circumstances: > > Case #1. the parent feature has no locations > > Foo: -- no locations -- > Bar: (10, 20) > Baz: (50, 60) > > Here are 6 possibilities: > 1. reject the writeback > 2. accept it and use an implicit location of (10, 60) > (implicit means the record is not modified and clients > downloading feature Foo will not get any locations for it > but the server will act as if it was present.) > 3. accept it and use an implicit location of [(10, 20), (50, 60)] > 4. accept it and insert the location (10, 60) (explict; clients > fetching feature Foo will see the server inserted locations) > 5. accept it and explicitly insert the locations [(10, 20), (50, 60)] > 6. accept it unchanged; range searches will always fail because > the root node has no locations > > Case #2. the parent feature has a location which does not overlap > all of the children > > Foo: (15, 85) > Bar: (10, 20) > Baz: (50, 60) > > > Case #3: the parent has multiple locations; the parent's locations > overlap those of the children > > Foo: (10, 30), (40, 66), (543, 567) > Bar: (10, 20) > Baz: (50, 60) > > > Case #4: the parent has a single location which is broader than > those of the children > > Foo: (10, 567) > Bar: (10, 20) > Baz: (50, 60) > > > Case #5: the children contain multiple locations, the parent > covers them all > Foo: (10, 100) > Bar: (10, 20), (22, 24) > Baz: (50, 60), (70, 80) > > > The server already does some validation for cyclic detection. > It can easily check for ranges as well. As I understand things > the answers should be: > Case #1: reject (parent must cover all children locations) > Case #2: reject (parent must cover all children locations) > Case #3: reject (parent can only have a single location per segment) > Case #4: accept, and use the broader range > Case #5: accept (leaves and leaves only may have multiple > locations on the same segment) > > and the reasons for these answers are: > - it doesn't make sense to have location-less parents when > the children don't have locations > - it makes the search algorithm work correctly > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Mon Aug 21 19:17:14 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 21 Aug 2006 15:17:14 -0400 Subject: [DAS2] feature locations Message-ID: <6dce9a0b0608211217s7b67d889q8f0570a378a5b2e5@mail.gmail.com> From: "Lincoln Stein" To: "Andrew Dalke" Date: Mon, 21 Aug 2006 15:04:09 -0400 Subject: Re: [DAS2] feature locations On 8/18/06, Andrew Dalke wrote: > > [ I hope to hear a response before the end of the sprint today. ] > > For those not in the phone conference call today there were several > issues which didn't get resolve regarding feature locations: > > 1) do we need multiple locations on a feature? (vs 0 or 1 location) > (I argue this is mostly a data modeling issue as I can > decompose anything to a set of features with at most 1 > location.) Yes, because a feature may be discontinuous. This feature won't be used very often, however, and simple servers might simply refuse to handle such features. 2) if a child has a location is its parent required to have > locations which includes the child locations? (currently no) No. Parent/child relationships are defined by functional/biological relationships and not by genomic coordinates. For example, a C. elegans transcript is assembled from discontinuous regions of the genome (the mRNA on one chromosome, the spliced leader on the other), and enforcing restriction (2) would make it impossible to represent nematode genomes, the most populous multicellular organism on earth. 3) if #2, is the parent required to have a single location per > each segment? ie, if there are children on a given segment > then the parent must have a single location on that segment where > start_location <= min(children.start_location) > end_location >= max(children.end_location) N/A 4) how is the feature search done? A feature may have multiple locations. If any of its locations matches the range query, then the feature, plus its parents and children, is returned. There is no "transitive" matching. That is, if the query consists of a feature type plus a range, then IT IS NOT appropriate to return a feature if its child matches the range and the feature itself matches the type. The query should only return a feature if both the feature's type and location matches. Lincoln Here's what I think is the problem question. > > Feature X is the parent of Y and Z with > Y.location = (10,20) and Z.location = (50, 60) > > What do you get from an overlap(30, 40) search? > > In the way I've been thinking about it, this returns nothing. None > of the features have locations which overlap that range. > > I gather that others want this to return {X,Y,Z} and do so > because X should be assigned the location (10, 60). X cannot > be location-less. > > > I don't know enough DNA to give an example of something for > which a location makes no sense. I think in proteins. Consider > X = "catalytic site" with Y and Z denoting regions essential > to catalysis. > > The section between Y and Z has nothing to do with "catalytic > site". Automatically including that range in X makes no sense. > For that matter, Y and Z may be on different segments. > > Hence I don't like #3. It doesn't make sense for some data types. > (Now it may be that certain data types must work this way. But > that's up to users of features of that type. A database could > enforce those cases but a dumb database shouldn't be required to > know all types.) > > > Without the extra qualification of #3 then here's a dead simple > way to implement #2 - > > parent_locations = { all of its children locations } > > Hence in my test case: > Y has 1 location (10, 20) > Z has 1 location (50, 60) > ---> X has two locations (10, 20) and (50, 60) > > That perfectly agrees with #2. But only because we support > multiple locations. We need multiple locations because > we have features which span multiple segments. Hence the > additional restriction required to make #3. > > If #2 is in place then I'll argue that a client should > only put in the union of the regions because unless it > knows the type it doesn't know if the min/max single > location make sense. > > > Please let me know if I'm on the right track before going > onwards with search. > > Andrew > dalke at dalkescientific.com > > ______________________________ _________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Mon Aug 21 20:24:46 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 21 Aug 2006 22:24:46 +0200 Subject: [DAS2] xml:base subtleties Message-ID: DAS2 support extensions through non-das2: namespaced XML elements. ... DAS2 supports xml:base ... For both IMAGE and MOVIE the link attribute is expanded via the enclosing xml:base, for clients which know about those elements. When a client gets the record it's supposed to treat any extension elements as character blobs and send them opaquely in the writeback. At least that's my thought. I don't know if you all have a different opinion. I believe this approach gives the best chance for compatibility so a server can serve things the client doesn't know about and still let the client do writeback for the things it does know about. (Another approach is to send diff/edit commands to the server, which we decided against. For good reasons.) Suppose there's a change to the above record. What does the client send back? The following is obviously valid but it requires the server keep track of the xml:base attributes at different levels ... change to the 'name' field ... The following should also be valid. It collapses the top-level xml:base into the FEATURE-level. ... change to the 'name' field ... It's even possible to get rid of xml:base for all the fields the client knows about. This will be most likely for servers which convert the XML into a class/object model and flatten the incoming URIs while doing so. ... change to the 'name' field ... The question is, what does the client do with the non-DAS elements where it doesn't know what to do. Should it insert xml:base attributes in them? Or is the best practice to persist the xml:base on a per-feature basis and always include the xml:base in the writeback? My feeling is it's the last paragraph. I really want the clients to treat extensions as mostly opaque things. (They can assume it's okay to remove instructions, comments, etc. and collapse whitespace.) Okay, now the other way around. Suppose the server is configured to treat unknown extensions as blobs. What does it do with the xml:base attributes? In the following the DAS2 URIs are all absolute URIs so the DAS2-specific code never even looks at the xml:base attributes. ... change to the 'name' field ... Normally on writeback the server inserts xml:base attribute into the FEATURES (and FEATURE in this case) document. It can't when the client sent in the above structure. At best it can collapse the xml:base attributes inward so they only apply to the non-DAS elements. That is, turn the above into ... change to the 'name' field ... That's not complicated but it is finicky. To summarize: if we have xml:base and support for blob elements then 1) the client must preserve xml:base on writeback 2) the server must fix up the writeback to make sure it does not conflict with the server's use of xml:base Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Mon Aug 21 21:42:30 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 21 Aug 2006 14:42:30 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 21 Aug 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 21 Aug 2006 $Id: das2-teleconf-2006-08-21.txt,v 1.1 2006/08/21 21:00:01 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'Connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: -------- Summarize progress during last week's code sprint and discuss the few remaining spec issues. Topic: Spec Discussion ---------------------- [The note taker apologizes for attending late (~30min)] gh: could a server in the types doc restrict the types. just say 'transcripts'? ls: yes. if not going to allow for searching for feature, only via parent, then types doc should only include parent. gh: types doc specifies which types you can query on. ls: ontology gives you access to all types that might come back ad: and how to depict them. gh: yes, but it can be restrictive of the types. ad: what does client do to display it? gh: implies we separate out style into stylesheet info again. no one is serving or using, so we can change w/o major impl changes. ad: type doc ties a feature to ontology, how to display it, and includes this extra source field. gh: types doc has all types server contains but tags as to what the server allows searching on. ad: feels weird. can't see why i'd want to do in my server. bo: better than limiting the types doc, just have a searchable field. ad: easy gh: if you don't say no, then it's searchable. this is backwards compatible. gh: other thing: for my optimization on client to work, need hint about particular type on a server can have children outside bounds of parent. or need the opposite: that all children are guaranteed to be within bounds. ad: can't see why this is needed. gh: can you trust me on it? ad: no. going back to the case where you ask for introns in this range and you want to return back everything. gh: the reason i need it: if children are outside bounds of parents, and i do query on parent, i never know if I'll get children outside the bounds i specified. messes up my optimization. ad: it will give you the children. gh: i want to optimize so that i don't have to get that back. i want assurance that there won't be something hanging off the region in the query. that there won't be anything outside the range that I queried. ls: that's always the case. you can do the query that somethings are outside the region you requested. you can filter things out. gh: i don't want server to send them. ls: semantically correct to always send the complete object. gh: there are optimizations on client that depend on it. ls: this will give you back more than you want. gh: i don't want to have to it filter out (defeats optimization). ad: range search for id=abc. you'll get all features in feat group id=abc. ad: modify servers gh: not so easy for servers I don't control. ls: you won't be able to convince worm or microbial communities which have features with different locations, some that are in trans on different chromosomes. gh: blat, blast, genscan, etc. majority of algorithmic seq will meet that condition. ls: if you feel comfortable going thru SO and flagging all features that meet that requirement, we can add to SO and you can use it in your optimization. gh: not necessary to modify SO. no blat, blast in SO ls: yes there are: computational matches gh: not all comp matches ls: we do have blast, you can add blat. ad: type ontology has extension area, you can add that. gh: no one will live with that. gh: will try on my server, see how it works. [A] Gregg will try flagging types on server, see if works with client optimizations ls: i have to go. gh: this will change all impls, could be trouble. ad: why does it change server impl at all? gh: where filter range applies only to nodes that meet the type filter. ls: that's the way it's in the spec now. ad: for any filter aday: if you match a range it's root feature that matches range, can reduce overhead by factor of 10-20. ls: aday: won't trigger range query because type doesn't match ls: searching over range you'd pick up exon because it's contained in the range. ad: if your server or allen's decides to model all stuctures by your logic, it won't work. there are occasions where you will have non overlapping impls in the server. gh: your right. to allen: does this affect your server impl? gh: proposal is to clarify spec to say that range queries apply only to the nodes of a feat group that pass the types filter. ad: range and non-range filters must both be true for a given feature gh: ok, as long as we can say in types doc that some types are not filtered. aday: gh: if searching for types=exon in range that's in the intron, gh: exon 1 in feature group, if it's outside range. aday: this is the way I've impl'd: find things in range, see if they match, then look for other things in other filters match. all filters operate on feature granularity, except range that operate on feature group granularity. all parents are located and encompass min/max bounds of encompassed features. gh: you get more things passing range query, but they get stopped by name, or type, or id query. ad: i'm happy with it. gh: i'm not but will go along. bo: have to leave now. [A] andrew will clarify range and type filtering logic in the spec [A] andrew will introduce concept of feature group (currently in spec as 'complex feat with children') [A] andrew will add searchable flag to type document [A] andrew will add optional circularity flag to segments document gh: Something we need todo: come back to stylesheet issue. ad: we should have impl in place before making spec work. [A] Discuss stylesheets when we have an impl in place Topic: Summarize code sprint work ---------------------------------- Focus on what people did last Friday (last day of sprint). gh: more complete write back on client. sync data model with how writeback is working: delete feat group, add back with change. then hit wall where that triggers issues in how to deal with undo/redo in client. I then did a massive chart on wall for how to deal with, now have a clear path forward. ad: Here's another issue: xml:base in writeback doc and how it interacts with extensions. server may not know extension is to be supported in writeback doc. e.g., link to image url. if xml:base in writeback doc, then you have to make sure the context of the extension that may have relative urls still preserver xml:base. seems ugly. do we say servers are free to ignore xml:base? gh: they should preserve it. ad: so if my writeback doc says features is http://biodas.org. feautures has a different one, individual features have different ones, and extensions has a different one. my impl would ignore xml:base in the data. (too complex to explain...) [A] Andrew will describe his xml:base issue with writeback and send email ad: worked on getting search algorithm to work. came up with counter examples re: parent element containing/not containing children. sc: mostly worked on notes and catching up with mailing list. Some todo items: [A] Steve Verify with Ann about new dm2-based affy das server data. [A] Steve Finish info page for data hosted by affy das servers. [A] Steve Update affy das/2 server to test new binary exon data (bp2) [A] Steve Add id to wiki page for new drosophila assembly (R5) aday: working on getting block translation server up and running. close. code to automatically set up caching and staling out the blocks. geting binary set up for onlth fly analysis servers. primer3, ncbi ePCR, blat, blast binaries on server. now need to install blat/blast dbs, can start serving up analyses. ee: [not present, but heard from Ed after meeting] - continuing work on gff3 parser for IGB client. [A] Next teleconf in two weeks (4 Sep 2006) gh: we had a successful sprint, hashed out critical decisison in the spec, got a lot of work done. [A] Next code sprint in Healdsburg at Helt Retreat Center. Possible date? Not until end of year or begin of next year (lots of construction in town). From Gregg_Helt at affymetrix.com Mon Aug 28 13:24:39 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 28 Aug 2006 06:24:39 -0700 Subject: [DAS2] DAS/2 teleconference now held biweekly Message-ID: As discussed during the code sprint and last week's teleconference, the DAS/2 teleconference is being rescheduled for once every two weeks rather the every week. So no teleconference this week, and since next Monday is a holiday in the US, no teleconference next week either. The next DAS/2 teleconference will be held Monday, September 11 at 9:30 AM Pacific time. Talk to you then! thanks, Gregg From lstein at cshl.edu Mon Aug 28 14:32:03 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 28 Aug 2006 10:32:03 -0400 Subject: [DAS2] Possibly dialing in late today Message-ID: <6dce9a0b0608280732r3363e362ue967b07a74608b02@mail.gmail.com> Hi All, I have a doctor's appointment just beforehand, so I may be a little late calling in today. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu