From ap3 at sanger.ac.uk Wed Feb 1 07:42:16 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Wed, 1 Feb 2006 12:42:16 +0000 Subject: [DAS2] code sprint final infos Message-ID: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk> Hi! This is to provide final organisatorial infos about the DAS 2 code sprint next week. - We start Monday 10:00 (Sanger time) in the Morgan building - meeting point is the small meeting room next to the kitchen 1st floor (we get a better room later). - The sanger guest wireless network supports Skype. so instant messaging and voice over IP calls will be possible during all the time. - every day at 17:00 (Sanger time = 9:00 pacific time) there will be a conference call on the usual DAS2 line Greetings, Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From allenday at ucla.edu Wed Feb 1 17:42:26 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 1 Feb 2006 14:42:26 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: I just looked over your changes, and will begin making the changes to the server repository today. I'd like to update the server at das.biopackages.net with my changes on Friday, unless there are objections. I'll be taking notes along the way and will post to the list if anything in your document is unclear to me. At first glance, I agree -- the changes are minor. -Allen On Mon, 30 Jan 2006, Andrew Dalke wrote: > Allen: > > Is the spec going to be in a stable state for the code sprint? I'd > > like > > to use this time to sync the server implementation with a stable > > version > > of the spec. It looks like there have been many substantial changes. > > I have just (within the last few minutes) completed the first draft > of the update of the spec. > > It's not in HTML - that calls for too much work for this stage. > It's text, in CVS under das/das2/new_spec.txt > > There are many parts which need clarification. These are marked > with a "XXX" along with my comments. > > The RNC files are in > > das/das2/scratch/*.rnc > along with some test XML files. These XML files are not meant > to be realistic. They are meant more to check edge cases. > > I do no think there are major changes to the spec. Most of the > changes have actually trimmed things down, like getting rid of > the "properties" subtree and merging the different "sources" requests > into a single document. > > > Here are the major interfaces > > $PREFIX/sequence - a "sources" request > This is the top-level entry point to a DAS 2 server. It returns a > list of the available genomic sequence and their versions. > [sequence-namespace] > > $PREFIX/sequence/$SOURCE - a "source" request > Returns the available versions of the given genomic sequence. > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > Returns information about a given version of a genomic sequence. > Clients may assume that the sequence and assembly are constant for a > given version of a source. Note that annotation data on a server > with curational write-back support may change without changing the > version. > > > For a given version here are the sub-parts. Note that I've gone ahead > and split the query urls (segment, features and types each have query > interfaces) from the base directory used as containers for the segments, > features and types. > > $VERSION/segments - the segments query URL; summarizes the top-level > segments in the data source > > $VERSION/segment/$SEGMENT_ID - a segment query; used to get detailed > information about the identified segment > > $VERSION/features - the feature filter query URL. Features are > locatable annotations or experimental results. The feature filter > URL supports query parameters to select a subset of the features > based on position, feature type and other properties. > > $VERSION/feature/$FEATURE_ID - a feature query; used to get detailed > information about the identified feature > > $VERSION/types - the types query URL which returns a list of all > feature types. Feature types include ontology and depiction > details for all features of the given type. > > $VERSION/type/$TYPE_ID - details about the specified feature type > > Oh, and there are internal conflicts which will be straightened > out in the next draft. These shouldn't be big. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From Gregg_Helt at affymetrix.com Wed Feb 1 18:14:30 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 1 Feb 2006 15:14:30 -0800 Subject: [DAS2] Re: Apollo and DAS/2 priorities Message-ID: That would be great if you could update the biopackages server before the code sprint starts! Then client implementers will have a server to test with. thanks, gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Allen Day > Sent: Wednesday, February 01, 2006 2:42 PM > To: Andrew Dalke > Cc: DAS/2 > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > I just looked over your changes, and will begin making the changes to the > server repository today. > > I'd like to update the server at das.biopackages.net with my changes on > Friday, unless there are objections. > > I'll be taking notes along the way and will post to the list if anything > in your document is unclear to me. > > At first glance, I agree -- the changes are minor. > > -Allen > > > On Mon, 30 Jan 2006, Andrew Dalke wrote: > > > Allen: > > > Is the spec going to be in a stable state for the code sprint? I'd > > > like > > > to use this time to sync the server implementation with a stable > > > version > > > of the spec. It looks like there have been many substantial changes. > > > > I have just (within the last few minutes) completed the first draft > > of the update of the spec. > > > > It's not in HTML - that calls for too much work for this stage. > > It's text, in CVS under das/das2/new_spec.txt > > > > There are many parts which need clarification. These are marked > > with a "XXX" along with my comments. > > > > The RNC files are in > > > > das/das2/scratch/*.rnc > > along with some test XML files. These XML files are not meant > > to be realistic. They are meant more to check edge cases. > > > > I do no think there are major changes to the spec. Most of the > > changes have actually trimmed things down, like getting rid of > > the "properties" subtree and merging the different "sources" requests > > into a single document. > > > > > > Here are the major interfaces > > > > $PREFIX/sequence - a "sources" request > > This is the top-level entry point to a DAS 2 server. It returns a > > list of the available genomic sequence and their versions. > > [sequence-namespace] > > > > $PREFIX/sequence/$SOURCE - a "source" request > > Returns the available versions of the given genomic sequence. > > > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > > Returns information about a given version of a genomic sequence. > > Clients may assume that the sequence and assembly are constant for a > > given version of a source. Note that annotation data on a server > > with curational write-back support may change without changing the > > version. > > > > > > For a given version here are the sub-parts. Note that I've gone ahead > > and split the query urls (segment, features and types each have query > > interfaces) from the base directory used as containers for the segments, > > features and types. > > > > $VERSION/segments - the segments query URL; summarizes the top-level > > segments in the data source > > > > $VERSION/segment/$SEGMENT_ID - a segment query; used to get detailed > > information about the identified segment > > > > $VERSION/features - the feature filter query URL. Features are > > locatable annotations or experimental results. The feature filter > > URL supports query parameters to select a subset of the features > > based on position, feature type and other properties. > > > > $VERSION/feature/$FEATURE_ID - a feature query; used to get detailed > > information about the identified feature > > > > $VERSION/types - the types query URL which returns a list of all > > feature types. Feature types include ontology and depiction > > details for all features of the given type. > > > > $VERSION/type/$TYPE_ID - details about the specified feature type > > > > Oh, and there are internal conflicts which will be straightened > > out in the next draft. These shouldn't be big. > > > > Andrew > > dalke at dalkescientific.com > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From allenday at ucla.edu Wed Feb 1 18:27:11 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 1 Feb 2006 15:27:11 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: That's what I was thinking too, but I was worried about the existing Genoviz clients "in the wild" having the server suddenly break. So you're saying it's okay with you if those clients have a service interruption? -Allen On Wed, 1 Feb 2006, Helt,Gregg wrote: > > That would be great if you could update the biopackages server before > the code sprint starts! Then client implementers will have a server to > test with. > > thanks, > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > [mailto:das2-bounces at portal.open- > > bio.org] On Behalf Of Allen Day > > Sent: Wednesday, February 01, 2006 2:42 PM > > To: Andrew Dalke > > Cc: DAS/2 > > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > > I just looked over your changes, and will begin making the changes to > the > > server repository today. > > > > I'd like to update the server at das.biopackages.net with my changes > on > > Friday, unless there are objections. > > > > I'll be taking notes along the way and will post to the list if > anything > > in your document is unclear to me. > > > > At first glance, I agree -- the changes are minor. > > > > -Allen > > > > > > On Mon, 30 Jan 2006, Andrew Dalke wrote: > > > > > Allen: > > > > Is the spec going to be in a stable state for the code sprint? > I'd > > > > like > > > > to use this time to sync the server implementation with a stable > > > > version > > > > of the spec. It looks like there have been many substantial > changes. > > > > > > I have just (within the last few minutes) completed the first draft > > > of the update of the spec. > > > > > > It's not in HTML - that calls for too much work for this stage. > > > It's text, in CVS under das/das2/new_spec.txt > > > > > > There are many parts which need clarification. These are marked > > > with a "XXX" along with my comments. > > > > > > The RNC files are in > > > > > > das/das2/scratch/*.rnc > > > along with some test XML files. These XML files are not meant > > > to be realistic. They are meant more to check edge cases. > > > > > > I do no think there are major changes to the spec. Most of the > > > changes have actually trimmed things down, like getting rid of > > > the "properties" subtree and merging the different "sources" > requests > > > into a single document. > > > > > > > > > Here are the major interfaces > > > > > > $PREFIX/sequence - a "sources" request > > > This is the top-level entry point to a DAS 2 server. It returns > a > > > list of the available genomic sequence and their versions. > > > [sequence-namespace] > > > > > > $PREFIX/sequence/$SOURCE - a "source" request > > > Returns the available versions of the given genomic sequence. > > > > > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > > > Returns information about a given version of a genomic sequence. > > > Clients may assume that the sequence and assembly are constant > for a > > > given version of a source. Note that annotation data on a server > > > with curational write-back support may change without changing > the > > > version. > > > > > > > > > For a given version here are the sub-parts. Note that I've gone > ahead > > > and split the query urls (segment, features and types each have > query > > > interfaces) from the base directory used as containers for the > segments, > > > features and types. > > > > > > $VERSION/segments - the segments query URL; summarizes the > top-level > > > segments in the data source > > > > > > $VERSION/segment/$SEGMENT_ID - a segment query; used to get > detailed > > > information about the identified segment > > > > > > $VERSION/features - the feature filter query URL. Features are > > > locatable annotations or experimental results. The feature > filter > > > URL supports query parameters to select a subset of the features > > > based on position, feature type and other properties. > > > > > > $VERSION/feature/$FEATURE_ID - a feature query; used to get > detailed > > > information about the identified feature > > > > > > $VERSION/types - the types query URL which returns a list of all > > > feature types. Feature types include ontology and depiction > > > details for all features of the given type. > > > > > > $VERSION/type/$TYPE_ID - details about the specified feature type > > > > > > Oh, and there are internal conflicts which will be straightened > > > out in the next draft. These shouldn't be big. > > > > > > Andrew > > > dalke at dalkescientific.com > > > > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/das2 > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > From allenday at ucla.edu Wed Feb 1 18:30:22 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 1 Feb 2006 15:30:22 -0800 (PST) Subject: [DAS2] code sprint final infos In-Reply-To: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk> References: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk> Message-ID: What IM service are we using, and where can we collate all user IDs? Perhaps it would be better to meet up in an IRC channel. I propose gathering in #codesprint on EFnet. -Allen On Wed, 1 Feb 2006, Andreas Prlic wrote: > Hi! > > This is to provide final organisatorial infos about the DAS 2 code > sprint next week. > > - We start Monday 10:00 (Sanger time) in the Morgan building - > meeting point is the small meeting room next to the kitchen 1st floor > (we get a better room later). > > - The sanger guest wireless network supports Skype. so instant > messaging and voice over IP calls > will be possible during all the time. > > - every day at 17:00 (Sanger time = 9:00 pacific time) there will be a > conference call on the usual DAS2 line > > Greetings, > Andreas > > > > > ----------------------------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From nomi at fruitfly.org Wed Feb 1 19:37:44 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Wed, 1 Feb 2006 16:37:44 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: <17377.21592.854840.243376@kinked.lbl.gov> On 1 February 2006, Helt,Gregg wrote: > That would be great if you could update the biopackages server before > the code sprint starts! Then client implementers will have a server to > test with. yes!! On 1 February 2006, Allen Day wrote: > That's what I was thinking too, but I was worried about the existing > Genoviz clients "in the wild" having the server suddenly break. are there really a lot of users (as opposed to das developers) who are using the biopackages server? On 1 February 2006, Allen Day wrote: > What IM service are we using, and where can we collate all user IDs? > Perhaps it would be better to meet up in an IRC channel. > > I propose gathering in #codesprint on EFnet. i need details on this as well. i've never bothered registering for an IM service or IRC channel. Nomi From ed_erwin at affymetrix.com Wed Feb 1 18:44:35 2006 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Wed, 01 Feb 2006 15:44:35 -0800 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: <43E147E3.1030705@affymetrix.com> Gregg asked me to say "No". Please do not break the current server that IGB is using. Please make your changes on a server at a different URL. Thanks Ed Allen Day wrote: > That's what I was thinking too, but I was worried about the existing > Genoviz clients "in the wild" having the server suddenly break. > > So you're saying it's okay with you if those clients have a service > interruption? > > -Allen > > > On Wed, 1 Feb 2006, Helt,Gregg wrote: > > >>That would be great if you could update the biopackages server before >>the code sprint starts! Then client implementers will have a server to >>test with. >> >> thanks, >> gregg >> >> >>>-----Original Message----- >>>From: das2-bounces at portal.open-bio.org >> >>[mailto:das2-bounces at portal.open- >> >>>bio.org] On Behalf Of Allen Day >>>Sent: Wednesday, February 01, 2006 2:42 PM >>>To: Andrew Dalke >>>Cc: DAS/2 >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities >>> >>>I just looked over your changes, and will begin making the changes to >> >>the >> >>>server repository today. >>> >>>I'd like to update the server at das.biopackages.net with my changes >> >>on >> >>>Friday, unless there are objections. >>> >>>I'll be taking notes along the way and will post to the list if >> >>anything >> >>>in your document is unclear to me. >>> >>>At first glance, I agree -- the changes are minor. >>> >>>-Allen >>> >>> >>>On Mon, 30 Jan 2006, Andrew Dalke wrote: >>> >>> >>>>Allen: >>>> >>>>>Is the spec going to be in a stable state for the code sprint? >> >>I'd >> >>>>>like >>>>>to use this time to sync the server implementation with a stable >>>>>version >>>>>of the spec. It looks like there have been many substantial >> >>changes. >> >>>>I have just (within the last few minutes) completed the first draft >>>>of the update of the spec. >>>> >>>>It's not in HTML - that calls for too much work for this stage. >>>>It's text, in CVS under das/das2/new_spec.txt >>>> >>>>There are many parts which need clarification. These are marked >>>>with a "XXX" along with my comments. >>>> >>>>The RNC files are in >>>> >>>> das/das2/scratch/*.rnc >>>>along with some test XML files. These XML files are not meant >>>>to be realistic. They are meant more to check edge cases. >>>> >>>>I do no think there are major changes to the spec. Most of the >>>>changes have actually trimmed things down, like getting rid of >>>>the "properties" subtree and merging the different "sources" >> >>requests >> >>>>into a single document. >>>> >>>> >>>>Here are the major interfaces >>>> >>>>$PREFIX/sequence - a "sources" request >>>> This is the top-level entry point to a DAS 2 server. It returns >> >>a >> >>>> list of the available genomic sequence and their versions. >>>> [sequence-namespace] >>>> >>>>$PREFIX/sequence/$SOURCE - a "source" request >>>> Returns the available versions of the given genomic sequence. >>>> >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request >>>> Returns information about a given version of a genomic sequence. >>>> Clients may assume that the sequence and assembly are constant >> >>for a >> >>>> given version of a source. Note that annotation data on a server >>>> with curational write-back support may change without changing >> >>the >> >>>> version. >>>> >>>> >>>>For a given version here are the sub-parts. Note that I've gone >> >>ahead >> >>>>and split the query urls (segment, features and types each have >> >>query >> >>>>interfaces) from the base directory used as containers for the >> >>segments, >> >>>>features and types. >>>> >>>> $VERSION/segments - the segments query URL; summarizes the >> >>top-level >> >>>> segments in the data source >>>> >>>> $VERSION/segment/$SEGMENT_ID - a segment query; used to get >> >>detailed >> >>>> information about the identified segment >>>> >>>> $VERSION/features - the feature filter query URL. Features are >>>> locatable annotations or experimental results. The feature >> >>filter >> >>>> URL supports query parameters to select a subset of the features >>>> based on position, feature type and other properties. >>>> >>>> $VERSION/feature/$FEATURE_ID - a feature query; used to get >> >>detailed >> >>>> information about the identified feature >>>> >>>> $VERSION/types - the types query URL which returns a list of all >>>> feature types. Feature types include ontology and depiction >>>> details for all features of the given type. >>>> >>>> $VERSION/type/$TYPE_ID - details about the specified feature type >>>> >>>>Oh, and there are internal conflicts which will be straightened >>>>out in the next draft. These shouldn't be big. >>>> >>>> Andrew >>>> dalke at dalkescientific.com >>>> >>>>_______________________________________________ >>>>DAS2 mailing list >>>>DAS2 at portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/das2 >>>> >>> >>>_______________________________________________ >>>DAS2 mailing list >>>DAS2 at portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/das2 >> > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From Gregg_Helt at affymetrix.com Wed Feb 1 18:51:23 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 1 Feb 2006 15:51:23 -0800 Subject: [DAS2] Re: Apollo and DAS/2 priorities Message-ID: Yes, what Ed said, that's what I meant. Updated server, but at a different address. Otherwise the current release of IGB will break when trying to use the biopackages server. Once our IGB code has caught up to the updated server, we can roll out a new release to point to the new server instead of the old one. But not yet. Thanks, Gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Ed Erwin > Sent: Wednesday, February 01, 2006 3:45 PM > To: Allen Day > Cc: DAS/2 > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > Gregg asked me to say "No". Please do not break the current server that > IGB is using. > > Please make your changes on a server at a different URL. > > Thanks > Ed > > Allen Day wrote: > > That's what I was thinking too, but I was worried about the existing > > Genoviz clients "in the wild" having the server suddenly break. > > > > So you're saying it's okay with you if those clients have a service > > interruption? > > > > -Allen > > > > > > On Wed, 1 Feb 2006, Helt,Gregg wrote: > > > > > >>That would be great if you could update the biopackages server before > >>the code sprint starts! Then client implementers will have a server to > >>test with. > >> > >> thanks, > >> gregg > >> > >> > >>>-----Original Message----- > >>>From: das2-bounces at portal.open-bio.org > >> > >>[mailto:das2-bounces at portal.open- > >> > >>>bio.org] On Behalf Of Allen Day > >>>Sent: Wednesday, February 01, 2006 2:42 PM > >>>To: Andrew Dalke > >>>Cc: DAS/2 > >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > >>> > >>>I just looked over your changes, and will begin making the changes to > >> > >>the > >> > >>>server repository today. > >>> > >>>I'd like to update the server at das.biopackages.net with my changes > >> > >>on > >> > >>>Friday, unless there are objections. > >>> > >>>I'll be taking notes along the way and will post to the list if > >> > >>anything > >> > >>>in your document is unclear to me. > >>> > >>>At first glance, I agree -- the changes are minor. > >>> > >>>-Allen > >>> > >>> > >>>On Mon, 30 Jan 2006, Andrew Dalke wrote: > >>> > >>> > >>>>Allen: > >>>> > >>>>>Is the spec going to be in a stable state for the code sprint? > >> > >>I'd > >> > >>>>>like > >>>>>to use this time to sync the server implementation with a stable > >>>>>version > >>>>>of the spec. It looks like there have been many substantial > >> > >>changes. > >> > >>>>I have just (within the last few minutes) completed the first draft > >>>>of the update of the spec. > >>>> > >>>>It's not in HTML - that calls for too much work for this stage. > >>>>It's text, in CVS under das/das2/new_spec.txt > >>>> > >>>>There are many parts which need clarification. These are marked > >>>>with a "XXX" along with my comments. > >>>> > >>>>The RNC files are in > >>>> > >>>> das/das2/scratch/*.rnc > >>>>along with some test XML files. These XML files are not meant > >>>>to be realistic. They are meant more to check edge cases. > >>>> > >>>>I do no think there are major changes to the spec. Most of the > >>>>changes have actually trimmed things down, like getting rid of > >>>>the "properties" subtree and merging the different "sources" > >> > >>requests > >> > >>>>into a single document. > >>>> > >>>> > >>>>Here are the major interfaces > >>>> > >>>>$PREFIX/sequence - a "sources" request > >>>> This is the top-level entry point to a DAS 2 server. It returns > >> > >>a > >> > >>>> list of the available genomic sequence and their versions. > >>>> [sequence-namespace] > >>>> > >>>>$PREFIX/sequence/$SOURCE - a "source" request > >>>> Returns the available versions of the given genomic sequence. > >>>> > >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > >>>> Returns information about a given version of a genomic sequence. > >>>> Clients may assume that the sequence and assembly are constant > >> > >>for a > >> > >>>> given version of a source. Note that annotation data on a server > >>>> with curational write-back support may change without changing > >> > >>the > >> > >>>> version. > >>>> > >>>> > >>>>For a given version here are the sub-parts. Note that I've gone > >> > >>ahead > >> > >>>>and split the query urls (segment, features and types each have > >> > >>query > >> > >>>>interfaces) from the base directory used as containers for the > >> > >>segments, > >> > >>>>features and types. > >>>> > >>>> $VERSION/segments - the segments query URL; summarizes the > >> > >>top-level > >> > >>>> segments in the data source > >>>> > >>>> $VERSION/segment/$SEGMENT_ID - a segment query; used to get > >> > >>detailed > >> > >>>> information about the identified segment > >>>> > >>>> $VERSION/features - the feature filter query URL. Features are > >>>> locatable annotations or experimental results. The feature > >> > >>filter > >> > >>>> URL supports query parameters to select a subset of the features > >>>> based on position, feature type and other properties. > >>>> > >>>> $VERSION/feature/$FEATURE_ID - a feature query; used to get > >> > >>detailed > >> > >>>> information about the identified feature > >>>> > >>>> $VERSION/types - the types query URL which returns a list of all > >>>> feature types. Feature types include ontology and depiction > >>>> details for all features of the given type. > >>>> > >>>> $VERSION/type/$TYPE_ID - details about the specified feature type > >>>> > >>>>Oh, and there are internal conflicts which will be straightened > >>>>out in the next draft. These shouldn't be big. > >>>> > >>>> Andrew > >>>> dalke at dalkescientific.com > >>>> > >>>>_______________________________________________ > >>>>DAS2 mailing list > >>>>DAS2 at portal.open-bio.org > >>>>http://portal.open-bio.org/mailman/listinfo/das2 > >>>> > >>> > >>>_______________________________________________ > >>>DAS2 mailing list > >>>DAS2 at portal.open-bio.org > >>>http://portal.open-bio.org/mailman/listinfo/das2 > >> > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From allenday at ucla.edu Wed Feb 1 19:07:54 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 1 Feb 2006 16:07:54 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: Okay, I will tag the current server and leave it at: http://das.biopackages.net/das I saw in the most recent commits by Andrew that the root-level "/das" is no longer needed, so I propose putting an updated server at: http://das.biopackages.net/codesprint If we're going to keep the current server in a "maintained but deprecated" mode like this, I'll be making changes to the "new" server before Friday. When the new version of IGB comes out we can then upgrade the current server. Sound good? -Allen On Wed, 1 Feb 2006, Helt,Gregg wrote: > Yes, what Ed said, that's what I meant. Updated server, but at a > different address. Otherwise the current release of IGB will break when > trying to use the biopackages server. > > Once our IGB code has caught up to the updated server, we can roll out a > new release to point to the new server instead of the old one. But not > yet. > > Thanks, > Gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > [mailto:das2-bounces at portal.open- > > bio.org] On Behalf Of Ed Erwin > > Sent: Wednesday, February 01, 2006 3:45 PM > > To: Allen Day > > Cc: DAS/2 > > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > > > > Gregg asked me to say "No". Please do not break the current server > that > > IGB is using. > > > > Please make your changes on a server at a different URL. > > > > Thanks > > Ed > > > > Allen Day wrote: > > > That's what I was thinking too, but I was worried about the existing > > > Genoviz clients "in the wild" having the server suddenly break. > > > > > > So you're saying it's okay with you if those clients have a service > > > interruption? > > > > > > -Allen > > > > > > > > > On Wed, 1 Feb 2006, Helt,Gregg wrote: > > > > > > > > >>That would be great if you could update the biopackages server > before > > >>the code sprint starts! Then client implementers will have a server > to > > >>test with. > > >> > > >> thanks, > > >> gregg > > >> > > >> > > >>>-----Original Message----- > > >>>From: das2-bounces at portal.open-bio.org > > >> > > >>[mailto:das2-bounces at portal.open- > > >> > > >>>bio.org] On Behalf Of Allen Day > > >>>Sent: Wednesday, February 01, 2006 2:42 PM > > >>>To: Andrew Dalke > > >>>Cc: DAS/2 > > >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > >>> > > >>>I just looked over your changes, and will begin making the changes > to > > >> > > >>the > > >> > > >>>server repository today. > > >>> > > >>>I'd like to update the server at das.biopackages.net with my > changes > > >> > > >>on > > >> > > >>>Friday, unless there are objections. > > >>> > > >>>I'll be taking notes along the way and will post to the list if > > >> > > >>anything > > >> > > >>>in your document is unclear to me. > > >>> > > >>>At first glance, I agree -- the changes are minor. > > >>> > > >>>-Allen > > >>> > > >>> > > >>>On Mon, 30 Jan 2006, Andrew Dalke wrote: > > >>> > > >>> > > >>>>Allen: > > >>>> > > >>>>>Is the spec going to be in a stable state for the code sprint? > > >> > > >>I'd > > >> > > >>>>>like > > >>>>>to use this time to sync the server implementation with a stable > > >>>>>version > > >>>>>of the spec. It looks like there have been many substantial > > >> > > >>changes. > > >> > > >>>>I have just (within the last few minutes) completed the first > draft > > >>>>of the update of the spec. > > >>>> > > >>>>It's not in HTML - that calls for too much work for this stage. > > >>>>It's text, in CVS under das/das2/new_spec.txt > > >>>> > > >>>>There are many parts which need clarification. These are marked > > >>>>with a "XXX" along with my comments. > > >>>> > > >>>>The RNC files are in > > >>>> > > >>>> das/das2/scratch/*.rnc > > >>>>along with some test XML files. These XML files are not meant > > >>>>to be realistic. They are meant more to check edge cases. > > >>>> > > >>>>I do no think there are major changes to the spec. Most of the > > >>>>changes have actually trimmed things down, like getting rid of > > >>>>the "properties" subtree and merging the different "sources" > > >> > > >>requests > > >> > > >>>>into a single document. > > >>>> > > >>>> > > >>>>Here are the major interfaces > > >>>> > > >>>>$PREFIX/sequence - a "sources" request > > >>>> This is the top-level entry point to a DAS 2 server. It > returns > > >> > > >>a > > >> > > >>>> list of the available genomic sequence and their versions. > > >>>> [sequence-namespace] > > >>>> > > >>>>$PREFIX/sequence/$SOURCE - a "source" request > > >>>> Returns the available versions of the given genomic sequence. > > >>>> > > >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > > >>>> Returns information about a given version of a genomic > sequence. > > >>>> Clients may assume that the sequence and assembly are constant > > >> > > >>for a > > >> > > >>>> given version of a source. Note that annotation data on a > server > > >>>> with curational write-back support may change without changing > > >> > > >>the > > >> > > >>>> version. > > >>>> > > >>>> > > >>>>For a given version here are the sub-parts. Note that I've gone > > >> > > >>ahead > > >> > > >>>>and split the query urls (segment, features and types each have > > >> > > >>query > > >> > > >>>>interfaces) from the base directory used as containers for the > > >> > > >>segments, > > >> > > >>>>features and types. > > >>>> > > >>>> $VERSION/segments - the segments query URL; summarizes the > > >> > > >>top-level > > >> > > >>>> segments in the data source > > >>>> > > >>>> $VERSION/segment/$SEGMENT_ID - a segment query; used to get > > >> > > >>detailed > > >> > > >>>> information about the identified segment > > >>>> > > >>>> $VERSION/features - the feature filter query URL. Features are > > >>>> locatable annotations or experimental results. The feature > > >> > > >>filter > > >> > > >>>> URL supports query parameters to select a subset of the > features > > >>>> based on position, feature type and other properties. > > >>>> > > >>>> $VERSION/feature/$FEATURE_ID - a feature query; used to get > > >> > > >>detailed > > >> > > >>>> information about the identified feature > > >>>> > > >>>> $VERSION/types - the types query URL which returns a list of all > > >>>> feature types. Feature types include ontology and depiction > > >>>> details for all features of the given type. > > >>>> > > >>>> $VERSION/type/$TYPE_ID - details about the specified feature > type > > >>>> > > >>>>Oh, and there are internal conflicts which will be straightened > > >>>>out in the next draft. These shouldn't be big. > > >>>> > > >>>> Andrew > > >>>> dalke at dalkescientific.com > > >>>> > > >>>>_______________________________________________ > > >>>>DAS2 mailing list > > >>>>DAS2 at portal.open-bio.org > > >>>>http://portal.open-bio.org/mailman/listinfo/das2 > > >>>> > > >>> > > >>>_______________________________________________ > > >>>DAS2 mailing list > > >>>DAS2 at portal.open-bio.org > > >>>http://portal.open-bio.org/mailman/listinfo/das2 > > >> > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/das2 > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > From Gregg_Helt at affymetrix.com Wed Feb 1 20:03:47 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 1 Feb 2006 17:03:47 -0800 Subject: [DAS2] Alternative feature formats in current DAS/2 spec Message-ID: When discussing alternative feature formats, the spec reads: The feature query URL supports the optional "format" parameter used to request that the results be returns in an alternative format. The format names are listed in the versioned source document in the element of the "feature" . I think the second sentence should instead read something like: The possible format names for a particular feature type are listed in the types document in the elements for a given type. Also, the spec says: Some of search results may not be expressible in the specified format. The server should silently skip those feature records and return only those records which can be converted. I would argue that if any of the search results cannot be returned in the specified format, then the server should really just return an error. Silently suppressing information is not good. A generic 400-"Bad Request" would work, although a 415-"Unsupported Media Type" might be more appropriate. gregg From allenday at ucla.edu Wed Feb 1 20:16:04 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 1 Feb 2006 17:16:04 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: There are still many references to "region" in Andrew's .txt document. Is it safe to assume that anywhere "region" is mentioned, it should really be "segment" now? I believe the answer is yes. I'm asking to see if I need to change the feature filter implementation. -Allen On Wed, 1 Feb 2006, Helt,Gregg wrote: > > That would be great if you could update the biopackages server before > the code sprint starts! Then client implementers will have a server to > test with. > > thanks, > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > [mailto:das2-bounces at portal.open- > > bio.org] On Behalf Of Allen Day > > Sent: Wednesday, February 01, 2006 2:42 PM > > To: Andrew Dalke > > Cc: DAS/2 > > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > > I just looked over your changes, and will begin making the changes to > the > > server repository today. > > > > I'd like to update the server at das.biopackages.net with my changes > on > > Friday, unless there are objections. > > > > I'll be taking notes along the way and will post to the list if > anything > > in your document is unclear to me. > > > > At first glance, I agree -- the changes are minor. > > > > -Allen > > > > > > On Mon, 30 Jan 2006, Andrew Dalke wrote: > > > > > Allen: > > > > Is the spec going to be in a stable state for the code sprint? > I'd > > > > like > > > > to use this time to sync the server implementation with a stable > > > > version > > > > of the spec. It looks like there have been many substantial > changes. > > > > > > I have just (within the last few minutes) completed the first draft > > > of the update of the spec. > > > > > > It's not in HTML - that calls for too much work for this stage. > > > It's text, in CVS under das/das2/new_spec.txt > > > > > > There are many parts which need clarification. These are marked > > > with a "XXX" along with my comments. > > > > > > The RNC files are in > > > > > > das/das2/scratch/*.rnc > > > along with some test XML files. These XML files are not meant > > > to be realistic. They are meant more to check edge cases. > > > > > > I do no think there are major changes to the spec. Most of the > > > changes have actually trimmed things down, like getting rid of > > > the "properties" subtree and merging the different "sources" > requests > > > into a single document. > > > > > > > > > Here are the major interfaces > > > > > > $PREFIX/sequence - a "sources" request > > > This is the top-level entry point to a DAS 2 server. It returns > a > > > list of the available genomic sequence and their versions. > > > [sequence-namespace] > > > > > > $PREFIX/sequence/$SOURCE - a "source" request > > > Returns the available versions of the given genomic sequence. > > > > > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > > > Returns information about a given version of a genomic sequence. > > > Clients may assume that the sequence and assembly are constant > for a > > > given version of a source. Note that annotation data on a server > > > with curational write-back support may change without changing > the > > > version. > > > > > > > > > For a given version here are the sub-parts. Note that I've gone > ahead > > > and split the query urls (segment, features and types each have > query > > > interfaces) from the base directory used as containers for the > segments, > > > features and types. > > > > > > $VERSION/segments - the segments query URL; summarizes the > top-level > > > segments in the data source > > > > > > $VERSION/segment/$SEGMENT_ID - a segment query; used to get > detailed > > > information about the identified segment > > > > > > $VERSION/features - the feature filter query URL. Features are > > > locatable annotations or experimental results. The feature > filter > > > URL supports query parameters to select a subset of the features > > > based on position, feature type and other properties. > > > > > > $VERSION/feature/$FEATURE_ID - a feature query; used to get > detailed > > > information about the identified feature > > > > > > $VERSION/types - the types query URL which returns a list of all > > > feature types. Feature types include ontology and depiction > > > details for all features of the given type. > > > > > > $VERSION/type/$TYPE_ID - details about the specified feature type > > > > > > Oh, and there are internal conflicts which will be straightened > > > out in the next draft. These shouldn't be big. > > > > > > Andrew > > > dalke at dalkescientific.com > > > > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/das2 > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > From allenday at ucla.edu Sat Feb 4 05:43:10 2006 From: allenday at ucla.edu (Allen Day) Date: Sat, 4 Feb 2006 02:43:10 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: There is a database server down, which is why I haven't posted the new code to /codesprint yet. Hopefully it will be back online tomorrow. However, on my dev box I was able to make the server code serve up almost all of what is described in Andrew's new_spec.txt file. The large remaining problems are: * Properties ( elements ). I still don't fully understand how these work, if the previous implementation continues to be valid, or if the implementation has been invalidated by the new document. * Alternate default Content-Type header for the same command, e.g. /sequence/.../segment # Content-Type: application/x-das-blah+xml /sequence/.../segment/chrM # Content-Type: text/x-fasta This is an artifact of an earlier design decision assumed Content-Type had a single default and would only be modified if a ?format= parameter was passed. This is difficult to fix properly, so right now the fasta is served up under the XML Content-Type. -Allen On Wed, 1 Feb 2006, Allen Day wrote: > Okay, I will tag the current server and leave it at: > > http://das.biopackages.net/das > > I saw in the most recent commits by Andrew that the root-level "/das" is > no longer needed, so I propose putting an updated server at: > > http://das.biopackages.net/codesprint > > If we're going to keep the current server in a "maintained but deprecated" > mode like this, I'll be making changes to the "new" server before Friday. > > When the new version of IGB comes out we can then upgrade the current > server. > > Sound good? > > -Allen > > > On Wed, 1 Feb 2006, Helt,Gregg wrote: > > > Yes, what Ed said, that's what I meant. Updated server, but at a > > different address. Otherwise the current release of IGB will break when > > trying to use the biopackages server. > > > > Once our IGB code has caught up to the updated server, we can roll out a > > new release to point to the new server instead of the old one. But not > > yet. > > > > Thanks, > > Gregg > > > > > -----Original Message----- > > > From: das2-bounces at portal.open-bio.org > > [mailto:das2-bounces at portal.open- > > > bio.org] On Behalf Of Ed Erwin > > > Sent: Wednesday, February 01, 2006 3:45 PM > > > To: Allen Day > > > Cc: DAS/2 > > > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > > > > > > > Gregg asked me to say "No". Please do not break the current server > > that > > > IGB is using. > > > > > > Please make your changes on a server at a different URL. > > > > > > Thanks > > > Ed > > > > > > Allen Day wrote: > > > > That's what I was thinking too, but I was worried about the existing > > > > Genoviz clients "in the wild" having the server suddenly break. > > > > > > > > So you're saying it's okay with you if those clients have a service > > > > interruption? > > > > > > > > -Allen > > > > > > > > > > > > On Wed, 1 Feb 2006, Helt,Gregg wrote: > > > > > > > > > > > >>That would be great if you could update the biopackages server > > before > > > >>the code sprint starts! Then client implementers will have a server > > to > > > >>test with. > > > >> > > > >> thanks, > > > >> gregg > > > >> > > > >> > > > >>>-----Original Message----- > > > >>>From: das2-bounces at portal.open-bio.org > > > >> > > > >>[mailto:das2-bounces at portal.open- > > > >> > > > >>>bio.org] On Behalf Of Allen Day > > > >>>Sent: Wednesday, February 01, 2006 2:42 PM > > > >>>To: Andrew Dalke > > > >>>Cc: DAS/2 > > > >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > >>> > > > >>>I just looked over your changes, and will begin making the changes > > to > > > >> > > > >>the > > > >> > > > >>>server repository today. > > > >>> > > > >>>I'd like to update the server at das.biopackages.net with my > > changes > > > >> > > > >>on > > > >> > > > >>>Friday, unless there are objections. > > > >>> > > > >>>I'll be taking notes along the way and will post to the list if > > > >> > > > >>anything > > > >> > > > >>>in your document is unclear to me. > > > >>> > > > >>>At first glance, I agree -- the changes are minor. > > > >>> > > > >>>-Allen > > > >>> > > > >>> > > > >>>On Mon, 30 Jan 2006, Andrew Dalke wrote: > > > >>> > > > >>> > > > >>>>Allen: > > > >>>> > > > >>>>>Is the spec going to be in a stable state for the code sprint? > > > >> > > > >>I'd > > > >> > > > >>>>>like > > > >>>>>to use this time to sync the server implementation with a stable > > > >>>>>version > > > >>>>>of the spec. It looks like there have been many substantial > > > >> > > > >>changes. > > > >> > > > >>>>I have just (within the last few minutes) completed the first > > draft > > > >>>>of the update of the spec. > > > >>>> > > > >>>>It's not in HTML - that calls for too much work for this stage. > > > >>>>It's text, in CVS under das/das2/new_spec.txt > > > >>>> > > > >>>>There are many parts which need clarification. These are marked > > > >>>>with a "XXX" along with my comments. > > > >>>> > > > >>>>The RNC files are in > > > >>>> > > > >>>> das/das2/scratch/*.rnc > > > >>>>along with some test XML files. These XML files are not meant > > > >>>>to be realistic. They are meant more to check edge cases. > > > >>>> > > > >>>>I do no think there are major changes to the spec. Most of the > > > >>>>changes have actually trimmed things down, like getting rid of > > > >>>>the "properties" subtree and merging the different "sources" > > > >> > > > >>requests > > > >> > > > >>>>into a single document. > > > >>>> > > > >>>> > > > >>>>Here are the major interfaces > > > >>>> > > > >>>>$PREFIX/sequence - a "sources" request > > > >>>> This is the top-level entry point to a DAS 2 server. It > > returns > > > >> > > > >>a > > > >> > > > >>>> list of the available genomic sequence and their versions. > > > >>>> [sequence-namespace] > > > >>>> > > > >>>>$PREFIX/sequence/$SOURCE - a "source" request > > > >>>> Returns the available versions of the given genomic sequence. > > > >>>> > > > >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > > > >>>> Returns information about a given version of a genomic > > sequence. > > > >>>> Clients may assume that the sequence and assembly are constant > > > >> > > > >>for a > > > >> > > > >>>> given version of a source. Note that annotation data on a > > server > > > >>>> with curational write-back support may change without changing > > > >> > > > >>the > > > >> > > > >>>> version. > > > >>>> > > > >>>> > > > >>>>For a given version here are the sub-parts. Note that I've gone > > > >> > > > >>ahead > > > >> > > > >>>>and split the query urls (segment, features and types each have > > > >> > > > >>query > > > >> > > > >>>>interfaces) from the base directory used as containers for the > > > >> > > > >>segments, > > > >> > > > >>>>features and types. > > > >>>> > > > >>>> $VERSION/segments - the segments query URL; summarizes the > > > >> > > > >>top-level > > > >> > > > >>>> segments in the data source > > > >>>> > > > >>>> $VERSION/segment/$SEGMENT_ID - a segment query; used to get > > > >> > > > >>detailed > > > >> > > > >>>> information about the identified segment > > > >>>> > > > >>>> $VERSION/features - the feature filter query URL. Features are > > > >>>> locatable annotations or experimental results. The feature > > > >> > > > >>filter > > > >> > > > >>>> URL supports query parameters to select a subset of the > > features > > > >>>> based on position, feature type and other properties. > > > >>>> > > > >>>> $VERSION/feature/$FEATURE_ID - a feature query; used to get > > > >> > > > >>detailed > > > >> > > > >>>> information about the identified feature > > > >>>> > > > >>>> $VERSION/types - the types query URL which returns a list of all > > > >>>> feature types. Feature types include ontology and depiction > > > >>>> details for all features of the given type. > > > >>>> > > > >>>> $VERSION/type/$TYPE_ID - details about the specified feature > > type > > > >>>> > > > >>>>Oh, and there are internal conflicts which will be straightened > > > >>>>out in the next draft. These shouldn't be big. > > > >>>> > > > >>>> Andrew > > > >>>> dalke at dalkescientific.com > > > >>>> > > > >>>>_______________________________________________ > > > >>>>DAS2 mailing list > > > >>>>DAS2 at portal.open-bio.org > > > >>>>http://portal.open-bio.org/mailman/listinfo/das2 > > > >>>> > > > >>> > > > >>>_______________________________________________ > > > >>>DAS2 mailing list > > > >>>DAS2 at portal.open-bio.org > > > >>>http://portal.open-bio.org/mailman/listinfo/das2 > > > >> > > > > _______________________________________________ > > > > DAS2 mailing list > > > > DAS2 at portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/das2 > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/das2 > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From allenday at ucla.edu Mon Feb 6 02:13:59 2006 From: allenday at ucla.edu (Allen Day) Date: Sun, 5 Feb 2006 23:13:59 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: Okay folks, an implementation of the document cited below is available here: http://das.biopackages.net/codesprint http://das.biopackages.net/codesprint/sequence http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment etc. After looking closely over this first draft of new_spec.txt, it's apparent that there are still some holes, e.g. what should the response to the following requests look like? http://das.biopackages.net/codesprint/sequence/yeast http://das.biopackages.net/codesprint/sequence/yeast/S228C For now I have left responses the same as in the old HTML version of the spec. Of course if you find bugs, let me know. The server at: http://das.biopackages.net/das is currently unavailable. This is due to limitations in Apache/mod_perl that won't allow different versions of the same class to coexist in a family of processes. I'd like to discuss how we should handle this in the conference call tomrorow (today, if you're not in GMT+8). -Allen On Mon, 30 Jan 2006, Andrew Dalke wrote: > Allen: > > Is the spec going to be in a stable state for the code sprint? I'd > > like > > to use this time to sync the server implementation with a stable > > version > > of the spec. It looks like there have been many substantial changes. > > I have just (within the last few minutes) completed the first draft > of the update of the spec. > > It's not in HTML - that calls for too much work for this stage. > It's text, in CVS under das/das2/new_spec.txt > > There are many parts which need clarification. These are marked > with a "XXX" along with my comments. > > The RNC files are in > > das/das2/scratch/*.rnc > along with some test XML files. These XML files are not meant > to be realistic. They are meant more to check edge cases. > > I do no think there are major changes to the spec. Most of the > changes have actually trimmed things down, like getting rid of > the "properties" subtree and merging the different "sources" requests > into a single document. > > > Here are the major interfaces > > $PREFIX/sequence - a "sources" request > This is the top-level entry point to a DAS 2 server. It returns a > list of the available genomic sequence and their versions. > [sequence-namespace] > > $PREFIX/sequence/$SOURCE - a "source" request > Returns the available versions of the given genomic sequence. > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > Returns information about a given version of a genomic sequence. > Clients may assume that the sequence and assembly are constant for a > given version of a source. Note that annotation data on a server > with curational write-back support may change without changing the > version. > > > For a given version here are the sub-parts. Note that I've gone ahead > and split the query urls (segment, features and types each have query > interfaces) from the base directory used as containers for the segments, > features and types. > > $VERSION/segments - the segments query URL; summarizes the top-level > segments in the data source > > $VERSION/segment/$SEGMENT_ID - a segment query; used to get detailed > information about the identified segment > > $VERSION/features - the feature filter query URL. Features are > locatable annotations or experimental results. The feature filter > URL supports query parameters to select a subset of the features > based on position, feature type and other properties. > > $VERSION/feature/$FEATURE_ID - a feature query; used to get detailed > information about the identified feature > > $VERSION/types - the types query URL which returns a list of all > feature types. Feature types include ontology and depiction > details for all features of the given type. > > $VERSION/type/$TYPE_ID - details about the specified feature type > > Oh, and there are internal conflicts which will be straightened > out in the next draft. These shouldn't be big. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From dalke at dalkescientific.com Mon Feb 6 06:33:34 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 11:33:34 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: Allen: > After looking closely over this first draft of new_spec.txt, it's > apparent > that there are still some holes, e.g. what should the response to the > following requests look like? > > http://das.biopackages.net/codesprint/sequence/yeast taxon="Yeast"> > http://das.biopackages.net/codesprint/sequence/yeast/S228C The same for this case. There is only on VERSION for "yeast". Your XML, btw, starts The "standalone" means that the DTD may affect the content of the documentation. http://www.stylusstudio.com/w3c/xml11/sec-rmd.htm > Markup declarations can affect the content of the document, as passed > from an XML Processor to an application; examples are attribute > defaults and entity declarations. The standalone document declaration, > which MAY appear as a component of the XML declaration, signals > whether or not there are such declarations which appear external to > the Document Entity or in parameter entities. An external markup > declaration is defined as a markup declaration occurring in the > external subset or in a parameter entity (external or internal, the > latter being included because non-validating processors are not > required to read them). For what we're doing, we don't need nor (I think) want that. There's no reason for a client to consult the DTD to figure out the XML. Instead, use and probably have the encoding That also means you can get rid of the statements. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Feb 6 07:02:40 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 12:02:40 +0000 Subject: [DAS2] timezone change Message-ID: <6c3ddd6d7dc01dc99f2e1e932e64e733@dalkescientific.com> To make it easier for Thomas' Java library, the timezone in the datestamps may also be of the form "0500". Here are the valid forms and new examples TZD = time zone designator (optional; one of the formats "Z", +hh:mm, +hhmm, -hh:mm, or -hhmm) 1959-21-52T09:35+0300 2042-03-18T01:19:00-11:15 Andrew dalke at dalkescientific.com From dhoworth at mrc-lmb.cam.ac.uk Mon Feb 6 07:12:52 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Mon, 06 Feb 2006 12:12:52 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: <43E73D44.5020107@mrc-lmb.cam.ac.uk> Andrew Dalke wrote: > That also means you can get rid of the > > Doing that automatically invalidates the document does it not? http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog "Definition: An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it. The document type declaration MUST appear before the first element in the document." Cheers, Dave From dalke at dalkescientific.com Mon Feb 6 08:42:03 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 13:42:03 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: <43E73D44.5020107@mrc-lmb.cam.ac.uk> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <43E73D44.5020107@mrc-lmb.cam.ac.uk> Message-ID: <57bc93f8acc736e752d048d970b1f332@dalkescientific.com> Dave Howorth: > Doing that automatically invalidates the document does it not? > > http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog > > "Definition: An XML document is valid if it has an associated document > type declaration and if the document complies with the constraints > expressed in it. > > The document type declaration MUST appear before the first element in > the document." I think this page summarizes it nicely: http://www.xml.com/lpt/a/2002/09/04/xslt.html "Valid" is a technical term referring to the presence of and conformance to a DOCTYPE declaration. XML documents w/o a DTD are "well-formed". XML documents with a DTD and which agree with the DTD are "valid". In this case not being "valid" does not mean that the document is "invalid XML". As I understand things, it's perfectly fine to pass well-formed but not valid XML documents around. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Feb 6 08:53:10 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 13:53:10 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: <3a3400e925dccf8583a5b47104e43766@dalkescientific.com> Trying out Allen's XML > > > > > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/"> > The xmlns is needed, else "SOURCES" is in the unnamed namespace, rather than the DAS2 namespace. It looks like your XSLT might not declare the namespace? I can't find the document to check, at either of http://das.biopackages.net/xsl/das.xsl http://radius.genomics.ctrl.ucla.edu/xsl/das.xsl The page at http://www.xml.com/pub/a/2001/04/04/trxml/ describes a bit on how to include namespace in your xslt > > > xmlns:xlink="http://www.w3.org/1999/xlink" > version="1.0"> > > > > > > > > > > > > Note the use of the "xlink:" namespace abbreviation. Andrew dalke at dalkescientific.com From dhoworth at mrc-lmb.cam.ac.uk Mon Feb 6 09:27:34 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Mon, 06 Feb 2006 14:27:34 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: <57bc93f8acc736e752d048d970b1f332@dalkescientific.com> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <43E73D44.5020107@mrc-lmb.cam.ac.uk> <57bc93f8acc736e752d048d970b1f332@dalkescientific.com> Message-ID: <43E75CD6.7000909@mrc-lmb.cam.ac.uk> Andrew Dalke wrote: > Dave Howorth: >> Doing that automatically invalidates the document does it not? >> >> http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog >> >> "Definition: An XML document is valid if it has an associated document >> type declaration and if the document complies with the constraints >> expressed in it. >> >> The document type declaration MUST appear before the first element in >> the document." > > I think this page summarizes it nicely: > http://www.xml.com/lpt/a/2002/09/04/xslt.html > > "Valid" is a technical term referring to the presence > of and conformance to a DOCTYPE declaration. I think that's a paraphrase of the first para I quoted above? > XML documents w/o a DTD are "well-formed". XML documents > with a DTD and which agree with the DTD are "valid". > > In this case not being "valid" does not mean that the > document is "invalid XML". No, I believe you're wrong there; 'not valid' and 'invalid' have the same meaning both colloquially and as used in the spec. It's either valid or it isn't, and if it isn't then its invalid. > As I understand things, it's perfectly fine to pass well-formed > but not valid XML documents around. I don't agree. There are occasions when it is acceptable but it's generally bad practice, IMHO. The discussion in sec 5 of the spec gives some motivation, particularly this section: http://www.w3.org/TR/REC-xml/#safe-behavior Or look here, or thousands of other places: http://www.online-learning.com/demos/xml/valid_xml.html http://en.wikipedia.org/wiki/Xml#Correctness_in_an_XML_document In particular for interoperability of an open, distributed system with many writers and readers implemented by different groups (i.e. DAS), I suggest validity is essential. I would have expected your experience of the PDB to make you keen on validation :) Cheers, Dave From dalke at dalkescientific.com Mon Feb 6 10:09:58 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 15:09:58 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: <43E75CD6.7000909@mrc-lmb.cam.ac.uk> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <43E73D44.5020107@mrc-lmb.cam.ac.uk> <57bc93f8acc736e752d048d970b1f332@dalkescientific.com> <43E75CD6.7000909@mrc-lmb.cam.ac.uk> Message-ID: <0aeda19421fdc7c75e2440ad0acd6391@dalkescientific.com> Dave Howorth wrote: > Andrew Dalke wrote: >> I think this page summarizes it nicely: >> http://www.xml.com/lpt/a/2002/09/04/xslt.html >> "Valid" is a technical term referring to the presence >> of and conformance to a DOCTYPE declaration. > > I think that's a paraphrase of the first para I quoted above? It adds the phrase "technical term", making it (in my interpretation) different from the word "valid" in its normal sense. > No, I believe you're wrong there; 'not valid' and 'invalid' have the > same meaning both colloquially and as used in the spec. It's either > valid or it isn't, and if it isn't then its invalid. I now agree that in the spec sense "invalid" and "not valid" are the same. I still think it has a technical difference from its normal use. See for example the thread at http://www.stylusstudio.com/xmldev/200411/post50310.html part of which says > >But does it matter if a document is Not valid? > > Not necessarily. It's up to you. Requiring a document to be valid is > a way of putting some constraints on it. If you don't have any such > constraints (unlikely, unless you are writing some very generic > software like an editor), then there's no need for validity. More > likely, not all your constraints can be expressed by a DTD, and you > will need to express them some other way. > > And of course you can require the document to be valid according to > some other kind of schema, such as XML schemas or RelaxNG or > Schematron. >> As I understand things, it's perfectly fine to pass well-formed >> but not valid XML documents around. > > I don't agree. There are occasions when it is acceptable but it's > generally bad practice, IMHO. The discussion in sec 5 of the spec > gives some motivation, particularly this section: > > http://www.w3.org/TR/REC-xml/#safe-behavior > > Or look here, or thousands of other places: > http://www.online-learning.com/demos/xml/valid_xml.html > http://en.wikipedia.org/wiki/Xml#Correctness_in_an_XML_document > > In particular for interoperability of an open, distributed system with > many writers and readers implemented by different groups (i.e. DAS), I > suggest validity is essential. Quoting the wikipedia reference to DTDs: > The oldest schema format for XML is the Document Type Definition > (DTD), inherited from SGML. While DTD support is ubiquitous due to its > inclusion in the XML 1.0 standard, it is seen as limited for the > following reasons: > * It has no support for newer features of XML, most importantly > namespaces. DAS2 uses namespaces. Hence it cannot use DTDs. We are defining Relax-NG schemas for the different formats, which can be used for better validity checking than is supported by DTDs. "valid DAS2 document" ::= "meets the DAS2 spec" "meets the DAS2 spec" is a stricter definition than "well-formed XML" + "meets the RNG spec" which is stricter than "well-formed XML" + "meets the (hypthetical namespace-aware) DTD" > I would have expected your experience of the PDB to make you keen > on validation :) Indeed, I'm working on the validator for DAS2, which uses the Relax-NG schemas. ;) Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Feb 6 11:03:07 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 16:03:07 +0000 Subject: [DAS2] elements Message-ID: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com> One discussion point from today is the elements. The current draft of the spec says they look like this Andreas Prlic pointed out that since the document says the "volvox" version "1" url is already known ("volvox/1") and the type="segments" then the query_id can be built from appending "segments" to the "volvox/1" (plus the "/") to get "volvox/1/segments". I originally responded from a ReST purity argument, in that URLs should not be constructed from non-URL data. This lets Thomas, for example, use GUIDs for the objects rather than the hierarchical structure I and others recommend. During discussion a better answer came up, which I think we talked about earlier but which is worth emphasizing is that the "query_id"s don't need to be on the same server. For example, the "regions" URL may and likely will point to a common reference server, and a database may offer only one set of "types" for all of the "features". That is, something like this DAS server example.com genome A version x segments at "ensembl.org/das2/genome_A/build_1/segments" features at "example.com/A/version_x/features" types at "example.com/A/types" version y segments at "ensembl.org/das2/genome_A/build_1/segments" features at "example.com/A/version_y/features" types at "example.com/A/types" version z segments at "ensembl.org/das2/genome_A/build_2/segments" features at "example.com/A/version_z/features" types at "example.com/A/types" DAS server biodas.org genome A version 1 segments at "ensembl.org/das2/genome_A/build_2/segments" features at "example.com/A/1/features" types at "example.com/A/types" (note: on other server!) Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Mon Feb 6 12:13:18 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 09:13:18 -0800 Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06 Message-ID: Status report DAS/2 XML - valid or not valid? CATEGORY elements -- constructing query URLs MAINTAINER information Use of xml:base update on feature properties - searching, etc. From lstein at cshl.edu Mon Feb 6 13:20:10 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 6 Feb 2006 13:20:10 -0500 Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06 In-Reply-To: References: Message-ID: <200602061320.11360.lstein@cshl.edu> Hi Gregg, I had a conflicting teleconference and wasn't sure whether there was a teleconference scheduled for the code sprint, so I didn't dial in. Just got the agenda now. I am online on both MSN and AOL chats, and will be all week, if anyone wants to IM me. Lincoln On Monday 06 February 2006 12:13, Helt,Gregg wrote: > Status report > DAS/2 XML - valid or not valid? > CATEGORY elements -- constructing query URLs > MAINTAINER information > Use of xml:base > update on feature properties - searching, etc. > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Mon Feb 6 13:42:24 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 18:42:24 +0000 Subject: [DAS2] version= Message-ID: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com> If we add a version= field to the Content-Type, or whatever mechanism is proposed Content-Type: application/x-das2features+xml; version=12345 What will a client do when it gets a version number it has never heard of? Should it use the newest version it supports? The oldest? Abort? Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Mon Feb 6 14:50:14 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 06 Feb 2006 11:50:14 -0800 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 6 Feb 2006 Message-ID: Notes from the DAS/2 teleconference for the code sprint, 6 Feb 2006 $Id: das2-teleconf-2006-02-06.txt,v 1.2 2006/02/06 19:57:05 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt Sanger: Andreas Prlic, Thomas Down, Roy Sweden: Andrew Dalke UC Berkeley: Nomi Harris UCLA: Allen Day Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Gregg's topics for discussion: * Status report * DAS/2 XML - valid or not valid? * CATEGORY elements -- constructing query URLs * MAINTAINER information * Use of xml:base * update on feature properties - searching, etc. Status Reports - what people are working on for the code sprint ------------------------------------------------------------ andrew - getting folks up to speed on the spec changes, what he wrote. - getting a feel for ensembl schema. - change today: time zone specification b/c td's java time lib did something different than iso did. aday: tag & branch? gh: no branch, maybe tag ad: tagging probably not necessary gh: brings up a related issue: what is our mechanism for versioning - client & spec to understand which version of the spec they are/should be implementing - can talk about it later during the xml validation issue discussion ap: [missed it -- sorry!] td: java om, feature xml done, can read and write. roy: zmap das2 client, read/write das2, written in C. working with ed griffith who's not available this week. currently just a reader. from james gilbert, based on fmap from Acedb gh: updating client and server (mostly client). top down syncing in parallel, one command at a time. sources request is working on both sides. will start w/ allen's server today, doing gh's sources query against allen's server. segments and types today. nh: apollo das2 client. reads das2 xml from andrew's example, write out features in das2, now working on get, testing with server. sc: affy das2 server stuff. streamlining updating it with feature data from UCSC. also working on updating exon array data for use in IGB client. working w/ gregg on other server-related work. gh: graph data as well. ee: working on igb client. talk w/ gregg later to get specifics. gh: lots of ui stuff Topic: xml validation --------------------- ad: dtd's don't support namespaces, so we can't support dtds gh: not that simple. where do we add namespaces? ad: schemas have ns's testing.... gh: concern #1: is one of perception. don't like telling people we don't have valid xml ad: only means suports the dtd, not in human sense. gh: it's one of perception td: self-contained document + validation gh: getting rid of doctype declaration is issue of versioning. how will client know which version of spec it's supposed to be implementing? need to deal with spec crawl. The only way i'm aware of is via looking at dtd pointer changing. gh: not worried about new categories, but changing things like optional vs req'd attributes/elements. ad: content-type contains version td: or content negotiation ap: xml schema validator at w3c.org can use that and claim it is valid. can upload your files, push a button. ad: I have an extension of properties with arbitrary binary data vs text vs href. this is ok with relaxng, not by xsd. ad: we could say what is valid das2 since we're the arbiters of what is valid das xml document. e.g., well-formed, validates against the rng schemas gh: the rng we now have allows arbitrary xml? ad: yes. can say there are arbitrary elements under some node. checked in as file named common.rnc gh: ok, getting rid of requirement for doctype declaration. any versioning is done via content-type gh: if we don't do content neg, a sources query goes out, whatever version that the server supports comes back. this will be the latest version of the spec the server supports. ad: for backwards compatibility that won't be needed. extensibility will be sufficient for a few years. gh: don't believe it. td: spec is churning fast now. there'll be less churn once there are impls. gh: there were impls 3 or 4 mos ago (allen, gregg). so there have been plenty of churn even with impls.so we'll need versioning, ok on content-type. aday: we definitely need versioning. need it now. also want a tagged version we we can work at same time. ad: content-type-xdas;version=1.1 in general not the right solution (not general purpose), but for this case, makes sense. aday: can impl, header says 1.1 gh/ad: contents are a subset of the specification. so it's tied to a version of the rng schema. ad: the tag will be the cvs revision # gh: this isn't temporary, where there will not be a time when we are not generating churn. ad: believes this is temporary, won't have to have it long-term aday: no mechanism for it now. ad: need a way to turn it into meaning. agreement on what string means which verison of a program. nh: second gregg. will always be an issue. ad says it's not good long-term, maybe we should come up with it. gh: we have some basis to go forward. [A] das/2 server will specify spec version via content-type-xdas;version=X.X Topic: category elements, how to construct a query url ------------------------------------------------------ ad: what is syntax of string used to specify ontology? SO:? aday: attribute for it gh: ontol term is a uri aday: type element has ontology gh: id of type is not nec an ontol term ad: the attrib of feat type, ontol=something gh: that's a uri, abs or rel point to a frag in so/fa ontol ad: can't find how this should look. said SO:0000001. that should be a uri? gh: yes. in types xml that's returned, id and ontol are uri's. a server will pick one for it's xml base. the other will have to be a full uri. ad: how do diff clients know a given term corresponds to what term in the ontol? gh: they will have to understand sofa/so. ad: do they have persistent ids? gh: my understanding is that they can use fragment notation for a stable url for the term aday: ontol docs aren't xml, no anchors for pointing to a fragment. they're their own format. nervous about building dependency on fragment record uris into our system gh: good point. would be happier if it was recast as xml aday: is now pointing to an xml document for ontology nodes ad: happier if we could use "SO:xxx" i.e., a urn gh: would like a re-cast as xml document, hosted at so/sofa website. that xml would be like a std ontology representation so you could extend it. so someone could point to an extension of it. Category elements -- constructing query URLs -------------------------------------------- gh: andreas' point (email): query id attribute, constructing these out of relative uri, or based on base uri. agree with andreas: we know what those will be. for clarity of spec, we should specify: here's base uri, here's how you construct the segments query, etc. ad: trouble for segments- could be on ref server gh: doubt that people will impl this way. will be specific to server and will be related to everyone else's notion of chromosomes and assemblies. ad: where does the distributed nature of das come from? ref server gh: das/1: ref server has residues to serve, regions (entry pts) served up by everyone. this was the notion of ref vs non-ref server to carry forward. non-ref server still serves up segments. will have segments in it's reference space. reference would be genome assembly version + organism. sufficient to globally identify it. ap: had discussions about this. query id td: issue comes from seqs being urls rather than opaque ids in a ns defined by coord system. have a set of servers that share common coord syst. then a seq identified by stringx on one server is same as on the other server. the remaining q: server that doesn't want to serve up seqs, what urls does it use? can it use an opaque seq name that is known by that name of ref server? gh: restating concerns here: using query string to construct uri's 1. confusion: arbitrary uri means more confusing spec, and how to implement it (can't just say /segment, but 'whatever is pointed at by such and such uri') 2. size of documents. right now, can use same xml:base for features document, can make feat ids and location id relative to it, nice and short. if seg is on other server, need to expand one of the ids compresses well, but that will take longer than transmission. this is only for features xml. can use coords or assembly info to determine identity between urls. want a defined ns. ad: you want a way to say: these are relative urls to a base url for that data type. so that this type url is relative to some base url for types, similar for segments, features. gh: we have this now, can be relative or absolute ad: there is a default xml base like thing: one for type, segment, features. so you could have relative ids to those bases. gh: possibly, but not ideal. It's better to use a std xml base for all of them. each server has it's own unique uris for segments. I'm proposing that we decouple segments from residues and having segments doesn't mean we can serve residues. reasoning: - this leads to smaller xml docs - simplifies the spec if we didn't have to construct query ids from category element would rather specify the string that's appended in the spec. sc: might could deal with this issue by adding structure to the document in order to add different xml:bases for different data types. e.g., use different parent elements that could define their own xml:bases, one for types, segments, and feautures. might complicate the spec tho. ad: single genome have same types across all dbs. gh: across servers, dangerous. ad/td: globally unique ids, could have everything in the same directory. td: can we just use seq/name, type/name. i.e., codifying what the convention now is. ad: name is put at end of base url a feature document may give types, segments, other features. td: just use simple strings, not urls. gh: std uri syntax isn't important, but a std query mechanism to get all of these is. some uri you put a '/types' on or a '/segments'. ad: you have this right now. gh: but it's only defined for a server, not the whole spec. there's no where in the spec that says this. confusing for people reading/implementing the spec. ap: If you make it free text, you don't know what to put for a given server? ad: you get a document ap: I already know the server, not necessarily a document. ad: taking out the mention of any hierarchy, just refer to things as feat query url. [note taker is having trouble following the thread of this discussion.] gh: let's sleep on it, discuss tomorrow, vote then. From nomi at fruitfly.org Mon Feb 6 15:49:51 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Mon, 6 Feb 2006 12:49:51 -0800 (PST) Subject: New DAS/2 server for codesprint [was Re: [DAS2] Re: Apollo and DAS/2 priorities] In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: <17383.46703.563017.422300@kinked.lbl.gov> thanks for setting up the new das/2 server, allen. i'm having trouble with some of the queries. On 5 February 2006, Allen Day wrote: > Okay folks, an implementation of the document cited below is available > here: > > http://das.biopackages.net/codesprint I get "Internal Server Error" > http://das.biopackages.net/codesprint/sequence > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment these both work. > http://das.biopackages.net/codesprint/sequence/yeast > http://das.biopackages.net/codesprint/sequence/yeast/S228C for these i get Error loading stylesheet: A network error occured loading an XSLT stylesheet: http://das.biopackages.net/xsl/das.xsl i'm running firefox on mozilla, so i'm not surprised when it has problems with stylesheets, but i used to be able to get data from the old das/2 server, even though it did have some complaint about not finding the stylesheet. http://das.biopackages.net/codesprint/sequence/human/17/feature churned forever (or, at least, for several minutes--maybe it will eventually return). Nomi From nomi at fruitfly.org Mon Feb 6 17:34:30 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Mon, 6 Feb 2006 14:34:30 -0800 (PST) Subject: [DAS2] Re: New DAS/2 server for codesprint In-Reply-To: <17383.46703.563017.422300@kinked.lbl.gov> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <17383.46703.563017.422300@kinked.lbl.gov> Message-ID: <17383.52982.274142.351003@kinked.lbl.gov> On 6 February 2006, Nomi Harris wrote: > thanks for setting up the new das/2 server, allen. i'm having trouble > with some of the queries. ok, i realized that some of the queries i was trying were senseless, but here are some that should work that are just hanging: http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments http://das.biopackages.net/codesprint/sequence/yeast/S228C/types Nomi From allenday at ucla.edu Mon Feb 6 16:53:34 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 6 Feb 2006 13:53:34 -0800 (PST) Subject: New DAS/2 server for codesprint [was Re: [DAS2] Re: Apollo and DAS/2 priorities] In-Reply-To: <17383.46703.563017.422300@kinked.lbl.gov> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <17383.46703.563017.422300@kinked.lbl.gov> Message-ID: On Mon, 6 Feb 2006, Nomi Harris wrote: > thanks for setting up the new das/2 server, allen. i'm having trouble > with some of the queries. > > On 5 February 2006, Allen Day wrote: > > Okay folks, an implementation of the document cited below is available > > here: > > > > http://das.biopackages.net/codesprint > I get "Internal Server Error" That's to be expected. The spec does not specify what the response to this request should be, or if it is even valid -- so I didn't implement it. > > http://das.biopackages.net/codesprint/sequence > > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment > these both work. > > > http://das.biopackages.net/codesprint/sequence/yeast > > http://das.biopackages.net/codesprint/sequence/yeast/S228C > for these i get > Error loading stylesheet: A network error occured loading an XSLT stylesheet: > http://das.biopackages.net/xsl/das.xsl This happens if you're browsing the URLs in a web browser that supports xsl directives. Previous versions of the server supported web browsers, but at the cost of using a 'text/xml' Content-Type header. Consensus in the group was that web browsers are not a target platform, so this feature no longer works -- so you won't be able to view the DAS2XML in your browser anymore. I just haven't removed the XSL references yet. > i'm running firefox on mozilla, so i'm not surprised when it has problems > with stylesheets, but i used to be able to get data from the old das/2 > server, even though it did have some complaint about not finding the > stylesheet. > > http://das.biopackages.net/codesprint/sequence/human/17/feature The server is coded to throw an error if you ask for all features, so I'm surprised it didn't just give you a 4xx or 5xx response. I'll look into it. > churned forever (or, at least, for several minutes--maybe it will > eventually return). > > Nomi > From allenday at ucla.edu Mon Feb 6 17:00:50 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 6 Feb 2006 14:00:50 -0800 (PST) Subject: [DAS2] Re: New DAS/2 server for codesprint In-Reply-To: <17383.52982.274142.351003@kinked.lbl.gov> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <17383.46703.563017.422300@kinked.lbl.gov> <17383.52982.274142.351003@kinked.lbl.gov> Message-ID: Hi Nomi, I just restarted the server, the "all features" request used all the memory and hung the webserver. I'll look into why that request wasn't immediately denied as it used to be. As for your .../segments and .../types, they should be .../segment and .../type. I see no reason to pluralize these URLs given that the sources response allows me to provide them at any arbitrary URL: [...] [...] -Allen On Mon, 6 Feb 2006, Nomi Harris wrote: > On 6 February 2006, Nomi Harris wrote: > > thanks for setting up the new das/2 server, allen. i'm having trouble > > with some of the queries. > > ok, i realized that some of the queries i was trying were senseless, but > here are some that should work that are just hanging: > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments > http://das.biopackages.net/codesprint/sequence/yeast/S228C/types > > Nomi > From Steve_Chervitz at affymetrix.com Mon Feb 6 17:27:01 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 06 Feb 2006 14:27:01 -0800 Subject: [DAS2] version= In-Reply-To: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com> Message-ID: Andrew Dalke wrote on Mon, 6 Feb 2006 18:42:24 > > If we add a version= field to the Content-Type, or whatever > mechanism is proposed > > Content-Type: application/x-das2features+xml; version=12345 > > What will a client do when it gets a version number it has > never heard of? Should it use the newest version it supports? > The oldest? Abort? Rather than have version data be something that the client has to discover in the response, an then have to react to in some intelligent way, how about adding an optional dasversion field to all requests, such as: http://www.wormbase.org/das/genome/volvox/1/type?dasversion=1.1 The server would then either: 1) return the appropriate response document if the server supports the requested version or a later version that is backward compatible with it, or 2) return a 505 error 'DAS Version Not Supported', which we already have in the spec. This puts the onus on the server rather than the client, but I think it would be less trouble on the server than the alternative scheme would be for the client. The client can now be fairly dumb about versioning and assume everything is kosher unless it gets an error. We could put some of the onus for DAS version support on the revisers of the spec: When a new version of the spec is released, we'll know right then what parts will be backward compatible and what parts will not be. The reviser could document whether the new version of the spec is backwards compatible with which previous versions, with the appropriate level of granularity (e.g., "all requests are backward compatible except for the types request"). This would serve as a guide for maintainers of das2 servers. Thoughts? Steve From nomi at fruitfly.org Mon Feb 6 18:41:23 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Mon, 6 Feb 2006 15:41:23 -0800 (PST) Subject: [DAS2] version= In-Reply-To: References: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com> Message-ID: <17383.56995.914058.889189@kinked.lbl.gov> i think it would be nice to have it work both ways--the version is reported by the server, but the client can also request a particular version as you suggest. whatever we decide on, can we please make the version IDs numerical so that they can be compared easily (e.g. "if (dasversion > 1.3) ...")? Nomi On 6 February 2006, Steve Chervitz wrote: > Andrew Dalke wrote on Mon, 6 Feb 2006 18:42:24 > > > > If we add a version= field to the Content-Type, or whatever > > mechanism is proposed > > > > Content-Type: application/x-das2features+xml; version=12345 > > > > What will a client do when it gets a version number it has > > never heard of? Should it use the newest version it supports? > > The oldest? Abort? > > Rather than have version data be something that the client has to discover > in the response, an then have to react to in some intelligent way, how about > adding an optional dasversion field to all requests, such as: > > http://www.wormbase.org/das/genome/volvox/1/type?dasversion=1.1 > > The server would then either: > > 1) return the appropriate response document if the server supports the > requested version or a later version that is backward compatible with it, > or > 2) return a 505 error 'DAS Version Not Supported', which we already have in > the spec. > > This puts the onus on the server rather than the client, but I think it > would be less trouble on the server than the alternative scheme would be for > the client. The client can now be fairly dumb about versioning and assume > everything is kosher unless it gets an error. > > We could put some of the onus for DAS version support on the revisers of the > spec: When a new version of the spec is released, we'll know right then what > parts will be backward compatible and what parts will not be. The reviser > could document whether the new version of the spec is backwards compatible > with which previous versions, with the appropriate level of granularity > (e.g., "all requests are backward compatible except for the types request"). > This would serve as a guide for maintainers of das2 servers. > > Thoughts? > > Steve From ed_erwin at affymetrix.com Mon Feb 6 17:48:49 2006 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Mon, 06 Feb 2006 14:48:49 -0800 Subject: [DAS2] elements In-Reply-To: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com> References: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com> Message-ID: <43E7D251.8050703@affymetrix.com> Andrew Dalke wrote: > One discussion point from today is the elements. > > The current draft of the spec says they look like this > > > > > > > > > > > > > > > Andreas Prlic pointed out that since the document says > the "volvox" version "1" url is already known ("volvox/1") > and the type="segments" then the query_id can be built > from appending "segments" to the "volvox/1" (plus the "/") > to get "volvox/1/segments". > > I originally responded from a ReST purity argument, in that > URLs should not be constructed from non-URL data. This > lets Thomas, for example, use GUIDs for the objects rather > than the hierarchical structure I and others recommend. > > During discussion a better answer came up, which I think > we talked about earlier but which is worth emphasizing > is that the "query_id"s don't need to be on the same server. > > For example, the "regions" URL may and likely will point > to a common reference server, and a database may offer > only one set of "types" for all of the "features". > > That is, something like this > > DAS server example.com > genome A > version x > segments at "ensembl.org/das2/genome_A/build_1/segments" > features at "example.com/A/version_x/features" > types at "example.com/A/types" None of your examples vary the words "segments", "types" or "features", but it is legal to do so, right?: segments at "ensembl.org/das2/genome_A/build_1/segment" features at "example.com/A/version_x/things/and/more/things" types at "example.com/A/rhinoceros" OK, so no one is likely to go that far, but is it legal for example to use non-plural "segment", "feature" and "type" ? From ed_erwin at affymetrix.com Mon Feb 6 17:51:11 2006 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Mon, 06 Feb 2006 14:51:11 -0800 Subject: [DAS2] version= In-Reply-To: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com> References: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com> Message-ID: <43E7D2DF.7060507@affymetrix.com> Andrew Dalke wrote: > If we add a version= field to the Content-Type, or whatever > mechanism is proposed > > Content-Type: application/x-das2features+xml; version=12345 > > What will a client do when it gets a version number it has > never heard of? Should it use the newest version it supports? > The oldest? Abort? > It is up to the client to decide what to do, and this does not need to be specified here. From Gregg_Helt at affymetrix.com Mon Feb 6 18:16:35 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 15:16:35 -0800 Subject: [DAS2] RE: New DAS/2 server for codesprint Message-ID: Ack, you're right! I didn't expect to get bitten by rogue query_ids so soon... gregg > -----Original Message----- > From: Nomi Harris [mailto:nomi at fruitfly.org] > Sent: Monday, February 06, 2006 3:48 PM > To: Allen Day > Cc: Helt,Gregg > Subject: Re: New DAS/2 server for codesprint > > On 6 February 2006, Allen Day wrote: > > Hi Nomi, > > > > I just restarted the server, the "all features" request used all the > > memory and hung the webserver. I'll look into why that request wasn't > > immediately denied as it used to be. > > > > As for your .../segments and .../types, they should be .../segment and > > .../type. I see no reason to pluralize these URLs given that the > sources > > response allows me to provide them at any arbitrary URL: > > oops, gregg led me astray with that one. right, /segment and /type > work. sorry for hanging your server with my inadvertent "all features" > request. > Nomi From Gregg_Helt at affymetrix.com Mon Feb 6 19:14:55 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 16:14:55 -0800 Subject: [DAS2] Re: New DAS/2 server for codesprint Message-ID: Allen, can you recommend a reasonable region on yeast to do a features query that will return features with some hierarchy (like transcript/exons)? Thanks, Gregg From Gregg_Helt at affymetrix.com Mon Feb 6 19:29:12 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 16:29:12 -0800 Subject: [DAS2] Re: New DAS/2 server for codesprint Message-ID: Actually, that "arbitrary URL" thing doesn't quite work with the current biopackages server, which has an xml:base pointing to a server at UCLA for the response to the sequence query: http://das.biopackages.net/codesprint/sequence ... ... ... Which means (I think) that the segments query resolves to http://radius.genomics.ctrl.ucla.edu/das/sequence/human/17/segment which for me returns a 404 Not Found response. gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Allen Day > Sent: Monday, February 06, 2006 2:01 PM > To: Nomi Harris > Cc: DAS/2 > Subject: [DAS2] Re: New DAS/2 server for codesprint ... > As for your .../segments and .../types, they should be .../segment and > .../type. I see no reason to pluralize these URLs given that the sources > response allows me to provide them at any arbitrary URL: > > [...] > > > > [...] > > -Allen > > > > On Mon, 6 Feb 2006, Nomi Harris wrote: > > > On 6 February 2006, Nomi Harris wrote: > > > thanks for setting up the new das/2 server, allen. i'm having > trouble > > > with some of the queries. > > > > ok, i realized that some of the queries i was trying were senseless, but > > here are some that should work that are just hanging: > > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments > > http://das.biopackages.net/codesprint/sequence/yeast/S228C/types > > > > Nomi > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From Steve_Chervitz at affymetrix.com Mon Feb 6 20:02:30 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 06 Feb 2006 17:02:30 -0800 Subject: [DAS2] Re: New DAS/2 server for codesprint In-Reply-To: Message-ID: There's a gene (RPL7A) with two introns on chr7 at roughly 366kbp - 364kbp: http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YGL076C Most genes with introns in cerevisiae (which aren't many) have just a single intron that creates a small 5' exon, such as the alpha and beta tubulin genes on chr13. Tub1 is on chr13 at 99Kbp, and tub3 is also on chr13 at 23Kbp. So the first 100Kb of chr13 would be another region to try. http://db.yeastgenome.org/cgi-bin/locus.pl?locus=tub1 Steve > From: "Helt,Gregg" > Date: Mon, 6 Feb 2006 16:14:55 -0800 > To: Allen Day > Cc: DAS/2 > Conversation: [DAS2] Re: New DAS/2 server for codesprint > Subject: RE: [DAS2] Re: New DAS/2 server for codesprint > > > Allen, can you recommend a reasonable region on yeast to do a features > query that will return features with some hierarchy (like > transcript/exons)? > > Thanks, > Gregg > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From Gregg_Helt at affymetrix.com Mon Feb 6 21:42:18 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 18:42:18 -0800 Subject: [DAS2] Modifying com.affymetrix.igb.das2 classes Message-ID: Brian and Marc, I'm about to start seriously modifying the IGB DAS/2 classes in the com.affymetrix.igb.das2 package. There's code in there you wrote to work with materials, assays, results, and ontology. I think we discussed at some point splitting this stuff out into a separate package(s). Which sounds good, especially since (as I understand it), these domains are separate from the DAS/2 "sequence" domain. The only place there's a lot of mixture of code for these domains with the sequence parts is in Das2VersionedSource. Is it okay if I move this out (or comment it out) of Das2VersionedSource while I renovate other parts of the class? thanks, Gregg From Gregg_Helt at affymetrix.com Mon Feb 6 22:34:48 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 19:34:48 -0800 Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes Message-ID: You're right, it looks like some of this code was already getting moved over to the das2.assay and das2.ontology packages as subclasses of Das2VersionedSource. However it's not clear to me if the equivalent of source and versioned source for assay, ontology, and other domains are going to be similar enough to the DAS/2 sequence domain to justify sharing a base class/interface. What do/will they share? I'll go ahead with changes to the das2 package, and look into moving much of this code into a das2.sequence package. Thanks, Gregg > -----Original Message----- > From: Brian O'Connor [mailto:boconnor at ucla.edu] > Sent: Monday, February 06, 2006 7:09 PM > To: Helt,Gregg > Cc: Marc Carlson; Allen Day; DAS/2 > Subject: Re: Modifying com.affymetrix.igb.das2 classes > > Hi Gregg, > > Go for it!! Marc and I can take a look at it again when you're happy > with the changes. The versioned source object really needed an overhaul > anyway to deal with the multiple domains of the DAS/2 server. I think > there should be a VersionedSource parent and then children for each > domain (i.e. VersionedSourceAssay). I think Marc started to do this but > he was afraid to alter the VersionedSource object too much for fear of > breaking the IGB client. > > --Brian > > Helt,Gregg wrote: > > > Brian and Marc, > > > > I'm about to start seriously modifying the IGB DAS/2 classes in the > > com.affymetrix.igb.das2 package. There's code in there you wrote to > > work with materials, assays, results, and ontology. I think we > > discussed at some point splitting this stuff out into a separate > > package(s). Which sounds good, especially since (as I understand it), > > these domains are separate from the DAS/2 "sequence" domain. The only > > place there's a lot of mixture of code for these domains with the > > sequence parts is in Das2VersionedSource. Is it okay if I move this > > out (or comment it out) of Das2VersionedSource while I renovate other > > parts of the class? > > > > thanks, > > > > Gregg > > From boconnor at ucla.edu Mon Feb 6 22:09:22 2006 From: boconnor at ucla.edu (Brian O'Connor) Date: Mon, 06 Feb 2006 19:09:22 -0800 Subject: [DAS2] Re: Modifying com.affymetrix.igb.das2 classes In-Reply-To: References: Message-ID: <43E80F62.4050403@ucla.edu> Hi Gregg, Go for it!! Marc and I can take a look at it again when you're happy with the changes. The versioned source object really needed an overhaul anyway to deal with the multiple domains of the DAS/2 server. I think there should be a VersionedSource parent and then children for each domain (i.e. VersionedSourceAssay). I think Marc started to do this but he was afraid to alter the VersionedSource object too much for fear of breaking the IGB client. --Brian Helt,Gregg wrote: > Brian and Marc, > > I?m about to start seriously modifying the IGB DAS/2 classes in the > com.affymetrix.igb.das2 package. There?s code in there you wrote to > work with materials, assays, results, and ontology. I think we > discussed at some point splitting this stuff out into a separate > package(s). Which sounds good, especially since (as I understand it), > these domains are separate from the DAS/2 ?sequence? domain. The only > place there?s a lot of mixture of code for these domains with the > sequence parts is in Das2VersionedSource. Is it okay if I move this > out (or comment it out) of Das2VersionedSource while I renovate other > parts of the class? > > thanks, > > Gregg > From Gregg_Helt at affymetrix.com Tue Feb 7 00:43:07 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 21:43:07 -0800 Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes Message-ID: Okay, I just split the code that was in Das2VersionedSource. Now regions and types (w/o ontology) are handled in Das2VersionedSource, and ontology, materials, results, and assays are handled by a subclass, Das2VersionedSourcePlus. I might do some further refactoring at a later date, but for right now this works (and compiles/runs). I also went ahead and committed almost all my DAS/2 code changes to the genoviz repository. Gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Helt,Gregg > Sent: Monday, February 06, 2006 7:35 PM > To: Brian O'Connor > Cc: DAS/2; Marc Carlson > Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes > > > You're right, it looks like some of this code was already getting moved > over to the das2.assay and das2.ontology packages as subclasses of > Das2VersionedSource. > > However it's not clear to me if the equivalent of source and versioned > source for assay, ontology, and other domains are going to be similar > enough to the DAS/2 sequence domain to justify sharing a base > class/interface. What do/will they share? > > I'll go ahead with changes to the das2 package, and look into moving > much of this code into a das2.sequence package. > > Thanks, > Gregg > > > -----Original Message----- > > From: Brian O'Connor [mailto:boconnor at ucla.edu] > > Sent: Monday, February 06, 2006 7:09 PM > > To: Helt,Gregg > > Cc: Marc Carlson; Allen Day; DAS/2 > > Subject: Re: Modifying com.affymetrix.igb.das2 classes > > > > Hi Gregg, > > > > Go for it!! Marc and I can take a look at it again when you're happy > > with the changes. The versioned source object really needed an > overhaul > > anyway to deal with the multiple domains of the DAS/2 server. I think > > there should be a VersionedSource parent and then children for each > > domain (i.e. VersionedSourceAssay). I think Marc started to do this > but > > he was afraid to alter the VersionedSource object too much for fear of > > breaking the IGB client. > > > > --Brian > > > > Helt,Gregg wrote: > > > > > Brian and Marc, > > > > > > I'm about to start seriously modifying the IGB DAS/2 classes in the > > > com.affymetrix.igb.das2 package. There's code in there you wrote to > > > work with materials, assays, results, and ontology. I think we > > > discussed at some point splitting this stuff out into a separate > > > package(s). Which sounds good, especially since (as I understand > it), > > > these domains are separate from the DAS/2 "sequence" domain. The > only > > > place there's a lot of mixture of code for these domains with the > > > sequence parts is in Das2VersionedSource. Is it okay if I move this > > > out (or comment it out) of Das2VersionedSource while I renovate > other > > > parts of the class? > > > > > > thanks, > > > > > > Gregg > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From Gregg_Helt at affymetrix.com Tue Feb 7 00:46:37 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 21:46:37 -0800 Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06 Message-ID: Will you be able to join the teleconference tomorrow (Tuesday?). Suzi is planning to join in, I'm hoping we can spend some time discussing ontologies. Thanks Gregg P.S. 9 AM Pacific time 800-531-3250 id: 2879055 > -----Original Message----- > From: Lincoln Stein [mailto:lstein at cshl.edu] > Sent: Monday, February 06, 2006 10:20 AM > To: das2 at portal.open-bio.org > Cc: Helt,Gregg > Subject: Re: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06 > > Hi Gregg, > > I had a conflicting teleconference and wasn't sure whether there was a > teleconference scheduled for the code sprint, so I didn't dial in. Just > got > the agenda now. > > I am online on both MSN and AOL chats, and will be all week, if anyone > wants > to IM me. > > Lincoln > > On Monday 06 February 2006 12:13, Helt,Gregg wrote: > > Status report > > DAS/2 XML - valid or not valid? > > CATEGORY elements -- constructing query URLs > > MAINTAINER information > > Use of xml:base > > update on feature properties - searching, etc. > > > > > > > > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Tue Feb 7 04:22:56 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 09:22:56 +0000 Subject: [DAS2] elements In-Reply-To: <43E7D251.8050703@affymetrix.com> References: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com> <43E7D251.8050703@affymetrix.com> Message-ID: <8daf0ba1e5744f8e0b99fc644fb5dd38@dalkescientific.com> Ed Erwin wrote: > None of your examples vary the words "segments", "types" or > "features", but it is legal to do so, right?: > > segments at "ensembl.org/das2/genome_A/build_1/segment" > features at "example.com/A/version_x/things/and/more/things" > types at "example.com/A/rhinoceros" > > OK, so no one is likely to go that far, but is it legal for example to > use non-plural "segment", "feature" and "type" ? Yes. My goal is two-fold. First, make no assertions on the internal organization of the DAS server. Machines can change, directories can move around. The specific advantages are: - annotation servers can all point to the same "segments" server - multiple versions of the same genomic source and on the same machine can reuse the same "types" server Another thought, perhaps too old-fashioned for modern web development, is that the query URLs are cgi scripts in a "cgi-bin" directory while the data files are flat-files in some other directory. Simiarly, the query url if a CGI script might end with a ".cgi" or ".pl" extension. My second goal is to develop a recommended layout. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Feb 7 04:32:11 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 09:32:11 +0000 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 6 Feb 2006 In-Reply-To: References: Message-ID: <97f6d51a2e54031ed49fe7997af383eb@dalkescientific.com> > gh: would like a re-cast as xml document, hosted at so/sofa > website. that xml would be like a std ontology representation so you > could extend it. so someone could point to an extension of it. I asked as an action item if Gregg would look into the solution for this. Do we refer to the ontology by a "GO:0123456" identifier or by some URL scheme? If so, what's the mapping from URL scheme to something that clients and people can understand, eg, to ask for everything which is an exon? Does this mapping need a version number - does it change over time? Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Feb 7 05:38:28 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 10:38:28 +0000 Subject: [DAS2] per-database MAINTAINER Message-ID: <294a2caeb29a823dd93fa1155012c8cb@dalkescientific.com> Based on Andreas Prlic's work with the DAS2 registry I've added a new MAINTAINER element to the SOURCE/VERSION part of the SOURCES document. I've updated das/das2/scratch/sources4.xml to have an example. It looks something like this The idea is that the database maintainer can be different than the server maintainer. On the other hand addition, if the SOURCES/SOURCE/VERSION/MAINTAINER is not present then clients may assume that the database maintainer is the same as the SOURCES/MAINTAINER The maintainer elements are both optional. Andrew dalke at dalkescientific.com From allenday at ucla.edu Tue Feb 7 05:52:12 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 7 Feb 2006 02:52:12 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: The XML is now as you requested, please confirm. After some thought today I realized the new SOURCES response is fully compatible with the existing server. The doc at: http://das.biopackages.net/codesprint/sequence is now simply a static XML doc that points into the stable server (plus the new "segments" response) implementation at: http://das.biopackages.net/das/genome The headers for the static document don't include the correct Content-Type "application/x-das-blah ; version = XxX", it's simply "text/xml". I'll add the headers in the morning GMT+8. There are probably also some other Content-Type headers that need to be changed for the other responses -- let me know if you spot them. -Allen On Mon, 6 Feb 2006, Andrew Dalke wrote: > Allen: > > After looking closely over this first draft of new_spec.txt, it's > > apparent > > that there are still some holes, e.g. what should the response to the > > following requests look like? > > > > http://das.biopackages.net/codesprint/sequence/yeast > > > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/"> > taxon="Yeast"> > > > > > > > > > > > > > > > > > > > > > > > > > > > http://das.biopackages.net/codesprint/sequence/yeast/S228C > > The same for this case. There is only on VERSION for "yeast". > > > Your XML, btw, starts > > > > > > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/"> > > The "standalone" means that the DTD may affect the content of the > documentation. > http://www.stylusstudio.com/w3c/xml11/sec-rmd.htm > > > Markup declarations can affect the content of the document, as passed > > from an XML Processor to an application; examples are attribute > > defaults and entity declarations. The standalone document declaration, > > which MAY appear as a component of the XML declaration, signals > > whether or not there are such declarations which appear external to > > the Document Entity or in parameter entities. An external markup > > declaration is defined as a markup declaration occurring in the > > external subset or in a parameter entity (external or internal, the > > latter being included because non-validating processors are not > > required to read them). > > For what we're doing, we don't need nor (I think) want that. There's > no reason for a client to consult the DTD to figure out the XML. > > Instead, use > > > > and probably have the encoding > > > > That also means you can get rid of the > > > > statements. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From dalke at dalkescientific.com Tue Feb 7 07:19:28 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 12:19:28 +0000 Subject: [DAS2] properties and queries Message-ID: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com> We've had a long discussion here about properties and how to search them. As it stands now the spec has a few holes in it. Here are the properties we've talked about. program_name: the program used to make the annotation, like "BLASTX 1.2.3" notes: There can be 0 or more notes. Notes might refer to other notes (eg, "the previous note said XYZ but I think ABC") phase: (is it 0, 1, 2 or 1, 2, 3?) (And does anyone use this? People here don't use it; Thomas "reinfers it by counting along the transcript" "but maybe that's just me". Others say they don't use the DAS1 phase.) icon: a hypothetical image use for the feature, perhaps as a binary png; curation history: a list of elements, each with - person - timestamp - reason for change score: a floating point number, which may be in exponential notation like "1E-3" Each one needs different search mechanisms. For example, "annotations done by that buggy version of BLAST 1.2.3" "scores better than 1E-2" "changes by Andrew done in August 2004" "notes with the substring 'helicase'" (case sensitive or not?) "notes with the phrase 'E. Coli'" (substring might not work if there's the note has 'E.\nColi') The property storage scheme doesn't handle this quite correctly. Here are problems: - how do you store multiple notes? Answer 1: use structured named, like "note_1", "note_2", "note_3", .. HACK! Then what if a note is deleted? Bigger problem; how do you search the "note" field using the existing query language? Answer 2: allow duplicate note elements, like Question: so the order must be preserved if two fields have the same name? Can't implement with a dictionary/hash data type. Question: what if there are duplicate "score" or "phase" elements? Which one wins? Answer 3: Notes are important and we know we need them now. Let's have a element and not make it be a property. This is a note The previous note is a lie! Is this an E or a NOT-E? (perhaps also with timestamp and author name, but that's a different question.) Then we also define that the "note=" parameter in as DAS query is a substring search of the elements of a feature. I like this one. - How do you do numeric searches? This is hypothetical. There hasn't been a requirement for this. 'Course it may be because people haven't had the ability. In any case, how to search numeric fields like "score" with comparisons? - querying non-queryable fields If there's embedded binary data, like an image, is it searchable? Does a server complain and die? Ignore the request? - more complex text searches "proteinase but not inhibitor" - complex data We have support for non-DAS extensions, which might be Change the this into that because of some reason or other Thomas proposed that we support some sort of complex query language, probably in XML, and get rid of the simple query scheme we have now. I argued against the complexity of that given that nearly all of the queries will be "give me these feature types on this range of that chromosome". I also pointed out that developing a generic query language is hard, and implementing it is harder. Why require all that effort? Roy commented the other way - in a server with only a few hundred features, why require a query language at all? Just return all of the features in the request. Here's what I proposed. We have the "CATEGORY" (but after discussion I now want to take it back to "CAPABILITY" since that's now much closer to what it does - it describes where to go to do something) So I'll use "CAPABILITY" The current scheme has This is an extensibility point. Suppose Thomas has an XML query search interface support on his server, with Sanger clients that handle it. Then there can be A client can see the list of CAPABILITIES and decide to use the feature search mechanism it likes best. In addition, we could say that "this supports the normal DAS query scheme but also supports extension vocabulary. For example, With this a client knows that the query_url supports the normal DAS queries and also supports the "annotator", "annotation_before" and "annotation_after" queries, like this .../features?annotator=Andrew;annotation_before=2005 Possible idea: if there is no SUPPORTs tag then the server implements no search syntax and instead returns everything, for the example Roy mentioned. Okay, we're off to lunch. Andrew dalke at dalkescientific.com From ap3 at sanger.ac.uk Tue Feb 7 07:21:53 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 7 Feb 2006 12:21:53 +0000 Subject: [DAS2] das-regstry sources response Message-ID: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk> Hi! I added a DAS2- sources response to a copy of the das registry running on my laptop. the attached file shows how the das1 sources are described using the das2 spec. - it fits together rather well. I did not know what to put under the . The already contain all required info. Therefore I propose to drop Andreas -------------- next part -------------- A non-text attachment was scrubbed... Name: sources_response.xml Type: application/octet-stream Size: 32318 bytes Desc: not available URL: -------------- next part -------------- ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From dalke at dalkescientific.com Tue Feb 7 08:20:35 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 13:20:35 +0000 Subject: [DAS2] das-regstry sources response In-Reply-To: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk> References: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk> Message-ID: Andreas: > I did not know what to put under the . The > already contain all required info. > Therefore I propose to drop Removed and commited to CVS. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Tue Feb 7 10:34:21 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Tue, 7 Feb 2006 07:34:21 -0800 Subject: [DAS2] Ontologies in DAS/2 Message-ID: I talked to Suzi, she's planning to join our teleconference today to discuss ontologies, wearing her hat as co-PI of the National Center for Biomedical Ontology. Hopefully Lincoln can join too. I took a closer look at the DAS/2 ontology work Allen has done (see http://biodas.org/documents/das2/das2_ontology.html). I urge anyone who wants to contribute to the ontology discussion to read this doc. It specifies a way to retrieve ontologies in OBOXML format. In this format each ontology term gets an absolute URI through the same mechanism that the rest of DAS/2 uses (URIs for ids, which can be either absolute or relative but resolvable). As Allen pointed out yesterday this would solve our problem of how to uniquely specify ontology terms in the DAS/2 TYPES XML. I couldn't find any documentation for the OBOXML format, other than the code that generates it from OBO files. But I'm using OBOXML as an example here because it clearly has resolvable URIs for each ontology term. In Allen's spec, ontologies can also be returned in other formats, but it's unclear to me whether terms in these other formats would resolve to similar URIs. gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Andrew Dalke > Sent: Tuesday, February 07, 2006 1:32 AM > To: DAS/2 > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > sprint,6 Feb 2006 > > > gh: would like a re-cast as xml document, hosted at so/sofa > > website. that xml would be like a std ontology representation so you > > could extend it. so someone could point to an extension of it. > > I asked as an action item if Gregg would look into the solution > for this. Do we refer to the ontology by a "GO:0123456" identifier > or by some URL scheme? If so, what's the mapping from URL scheme > to something that clients and people can understand, eg, to > ask for everything which is an exon? > > Does this mapping need a version number - does it change over time? > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org From dalke at dalkescientific.com Tue Feb 7 10:45:00 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 15:45:00 +0000 Subject: [DAS2] properties and queries In-Reply-To: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com> References: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com> Message-ID: <16111cd36850795dfd46696a63fb1057@dalkescientific.com> To summarize, the current thought here for properties and queries is as follows (it's a long summary. More like an essay. :) Add support for zero or more elements in the feature, of the form This is some arbitrary (but non-markup-ed) text Add a features search keyword "note=" which takes a search string to be found in the note elements. (substring? soundex? regex? the search engine calls up Lincoln and asks?) Add support for zero or more elements in the feature, of the form (I missed this in the redraft. It should have been there. Feature filter "name" already says it searches the "name" and "alias" fields for a feature.) Ignore the "phase" property (contentious, perhaps?) or add it as an attribute of something else in the feature element. Ignore the "score" property. As written in the current spec "score" A floating point number indicating a context-dependent score. This is to be used only when a more specific ontology-driven score cannot be used. (Umm, where do the other scores go?) Unless someone wants to define that score ontology and what it means to search that field, this is a can of worms I don't want to open. Ignore the "editable" property. As written (and kibbitzed) "editable" indicates that features may be updateable (this is at the discretion of the server). (But this is potentially per-user data.) This should either be in the feature type or it should be in some write-back specific data structure the client can fetch. (To be discussed) It isn't a feature property. This gets rid of all stated needs for arbitrary key/value data. That doesn't mean there won't be future needs. In that case, here's how to add new pieces of data. 1) use a non-DAS extension element. Clients must ignore elements they don't understand. This is good for storing data, but not for searching. The thing is, the search mechanism (or multiple search mechanisms perhaps) is data field specific. Hence, 2) servers may provide extensions to the basic DAS query mechanism. Currently the mechanism is: and-ed set of zero or more keyword = (set, of, or, terms, for, keyword) where "keyword" is well-defined by DAS except for the "att" property keywords. Query extensions add new keywords in the same syntax, and define somewhere how that syntax works. It must be backwards compatible to the existing syntax and semantics. The problem then is clients don't know that a server supports a given query extension, so 3) add a element to the element. (Also proposed, renaming "CATEGORY" back to "CAPABILITY".) The CAPABILITY may have zero or more of Here are the two defined unique strings, The "all" query says that a client may reasonably fetch all the features in one go. This would occur with a small DAS server containing only a few hundred features. In that case there's no need to even have a CGI script running on the back end - just a set of flat files. The query is done by fetching the URL with no parameters. A rich server with millions of features might decide to not support an "all" query. The "das2" query is the one we've been talking about. If a site develops a query extension it adds so clients know what the server can do. (In this case supporting searches for "annotator", "annotation_before" and "annotation_after" fields.) That all said, this doesn't mean that the server shouldn't have a property table. It's a question of what it means to search the property table. People here want the following: multiple properties may have the same key and different value the order of the properties is not important the "att:" search is renamed a "prop:" search, like "prop:author" the search is a substring search. a feature matches a search if any of the properties with that name match the substring search For example, source = BLAST 2.3.4 author = Andrew Dalke author = Thomas Down lets me search for features?prop:author=Andrew all features with "Andrew" as a substring in the "author" property features?prop:author=Andrew;source=BLAST all features with "Andrew" as a substring in the "author" and with "BLAST" in the source name features?prop:author=Andrew,Thomas all features with "Andrew" or "Thomas" as an author Really what I think this essay is doing is saying that storing data and searching data is different. Servers can develop new ways to extend DAS searches and flag that they support new searches. (Eg, the new search may be to support a different way to search a field in the property table.) But there needs to be a really basic substring search, given that there will be simple string key/ string value data for the property table. Oh, and should the key/value table also include my proposed "href" and embedded binary data fields like images? Hmmmmm.... Lots of talk about this here. Time for a tea break. Andrew dalke at dalkescientific.com From lstein at cshl.edu Tue Feb 7 11:00:52 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 7 Feb 2006 11:00:52 -0500 Subject: [DAS2] properties and queries In-Reply-To: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com> References: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com> Message-ID: <200602071100.52818.lstein@cshl.edu> Hi, I use the phase information quite a lot and I know that others do as well. The phase is {0,1,2} and the meaning is described here: For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon. In other words, a phase of "0" indicates that the next codon begins at the first base of the region described by the current line, a phase of "1" indicates that the next codon begins at the second base of this region, and a phase of "2" indicates that the codon begins at the third base of this region. This is NOT to be confused with the frame, which is simply start modulo 3. Lincoln On Tuesday 07 February 2006 07:19, Andrew Dalke wrote: > We've had a long discussion here about properties and how to > search them. As it stands now the spec has a few holes in it. > > Here are the properties we've talked about. > > program_name: the program used to make the annotation, like > "BLASTX 1.2.3" > > notes: > There can be 0 or more notes. Notes might refer to other > notes (eg, "the previous note said XYZ but I think ABC") > > phase: (is it 0, 1, 2 or 1, 2, 3?) > (And does anyone use this? People here don't use it; Thomas > "reinfers it by counting along the transcript" "but maybe > that's just me". Others say they don't use the DAS1 phase.) > > icon: a hypothetical image use for the feature, perhaps as > a binary png; > > curation history: > a list of elements, each with > - person > - timestamp > - reason for change > > score: a floating point number, which may be in exponential > notation like "1E-3" > > Each one needs different search mechanisms. For example, > "annotations done by that buggy version of BLAST 1.2.3" > "scores better than 1E-2" > "changes by Andrew done in August 2004" > "notes with the substring 'helicase'" (case sensitive or not?) > "notes with the phrase 'E. Coli'" (substring might not work > if there's the note has 'E.\nColi') > > The property storage scheme doesn't handle this quite correctly. > Here are problems: > > - how do you store multiple notes? > > Answer 1: use structured named, like "note_1", "note_2", "note_3", .. > HACK! Then what if a note is deleted? Bigger problem; how do you > search the "note" field using the existing query language? > > Answer 2: allow duplicate note elements, like > > > > > Question: so the order must be preserved if two fields have the > same name? Can't implement with a dictionary/hash data type. > > Question: what if there are duplicate "score" or "phase" elements? > Which one wins? > > Answer 3: Notes are important and we know we need them now. > Let's have a element and not make it be a property. > > This is a note > The previous note is a lie! > Is this an E or a NOT-E? > > (perhaps also with timestamp and author name, but that's a different > question.) Then we also define that the "note=" parameter in as > DAS query is a substring search of the elements of a feature. > > I like this one. > > > - How do you do numeric searches? > > This is hypothetical. There hasn't been a requirement for this. > 'Course it may be because people haven't had the ability. In > any case, how to search numeric fields like "score" with comparisons? > > > - querying non-queryable fields > > If there's embedded binary data, like an image, is it searchable? > Does a server complain and die? Ignore the request? > > - more complex text searches > > "proteinase but not inhibitor" > > - complex data > > We have support for non-DAS extensions, which might be > > > > Change the this into that because of some reason or other > > > > Thomas proposed that we support some sort of complex query > language, probably in XML, and get rid of the simple query scheme > we have now. > > I argued against the complexity of that given that nearly all > of the queries will be "give me these feature types on this range > of that chromosome". I also pointed out that developing a > generic query language is hard, and implementing it is harder. > Why require all that effort? > > Roy commented the other way - in a server with only a few hundred > features, why require a query language at all? Just return all > of the features in the request. > > Here's what I proposed. > > We have the "CATEGORY" (but after discussion I now want to take > it back to "CAPABILITY" since that's now much closer to what > it does - it describes where to go to do something) > > So I'll use "CAPABILITY" > > The current scheme has > > > > > > This is an extensibility point. Suppose Thomas has an XML > query search interface support on his server, with Sanger > clients that handle it. Then there can be > > query_url="http.../search-features"> > > > > A client can see the list of CAPABILITIES and decide to > use the feature search mechanism it likes best. > > In addition, we could say that "this supports the normal DAS > query scheme but also supports extension vocabulary. For example, > > > > > > > With this a client knows that the query_url supports the normal > DAS queries and also supports the "annotator", "annotation_before" > and "annotation_after" queries, like this > > .../features?annotator=Andrew;annotation_before=2005 > > Possible idea: if there is no SUPPORTs tag then the server > implements no search syntax and instead returns everything, > for the example Roy mentioned. > > Okay, we're off to lunch. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Tue Feb 7 11:46:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 7 Feb 2006 11:46:47 -0500 Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: <200602071146.48212.lstein@cshl.edu> Hi, I have group meeting from 12-1 every Tuesday, so I can't make this one. I'll be present for the telecon Wednesday at 12. Lincoln On Tuesday 07 February 2006 10:34, Helt,Gregg wrote: > I talked to Suzi, she's planning to join our teleconference today to > discuss ontologies, wearing her hat as co-PI of the National Center for > Biomedical Ontology. Hopefully Lincoln can join too. > > I took a closer look at the DAS/2 ontology work Allen has done (see > http://biodas.org/documents/das2/das2_ontology.html). I urge anyone who > wants to contribute to the ontology discussion to read this doc. It > specifies a way to retrieve ontologies in OBOXML format. In this format > each ontology term gets an absolute URI through the same mechanism that > the rest of DAS/2 uses (URIs for ids, which can be either absolute or > relative but resolvable). As Allen pointed out yesterday this would > solve our problem of how to uniquely specify ontology terms in the DAS/2 > TYPES XML. > > I couldn't find any documentation for the OBOXML format, other than the > code that generates it from OBO files. But I'm using OBOXML as an > example here because it clearly has resolvable URIs for each ontology > term. In Allen's spec, ontologies can also be returned in other > formats, but it's unclear to me whether terms in these other formats > would resolve to similar URIs. > > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > > [mailto:das2-bounces at portal.open- > > > bio.org] On Behalf Of Andrew Dalke > > Sent: Tuesday, February 07, 2006 1:32 AM > > To: DAS/2 > > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > > sprint,6 Feb 2006 > > > > > gh: would like a re-cast as xml document, hosted at so/sofa > > > website. that xml would be like a std ontology representation so you > > > could extend it. so someone could point to an extension of it. > > > > I asked as an action item if Gregg would look into the solution > > for this. Do we refer to the ontology by a "GO:0123456" identifier > > or by some URL scheme? If so, what's the mapping from URL scheme > > to something that clients and people can understand, eg, to > > ask for everything which is an exon? > > > > Does this mapping need a version number - does it change over time? > > > > Andrew > > dalke at dalkescientific.com > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Tue Feb 7 11:50:56 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 16:50:56 +0000 Subject: [DAS2] query_api and server layout Message-ID: Continuing from yesterday's discussion... There are several things in a DAS server - there is the list of all sources and versions - there is a list of all versions for a source - there is the versioned source information The versioned source only really provides a bit of overall configuration information and links to three URLs: - the query interface for features - the query interface for types - the query interface for segments It doesn't say anything about where the actual feature, type and segment data is stored. It doesn't even mean that the query URLs are on the same machine as the versioned source document. Hence Andreas can have his registry server. DAS defines what those queries do. The segments query URL interface can be a shared reference server. It has a rather simple interface: - get URLs and information for each segment - given a sequence URL return the sequence data - return the assembly data The segment and sequence data does not need to be on the same machine as the segments query URL. It likely will be but does not need to be. DAS defines what the types interface does. At present it is also very simple. Be default it lists everything, or you can ask it for an "ontology" or (proposed new query) "exact_ontology", and it returns all DAS types which match that request. The actual DAS type data does not need to be on the same server has the DAS query URL, though again it probably will be. The types query URL does not need to be on the same machine as the segments query URL. Similarly, the features query URL implements the DAS query interface and returns a list of features. The actual features do not need to be on the same machine or directory location as the feature query, or the types, or the segments. Here are some possible reasons for the different locations: Common case: - segments query URL and segments data on a reference server - versioned source provides its own types and features New genome / internal project: - database implements all three query URLs Registry server: - each versioned source entry points to the original machine's values for the segments, types and features query URLs Multiple versions database, shared types: - segments points to the reference server - all versioned sources "types" query url point to the same URL - each versioned source gets it own features query old-style CGI-based web server: - the "segments" query url points to the reference server - the individual features, types and sources are ".xml" files in the file system - the query URLs end with ".cgi" and start a CGI script If we say that the URL for doing a types query is composed as: + "/" (if missing) + "types" then at the very least we preclude CGI-based servers. No big deal perhaps? It also makes things slightly more duplicitous when several versions of the database share the same DAS "types" (and "segments"). I also think using a server-provided URL is easier than constructing the URL in code. Get the "query_url", perhaps resolved by the xml:base. That's it. No need to add in the "/types". Gregg worries about the network performance of having because each location has the full URL to another server and the type in this case refers to a types collection shared by all of the versions of the source. I've thought about that for a while. It's a reasonable and serious architectural concern. I think the right response is that that's an architecture decision we should leave up to the data provider. If Gregg wants more compact XML and that on-the-fly compression slows things down too much then his DAS server can make the segments, types and features all be not only on the same machine but in the same directory. The following is valid (omitting some required parts) The features request can return GET /h_sapiens/v1/features In this architecture, features start with an 'F', like /h_sapiens/v1/F12345 types start with a 'T', like /h_sapiens/v1/Tabcde and regions start with a 'C', like /h_sapiens/v1/S1 This is about as compact as I think you can make it, yet it still fits into the current DAS spec. (You don't even need the special character - it only makes it easier to see that the names/URLs will never collide.) Andrew dalke at dalkescientific.com From lstein at cshl.edu Tue Feb 7 11:51:55 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 7 Feb 2006 11:51:55 -0500 Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: <200602071151.56939.lstein@cshl.edu> Allen's ideas seem very sensible and easy to manage. We can already serve associations between genomic features and GO terms via properties, so the concerns expressed in the discussion section about the big GO API shouldn't apply. Lincoln On Tuesday 07 February 2006 10:34, Helt,Gregg wrote: > I talked to Suzi, she's planning to join our teleconference today to > discuss ontologies, wearing her hat as co-PI of the National Center for > Biomedical Ontology. Hopefully Lincoln can join too. > > I took a closer look at the DAS/2 ontology work Allen has done (see > http://biodas.org/documents/das2/das2_ontology.html). I urge anyone who > wants to contribute to the ontology discussion to read this doc. It > specifies a way to retrieve ontologies in OBOXML format. In this format > each ontology term gets an absolute URI through the same mechanism that > the rest of DAS/2 uses (URIs for ids, which can be either absolute or > relative but resolvable). As Allen pointed out yesterday this would > solve our problem of how to uniquely specify ontology terms in the DAS/2 > TYPES XML. > > I couldn't find any documentation for the OBOXML format, other than the > code that generates it from OBO files. But I'm using OBOXML as an > example here because it clearly has resolvable URIs for each ontology > term. In Allen's spec, ontologies can also be returned in other > formats, but it's unclear to me whether terms in these other formats > would resolve to similar URIs. > > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > > [mailto:das2-bounces at portal.open- > > > bio.org] On Behalf Of Andrew Dalke > > Sent: Tuesday, February 07, 2006 1:32 AM > > To: DAS/2 > > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > > sprint,6 Feb 2006 > > > > > gh: would like a re-cast as xml document, hosted at so/sofa > > > website. that xml would be like a std ontology representation so you > > > could extend it. so someone could point to an extension of it. > > > > I asked as an action item if Gregg would look into the solution > > for this. Do we refer to the ontology by a "GO:0123456" identifier > > or by some URL scheme? If so, what's the mapping from URL scheme > > to something that clients and people can understand, eg, to > > ask for everything which is an exon? > > > > Does this mapping need a version number - does it change over time? > > > > Andrew > > dalke at dalkescientific.com > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Gregg_Helt at affymetrix.com Tue Feb 7 11:54:39 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Tue, 7 Feb 2006 08:54:39 -0800 Subject: [DAS2] Proposed agenda for DAS/2 code sprint teleconference, Tuesday Feb 7 Message-ID: Vote on how to construct URLs to query for segments, types, features: 1.) specified by query_id 2.) hardwired to ~/segments, ~/types, ~/features 3.) ? Status Report Integrating sequence ontology with DAS/2 (and possibly other ontologies) Feature properties and queries over properties MAINTAINER information Use of xml:base ? From dalke at dalkescientific.com Tue Feb 7 12:01:38 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 17:01:38 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com> Allen > The XML is now as you requested, please confirm. Missing the namespace declaration. You have should be The element goes after the CATEGORY. (Which I want to rename back to CAPABILITY.) The ASSEMBLY element no longer exists. Fixing those by hand, * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: error: attribute "writeable" not allowed at this point; ignored * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: error: attribute "taxon" not allowed at this point; ignored There is no more 'writeable' (that's, IMO) something to be decided as part of the writeback spec. It might be that we have a and the existence of that indicate writeability. It's also "taxid" and not "taxon". I used "taxid" because that's what NCBI uses for their data. > There are probably also some other Content-Type headers that need to be > changed for the other responses -- let me know if you spot them. Haven't gotten that far yet. Andrew dalke at dalkescientific.com From allenday at ucla.edu Tue Feb 7 12:25:03 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 7 Feb 2006 09:25:03 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com> Message-ID: On Tue, 7 Feb 2006, Andrew Dalke wrote: > Allen > > The XML is now as you requested, please confirm. > > Missing the namespace declaration. You have > > > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://das.biopackages.net/das/genome/"> > > should be > > xmlns="http://www.biodas.org/ns/das/genome/2.00" > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://das.biopackages.net/das/genome/"> done > > The element goes after the CATEGORY. (Which I want to > rename back to CAPABILITY.) done > > The ASSEMBLY element no longer exists. done > > Fixing those by hand, > > * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: > error: attribute "writeable" not allowed at this point; ignored > * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: > error: attribute "taxon" not allowed at this point; ignored > > There is no more 'writeable' (that's, IMO) something to be decided > as part of the writeback spec. It might be that we have a > > > > and the existence of that indicate writeability. i have not made the change if this is an IMO. > > It's also "taxid" and not "taxon". I used "taxid" because that's > what NCBI uses for their data. done -Allen > > > There are probably also some other Content-Type headers that need to be > > changed for the other responses -- let me know if you spot them. > > Haven't gotten that far yet. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From ap3 at sanger.ac.uk Tue Feb 7 12:44:41 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 7 Feb 2006 17:44:41 +0000 Subject: [DAS2] toy - das2 registry Message-ID: Hi! A "toy" das2 registry serving das1 servers, via das2 responses can be accessed at http://www.spice-3d.org/dasregistry/das2/sources/ I will work on adding the first das2 servers tomorrow. Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From cjm at fruitfly.org Tue Feb 7 12:29:09 2006 From: cjm at fruitfly.org (Chris Mungall) Date: Tue, 7 Feb 2006 09:29:09 -0800 (PST) Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: Hi all I'm concerned that the XML in the URL below isn't quite Obo-XML, it's Allen's modified version of it. In particular, the adding of an "id" attribute which is redundant with the id element, and the modification of the ID scheme to use slashes instead of :s. I believe the latter may have been to make the ID scheme more DAS-y? OBO IDs are composed of a prefix and a local ID. These are always joined with a :. The prefix can be specified as shortform (eg GO) or a URI prefix. When the long form is combined with the local ID you get your URI. If DAS wants to use a modified version of Obo-XML, that's fine, but please don't call it Obo-XML, it will cause huge confusion! I would much prefer if you used Obo-XML as it is - if there are things you'd like to see changed about the format we can perhaps work that out. I'm concerned by the changing the ID to use / instead of :. This is wrong, and if it's something that's required for DAS, how will you interoperate with RDF etc? In fact there are other parts where the xml is definitely not Obo-XML - it looks like Allen has coded these by hand rather than taking existing XML. That's fine, but it should be marked as such. For example, there is no develops_from element in Obo-XML; all relations bar is_a are encoded as relationship elements. There is a DTD at the moment http://www.godatabase.org/dev/xml/dtd The docs are minimal as the explanation of all the fields is in the docs for the obo text file format http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf} We'll be converting to RNG+XSD soon You can get Obo-XML examples from http://www.fruitfly.org/~cjm/obo-download You can see the default rule for creating a URI in the OWL files; these currently all get the geneontology.org URI prefix by default, but this will change (we were going to use LSIDs but the majority of OWL tools don't seem to handle URNs very well) As far as DAS/2 supporting different file formats, Obo-XML and RDFS/OWL would seem to be the natural contenders. We currently go from the former to the latter via a simple XSLT, the reverse transformation is a little more difficult. Allen has inlined some comments from an email exchange with me in the document. I agree about keeping the API minimal. On the other hand you will need at least some inferencing machinery - I'd encourage you to reuse existing reasoning services here. Cheers Chris On Tue, 7 Feb 2006, Helt,Gregg wrote: > I talked to Suzi, she's planning to join our teleconference today to > discuss ontologies, wearing her hat as co-PI of the National Center for > Biomedical Ontology. Hopefully Lincoln can join too. > > I took a closer look at the DAS/2 ontology work Allen has done (see > http://biodas.org/documents/das2/das2_ontology.html). I urge anyone who > wants to contribute to the ontology discussion to read this doc. It > specifies a way to retrieve ontologies in OBOXML format. In this format > each ontology term gets an absolute URI through the same mechanism that > the rest of DAS/2 uses (URIs for ids, which can be either absolute or > relative but resolvable). As Allen pointed out yesterday this would > solve our problem of how to uniquely specify ontology terms in the DAS/2 > TYPES XML. > > I couldn't find any documentation for the OBOXML format, other than the > code that generates it from OBO files. But I'm using OBOXML as an > example here because it clearly has resolvable URIs for each ontology > term. In Allen's spec, ontologies can also be returned in other > formats, but it's unclear to me whether terms in these other formats > would resolve to similar URIs. > > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > [mailto:das2-bounces at portal.open- > > bio.org] On Behalf Of Andrew Dalke > > Sent: Tuesday, February 07, 2006 1:32 AM > > To: DAS/2 > > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > > sprint,6 Feb 2006 > > > > > gh: would like a re-cast as xml document, hosted at so/sofa > > > website. that xml would be like a std ontology representation so you > > > could extend it. so someone could point to an extension of it. > > > > I asked as an action item if Gregg would look into the solution > > for this. Do we refer to the ontology by a "GO:0123456" identifier > > or by some URL scheme? If so, what's the mapping from URL scheme > > to something that clients and people can understand, eg, to > > ask for everything which is an exon? > > > > Does this mapping need a version number - does it change over time? > > > > Andrew > > dalke at dalkescientific.com > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From cjm at fruitfly.org Tue Feb 7 12:32:24 2006 From: cjm at fruitfly.org (chris mungall) Date: Tue, 7 Feb 2006 09:32:24 -0800 Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: <200602071151.56939.lstein@cshl.edu> References: <200602071151.56939.lstein@cshl.edu> Message-ID: What inferencing rules do you use for fetching features by their Ontology_terms? On Feb 7, 2006, at 8:51 AM, Lincoln Stein wrote: > Allen's ideas seem very sensible and easy to manage. We can already > serve > associations between genomic features and GO terms via properties, so > the > concerns expressed in the discussion section about the big GO API > shouldn't > apply. > > Lincoln > > On Tuesday 07 February 2006 10:34, Helt,Gregg wrote: >> I talked to Suzi, she's planning to join our teleconference today to >> discuss ontologies, wearing her hat as co-PI of the National Center >> for >> Biomedical Ontology. Hopefully Lincoln can join too. >> >> I took a closer look at the DAS/2 ontology work Allen has done (see >> http://biodas.org/documents/das2/das2_ontology.html). I urge anyone >> who >> wants to contribute to the ontology discussion to read this doc. It >> specifies a way to retrieve ontologies in OBOXML format. In this >> format >> each ontology term gets an absolute URI through the same mechanism >> that >> the rest of DAS/2 uses (URIs for ids, which can be either absolute or >> relative but resolvable). As Allen pointed out yesterday this would >> solve our problem of how to uniquely specify ontology terms in the >> DAS/2 >> TYPES XML. >> >> I couldn't find any documentation for the OBOXML format, other than >> the >> code that generates it from OBO files. But I'm using OBOXML as an >> example here because it clearly has resolvable URIs for each ontology >> term. In Allen's spec, ontologies can also be returned in other >> formats, but it's unclear to me whether terms in these other formats >> would resolve to similar URIs. >> >> gregg >> >>> -----Original Message----- >>> From: das2-bounces at portal.open-bio.org >> >> [mailto:das2-bounces at portal.open- >> >>> bio.org] On Behalf Of Andrew Dalke >>> Sent: Tuesday, February 07, 2006 1:32 AM >>> To: DAS/2 >>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code >>> sprint,6 Feb 2006 >>> >>>> gh: would like a re-cast as xml document, hosted at so/sofa >>>> website. that xml would be like a std ontology representation so you >>>> could extend it. so someone could point to an extension of it. >>> >>> I asked as an action item if Gregg would look into the solution >>> for this. Do we refer to the ontology by a "GO:0123456" identifier >>> or by some URL scheme? If so, what's the mapping from URL scheme >>> to something that clients and people can understand, eg, to >>> ask for everything which is an exon? >>> >>> Does this mapping need a version number - does it change over time? >>> >>> Andrew >>> dalke at dalkescientific.com >>> >>> _______________________________________________ >>> DAS2 mailing list >>> DAS2 at portal.open-bio.org >> >> _______________________________________________ >> DAS2 mailing list >> DAS2 at portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/das2 > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From dalke at dalkescientific.com Tue Feb 7 13:40:56 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 18:40:56 +0000 Subject: [DAS2] category -> capability Message-ID: <98a28be1166142c23be61650f51b66ae@dalkescientific.com> I've made the commit. The element SOURCES/SOURCE/VERSION/CATEGORY is now (in some shallow and some deep sense) back to SOURCES/SOURCE/VERSION/CAPABILITY Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Tue Feb 7 14:00:40 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Tue, 7 Feb 2006 11:00:40 -0800 Subject: [DAS2] Working with xml:base in Java? Message-ID: Thomas, I'm wondering what toolkits you're using for binding XML to Java objects? And particularly how you are dealing with resolving URIs when xml:base is used. So far I've mostly used various implementations of SAX and DOM -- I've found some reports of builtin xml:base support in Xerces SAX/DOM, but it's still unclear. I've been avoiding the issue up till now. It won't be too hard to implement URI resolution relative to xml:base, but I thought I'd check around first and see if there's automated support of this in some toolkit. Thanks, Gregg From dalke at dalkescientific.com Tue Feb 7 14:11:09 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 19:11:09 +0000 Subject: [DAS2] toy - das2 registry In-Reply-To: References: Message-ID: <551a60258c89cd953f35c6a4450a444d@dalkescientific.com> Andreas Prlic wrote: > A "toy" das2 registry serving das1 servers, via das2 responses can > be accessed at > > http://www.spice-3d.org/dasregistry/das2/sources/ > > I will work on adding the first das2 servers tomorrow. There are differences between this and the spec. These are "CATEGORY" -> "CAPABILITIES" Andreas knew that but didn't get it changed before having to head out for a bit. "testcode" should be "test_range" - it was added this afternoon but I changed the name on Andreas. (He agreed to the change.) # this is range string (eg, "Chr1/1:100" or "CloneABC123/500:599") # used in an "inside=" feature query. It is used by the registry # server when doing a heartbeat check. attribute test_range { text }?, The underlying problem is that a web server can be up while the back-end database is down. While a server should report that as an error, sadly that's not always the case. This test_range is used by Andreas registry server in a periodic feature query. It should return a "reasonable" number of features. I decided to make it part of the spec for two reasons: - it simplifies auto-fill-in during registry discovery - clients can also use it to query the server and see if it's really alive or if it really means to return an empty list of features all the time. It is optional. The MAINTAINER "name" was required. Andreas has examples where there is only an email address and wants the name to be optional. So now "name", "email" and "href" are all optional. I would like that one must be provided. Finally, the "taxid" in the COORDINATES is optional. The RNG schema thought it was mandatory. I've updated the schemas and the spec for the last two. Committed. Looks like I'll be spending most of tomorrow updating the rest of the spec document. I got a copy of Andreas' document and edited it to meet the current spec and I've checked it in under "scratch/registry_sources.xml" Feel free to test it out with your parsers. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Feb 7 14:28:49 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 19:28:49 +0000 Subject: [DAS2] format version Message-ID: <4cd0c60fb7871ad6a70ad2b25cb73406@dalkescientific.com> Just committed to the spec. If I'm wrong and the version number proves useful, I'll make it less snarky. :) This document defines several new content-types. These are application/x-das-sources+xml application/x-das-features+xml application/x-das-types+xml application/x-das-segments+xml A server may supply an optional "version" value for the Content-Type, to specify which version of the specification it provides. This is (at present and unless others can convince me otherwise) meant to be used only during this period of specification development while things are in flux. A client can look at the version string and use an appropriate reader to handle it. Example: Content-Type: application/x-das-types+xml; version=1 The list of versions is as follows: 601071920: this version The versions will be increasing integers. The format will be "YMMDDHHMM" where "Y" is the year - 2005. (This makes it a 32 bit integer, in case you were wondering.) There's no way this spec will be in flux in 4 years time. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Feb 7 14:14:15 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 19:14:15 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com> Message-ID: <7ca6d9841d7c3c334589da147c38de53@dalkescientific.com> >> There is no more 'writeable' (that's, IMO) something to be decided >> as part of the writeback spec. It might be that we have a > i have not made the change if this is an IMO. Okay. There is no "writeable". The writeability is determined by the element. If there is a CAPABILITY with a type == "locks" then the server is (potentially) writeable in the same way that "writeable='yes'" means that it's writeable. Anyone else have an O? Andrew dalke at dalkescientific.com From ed_erwin at affymetrix.com Tue Feb 7 15:46:01 2006 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Tue, 07 Feb 2006 12:46:01 -0800 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: <7ca6d9841d7c3c334589da147c38de53@dalkescientific.com> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com> <7ca6d9841d7c3c334589da147c38de53@dalkescientific.com> Message-ID: <43E90709.6060602@affymetrix.com> This is something we should discuss when we discuss the 'writeable' parts of the spec. But in my opinion, 'writeable' and 'lockable' are two separate 's. I see no reason not to allow some implementers to develop simple servers that are writeable but don't implement a locking mechanism. Large public servers may want locking, but I'd bet that a non-locking server would very rarely lead to problems, especially in small projects. (If the server is non-locking, the client could add a little more logic to check that nothing has changed since the last retrieval before doing a commit.) Andrew Dalke wrote: >>> There is no more 'writeable' (that's, IMO) something to be decided >>> as part of the writeback spec. It might be that we have a > > >> i have not made the change if this is an IMO. > > > Okay. There is no "writeable". The writeability is determined > by the element. If there is a CAPABILITY with > a type == "locks" then the server is (potentially) writeable > in the same way that "writeable='yes'" means that it's writeable. > > Anyone else have an O? > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From allenday at ucla.edu Tue Feb 7 16:20:53 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 7 Feb 2006 13:20:53 -0800 (PST) Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: Hi Chris, On Tue, 7 Feb 2006, Chris Mungall wrote: > > Hi all > > I'm concerned that the XML in the URL below isn't quite Obo-XML, it's > Allen's modified version of it. In particular, the adding of an "id" > attribute which is redundant with the id element, and the modification of > the ID scheme to use slashes instead of :s. > > I believe the latter may have been to make the ID scheme more DAS-y? The slash was introduced to take advantage of xml:base and the hierarchical relationship between namespaces and terms, e.g. xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001" is equivalent to: /das/ontology/obo/1/ontology/SO/0000001 If we want the identifier to be SO:0000001, it means that we have to make xml:base="/das/ontology/obo/1/ontology/SO. This is problematic for two reasons: 1) multiple xml:base cannot be defined for the entire document, meaning that URIs for other records referenced become very long. 2) different ontologies cannot use the same xml:base The only way I see out of this ATM is to treat : as a / internal to the Ontology-DAS service. > OBO IDs are composed of a prefix and a local ID. These are always joined > with a :. The prefix can be specified as shortform (eg GO) or a URI > prefix. When the long form is combined with the local ID you get your URI. > > If DAS wants to use a modified version of Obo-XML, that's fine, but please > don't call it Obo-XML, it will cause huge confusion! > > I would much prefer if you used Obo-XML as it is - if there are things > you'd like to see changed about the format we can perhaps work that out. > I'm concerned by the changing the ID to use / instead of :. This is wrong, > and if it's something that's required for DAS, how will you interoperate > with RDF etc? > > In fact there are other parts where the xml is definitely not Obo-XML - it > looks like Allen has coded these by hand rather than taking existing XML. > That's fine, but it should be marked as such. For example, there is no > develops_from element in Obo-XML; all relations bar is_a are encoded as > relationship elements. The XML provided by the Ontology-DAS server is using templates to mark up ontology records that have been loaded to a chado database using perl-go-perl. The develops_from node, IIRC, was created because there is a section in a perl-go-perl .xslt that creates elements for all relationship types. > > There is a DTD at the moment > http://www.godatabase.org/dev/xml/dtd This didn't exist at the time I wrote my templates ( 4-6 months ago), or I would have validated. -Allen > > The docs are minimal as the explanation of all the fields is in the docs > for the obo text file format > http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf} > > We'll be converting to RNG+XSD soon > > You can get Obo-XML examples from > http://www.fruitfly.org/~cjm/obo-download > > You can see the default rule for creating a URI in the OWL files; these > currently all get the geneontology.org URI prefix by default, but this > will change (we were going to use LSIDs but the majority of OWL tools > don't seem to handle URNs very well) > > As far as DAS/2 supporting different file formats, Obo-XML and RDFS/OWL > would seem to be the natural contenders. We currently go from the former > to the latter via a simple XSLT, the reverse transformation is a little > more difficult. > > Allen has inlined some comments from an email exchange with me in the > document. I agree about keeping the API minimal. On the other hand you > will need at least some inferencing machinery - I'd encourage you to reuse > existing reasoning services here. > > Cheers > Chris > > On Tue, 7 Feb 2006, Helt,Gregg wrote: > > > I talked to Suzi, she's planning to join our teleconference today to > > discuss ontologies, wearing her hat as co-PI of the National Center for > > Biomedical Ontology. Hopefully Lincoln can join too. > > > > I took a closer look at the DAS/2 ontology work Allen has done (see > > http://biodas.org/documents/das2/das2_ontology.html). I urge anyone who > > wants to contribute to the ontology discussion to read this doc. It > > specifies a way to retrieve ontologies in OBOXML format. In this format > > each ontology term gets an absolute URI through the same mechanism that > > the rest of DAS/2 uses (URIs for ids, which can be either absolute or > > relative but resolvable). As Allen pointed out yesterday this would > > solve our problem of how to uniquely specify ontology terms in the DAS/2 > > TYPES XML. > > > > I couldn't find any documentation for the OBOXML format, other than the > > code that generates it from OBO files. But I'm using OBOXML as an > > example here because it clearly has resolvable URIs for each ontology > > term. In Allen's spec, ontologies can also be returned in other > > formats, but it's unclear to me whether terms in these other formats > > would resolve to similar URIs. > > > > gregg > > > > > -----Original Message----- > > > From: das2-bounces at portal.open-bio.org > > [mailto:das2-bounces at portal.open- > > > bio.org] On Behalf Of Andrew Dalke > > > Sent: Tuesday, February 07, 2006 1:32 AM > > > To: DAS/2 > > > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > > > sprint,6 Feb 2006 > > > > > > > gh: would like a re-cast as xml document, hosted at so/sofa > > > > website. that xml would be like a std ontology representation so you > > > > could extend it. so someone could point to an extension of it. > > > > > > I asked as an action item if Gregg would look into the solution > > > for this. Do we refer to the ontology by a "GO:0123456" identifier > > > or by some URL scheme? If so, what's the mapping from URL scheme > > > to something that clients and people can understand, eg, to > > > ask for everything which is an exon? > > > > > > Does this mapping need a version number - does it change over time? > > > > > > Andrew > > > dalke at dalkescientific.com > > > > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at portal.open-bio.org > > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From cjm at fruitfly.org Tue Feb 7 16:59:12 2006 From: cjm at fruitfly.org (chris mungall) Date: Tue, 7 Feb 2006 13:59:12 -0800 Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: On Feb 7, 2006, at 1:20 PM, Allen Day wrote: > Hi Chris, > > On Tue, 7 Feb 2006, Chris Mungall wrote: > >> >> Hi all >> >> I'm concerned that the XML in the URL below isn't quite Obo-XML, it's >> Allen's modified version of it. In particular, the adding of an "id" >> attribute which is redundant with the id element, and the >> modification of >> the ID scheme to use slashes instead of :s. >> >> I believe the latter may have been to make the ID scheme more DAS-y? > > The slash was introduced to take advantage of xml:base and the > hierarchical relationship between namespaces and terms, e.g. > > xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001" > > is equivalent to: > > /das/ontology/obo/1/ontology/SO/0000001 it's actually equivalent to: /das/ontology/obo/1/ontologySO/0000001 > If we want the identifier to be SO:0000001, it means that we have to > make > xml:base="/das/ontology/obo/1/ontology/SO. This is problematic for two > reasons: > > 1) multiple xml:base cannot be defined for the entire document, > meaning > that URIs for other records referenced become very long. Why not just define a qname for every idspace? This is the standard way of doing this in XML Using xml:base is not a big gain for brevity, since fairly soon some obo ontologies will reference other obo ontologies. In fact is this even as issue if you get rid of the id attribute to conform to obo-xml? ids in obo-xml are encoded as elements, so xml:base rules are not applied. Obo has it's own rules for ID generation. This has the arguable disadvantage that we can't directly use xml:base and the whole xml namespace system for OBO IDs, we layer our own system on top. This is actually preferable for us. > 2) different ontologies cannot use the same xml:base > > The only way I see out of this ATM is to treat : as a / internal to the > Ontology-DAS service. I'm still not sure what the problem is, and I think you may be stuck anyway when it comes to RDF/OWL ontologies > >> OBO IDs are composed of a prefix and a local ID. These are always >> joined >> with a :. The prefix can be specified as shortform (eg GO) or a URI >> prefix. When the long form is combined with the local ID you get your >> URI. >> >> If DAS wants to use a modified version of Obo-XML, that's fine, but >> please >> don't call it Obo-XML, it will cause huge confusion! >> >> I would much prefer if you used Obo-XML as it is - if there are things >> you'd like to see changed about the format we can perhaps work that >> out. >> I'm concerned by the changing the ID to use / instead of :. This is >> wrong, >> and if it's something that's required for DAS, how will you >> interoperate >> with RDF etc? >> >> In fact there are other parts where the xml is definitely not Obo-XML >> - it >> looks like Allen has coded these by hand rather than taking existing >> XML. >> That's fine, but it should be marked as such. For example, there is no >> develops_from element in Obo-XML; all relations bar is_a are encoded >> as >> relationship elements. > > The XML provided by the Ontology-DAS server is using templates to mark > up > ontology records that have been loaded to a chado database using > perl-go-perl. The develops_from node, IIRC, was created because there > is > a section in a perl-go-perl .xslt that creates elements for all > relationship types. hmmm, I don't think so, but the point is moot anyway, just so long as the final version uses xml that validates, either against obo-xml or your own documented variant > >> >> There is a DTD at the moment >> http://www.godatabase.org/dev/xml/dtd > > This didn't exist at the time I wrote my templates ( 4-6 months ago), > or I > would have validated. it did, it's just not well signposted! sorry about that look forward to seeing a demo. I do this you have to work out the semantics of retrieval by ontology term though. cheers chris > > -Allen > > > >> >> The docs are minimal as the explanation of all the fields is in the >> docs >> for the obo text file format >> http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf} >> >> We'll be converting to RNG+XSD soon >> >> You can get Obo-XML examples from >> http://www.fruitfly.org/~cjm/obo-download >> >> You can see the default rule for creating a URI in the OWL files; >> these >> currently all get the geneontology.org URI prefix by default, but this >> will change (we were going to use LSIDs but the majority of OWL tools >> don't seem to handle URNs very well) >> >> As far as DAS/2 supporting different file formats, Obo-XML and >> RDFS/OWL >> would seem to be the natural contenders. We currently go from the >> former >> to the latter via a simple XSLT, the reverse transformation is a >> little >> more difficult. >> >> Allen has inlined some comments from an email exchange with me in the >> document. I agree about keeping the API minimal. On the other hand you >> will need at least some inferencing machinery - I'd encourage you to >> reuse >> existing reasoning services here. >> >> Cheers >> Chris >> >> On Tue, 7 Feb 2006, Helt,Gregg wrote: >> >>> I talked to Suzi, she's planning to join our teleconference today to >>> discuss ontologies, wearing her hat as co-PI of the National Center >>> for >>> Biomedical Ontology. Hopefully Lincoln can join too. >>> >>> I took a closer look at the DAS/2 ontology work Allen has done (see >>> http://biodas.org/documents/das2/das2_ontology.html). I urge anyone >>> who >>> wants to contribute to the ontology discussion to read this doc. It >>> specifies a way to retrieve ontologies in OBOXML format. In this >>> format >>> each ontology term gets an absolute URI through the same mechanism >>> that >>> the rest of DAS/2 uses (URIs for ids, which can be either absolute or >>> relative but resolvable). As Allen pointed out yesterday this would >>> solve our problem of how to uniquely specify ontology terms in the >>> DAS/2 >>> TYPES XML. >>> >>> I couldn't find any documentation for the OBOXML format, other than >>> the >>> code that generates it from OBO files. But I'm using OBOXML as an >>> example here because it clearly has resolvable URIs for each ontology >>> term. In Allen's spec, ontologies can also be returned in other >>> formats, but it's unclear to me whether terms in these other formats >>> would resolve to similar URIs. >>> >>> gregg >>> >>>> -----Original Message----- >>>> From: das2-bounces at portal.open-bio.org >>> [mailto:das2-bounces at portal.open- >>>> bio.org] On Behalf Of Andrew Dalke >>>> Sent: Tuesday, February 07, 2006 1:32 AM >>>> To: DAS/2 >>>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code >>>> sprint,6 Feb 2006 >>>> >>>>> gh: would like a re-cast as xml document, hosted at so/sofa >>>>> website. that xml would be like a std ontology representation so >>>>> you >>>>> could extend it. so someone could point to an extension of it. >>>> >>>> I asked as an action item if Gregg would look into the solution >>>> for this. Do we refer to the ontology by a "GO:0123456" identifier >>>> or by some URL scheme? If so, what's the mapping from URL scheme >>>> to something that clients and people can understand, eg, to >>>> ask for everything which is an exon? >>>> >>>> Does this mapping need a version number - does it change over time? >>>> >>>> Andrew >>>> dalke at dalkescientific.com >>>> >>>> _______________________________________________ >>>> DAS2 mailing list >>>> DAS2 at portal.open-bio.org >>> >>> >>> _______________________________________________ >>> DAS2 mailing list >>> DAS2 at portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/das2 >>> >> >> >> _______________________________________________ >> DAS2 mailing list >> DAS2 at portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/das2 >> From Steve_Chervitz at affymetrix.com Tue Feb 7 19:30:52 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Tue, 07 Feb 2006 16:30:52 -0800 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 7 Feb 2006 Message-ID: Notes from the DAS/2 teleconference for the code sprint, 7 Feb 2006 $Id: das2-teleconf-2006-02-07.txt,v 1.1 2006/02/08 00:37:41 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt Sanger: Andreas Prlic, Thomas Down Sweden: Andrew Dalke UC Berkeley: Nomi Harris, Suzi Lewis UCLA: Allen Day Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: * Vote on constructing URLs/URIs to query segments, types, features * Status report from people * Ontologies * Feat property changes Topic: Constructing URLS/URIs to query segments, types, features ---------------------------------------------------------------- 1.) specified by query_id 2.) hardwired to ~/segments, ~/types, ~/features 3.) ? ad: lots of people have left here so the vote won't include all. see email why a query url is useful agree w/ gregg: short names could be a nice to have. shouldn't have to worry about how you organize your urls gh: yes it does: this/types this/segments etc. ad: can take it out if there's confusion gh: recommended structure is good. ee/gh: people will look at the examples and do it that way. they won't look at .rnc file gh: make it clearer in the spec that these are merely suggestions of the hierarchy, you don't have to do it this way. ad: roy's view: likes the query id url for doing search for all featues, or all types. query id is the url used to do search against features. uri could be relative or absolute. gh: category element defines a query id for a subset of das. it's the attribute query id in the category ad: I also want to rename category back to capability. how do we arrange urls in a versioned source. construction off of strings or via attributes in a url gh: votes for hardwired, but feels less strong today about it. ad: majority vote is for query id, spec czar goes with that. [A] query id [A] andrew will update spec to have less mention of hierarchical structure [A] allen will update server to do it the recommended way gh: in addition to have an arbitrary query id to get segments, types, features, there's a recommended way to do it via the hierarchy. server should do it the recommended way (hierarchy) ee: we should be flexible about it. gh/ad: ok take out recommendation. Topic: Status reports --------------------- ad: see his emails. gh: we need examples in spec document and scratch to be better synchronized. ad: should be, i've been trying to keep these in sync. gh: plan to push into html, incorporate scratch into doc? ad: yes, eventually. will also add andreas' work to scratch too. td: java xml binding libraries, how to put it into a workable server ap: das registry, sources command, attribute handling, people can connect to a toy server publically available. gh: registry will respond? ap: yes. toy server, toy data like das1, returning sources command. gh: can you add allen's codesprint server? consider it registered. ap: is fully working? gh: can allen send a command to it to register it? ap: no. gh: would like to tell my client to do discovery rather than hard wiring. gh: comits to igb das/2 client to handle seq, segment, types. not features query yet. given decision about url construction, can do this fast so we can test on codesprint server seq, seg, types to bring up something meaningful in gui. not features by today. affy das/2 server is running behind. will sync up today as well. nh: apollo working out sequence, segment, types request. now does versioned sources. integrating those into query gui as well. aday: changes early this am. server running under /codesprint is now a static doc pointing back to the old server. adding segment command, merging region and seq command. has made everything except capabilities writeback stuff. ad: there's another request recently, see my email. aday: have gotten 40 emails from you in the last day! aday: brian oconnor is working on bundling dependencies for an rpm based release. gh: I also did significant refactoring/moving assay/ontology stuff into subclasses on client side. haven't seen brian's code, but should run fine. Topic: Integrating Sequence Ontology with DAS/2 ----------------------------------------------- suzi: national center for biomedical ontology, one of 7 natl centers for biomedical computing. focus on needs regarding developing and using ontologies. gh: hoping to have a typing system in das/2 via types queries that references SO but doesn't require client to fully understand ontologies. too much of a burden. that's the challenge. this translates into referring to ontology terms as opaque uris suzi: 'understands' means they're ignoring any relationships between types. gh: yes. currently type has attrib for id, attrib for ontology. ad: uri or arbitrary string suzi: can use uri or string, preprocessed ad: one or the other gh: prefers uri suzi: from uri you can get the string gh: not clear how to construct uri for particular terms in an ontology doc suzi: this will happen in next few months. talking with daniel rubin about this. gh: this is where allen comes in. ontology das. aday: next step is getting it hosted on NCBO server. currently communicating with chris mungall. said they're planning on implementing something similar soon, not sure if they'd accept allen's solution. unclear. working with gavin sherlock on ontology support for microarry samples, tissue type, phenotype. was hoping people could pick this up and use it. suzi: gavin and I could help push this. gh: chris m posted concerns about obo xml that's in allen's scheme isn't same as what he's using. re: how you make absolution uris. aday: there's not much docs on obo xml format. did the best I could. suzi: should be able to sort it out. just an inertia problem of getting it installed. not a competition issue. fine with me. not difficult? aday: by end of week we'll have an rpm. suzi: let's keep pushing on this to make it happen. I'll talk to gavin tomorrow. can we install on sf site, or do we need to set it up elsewhere? aday: could conceivably set up a cgi on sf. uses custom apache handler tho. gh: more ontology q's can wait till tomorrow w/ lincoln. concern: how do we deal w/ types that represent more than one ontology terms. defer discussion till tomorrow. Topic: Feature Properties ------------------------- See andrew's post today. ad: this ties into ontologies. two ontology related issues: two different ways to query. ontology of a feature, and two diff ways to search a db for that property: exactly equal, or a subtype. this is a property with two diff searches you may want to do on it. properties like note, alias, phase have ability to search key/val properties, e.g., att:alias=something. score is a floating point number you may want to support > or < on it. regular exp searches, identical, etc. td says use xml query language, but worried about complexity of this. 99% of time this is way more that you need. scenario: given 4 different notes in a feature, is order important? extensions: curation point gives curator's name and time stamp. e.g., search for all featues modified by andrew in 2004. discussion: pull this into a note element, perhaps phase and alias too. property table only supports a substring search. give me an author name, e.g. not saying getting rid of tag values. server supporting new data types, extensions, feat search w/ sanger curation elements for query. or thomas xml search. this is why I want to move categories back to capabilities. gh: more appropriate as capabilities than header. ad: someone can get a document. andreas can combining many servers into one, say: which one supports which. to summarize: - properties are simple strings - only substring searches - change att: to prop: - note and alias and phase are elements - advertise that a server has extension to das query lang gh: what about phase? lincoln needs it. ad: if it's something that people will be editing, make it a element. gh: phase is inappropriate for certain types. would like formal way when it should be there or not. ad: this is formalizing a way for server to tell client that there are more types of searches available. can't see how to do it automatically: eg for a given score, knowing what is considered significant (low or high, e.g.). td: if he needs a phase he re-infers it. doesn't work for partial CDS tho. gh: how much spec churn will this generate? ad: [various things, half a dozen or so, some simplifying] gh: does a colon in a query string need to be escaped? if so, this makes it hard to read. ad: could use prop_ rather than prop: thomas and I had long discussion about this. [A] andrew will incorporate these changes into feature properties Topic: Maintainer information ----------------------------- ad: modified examples under scratch gh: maintainer at source or version level ad: one for all sources level ap: at sanger we have one central server with lots of sources. notes who's responsible for which server. gh: ownership cascades down to sub elements? ad: yes Topic: XML Base --------------- gh: can be in any element. as well as xml:lang, don't really understand. ad: it's what the atom spec does, so we copied. maybe for bidirectional languages. gh: flexible uri resolution scheme w/ xml base. implementation in java tools is spotty for xml:base. curious about java obj binding of xml what support they have for resolving xml base. at this point will have to roll it myself. want to ask thomas about this. ap: he's using Stacks parser, gets global namespace. gh: bigger concern for when we have to use sax, need to do xml:base resolution, eg. when we need to retrieve lots of features. ad: it can be done with sax. gh: not hard, but it is a multistep process. ad: multiple levels of xml:base ad: tomorrow's agenda: go through roy's otter stuff, convert into new das format. to get a feel for how data will look. see roy's email. to use experience gathered from otter to make sure we're sufficiently covering features. gh: talking about writeback? ad: premature. let's talk style sheets wed, and writeback thursday. plus anything else that's come up about the spec. want to know how style sheets will look. lincoln should be able to help out there. From nomi at fruitfly.org Tue Feb 7 22:27:13 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Tue, 7 Feb 2006 19:27:13 -0800 (PST) Subject: [DAS2] We need DAS/2 progress reports for the grant! Message-ID: <17385.25873.660275.790249@kinked.lbl.gov> Dear DAS/2 developers, I am writing this on behalf of Gregg and the DAS/2 team. This is so important I'm actually using capital letters. As you know, we have submitted a request for renewing the DAS/2 grant. Our chances of having this renewal approved are iffy, especially since we are asking for more money than in the original grant and NIH's budget is very tight right now. The reviewers are about to read our grant proposal and decide whether to fund it, and we need to send them a supplementary progress report about what we've accomplished since we submitted the grant in November. Describing how much progress we've made towards implementing the DAS/2 protocol in both servers and clients will help make our case that we deserve more funding to continue this important research. Gregg has been trying for weeks to find out when this progress report was due (we had figured we had until the end of February). Today he *finally* got through to our scientific review administrator, who said that we have to send it to them no later than THIS THURSDAY. Obviously, this is very short notice, so we are asking all of you to very quickly put together a paragraph (no more!) describing your progress between Nov 1 and the of the end of this week (i.e., you can project to what you expect to have completed by Friday). If you need context, I have attached a copy of the grant; I will also send some of you individual notes about what we need from you. Please send us (the DAS2 mailing list, or, if you're feeling shy, just me and Gregg) your paragraph in PLAIN TEXT so that I can more easily assimilate them into a single document. We plan to work on incorporating your reports into our progress report tomorrow (Wed), send out a draft tomorrow night (our time) for you to review, and incorporate any suggestions into our final version that we'll send off on Thursday. Sorry for the short notice, and thanks in advance for your help. Nomi and Gregg -------------- next part -------------- A non-text attachment was scrubbed... Name: DAS2_renewal_grant_final2l.doc Type: application/octet-stream Size: 453632 bytes Desc: DAS2 renewal grant proposal URL: From allenday at ucla.edu Tue Feb 7 22:14:49 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 7 Feb 2006 19:14:49 -0800 (PST) Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: Chris, Why have you chosen to make a subelement of ? Is it expected that there will be multiple IDs for a given term, and if so is there not a primary ID? having an id attribute is a defacto standard for DOM libs, so you can call getElementById(). -Allen On Tue, 7 Feb 2006, chris mungall wrote: > > On Feb 7, 2006, at 1:20 PM, Allen Day wrote: > > > Hi Chris, > > > > On Tue, 7 Feb 2006, Chris Mungall wrote: > > > >> > >> Hi all > >> > >> I'm concerned that the XML in the URL below isn't quite Obo-XML, it's > >> Allen's modified version of it. In particular, the adding of an "id" > >> attribute which is redundant with the id element, and the > >> modification of > >> the ID scheme to use slashes instead of :s. > >> > >> I believe the latter may have been to make the ID scheme more DAS-y? > > > > The slash was introduced to take advantage of xml:base and the > > hierarchical relationship between namespaces and terms, e.g. > > > > xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001" > > > > is equivalent to: > > > > /das/ontology/obo/1/ontology/SO/0000001 > > it's actually equivalent to: > /das/ontology/obo/1/ontologySO/0000001 > > > If we want the identifier to be SO:0000001, it means that we have to > > make > > xml:base="/das/ontology/obo/1/ontology/SO. This is problematic for two > > reasons: > > > > 1) multiple xml:base cannot be defined for the entire document, > > meaning > > that URIs for other records referenced become very long. > > Why not just define a qname for every idspace? This is the standard way > of doing this in XML > > Using xml:base is not a big gain for brevity, since fairly soon some > obo ontologies will reference other obo ontologies. > > In fact is this even as issue if you get rid of the id attribute to > conform to obo-xml? ids in obo-xml are encoded as elements, so xml:base > rules are not applied. Obo has it's own rules for ID generation. This > has the arguable disadvantage that we can't directly use xml:base and > the whole xml namespace system for OBO IDs, we layer our own system on > top. This is actually preferable for us. > > > 2) different ontologies cannot use the same xml:base > > > > The only way I see out of this ATM is to treat : as a / internal to the > > Ontology-DAS service. > > I'm still not sure what the problem is, and I think you may be stuck > anyway when it comes to RDF/OWL ontologies > > > > >> OBO IDs are composed of a prefix and a local ID. These are always > >> joined > >> with a :. The prefix can be specified as shortform (eg GO) or a URI > >> prefix. When the long form is combined with the local ID you get your > >> URI. > >> > >> If DAS wants to use a modified version of Obo-XML, that's fine, but > >> please > >> don't call it Obo-XML, it will cause huge confusion! > >> > >> I would much prefer if you used Obo-XML as it is - if there are things > >> you'd like to see changed about the format we can perhaps work that > >> out. > >> I'm concerned by the changing the ID to use / instead of :. This is > >> wrong, > >> and if it's something that's required for DAS, how will you > >> interoperate > >> with RDF etc? > >> > >> In fact there are other parts where the xml is definitely not Obo-XML > >> - it > >> looks like Allen has coded these by hand rather than taking existing > >> XML. > >> That's fine, but it should be marked as such. For example, there is no > >> develops_from element in Obo-XML; all relations bar is_a are encoded > >> as > >> relationship elements. > > > > The XML provided by the Ontology-DAS server is using templates to mark > > up > > ontology records that have been loaded to a chado database using > > perl-go-perl. The develops_from node, IIRC, was created because there > > is > > a section in a perl-go-perl .xslt that creates elements for all > > relationship types. > > hmmm, I don't think so, but the point is moot anyway, just so long as > the final version uses xml that validates, either against obo-xml or > your own documented variant > > > > >> > >> There is a DTD at the moment > >> http://www.godatabase.org/dev/xml/dtd > > > > This didn't exist at the time I wrote my templates ( 4-6 months ago), > > or I > > would have validated. > > it did, it's just not well signposted! sorry about that > > look forward to seeing a demo. I do this you have to work out the > semantics of retrieval by ontology term though. > > cheers > chris > > > > > -Allen > > > > > > > >> > >> The docs are minimal as the explanation of all the fields is in the > >> docs > >> for the obo text file format > >> http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf} > >> > >> We'll be converting to RNG+XSD soon > >> > >> You can get Obo-XML examples from > >> http://www.fruitfly.org/~cjm/obo-download > >> > >> You can see the default rule for creating a URI in the OWL files; > >> these > >> currently all get the geneontology.org URI prefix by default, but this > >> will change (we were going to use LSIDs but the majority of OWL tools > >> don't seem to handle URNs very well) > >> > >> As far as DAS/2 supporting different file formats, Obo-XML and > >> RDFS/OWL > >> would seem to be the natural contenders. We currently go from the > >> former > >> to the latter via a simple XSLT, the reverse transformation is a > >> little > >> more difficult. > >> > >> Allen has inlined some comments from an email exchange with me in the > >> document. I agree about keeping the API minimal. On the other hand you > >> will need at least some inferencing machinery - I'd encourage you to > >> reuse > >> existing reasoning services here. > >> > >> Cheers > >> Chris > >> > >> On Tue, 7 Feb 2006, Helt,Gregg wrote: > >> > >>> I talked to Suzi, she's planning to join our teleconference today to > >>> discuss ontologies, wearing her hat as co-PI of the National Center > >>> for > >>> Biomedical Ontology. Hopefully Lincoln can join too. > >>> > >>> I took a closer look at the DAS/2 ontology work Allen has done (see > >>> http://biodas.org/documents/das2/das2_ontology.html). I urge anyone > >>> who > >>> wants to contribute to the ontology discussion to read this doc. It > >>> specifies a way to retrieve ontologies in OBOXML format. In this > >>> format > >>> each ontology term gets an absolute URI through the same mechanism > >>> that > >>> the rest of DAS/2 uses (URIs for ids, which can be either absolute or > >>> relative but resolvable). As Allen pointed out yesterday this would > >>> solve our problem of how to uniquely specify ontology terms in the > >>> DAS/2 > >>> TYPES XML. > >>> > >>> I couldn't find any documentation for the OBOXML format, other than > >>> the > >>> code that generates it from OBO files. But I'm using OBOXML as an > >>> example here because it clearly has resolvable URIs for each ontology > >>> term. In Allen's spec, ontologies can also be returned in other > >>> formats, but it's unclear to me whether terms in these other formats > >>> would resolve to similar URIs. > >>> > >>> gregg > >>> > >>>> -----Original Message----- > >>>> From: das2-bounces at portal.open-bio.org > >>> [mailto:das2-bounces at portal.open- > >>>> bio.org] On Behalf Of Andrew Dalke > >>>> Sent: Tuesday, February 07, 2006 1:32 AM > >>>> To: DAS/2 > >>>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > >>>> sprint,6 Feb 2006 > >>>> > >>>>> gh: would like a re-cast as xml document, hosted at so/sofa > >>>>> website. that xml would be like a std ontology representation so > >>>>> you > >>>>> could extend it. so someone could point to an extension of it. > >>>> > >>>> I asked as an action item if Gregg would look into the solution > >>>> for this. Do we refer to the ontology by a "GO:0123456" identifier > >>>> or by some URL scheme? If so, what's the mapping from URL scheme > >>>> to something that clients and people can understand, eg, to > >>>> ask for everything which is an exon? > >>>> > >>>> Does this mapping need a version number - does it change over time? > >>>> > >>>> Andrew > >>>> dalke at dalkescientific.com > >>>> > >>>> _______________________________________________ > >>>> DAS2 mailing list > >>>> DAS2 at portal.open-bio.org > >>> > >>> > >>> _______________________________________________ > >>> DAS2 mailing list > >>> DAS2 at portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/das2 > >>> > >> > >> > >> _______________________________________________ > >> DAS2 mailing list > >> DAS2 at portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/das2 > >> > From allenday at ucla.edu Tue Feb 7 22:57:05 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 7 Feb 2006 19:57:05 -0800 (PST) Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: Hi Chris, > Why not just define a qname for every idspace? This is the standard way > of doing this in XML Can you give a concrete example of this? a search for "qname idspace" returns a single godatabase.org result. Anyway, I have stripped out the id= attributes from the and elements. You can see valid (by your DTD) obo xml produced from the das server here: Entire SO: http://das.biopackages.net/das/ontology/obo/1/ontology/SO?format=legacy1 SO "exon" record: http://das.biopackages.net/das/ontology/obo/1/ontology/SO/0000147?format=legacy1 -Allen From Gregg_Helt at affymetrix.com Wed Feb 8 03:36:01 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 8 Feb 2006 00:36:01 -0800 Subject: [DAS2] Working with xml:base in Java? Message-ID: I've been mucking around trying to find an answer to my own question about ways to easily handle xml:base in Java. And I think the answer if I want to continue to use DOM ends up being "code it yourself". But it took a while to get to that answer. I'm writing down these notes so I can refer back to them next time if the issues I encountered come up again. But I figured I might as well post in case other DAS/2 implementers have similar problems. So the standard Java 1.5 distribution includes the org.xml.dom.Node interface, which conveniently enough has a getBaseURI() method that should do exactly what I want -- for any node in an XML document, give me the resolved base URI for that node (regardless of how complex a combination of xml:base attributes are used in the path to that node). Which I can then combine with whatever id attribute I'm interested in (via Java networking classes) to get the full URI. But I need to guarantee compatibility with Java 1.4, so I can't rely on 1.5. Java 1.4 has a previous version of org.xml.dom.Node, but with no getBaseURI() method. Turns out this is because the 1.5 Node interface complies with DOM-level3 spec (includes XML Base support) but the 1.4 Node interface only supports DOM-level2 spec (no XML Base support). Okay, but I can download the Xerces2 distribution, which is a Java library that also has a full implementation of DOM-level3. So I get that set up, add some calls to node.getBaseURI() to my code, and it compiles fine. But when I run the program I get an ugly java.lang.NoSuchMethodError. I dig around on the web and find the problem is a class/package namespace collision -- both Xerces2 and the builtin java libraries have a class named org.xml.dom.Node, but of course they're different. And replacing built-in java classes is not normally allowed, so when the program is actually run and classes are loaded the builtin Node class wins (the one w/o the getBaseURI() method). It would have been nice if they mentioned this in the JDK Compatibility section of the Xerces2 FAQ... But there is some discussion of solutions to this problem on the Xerces mailing list. There is actually a way to replace builtin java packages via an "Endorsed Standards Override Mechanism", if they're on the list of endorsed standards, which org.w3c.dom is. This involves putting the replacement package in an endorsed directory and setting a system property to direct the JVM to look there for replacement packages. But... whatever solution I use has to work with Java WebStart. I can't find _any_ info on whether the package override mechanism works with WebStart. And even if it does work for some WebStart implementations, I'd be wary of assuming it works for others -- it seems like one of those things IT folks on the user end might get concerned about. I've also found other solutions to the package name clash, but none that seems compatible with WebStart. So it looks like, considering my other constraints, if I want to stick with DOM I'll need to code xml:base handling myself. Looking at the source code for Xerces2, doesn't look too hard. Except... damn, the getBaseURI() method implementation is actually commented in the Xerces code as "Experimental". Looking closer... um, I think it actually doesn't implement the spec correctly. Grr... To summarize, when it's time for my status report tomorrow, I think it's best if I just remain silent. gregg P.S. I suspect the answer for SAX will be similar. P.P.S. XOM (http://www.xom.nu/) is starting to look pretty good, but I may just be hallucinating at this point... > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Helt,Gregg > Sent: Tuesday, February 07, 2006 11:01 AM > To: Thomas Down > Cc: DAS/2 > Subject: [DAS2] Working with xml:base in Java? > > > Thomas, I'm wondering what toolkits you're using for binding XML > to Java objects? And particularly how you are dealing with resolving > URIs when xml:base is used. So far I've mostly used various > implementations of SAX and DOM -- I've found some reports of builtin > xml:base support in Xerces SAX/DOM, but it's still unclear. > > I've been avoiding the issue up till now. It won't be too hard > to implement URI resolution relative to xml:base, but I thought I'd > check around first and see if there's automated support of this in some > toolkit. > > Thanks, > Gregg > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From td2 at sanger.ac.uk Wed Feb 8 03:44:38 2006 From: td2 at sanger.ac.uk (Thomas Down) Date: Wed, 8 Feb 2006 08:44:38 +0000 Subject: [DAS2] Re: Working with xml:base in Java? In-Reply-To: References: Message-ID: <70790A43-AA5F-4F4A-8F20-50CDE30C7BB3@sanger.ac.uk> On 7 Feb 2006, at 19:00, Helt,Gregg wrote: > > Thomas, I'm wondering what toolkits you're using for binding XML > to Java objects? And particularly how you are dealing with resolving > URIs when xml:base is used. So far I've mostly used various > implementations of SAX and DOM -- I've found some reports of builtin > xml:base support in Xerces SAX/DOM, but it's still unclear. > > I've been avoiding the issue up till now. It won't be too hard > to implement URI resolution relative to xml:base, but I thought I'd > check around first and see if there's automated support of this in > some > toolkit. Hi Greg, I'm actually using Stax (the streaming API for XML). The implementation I use is called Woodstox: http://woodstox.codehaus.org/ (but there are a few others out there). No builtin xml:base support but it's easy to write a little wrapper around XMLStreamReader to spot xml:base attributes and maintain a stack of base URIs. I'm using java.net.URI to do the URI handling/resolution/ relativization. Seems to be working okay... so far... Thomas. From Gregg_Helt at affymetrix.com Wed Feb 8 05:12:22 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 8 Feb 2006 02:12:22 -0800 Subject: [DAS2] RE: Working with xml:base in Java? Message-ID: > -----Original Message----- > From: Thomas Down [mailto:td2 at sanger.ac.uk] > Sent: Wednesday, February 08, 2006 12:45 AM > To: Helt,Gregg > Cc: DAS/2 > Subject: Re: Working with xml:base in Java? > > > On 7 Feb 2006, at 19:00, Helt,Gregg wrote: > > > > > Thomas, I'm wondering what toolkits you're using for binding XML > > to Java objects? And particularly how you are dealing with resolving > > URIs when xml:base is used. So far I've mostly used various > > implementations of SAX and DOM -- I've found some reports of builtin > > xml:base support in Xerces SAX/DOM, but it's still unclear. > > > > I've been avoiding the issue up till now. It won't be too hard > > to implement URI resolution relative to xml:base, but I thought I'd > > check around first and see if there's automated support of this in > > some > > toolkit. > > Hi Greg, > > I'm actually using Stax (the streaming API for XML). The > implementation I use is called Woodstox: > > http://woodstox.codehaus.org/ I would like to check out Stax, haven't used it before. > (but there are a few others out there). No builtin xml:base support > but it's easy to write a little wrapper around XMLStreamReader to > spot xml:base attributes and maintain a stack of base URIs. > > I'm using java.net.URI to do the URI handling/resolution/ > relativization. Seems to be working okay... so far... That's what I was thinking about when I said it wouldn't be too hard to implement... But that was yesterday. A long time ago. Now I've taken a detour into re-reading the XML Base spec http://www.w3.org/TR/xmlbase/, and things don't seem so easy. I _think_ if there's at least one xml:base attribute in the element hierarchy above where you're trying to determine a base URI, and resolution of those xml:base attributes yields an absolute URI, it's all good, that's the base URI. But on the other hand if this resolution yields a relative URI instead of an absolute URI I'm not sure what happens -- I would guess it's an error, but I can't see anywhere in the XML Base spec that spells this out. And if there's no xml:base to use to determine a base URI, things get weird: if the document is "encapsulated within another entity", the base URI is the URI of that entity (I have no idea if DAS/2 docs could appear in such a context) otherwise the base URI is the URI used to retrieve the document oh, except if you burrow down into the spec pointers to RFC 2396 http://www.ietf.org/rfc/rfc2396.txt, if the request gets redirected you need to make sure the base URI is the last URI used in the redirect oh yeah, and apparently external entity declarations can affect all of this in ways I don't understand and there's probably other gotchas I've missed... Now from the server side, none of this is really an issue. Just pick from a multitude of variants that XML Base allows when you send responses to the client. From the client side, if we really want DAS/2 to support XML Base (and I think we do), things get tricky. It's definitely pushing me towards using libraries that provide builtin support for XML Base. Gregg From dhoworth at mrc-lmb.cam.ac.uk Wed Feb 8 06:54:54 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Wed, 08 Feb 2006 11:54:54 +0000 Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: <43E9DC0E.30809@mrc-lmb.cam.ac.uk> Allen Day wrote: > Why have you chosen to make a subelement of ? Is it expected > that there will be multiple IDs for a given term, and if so is there not a > primary ID? having an id attribute is a defacto standard for DOM libs, so > you can call getElementById(). I'm curious about the DAS use of id attributes, especially given an expectation to use getElementById(). DAS has attributes that are URLs - they include the '/' character. But getElementById() is an HTML or XHTML DOM method I believe. Both HTML 4 and XHTML require that id attributes be of type ID, I think, and the ID type does not permit '/' characters (IDs are Names). I find it pretty confusing that DAS uses an attribute that is called id that isn't an ID. And I'm curious to know if getElementById() works with it? Sounds like a sloppy implementation of the DOM. Or did I miss something? Cheers, Dave From dalke at dalkescientific.com Wed Feb 8 11:36:11 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 8 Feb 2006 16:36:11 +0000 Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2) In-Reply-To: <43E9DC0E.30809@mrc-lmb.cam.ac.uk> References: <43E9DC0E.30809@mrc-lmb.cam.ac.uk> Message-ID: <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com> Dave Howorth wrote: > I'm curious about the DAS use of id attributes, especially given an > expectation to use getElementById(). > > DAS has attributes that are URLs - they include the '/' character. > > But getElementById() is an HTML or XHTML DOM method I believe. > > Both HTML 4 and XHTML require that id attributes be of type ID, I > think, and the ID type does not permit '/' characters (IDs are Names). > > I find it pretty confusing that DAS uses an attribute that is called > id that isn't an ID. And I'm curious to know if getElementById() works > with it? Sounds like a sloppy implementation of the DOM. Or did I miss > something? We've been talking about this and related matters most of the day. It started with Thomas' question "How do I get all of the exons in the database which are from Vega?" (Vega being some other database.) All of the features which are exons from Vega have the same DAS data type. This means he wants to do a feature query with type = He needs to get the DAS type id. He can get all of the exons using an ontology search. But he wants to search for the string "exon". Given the discussion yesterday, will the type query support "ontology='exon'" or must he use some other service to convert "exon" to "SO:exon" or to "http://some/server.url"? Suppose for now it is "SO:exon". He does http://das.server/../types?ontology=SO:exon That gets all of the exon types, but not the ones from Vega. The Vega types have a source="Vega". DAS type queries do not support searching on that field. PROPOSAL: Add a "source=" (case-insensitive substring search) field to the types query. (I don't think there is any contention here so I'll add it.) http://das.server/../types?ontology=SO:exon;source=Vega That comes back with a single DAS type. He now wants to search for all features with that type. What does he use for the query? Is it (assuming proper escaping) http://das.server/../features?type=http://das.server/../type/T12345 ? That's rather excessive, especially if there are many DAS types derived from the given ontology term. All around people want to use "T12345" for that, and not the full URL. Are there people who do want to use the full URL? The current system comes from saying the URL is the identifier for a DAS object. If as Dave points out we have a "id" which is a simple string (of the format /[A-Za-z0-9_]+/ or so) then there's no problem. We can use that for the query, as http://das.server/../features?type=T12345 PROPOSAL: do not use a URL for the identifier for objects That fixes a few problems: - xml:base is no longer an issue; these are ids and not URLs - the names are short and sweet It introduces a few problems. Problem 1: a feature has a type. How can the client get from the type id to the type information if there is no URL to resolve? Solution 1: add a 'id=' term to the types query URL, eg http://das.server/../types?id=T12345 (or possibly call it 'type=') Solution 2: append "/" + type id to the types query URL, eg http://das.server/../types/T1234 Solution 3: have both an 'id' and an 'href' attribute Solution 4: the client downloads all the types and compares the id fields. QUESTION: At Hinxton nearly all the DAS servers have only one or two types. Ensembl has 45 types and Allen's has about 50. Is it reasonable to have clients just go ahead and download everything and not worry about a query language? Is Chado any different? Problem 2: a feature can refer to its parent and part features. It can refer to regions on other features. How does a client get information about the feature given the feature id? Solution 1: add a 'id=' term to the features query URL Solution 2: append "/" + feature id to the feature query URL Solution 3: have both an 'id' and an 'href' attribute We discussed this a lot and decided on PROPOSAL: add an 'id=' query to the types and features query. We decided against solution 2 because of me - I don't like working with URLs that way. Thomas pointed out that an 'id=' query is useful, eg, if a feature has three parts then a client can request http://das.server/../features?id=part1,part2,part3 (NOTE: we're also thinking of proposing this syntax for an 'OR' query over the same term http://das.server/../features?id=part1;id=part2;id=part3 ) I pointed out that having both means there are two ways in the server to look-up by id - extra machinery. QUESTION: Who will want to refer to features and types by URL? Possibilities: - hypothetical model where the queries return a list of URLs and the server (through HTTP pipelining) asks only for the ones it doesn't have already; saving bandwidth. THIS IS NOT A USE CASE! - request a feature in a specific format (but that can be done through the query URL) - RDF people who want individually named items (not a use case) ?We couldn't come up with a case where someone would want to refer to features and types as an individually named URL! For segments there is a use case - you can ask for sequence by range, and that's through the segment URLs. However, that could be done with the segment query URL so it's not a strong use case. In any case, it hasn't been a problem so I'll put that off for now. That being the case, there's no need to consider "Solution 2". Why have URLs if no one wants to use them? What did come up during the discussion here was that we had planned to use URLs for writeback. That model seems rather nice. "DELETE" and "PUT" to the correct URLs, rather than going through a "POST to delete.cgi?type_id=", etc. The model for writeback was something like "ask server to make a copy, with region A:C available for editing. User works with region. User commits region back to server." In that case, the request for region might as easily make a copy of the source, available through a special URL visible only to that one user. In this copy it can expose "url=" attributes for editing, perhaps also with a "writeable=" field because some features will not be editable for that user. I complained yesterday about "writeable" but that was because for the general purpose server the concept of "writeable" was user-specific and not appropriate. In this writeback model it's just fine. Another thing came up during discussion of this. Roy yesterday proposed the idea of a simple server which only supports getting "everything". It doesn't support the DAS query specification. That is, it only supports http://das.server/../types http://das.server/../features and fetching those returns everything. This is useful for small data sets because those could be simple files, like http://das.server/../types.xml http://das.server/../features.xml Still, for that case there would need to be "feature/F1", "type/T2", etc. In essense, a duplicate of every record. Last December during discussion people said there was no use case for this sort of flat-file oriented server. This was not a design goal. Thomas mentioned that there is a use case. Uploading of DAS tracks to a server. People complain now that it's hard to do that. With this url-less model people can upload a small number of documents (or at .zip file of a directory) with the versioned source, types, and features data. There is no need to have an "exploded" copy of all of the records in parallel to the types and features xml files. Big Advantage: Stylesheets are much easier to write. Refer to fields by short id instead of long URL. Conclusion: Proposal 1: "id"s are of the form /[A-Za-z0-9_]+/ Proposal 2: FEATURE and TYPE elements have an option "url" (or "href") attribute Proposal 3: the feature and type queries support a 'id=' search Proposal 4: the type query supports a "source=" search Churn factor: Allen's server doesn't need the 'type/' and 'feature/' fields Gregg and others don't need to worry about xml:base any more. Type and feature lookups need to track the query URL as well as the type and feature id We need a new 'id=' search capability These don't seem big on a programming sense, more a conceptual one. Andrew dalke at dalkescientific.com From cjm at fruitfly.org Wed Feb 8 13:03:41 2006 From: cjm at fruitfly.org (chris mungall) Date: Wed, 8 Feb 2006 10:03:41 -0800 Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2) In-Reply-To: <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com> References: <43E9DC0E.30809@mrc-lmb.cam.ac.uk> <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com> Message-ID: <94bafd156da54842f9093244ca6083d1@fruitfly.org> I'm mostly skim the messages here, so I may be missing something, but I'm a little confused by this: On Feb 8, 2006, at 8:36 AM, Andrew Dalke wrote: > > http://das.server/../types?ontology=SO:exon I don't understand this - SO:exon isn't an ontology > > That gets all of the exon types, but not the ones from Vega. > The Vega types have a source="Vega". DAS type queries do > not support searching on that field. > > PROPOSAL: Add a "source=" (case-insensitive substring search) > field to the types query. (I don't think there is any contention > here so I'll add it.) > > http://das.server/../types?ontology=SO:exon;source=Vega What does 'types' return? A type from an ontology (eg SO:exon) or something else? Why would source be recorded here? Surely source would be a valid constraint on a feature query, but not a type query. Perhaps it's the case that in DAS a 'type' means some kind of arbitrary grouping (eg features of type X and source Y), and 'ontology' means a term/type from an ontology. If it isn't too late I'd suggest changing these conventions. From Gregg_Helt at affymetrix.com Wed Feb 8 13:12:46 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 8 Feb 2006 10:12:46 -0800 Subject: [DAS2] Why use URIs for feature IDs? Message-ID: Regarding using URIs for DAS features, here's the quote from Paul Prescod that I used in the original DAS/2 grant proposal addressing the question "why use URIs?". From http://www.prescod.net/rest/rpc_for_get.html : You can give that URI address to anyone, anywhere and they can reuse it. In particular this means that we can compose applications that were not thought of in advance. Google is an example of an application that was composed "after the fact" out of URIs. Yahoo is another...There are a raft of deployed W3C recommendations that work with information related through URIs. Many of these are XML-related specifications that work as well in API-like applications as in user interface-based applications. These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery, xml-stylesheet. Information published through HTTP URIs can be combined through XInclude, queried and sorted through XQuery and XSLT, visually rendered with xml-stylesheet, related through RDF, linked through XLink, pointed into through XPointer. From dalke at dalkescientific.com Wed Feb 8 14:24:06 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 8 Feb 2006 19:24:06 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: References: Message-ID: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com> Yes. I like URLs. I've been so in favor of URLs that until this morning I had in the spec that the "id" *is* the URL. There was no short form for the URL. (still /is/ no short form since it hasn't changed ;) That meant several things: - everyone needs to disambiguate through the xml:base to figure out if two features are the same. (Neither Gregg nor Thomas liked that) - queries of the style we are doing become more complex (type=http://www.server/path/to/das/type/000A956826C8 vs. type=000A956826C8 ) - passing URLs about make for bigger XML, hence slower. The first is technical. The second is emotional - that sort of query looks ugly. The last is .. I can't speak for the last. In an earlier email I showed how a different site layout can be as efficient as any id scheme. Quickly, use http://www.../volvox/1/S <- versioned source URL http://www.../volvox/1/T?.. <- types query url http://www.../volvox/1/T001 <- type urls http://www.../volvox/1/F?.. <- feature query urls http://www.../volvox/1/F001 <- type urls and don't worry about any sort of hierarchy in the system. Everything has the xml:base of "http://www.../volvox/1/" so relative URLs are trivial strings. Several said "just chop off the last bit of the URL to get the id" or "combine some base feature URL with the feature id to get the full URL." Why is that useful? Lincoln said on today's phone call that he wants both a URL and an id, and expected that both would be there. I'm now going to be either stubborn or irritating or both. Why have an id at all? That is, why at all have a short string (say of the form /[A-Za-z0-9_]/ when the URL is there and meets all the functional requirements of an identifier? (I'll use 'id' to refer to a short string, 'url' to refer to a URL. Both are identifiers. I should be using 'uri' for the latter, I know. See comment below.) Today I thought I came up with one reason to have ids and to have a non-existant URL for a element. I think now that I was wrong. My use case was for uploading data to the Emsembl viewer to display a new DAS track. Put all of the types into one file, in the types XML format. Put all of the features into another file in a features XML format. Use arbitrary ids for cross referencing, because there is no URL for them - they don't exist in any form outside the document. Upload them to the server. The server reassembles the annotations by cross referencing the ids. I now see that that's a mistake. As Gregg corrected me, they use URIs not just URLs. They could use "das_private:ABC123" or a fully-qualified URL or a xml:base and the partial URL or whatever scheme. All the server needs to know is how to compare the two URI strings. It's free to rename the strings if need be. (Could it keep the original URLs? Perhaps, but the original data might not be accessible. Consider an exon predictor whose output you want to upload to the Ensembl viewer. There is no URL for that.) Given that this isn't a valid use case for having an 'id' and not having a 'url' now I ask again, what's the point of\ having *both* a unique URL and a unique 'id' for the elements? Tradition? Elegance? With Dave Howorth's comment about the specialness of 'id' I can see changing the attribute name to 'url'.... or 'uri'. I've got to write a couple paragraphs for Nomi now. I'll leave with the following comment from http://tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages > Designing XML Languages is hard. It?s boring, political, > time-consuming, unglamorous, irritating work. It always takes longer > than you think it will, and when you?re finished, there?s always this > feeling that you could have done more or should have done less or got > some detail essentially wrong. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Wed Feb 8 16:46:37 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 8 Feb 2006 13:46:37 -0800 Subject: [DAS2] Re: New DAS/2 server for codesprint Message-ID: Following Steve's suggestion, I'm focusing on the region around YGL076C (also known as RPL7A) on the yeast genome to get a small slice of feature XML back from the codesprint server for a region where I know what the genes should be: http://das.biopackages.net/das/genome/yeast/S228C/feature?overlaps=chrVI I/364251:366080;type=SO:gene This returns the YGL076C gene with three CDS and two introns. A nearby snoRNA also gets returned. Gregg > -----Original Message----- > From: Chervitz, Steve > Sent: Monday, February 06, 2006 5:03 PM > To: Helt,Gregg; Allen Day > Cc: DAS/2 > Subject: Re: [DAS2] Re: New DAS/2 server for codesprint > > > > There's a gene (RPL7A) with two introns on chr7 at roughly > 366kbp - 364kbp: > http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YGL076C > > Most genes with introns in cerevisiae (which aren't many) > have just a single intron that creates a small 5' exon, such > as the alpha and beta tubulin genes on chr13. Tub1 is on > chr13 at 99Kbp, and tub3 is also on chr13 at 23Kbp. So the > first 100Kb of chr13 would be another region to try. > http://db.yeastgenome.org/cgi-bin/locus.pl?locus=tub1 > > Steve > > > > From: "Helt,Gregg" > > Date: Mon, 6 Feb 2006 16:14:55 -0800 > > To: Allen Day > > Cc: DAS/2 > > Conversation: [DAS2] Re: New DAS/2 server for codesprint > > Subject: RE: [DAS2] Re: New DAS/2 server for codesprint > > > > > > Allen, can you recommend a reasonable region on yeast to do > a features > > query that will return features with some hierarchy (like > > transcript/exons)? > > > > Thanks, > > Gregg > > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > > From Steve_Chervitz at affymetrix.com Wed Feb 8 16:47:18 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Wed, 08 Feb 2006 13:47:18 -0800 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 8 Feb 2006h Message-ID: Notes from the DAS/2 teleconference for the code sprint, 8 Feb 2006 $Id: das2-teleconf-2006-02-08.txt,v 1.1 2006/02/08 21:51:14 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Sanger: Thomas Down Sweden: Andrew Dalke UC Berkeley: Nomi Harris UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: * progress report for grant renewal * ontologies * ids and urls * style sheets * status reports Topic: Progress report for grant -------------------------------- gh: needs to be in the mail by 5pm tomorrow, to be included as a hard copy addendum to grant. will improve chances of funding for next cycle. review will be done be end of feb. nh: no later than 4pm pst today. state what you've accomplished since Nov 1 and now, in particular this week. one or two paragraphs. gh: 1. highlight significant enhancements 2. involvement of sanger, ebi 3. registry work from andreas, http spec for that registry 4. writeback ad: andreas worked on registry server, will send write up soon post telelconference. [A] Everyone write up 1-2 paragraphs of progress and send to Nomi ASAP Topic: Ontologies ----------------- gh: concerned about ontol attrib in types doc because, do we want it to be possible for a type to be an instantiation of multiple terms in the ontology. ls: will make it hard to validate. one type = many ontol terms. don't like it. types will be specializations of SO terms and will not have multiple parents. gh: thinking about people doing curation. if a type is anchored to one tern in the ontol, and a feat can have only one type, a feat won't be able to refer to >1 term in SO. ls: any use case for this? gh: still exploring this. eg., both a computed feature and an exon? ls: no. separate category for predicted genes. gh: is there something for 'computed exon' or 'computed cds'? ls: think so. sc: multiple branches like go? ls: multiple relationship types do exist. something can be is_a or part_of. I wanted das/2 to be limited to what you can say in SO, with notion that you can extend it. e.g., three predicted exons one with genefinder, exonerate, etc. ad: given a string 'exon' how does that get used to query server? ls: find exon SO term, download list of types from das server, find everything that inherits from exon ontology term. clients need to know how to search the SO list. they will have a local copy of SO that they'll refresh from time to time. gh: client isn't required to know the full structure, except maybe to search higher-level terms. but the term in the ontology attribute is sufficient. ls: could just search types and desc to find exons, but that relies on implementer describing their types correctly. gh: if a client wants to understand an ontol, the best way to go is via what allen's proposing, searching via ontology das, preferably via NCBO server. ad: what is the actual string we're searching on? aday: name or definition, or id. ls: client should have a copy of the SO. unambiguous in this opinion. client has SO, looks through types XML to find what the local types are which the server supports which match what it's looking for in the SO. here's a flowchart: - client downloads SO, caches. - client downloads seq types list, caches. - user searches to find exon - client looks to find matches against 'exon', maybe 5 hits. - prompts user to select which he's looking for - client looks thru cached types xml to find server types of SO term that user selected - client does feature query. ad: what is the string that the user is looking for URL or string? ls: in type xml how do we indicate the term? gh: we've been discussing this the past few days ls: why not replace the term with SO accession number? then we don't have to figure out the correct representation of ontology in an xml. can finish this by friday. chris mungall has weighed in, and xml version of SO ontology is not completely stable. gh: perferctly ok for client to know nothing about SO and treat these as unique string. ls: right. names will eventually be things like 'exon'. aday: chris's main complaint is that the doc didn't validate. I didn't have a dtd. got it and now it validates. I thought this was a done deal. there is a document written that describes how to do what we're talking about. ls: the only thing to be resolved, in types xml document, how do we refer to SO terms? aday: an attribute there that allows you to put in uri. it's a relative url that points to ontology das server to get obo xml for that term. ad: how do I go from string 'exon' to find out what that is? aday: ls: lets say administrator of das server has local type called foobar. associated w/ url for SO 'exon' term. andrew's question is, user want's to search for exons, how to go from 'exon' to correct url in SO to find what types correspond to that? what's to go from 'exon' to foobar. aday: search SO for exon, local types. there's a filter onontolgy that lets you search all terms and definitions gh: there's a reqt now that server must understnd parent child relationships in ontology. aday: server could do xpath query to pull out the terms you're interested in w/o understanding ontology ls: user types 'exon' returns all feats in the genome that are exons. aday: two servers, feat and ontol server gets all types from feat server, each has url to ontology das server, maybe multiple ontology das servers. each must have it's ontology searched returns supported or not. client assembles all search results from static obo xml documents, gh: for most clients this will be irrelevant. user will get a list of types - genscan, blat alignment, for things they may be interested in. they don't need to understand ontology nor does client. there may be a url to look up info about the term. this is the typical case. more sophisticated use cases can be put off till later. ls: in types xml can we have two attributes, url and accession so_accession="SO:12414", other will be url for obo xml. [A] types will have separate attributes for URI and SO accession number Topic: IDs and URLs ------------------- ad: discussion about searching for exon, use case: client goes to server to get list of all types, wants all features of a given type in a given range. may filter based on contains or inside, das-type=xxxxx. talking about that being a URL to get full name for it. what is the thing you send to server to ask for the types? gh: url ad: make this an id so it's not a long complex url. just an id specific to that server. such that you go to feat query url and get it. ls: can just chose the last component of the url, type id. ad: why have ability to get feature type individually? ls: will have to be uniquified, by adding url to types query. ad: feat query = ls: isn't this the way it was? gh: every feat has unique uri. ad: talking about filtering and querying. ls: just give it the id not the whole url. ad: now it is the url ls: should be the id does it make sense to be something that another server has defined? probably not. just a local type. [lots of back and forth here, didn't catch it all...] ad: do we need ability to refer to feature or type by url? gh: yes. for making rdf statements about das2 features. ad: who will do this? gh: I will if no one else does. web technology is moving in this direction. ls: we want every object a das server serves to be referencable as a url/uri. as for filtering mechanism, for type filter we can just use the id of the type, a short string. ad: agree, as of this morning the url and id are same thing. ls: a relative uri, by definition the server should implicitly attach the versioned data source url to it. ad: xml processors ls: define the way the filter query mechanism, hard code implicit paths into it. ls: featuresquery?type=something if 'something' has no slashes, server implicitly adds http://myserver/das/types/... ad: don't like pasting urls and strings together to get things. don't like queries with implicit logic like that. ls: perfectly happy saying you can use urls in the query strings. I'd go with short ids ad: propsing we have both, id and href. here's the case: people uploading to server want to provide a das track, can provide two documents. works well for < 1000 features gh: we have to have uri for features. ad: why? gh: I will send you the page from the first grant. ls: main reason is: to avoid namespace clashes when integrating data sets. td: what do you mean by integrate? ls: view of features from 4 diff annotation groups, want to search for a particular feature by its id, need to indicate which data source it's coming from. td: won't you be keeping track of which data source anyway? you never get a track that's a mixture of diff sources. gh: dangerous to do this. td: there must be something keeping track of which track is from. gh: my assumption is that this is with uri td: there's nothing that constrains a server to only use uris from itself. gh: we sacrificed this when we went with capabilities. ls: a server can emit a set of features, some use relative uris and some absolute ones. if my server starts emiting features with affymetrix uris, the assumption is these originate from affymetrix. uris indicate that they originate from diff places even though you may physically get them from a das server at a different location. gh: thomas is right. given a feature uri you have no way to tell which das server it came from. clients must keep track of this themselves. ls: we wanted to divorce the origin of the feat from the sever that serves it. should be possible to serve features that come from somewhere else. gh: making feature uri opaque was deliberate. ad: when you do a feat query it could return the whole db. so the server must know how to return a feature document that contains all features. that server must know all the data. gh: don't see problem ad: all features and types have id and url. different. url is optional gh: no, required. also, not url, but uri. ad: ok. why should all records have a uri? gh: compatibility with semantic web/rdf, lsid, future proofing. ad: if they want to they can, if not they shouldn't be required. no one is doing rdf now. ls: what issue are you concerned about with respect to uri? ad: like ontology search. give me all features of this das type, you then have to give the url. this is different than id. ls: completely happy treating id as the last component of uri and doing a paste. why don't you like the paste? ad: you can get features from two diff places, each ending with same last word. ls: what query is it that allows you to filter by feature id? we have positional, type filtering and getting a single feature from server of origin. gh: there shouldn't be an id filter. just resolving uri for that feature. ls: we can't search a feature by regex match on it's id. ad: i'm not saying that. I'm suggesting that the url be optional. ls: I don't understand the point. gh: why can't uri be required? ad: see use case in email today subject="ids and urls". involves uploading das tracks to a server. [some trouble: not everyone has seen it] ls: I say we have a policy that if there is big discussion, the email should come more than 30 minutes before conf call. gh: I've read most of it and am still confused. ls: I still don't understand it after reading. you'll have to rephrase it. ad: all types and features have id and url. ls: no, explain in a follow up email. ad: ok [A] Andrew will send follow up email to elaborate on his "ids and urls" use case [A] Everyone will try to absorb andrew's ids and urls use case Topic: Style Sheets ------------------- ad: how do you refer to elements in style sheets, by id or url? gh: no opinion ad: if everything is refered to by id, that makes style sheets easier to write. gh: has anyone gotten to implementation of style sheets for das/2? ad: my proposal was a straw man. Topic: Status reports --------------------- gh: reading lots of specs. after yesterday's rant about xml:base last night, implemented a stack. works fine for our current server. we shouldn't throw out xml:base because of a few edge cases. we might want to specify which subset of xml:base we use. checked in code for igb client, does capabilities, specify feat, types, segments. trouble when modeling sequences. ee: working on das/2 client. building new widget as gregg asked for. ad: working with andreas write up for registry. td: understanding the spec. xml parsing. gh: you are using stacks, have experience with it? td: yes, less painful. streaming api for xml. gh: tried xom. picky about namespaces. difficult to use with spec that's not stable. td: some trouble with dom gh: sources, types, segments I use dom (small document). for features use sax nh: progress with apollo. list of versioned sources, show segments, user picks, gets features. something that the parser doesn't like. not sure where the problem comes from. sc: working on setting up internal das server on 64bit machine here. refining the pipeline for generating files for loading the affy das server with updated data for various public and affy data sources. also writing up and posting meeting notes. aday: message from gavin about ontology responses. caching issue cased trouble with model/controller. chris's obo dtd. dependencies for server rpm were finished. now building the rpm. td: prsing xml from codesprint server. a few things are matching the spec from a few weeks back. prop, loc elements. will these be changed. aday: feature xml? td: yes. I'm still absorbing the changes, dozens of mails about feat properties. gh: more important is loc element, splitting into id and range. used to be one thing, now is two. one is id, other is start,end,strand. aday: will look into today. nh: I'm also taking charge of getting grant progress report done. especially need allen re: server, andreas via registry. gh: any reports for write back. brian: some work on that. not ready for prime time. gh: roy? ad: some talk about this puts and deletes on the urls. gh: let's talk about it tomorrow. From td2 at sanger.ac.uk Wed Feb 8 18:20:34 2006 From: td2 at sanger.ac.uk (Thomas Down) Date: Wed, 8 Feb 2006 23:20:34 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: References: Message-ID: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk> [I should prefix my comments here by saying that I don't actually have a terribly strong opinion on this matter *except that* I'd really like the spec to be explicit on how feature query language works... Does it go .../features?type=exon, .../features?type=types/ exon, or .../features?type=http://das2.sanger.ac.uk/ensembl35/types/ exon?]. Anyway, I'm still having a bit of trouble seeing why features need individually GETable URIs. The use case I remember from the conference call was that it would be nice to be able to describe DAS/ 2 features in RDF documents. I guess that makes sense to me, but for this purpose is there anything wrong with a URI like: http://das2.sanger.ac.uk/ensembl35/features#id12345 This seems compatible with Andrew's ID proposal. My memory of RDF/DAML/OWL/etc is that most objects which get described in such documents are actually fragment identifiers in larger documents, rather than individually GETable entities. Am I missing something here? Thomas On 8 Feb 2006, at 18:12, Helt,Gregg wrote: > Regarding using URIs for DAS features, here's the quote from > Paul > Prescod that I used in the original DAS/2 grant proposal addressing > the > question "why use URIs?". From > http://www.prescod.net/rest/rpc_for_get.html : > > You can give that URI address to anyone, anywhere and they can > reuse it. > In particular this means that we can compose applications that were > not > thought of in advance. Google is an example of an application that was > composed "after the fact" out of URIs. Yahoo is another...There are a > raft of deployed W3C recommendations that work with information > related > through URIs. Many of these are XML-related specifications that > work as > well in API-like applications as in user interface-based applications. > These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery, > xml-stylesheet. Information published through HTTP URIs can be > combined > through XInclude, queried and sorted through XQuery and XSLT, visually > rendered with xml-stylesheet, related through RDF, linked through > XLink, > pointed into through XPointer. > > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From dalke at dalkescientific.com Thu Feb 9 04:35:19 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 09:35:19 +0000 Subject: [DAS2] Re: New DAS/2 server for codesprint In-Reply-To: References: Message-ID: In the das2/scratch directory is a program called "verify_examples.py" I ran it against http://das.biopackages.net/das/genome/yeast/S228C/feature?overlaps=chrVI I/364251:366080;type=SO:gene as follows [guest276:das/das2/scratch] dalke% python ./verify_examples.py load FEATURES "http://das.biopackages.net/das/genome/yeast/S228C/feature? overlaps=chrVII/364251:366080;type=SO:gene" ! expected root tag '{http://www.biodas.org/ns/das/genome/2.00}FEATURES' got '{http://www.biodas.org/ns/das/2.00}FEATURELIST' ^D [guest276:das/das2/scratch] dalke% That is, it's a simple command language. The command to load a URL of the given type is load FEATURES "url" In this case it warns that the top-level name is "FEATURELIST" instead of "FEATURES", which is something that was changed last summer, I think. Saving locally and editing by hand I then get ! expected root tag '{http://www.biodas.org/ns/das/genome/2.00}FEATURES' got '{http://www.biodas.org/ns/das/2.00}FEATURES' That's because element. I'll explain in the next email. * file:///Users/dalke/cvses/das/das2/scratch/biopackages_features.xml:95: 57: error: element "LOC" from namespace "http://www.biodas.org/ns/das/genome/2.00" not allowed in this context That came from The RNC had a bug - it only allowed a single LOC element. Fixed. I've updated the schema and committed a copy of a features data set from Allen's server to CVS under das/das2/scratch/biopackages_features.xml Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Feb 9 05:00:45 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 10:00:45 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk> References: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk> Message-ID: <631777ea4ff08f68b6dde657effc18fe@dalkescientific.com> Thomas Down wrote: > Anyway, I'm still having a bit of trouble seeing why features need > individually GETable URIs. The use case I remember from the > conference call was that it would be nice to be able to describe DAS/2 > features in RDF documents. I guess that makes sense to me, but for > this purpose is there anything wrong with a URI like: > > http://das2.sanger.ac.uk/ensembl35/features#id12345 For that matter, the spec doesn't at present say that the individual URLs need to be fetchable. A client could treat them as opaque and unresolvable URLs and still do what it wants. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Feb 9 06:15:18 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 11:15:18 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com> References: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com> Message-ID: <4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com> I'm going to incur the possibility of pitchforks here.. :) Me: > Yes. I like URLs. I've been so in favor of URLs that until > this morning I had in the spec that the "id" *is* the URL. > There was no short form for the URL. (still /is/ no short form > since it hasn't changed ;) > > I'm now going to be either stubborn or irritating or both. > Why have an id at all? That is, why at all have a short string > (say of the form /[A-Za-z0-9_]/ when the URL is there and > meets all the functional requirements of an identifier? Here's the change - or not change since it reflects the current spec. Features and types have a single "id". That id is a uri in all its glory. Referring to Dave's email, yes, special characters are included - this is a uri. Looking at http://blog.bitflux.ch/wiki/GetElementById_Pitfalls the getElementById refers to the attribute with type "ID" which happens to be named "id" for XHTML and SVG. Given http://www.w3.org/TR/xml-id/ I have added xml:id as a common attribute for all of the DAS items for independent and optional identification of an element in a document. There is no short-form id for features and types. Queries are done using the full URL. For example, to find all elements of type "http://www.example.com/das2/human/1/type/T12345" the query string (assuming the query url is ".../1/feature_search.cgi") http://www.example.com/das2/human/1/feature_search.cgi? type=http%3A%2F%2Fwww.example.com%2Fdas2%2Fhuman%2F1%2Ftype%2FT12345 The single and sole exception is for range queries. Each segment has a URL and a "name" attribute. This name is a unique short-form identifier used for range queries. The name is of the form /[A-Za-z_][A-Za-z_0-9]*/ . To do a range query for all features on a segment with name Chr1 and range 50 to 100 use the format "X/50:100" and the query looks like http://www.example.com/das2/human/1/feature_search.cgi? overlaps=X%2F50%3A100 The reason for this exception is three-fold: - the syntax for merging the URL and two/three fields became ugly - Gregg wants to send multiple ranges at a time, if the client knows enough about what it has already - the client may consult one of several reference servers given the coordinate system for the annotations. These do not hold for feature types (features are independent objects; there will be at most a handful in most servers; the types are specific to the given set of features) Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Feb 9 06:41:35 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 11:41:35 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: <4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com> References: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com> <4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com> Message-ID: <0255ae96de376ffd89e2af0d9766aed6@dalkescientific.com> > I'm going to incur the possibility of pitchforks here.. :) To mollify or intensify the pitchforks ... Several people have said that "the id is the last component of the URL" or "the URL is the base + '/' + the id". That's what DAS1 did. I don't like URL construction like this. It makes the URL organization imposed by the specification when it doesn't need to do so. For example, Allen prefers his URLs like this /feature?this=that is the query interface /feature/F00001 is an identifier for the features I might like it like this /feature_search.cgi?.. is the query interface /feature/F00001 is an identifier for the features Still others as /features?this=that is the query interface /feature/exon/A1 is an identifier for the features /feature/contig/A is another identifier for the features ** NOTE: in this case the "last term of the URL" is not sufficient as a unique short-form id ** Or still others as /cgi-bin/fsearch.rb?this=that is the query interface /data/F1 is an identifier for the features /data/F2 is another identifier One advantage to hard-coding the URL organization into the spec is the tradition from DAS1, and the general practice of expecting one-off URL schemes during web scraping. Another is that people understand it more easily. It's a lot easier to write out examples in one naming scheme than it is to say "using the identifier from the record ..." On the other hand, the programming is easier. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Feb 9 06:48:02 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 11:48:02 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: <631777ea4ff08f68b6dde657effc18fe@dalkescientific.com> References: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk> <631777ea4ff08f68b6dde657effc18fe@dalkescientific.com> Message-ID: <2878cecec027ce28826c48d1a3a68e30@dalkescientific.com> Churn factor: The only part of the spec that changes is the query interface for types. The type feature filter must take a full URL and not a partial URL nor a non-existant 'short id'. Allen's server does not support queries given the full URL. Here's what the spec says -- note that it quotes the previous draft and I added some comments. > Query parameter "type" > > type=type_url > > Example: > $FQ?type=http%3A%2F%2Fwww.biodas.org%2FtypeA > > Match features with the given feature type. > > XXX the previous version of this document says > > Match features of the given type. A type is one of: > 1. a typeid returned by the feature type document described > earlier. Only features exactly matching the type are returned. > > 2. a sequence ontology term, such as "exon". Features matching the > term or *any of its ISA descendents* are returned. > > 3. a sequence ontology accession number, such as SO:12345. Features > matching the accession number or *any of its ISA descendents* are > returned. > > 4. a reserved type beginning with the namespace "das:". The only such > reserved type is currently "das:feature-lock", used for feature > updating. > > XXX I think we should only have it do 1. For 2 and 3 use the query > parameter 'ontology'. For 4, use a different query term, or don't use > locks as features. Based on the discussion yesterday, this changes to: 1. we support this one, with fully resolved URLs 2. the searching is done in the client so this option is removed 3. the searching is done in the client so this option is removed 4. we can always define "http://www.biodas.org/spec/special-type" as a URL to send to the server if we want to define a special query. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Thu Feb 9 10:27:57 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 07:27:57 -0800 Subject: [DAS2] Why use URIs for feature IDs? Message-ID: I think that as Thomas says, using URI fragment notation, http://das2.sanger.ac.uk/ensembl35/features#id12345 is a perfectly valid URI and thus is acceptable as a feature ID. But, if the intent is to construct feature URIs using fragment identifiers in combination with either ID attributes (as defined in a DTD) or xml:id attributes, as an alternative approach to URI = ID attribute with xml:base resolution, I think it would get messy. As I understand it a fragment identifier approach would mean URI = (URL of doc feature XML is embedded in) + "#" + value of feature's ID attribute. But then if the feature is returned as part of a query, say: http://das2.sanger.ac.uk/ensembl35/features?overlaps=chr1/10000:20000 and the feature with attribute id="id12345", then the feature URI using standard fragment notation would be http://das2.sanger.ac.uk/ensembl35/features?overlaps=chr1/10000:20000#id 12345 In other words there would be a very large number of possible feature URIs, with query string gunk in them, identifying the same feature. Unless we define a nonstandard way of constructing fragment identifiers that chops off the query string. Instead of something nonstandard I'd rather use xml:base, adhere to the XML Base spec, and allow the feature id attribute to be full or relative URIs. Then specifying in the top element that xml:base = http://das2.sanger.ac.uk/ensembl35/features/, a feature returned by the features query whose with attribute id="id12345" resolves the feature URI to: http://das2.sanger.ac.uk/ensembl35/features/id12345 There might even be a way to fiddle with xml:base and id to use a "#" instead of the last "/", though I'm not at all sure about that. gregg > From: Thomas Down [mailto:td2 at sanger.ac.uk] > Sent: Wednesday, February 08, 2006 3:21 PM > To: Helt,Gregg > Cc: DAS/2 > Subject: Re: [DAS2] Why use URIs for feature IDs? > > [I should prefix my comments here by saying that I don't actually > have a terribly strong opinion on this matter *except that* I'd > really like the spec to be explicit on how feature query language > works... Does it go .../features?type=exon, .../features?type=types/ > exon, or .../features?type=http://das2.sanger.ac.uk/ensembl35/types/ > exon?]. > > Anyway, I'm still having a bit of trouble seeing why features need > individually GETable URIs. The use case I remember from the > conference call was that it would be nice to be able to describe DAS/ > 2 features in RDF documents. I guess that makes sense to me, but for > this purpose is there anything wrong with a URI like: > > http://das2.sanger.ac.uk/ensembl35/features#id12345 > > This seems compatible with Andrew's ID proposal. > > My memory of RDF/DAML/OWL/etc is that most objects which get > described in such documents are actually fragment identifiers in > larger documents, rather than individually GETable entities. Am I > missing something here? > > Thomas > > > On 8 Feb 2006, at 18:12, Helt,Gregg wrote: > > > Regarding using URIs for DAS features, here's the quote from > > Paul > > Prescod that I used in the original DAS/2 grant proposal addressing > > the > > question "why use URIs?". From > > http://www.prescod.net/rest/rpc_for_get.html : > > > > You can give that URI address to anyone, anywhere and they can > > reuse it. > > In particular this means that we can compose applications that were > > not > > thought of in advance. Google is an example of an application that was > > composed "after the fact" out of URIs. Yahoo is another...There are a > > raft of deployed W3C recommendations that work with information > > related > > through URIs. Many of these are XML-related specifications that > > work as > > well in API-like applications as in user interface-based applications. > > These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery, > > xml-stylesheet. Information published through HTTP URIs can be > > combined > > through XInclude, queried and sorted through XQuery and XSLT, visually > > rendered with xml-stylesheet, related through RDF, linked through > > XLink, > > pointed into through XPointer. > > > > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 From dalke at dalkescientific.com Thu Feb 9 10:43:27 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 15:43:27 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: References: Message-ID: <5920623233379c4200775188315082bb@dalkescientific.com> Gregg > As I understand it a fragment identifier approach would mean > URI = (URL of doc feature XML is embedded in) + "#" + value of > feature's > ID attribute. As I understand it the part after the '#' is a query language which is document type specific and used by the client. DAS does not define how that query language is used, so it has no meaning in the DAS world. http://www.ietf.org/rfc/rfc2396.txt 4. URI References The term "URI-reference" is used here to denote the common usage of a resource identifier. A URI reference may be absolute or relative, and may have additional information attached in the form of a fragment identifier. However, "the URI" that results from such a reference includes only the absolute URI after the fragment identifier (if any) is removed and after any relative URI is resolved to its absolute form. Although it is possible to limit the discussion of URI syntax and semantics to that of the absolute result, most usage of URI is within general URI references, and it is impossible to obtain the URI from such a reference without also parsing the fragment and resolving the relative form. .... 4.1. Fragment Identifier When a URI reference is used to perform a retrieval action on the identified resource, the optional fragment identifier, separated from the URI by a crosshatch ("#") character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed. As such, it is not part of a URI, but is often used in conjunction with a URI. fragment = *uric The semantics of a fragment identifier is a property of the data resulting from a retrieval action, regardless of the type of URI used in the reference. Therefore, the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result. The character restrictions described in Section 2 for URI also apply to the fragment in a URI-reference. Individual media types may define additional restrictions or structure within the fragment for specifying different types of "partial views" that can be identified within that media type. A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Feb 9 10:53:38 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 15:53:38 +0000 Subject: [DAS2] writeback via diffs Message-ID: <7a182cd18dacf110341f5cec43436f38@dalkescientific.com> Summary: We've been talking about the "update via a delta" model as an alternative to the "lots of changes to the server" model. Deltas mean the heavy work is done in the client (or middleware), vs. the server. We've been looking at the writeback spec. It doesn't handle the case of a complex feature with a parent/part relationship. In the current scheme that's done as a: - get the write lock - POST the new feature (parent) - POST the new feature (child) - commit on the lock What URL does the parent record have to point to the child? Does the database defer referential integrity checks until the commit on the lock? Is this a case where the POST for that feature returns an UPDATELIST document for every unknown/ placeholder identifier in the record? Probably. Another solution is to ask the server "give me two identifiers which can be used for features". (NOTE: must do this for either URLs or 'short ids' because the client might guess and override an existing feature.) Cute. But no real takers here. BTW, does the full DAS query system support searches of the modified version of the server? How does the server know that the search request comes from a client working in an editable view? In talking about it we've been working on an idea we all talked about last year; submitting a delta to the server and moving the heavy work into the client. That is, after the client is done locally it sends a document which looks like ... updated type information here ... ... ... There are several things to note: - the elements, to remove existing types and features - the types and features are in the normal formats. - there is no way to update a part of a record/ the record is sent in full - new identifiers are still a problem The use model for this is as follows, based on Otter. - get the SOURCES document, which will have - get an exclusive write lock on a region - POST to the locks URL (and GET gets a list of the locks?) - only one region locked at a time (current spec allows the full query language; is that needed?) - user is authenticated via HTTP-level authentication (Q: allow https for any of this?) - optional timeout time in request; server may give shorter or longer timeout - user is allowed to edit all features in the given region - get all the features in that region (because there may have been a commit before the write lock) - work with the data on the local copy of the server data - push the big red "COMMIT" button - server POSTS the delta to the server - user authentication again - also sends a lock-id or a nounce so the server can double-check that there wasn't some other change - server checks payload for referential integrity The problem is the need for a URL. We've come up with two solutions. 1. ask the server for things which can be used as identifiers. These identifiers live for the life of the lock. 2. reserve a private URI scheme, like "das-private:" followed by a client-defined identifier. On upload the server maps those into valid local identifiers. To work correctly for the client the response document would need to contain mapping from private identifiers to server identifiers. The current spec uses the latter mechanism but does not specify how the placeholder identifier is generated. The mapping is essentially the "UPDATELIST" from the current spec, though with no need to support the status field on a per item basis - it should be an all or none transaction. Sending a delta gets rid of the DELETE and PUT (and POST update) methods on the server. Not ReSTful. It places the burden on the client for tracking the user edits instead of in the server. But we have a good sense that it will work and is understandable. It maps much more closely to the current Otter use. We don't know how Apollo/Chado wants to support writeback. If we decide to stay with the existing ReSTy spec then our recommendations are: - there's no need to support partial updates; clients send the complete record to the server for update - the query language does not need to support the full DAS query language; only the "region" query (based on Otter experience) - there's no current need to extend the range of a lock nor to extend the time of the lock. And I don't like that "lock=" is a parameter to the feature and types URLs which creates locks for those types rather than performs queries. I would rather these be new URLs. Andrew dalke at dalkescientific.com From lstein at cshl.edu Thu Feb 9 11:12:32 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 9 Feb 2006 11:12:32 -0500 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: <5920623233379c4200775188315082bb@dalkescientific.com> References: <5920623233379c4200775188315082bb@dalkescientific.com> Message-ID: <200602091112.33548.lstein@cshl.edu> Hi Folks, I've drunk the W3C Kool-Aid and do feel that a major feature of DAS/2 as it now stands is that all data objects are referenceable as URIs. Furthermore, I think it is a handy-dandy feature for them to be fetchable URLs as well, having, I suppose, drunk the REST Kool-Aid. For this reason, I prefer the / notation to the # notation. Over and above the fact that the #fragment is not a part of the URI at all (according to the part of the spec that Andrew quoted), a practical issue with the # notation is that all browsers (and, I believe, some client-side libraries, although not the Perl LWP) strip out the # and whatever follows it. The server never gets a chance to act on the fragment. Since xml:base is giving us a hard time with respect to the queries, and causing major confusion and dissension in the group, I'd prefer to go with Andrew's strict idea of making all the IDs passed to the queries full URIs. In other words, including the properly escaped http://etc.etc in the query string. This is going to make it a bit annoying to debug servers from within browsers, but will clean up the semantics considerably and once and for all remove the confusion about who "owns" a feature versus who "serves" a feature. Lincoln On Thursday 09 February 2006 10:43, Andrew Dalke wrote: > Gregg > > > As I understand it a fragment identifier approach would mean > > URI = (URL of doc feature XML is embedded in) + "#" + value of > > feature's > > ID attribute. > > As I understand it the part after the '#' is a query language > which is document type specific and used by the client. DAS does not > define how that query language is used, so it has no meaning in the > DAS world. > > http://www.ietf.org/rfc/rfc2396.txt > > 4. URI References > > The term "URI-reference" is used here to denote the common usage of a > resource identifier. A URI reference may be absolute or relative, > and may have additional information attached in the form of a > fragment identifier. However, "the URI" that results from such a > reference includes only the absolute URI after the fragment > identifier (if any) is removed and after any relative URI is resolved > to its absolute form. Although it is possible to limit the > discussion of URI syntax and semantics to that of the absolute > result, most usage of URI is within general URI references, and it is > impossible to obtain the URI from such a reference without also > parsing the fragment and resolving the relative form. > .... > 4.1. Fragment Identifier > > When a URI reference is used to perform a retrieval action on the > identified resource, the optional fragment identifier, separated from > the URI by a crosshatch ("#") character, consists of additional > reference information to be interpreted by the user agent after the > retrieval action has been successfully completed. As such, it is not > part of a URI, but is often used in conjunction with a URI. > > fragment = *uric > > The semantics of a fragment identifier is a property of the data > resulting from a retrieval action, regardless of the type of URI used > in the reference. Therefore, the format and interpretation of > fragment identifiers is dependent on the media type [RFC2046] of the > retrieval result. The character restrictions described in Section 2 > > for URI also apply to the fragment in a URI-reference. Individual > media types may define additional restrictions or structure within > the fragment for specifying different types of "partial views" that > can be identified within that media type. > > A fragment identifier is only meaningful when a URI reference is > intended for retrieval and the result of that retrieval is a document > for which the identified fragment is consistently defined. > > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Thu Feb 9 11:15:48 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 9 Feb 2006 11:15:48 -0500 Subject: [DAS2] RE: Working with xml:base in Java? In-Reply-To: References: Message-ID: <200602091115.49675.lstein@cshl.edu> The Perl libraries provide a very simple HTTP_Base attribute. As you parse your way through the XML, you can change the HTTP_Base using any of the relative or absolute address resolution modes, so that subsequent URLs are correctly resolved. Unfortunately it is a SAX model, so that you have to push previous bases onto a stack and restore them as needed. Lincoln On Wednesday 08 February 2006 05:12, Helt,Gregg wrote: > > -----Original Message----- > > From: Thomas Down [mailto:td2 at sanger.ac.uk] > > Sent: Wednesday, February 08, 2006 12:45 AM > > To: Helt,Gregg > > Cc: DAS/2 > > Subject: Re: Working with xml:base in Java? > > > > On 7 Feb 2006, at 19:00, Helt,Gregg wrote: > > > Thomas, I'm wondering what toolkits you're using for binding XML > > > to Java objects? And particularly how you are dealing with > > resolving > > > > URIs when xml:base is used. So far I've mostly used various > > > implementations of SAX and DOM -- I've found some reports of builtin > > > xml:base support in Xerces SAX/DOM, but it's still unclear. > > > > > > I've been avoiding the issue up till now. It won't be too hard > > > to implement URI resolution relative to xml:base, but I thought I'd > > > check around first and see if there's automated support of this in > > > some > > > toolkit. > > > > Hi Greg, > > > > I'm actually using Stax (the streaming API for XML). The > > implementation I use is called Woodstox: > > > > http://woodstox.codehaus.org/ > > I would like to check out Stax, haven't used it before. > > > (but there are a few others out there). No builtin xml:base support > > but it's easy to write a little wrapper around XMLStreamReader to > > spot xml:base attributes and maintain a stack of base URIs. > > > > I'm using java.net.URI to do the URI handling/resolution/ > > relativization. Seems to be working okay... so far... > > That's what I was thinking about when I said it wouldn't be too hard to > implement... But that was yesterday. A long time ago. > > Now I've taken a detour into re-reading the XML Base spec > http://www.w3.org/TR/xmlbase/, and things don't seem so easy. > > I _think_ if there's at least one xml:base attribute in the element > hierarchy above where you're trying to determine a base URI, and > resolution of those xml:base attributes yields an absolute URI, it's all > good, that's the base URI. But on the other hand if this resolution > yields a relative URI instead of an absolute URI I'm not sure what > happens -- I would guess it's an error, but I can't see anywhere in the > XML Base spec that spells this out. And if there's no xml:base to use > to determine a base URI, things get weird: > if the document is "encapsulated within another entity", the base URI > is the URI of that entity (I have no idea if DAS/2 docs could appear in > such a context) > otherwise the base URI is the URI used to retrieve the document > oh, except if you burrow down into the spec pointers to RFC 2396 > http://www.ietf.org/rfc/rfc2396.txt, if the request gets redirected you > need to make sure the base URI is the last URI used in the redirect > oh yeah, and apparently external entity declarations can affect all > of this in ways I don't understand > and there's probably other gotchas I've missed... > > Now from the server side, none of this is really an issue. Just pick > from a multitude of variants that XML Base allows when you send > responses to the client. From the client side, if we really want DAS/2 > to support XML Base (and I think we do), things get tricky. It's > definitely pushing me towards using libraries that provide builtin > support for XML Base. > > Gregg > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Thu Feb 9 11:37:12 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 16:37:12 +0000 Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2) In-Reply-To: <94bafd156da54842f9093244ca6083d1@fruitfly.org> References: <43E9DC0E.30809@mrc-lmb.cam.ac.uk> <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com> <94bafd156da54842f9093244ca6083d1@fruitfly.org> Message-ID: [Top-posting summary] I agree with Chris that the DAS "type"s aren't really types. Chris Mungall: > I'm mostly skim the messages here, so I may be missing something, but > I'm a little confused by this: > > On Feb 8, 2006, at 8:36 AM, Andrew Dalke wrote: > >> >> http://das.server/../types?ontology=SO:exon > > I don't understand this - SO:exon isn't an ontology I made it up; I mean "whatever the SO term is for an exon". I think it's SO:0005845 ("single_exon") or SO:0000147 ("exon") >> PROPOSAL: Add a "source=" (case-insensitive substring search) >> field to the types query. (I don't think there is any contention >> here so I'll add it.) >> >> http://das.server/../types?ontology=SO:exon;source=Vega > > What does 'types' return? A type from an ontology (eg SO:exon) or > something else? Why would source be recorded here? Surely source would > be a valid constraint on a feature query, but not a type query. A DAS type is a somewhat strange thing, in the type sense. It stores: - the link to the ontology - a list of the formats available for features of that type - this "source" field - potentially some per-source data used for depiction, or perhaps not Thomas Down here has this use case. He has a program which searches for exons. All of the annotations it makes for a month are from that program. He wants them to be the same type - conceptually "the exons predicted by the program". Some of that data could be moved into the feature. The feature can point directly to the ontology, and have a "source". > Perhaps it's the case that in DAS a 'type' means some kind of > arbitrary grouping (eg features of type X and source Y), and > 'ontology' means a > term/type from an ontology. If it isn't too late I'd suggest changing > these conventions. That is more like the case. Got a better name. "class"? ROFL. Or not. It is not a type system. It is closer to a group than anything else. I agree that "type" has connotations which are not true for this case. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Thu Feb 9 11:40:34 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 08:40:34 -0800 Subject: [DAS2] Why use URIs for feature IDs? Message-ID: Interesting, I hadn't fully absorbed part 4 of the URI spec (rfc2396). So if I understand correctly: If we replace everywhere we've called something a "URI" with "URI reference" we're being correct -- a URI reference can be an absolute or relative URI, and can also include a fragment identifier. And according to the spec saying "the URI" means the absolute URI, not the relative URI. So to restate, I think the ids we use in DAS/2 should be URI references. Maybe instead of "id" or "uri" we should use "uri_ref" for the attribute name? I still see no reason to exclude URI references with fragment identifiers, though I agree with Lincoln that actually resolving a URL with a fragment is problematic. But we're not guaranteeing that these URI references are URLs anyway. The capabilities "query_id" attributes are another story. These need to be not just URI references but also resolve via XML-Base to full URLs. gregg > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Andrew Dalke > Sent: Thursday, February 09, 2006 7:43 AM > To: DAS/2 > Subject: Re: [DAS2] Why use URIs for feature IDs? > > Gregg > > As I understand it a fragment identifier approach would mean > > URI = (URL of doc feature XML is embedded in) + "#" + value of > > feature's > > ID attribute. > > As I understand it the part after the '#' is a query language > which is document type specific and used by the client. DAS does not > define how that query language is used, so it has no meaning in the > DAS world. > > http://www.ietf.org/rfc/rfc2396.txt > > 4. URI References > > The term "URI-reference" is used here to denote the common usage of a > resource identifier. A URI reference may be absolute or relative, > and may have additional information attached in the form of a > fragment identifier. However, "the URI" that results from such a > reference includes only the absolute URI after the fragment > identifier (if any) is removed and after any relative URI is resolved > to its absolute form. Although it is possible to limit the > discussion of URI syntax and semantics to that of the absolute > result, most usage of URI is within general URI references, and it is > impossible to obtain the URI from such a reference without also > parsing the fragment and resolving the relative form. > .... > 4.1. Fragment Identifier > > When a URI reference is used to perform a retrieval action on the > identified resource, the optional fragment identifier, separated from > the URI by a crosshatch ("#") character, consists of additional > reference information to be interpreted by the user agent after the > retrieval action has been successfully completed. As such, it is not > part of a URI, but is often used in conjunction with a URI. > > fragment = *uric > > The semantics of a fragment identifier is a property of the data > resulting from a retrieval action, regardless of the type of URI used > in the reference. Therefore, the format and interpretation of > fragment identifiers is dependent on the media type [RFC2046] of the > retrieval result. The character restrictions described in Section 2 > > for URI also apply to the fragment in a URI-reference. Individual > media types may define additional restrictions or structure within > the fragment for specifying different types of "partial views" that > can be identified within that media type. > > A fragment identifier is only meaningful when a URI reference is > intended for retrieval and the result of that retrieval is a document > for which the identified fragment is consistently defined. > > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From Gregg_Helt at affymetrix.com Thu Feb 9 11:57:02 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 08:57:02 -0800 Subject: [DAS2] Proposed agenda for DAS/2 code sprint teleconference, Feb 9 Message-ID: ids for features, sequences, types, etc. stylesheets writeback update to NIH grant proposal status report Anything else we should add? From dalke at dalkescientific.com Thu Feb 9 13:28:48 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 18:28:48 +0000 Subject: [DAS2] arbitrary data in writeback Message-ID: The DAS spec for features looks something like this ... iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFAQAAAABUH0DFAAAANUlEQVR4nGIM1AcAAAD// 2LiYgAA AAD//2I6wAAAAAD//2JycAAAAAD//2L6dgAAAAD//wMAEoUDipTjFscAAAAASUVORK5CYII= ... .. There are two points for extension. One is the PROP table which is meant to be simple. Clients can do substring searches of PROP elements with "value"s, as in prop-name=blah+blah All clients should be able to understand these data formats, though there is no constraint for the key names. They are convention only. Right now a key gets either a string, a URL, or a chuck of binary data which is uuencoded. (The key can be present many times; is that a problem with Apollo?) The latter two (URL and binary data) are *proposals*. They are neat, but not based on user demand. No one has told me that they will use it. Allen wants one more possibility, "existence", with no associated value at all. Nomi says that Apollo can't round-trip that data except by also tracking the input XML. I don't want a "it just exists" field and would prefer those stored with an empty string. Then there is the support for non-DAS elements as extensions. These can contain arbitrary XML, so long as they are not in the DAS XML namespace. A client can ignore elements it doesn't understand. However, if it does writeback of a feature it *MUST* include all elements it doesn't understand. I can write that into the spec. It doesn't need to do anything with that data. It can keep it around as a chunk of text. It just needs to send it back to the server when it does the writeback. For that matter, it doesn't even need to keep it around. It can throw the unknown data to the wind and work with the stuff it does know. Just before doing the writeback, go back to the server and get the features again. From the documents get the unknown extension elements and insert them into the data - as text! - to be sent back to the server. Clients may mess up and commit records without these elements. The server will treat those as delete of those records. Because it cannot tell if the client really knows what to do with that data. This is the easiest solution as a spec writer. We have nearly all of the format for that transaction, excepting a bit about being able to delete. NOTE: a server may ignore the uploaded data. For example, it may modify the transaction history and throw out whatever the client sent to it -- if that's how the element is specified. The other solution is to be more fine grained, so that clients send deltas, like ... iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFAQAAAABUH0DFAAAANUlEQVR4nGIM1AcAAAD// 2LiYgAA AAD//2I6wAAAAAD//2JycAAAAAD//2L6dgAAAAD//wMAEoUDipTjFscAAAAASUVORK5CYII= .. but that gets complex. You end up with a grammar for the deltas. Eg, "delete the first 'some_non_das_namespace:curation-history' but not the others". It's a harder grammar to write and a harder semantic to implement on client and server. I don't understand the case where complete writeback is a problem. There was the mention of if a client deletes a feature when it shouldn't have because of extra data that it just didn't know about. I didn't follow that at all. Please enlighten me! :) Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Thu Feb 9 14:06:03 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 09 Feb 2006 11:06:03 -0800 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 9 Feb 2006 Message-ID: Notes from the DAS/2 teleconference for the code sprint, 9 Feb 2006 $Id: das2-teleconf-2006-02-09.txt,v 1.1 2006/02/09 19:13:39 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Sanger: Thomas Down, Roy Sweden: Andrew Dalke UC Berkeley: Nomi Harris, Suzi Lewis UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. [note taker missed the first 5-10 minutes] Topic: encoded URLs ------------------- ls: apache bug - unesacped //. must be percent encoded or apache can run into problems gh: most people don't bother escaping, we should make this clear in the spec. every major library has ways of doing this automatically. [A] update spec to state: contained urls w/in das query urls should be encoded Topic: Style sheets ------------------- ad: see Jan 26/27 email, "style sheet question" what i described is not the same as what das/1 style sheets supply. we already have a mechanism gh: embed ss in types element? ad: or, new capapbility or link server for a given source. gh: prefer this td: easy to have a single style element gh: would a types elem have ptr to ss or do you query for the capability? ad: if no one's interested we don't have to answer the question. sounds like no one's interested in style sheets. gh: we'll keep what you have in the spec for style sheets and move on. ls: what is it? ad: yes. style is embedded in type record. it's now on a per-element basis. ls: ok with this. attributes of types. is there a need for a separate ss? true it mixes presentation with data model. people will look for the info they need and can ignore. ls: transition to separate sheets - visual style id pointing to ss url. same as with html. instead of 'i' tag moved to font style info. Topic: Writeback ---------------- gh: discussion in progress in uk. how big a change from current writeback spec? ad: spec: server does modification to data. this proposal: client can now do more stuff with the data. gh: writeback for client is considerably harder, rarer to impl. ad: issues: can you still do searches for modified data on server? ls: building objs from bottom up (children, to parent) so everything has a url. ad: each feat has parent and a part. ls: true. temporary id mechanism, response indicates mapping to local id is. what happens is: client locks, uploads parents, children with temp ids, does referential integrity checking, then reports mapping from temp to local id. gh: doing http DELETE imposes a constraint ls: how handling id issue? gh: you need something to create new, real id ad: b/c they're in one transaction, server can ls: delete is a problem because http delete only permits one at a time. updates a problem too. post that creates new objs allows you to create multiple new objs at same time, but push and delete only operate one at time. ad: at this point don't want to change data model. ls: so everything will be a post then, under your proposal, for writeback url. ad: a single post. gh: moving from http delete to a trying to understand how this is a delta model. ad: only updates things that changed, and listed deletions ls: fine. writeback, create update and delete sections td: granularity. not single characters. one feature. ls: three transactions we previously had, put, post, and delete, and roll up into a single transaction. gh: when you send back a feat you ve already seen, do you restate all the xml for that feature, since otherwise it is deleted? ad: yes. gh: would like the unit of ro ls: this achieves per transaction integrity, since you don't have to do multiple deletes. the lock idea, had to persist over multiple transactions to allow for that atomicity. gh: we need to keep lock so curators can guarantee that nothing changes underneath them. td: lock corresponds to a db transaction as well. ls: no one's impl this writeback so there's no friction against changing it. i'm fine with it. as long as people don't mind we're losing a cute feature described in a grant. gh: what does roy or ed g. think? roy: have been involved in this. this mirrors some features that otter does. a good idea. deletes and put aren't big winners, if updating multiple feats and they refer to each other. roy: whole xml doc is the transcaction ls: if anything doesn't make sense, all requests in the writeback doc are rolled back. roy: yes. some error messages to understand what might be going wrong. gh: splits and merges work too? merging one feature from two, or splitting one transcript into two. roy: fits in well. get back two ids of new features. otter give a lot back in the xml after posting the data. gh: treats id in feat is a placeholder and it sends a real id back to you. ls: your given a temporary placeholder then it give you real id. might want to put a formal merge and split commands. because in proposed new system (and old) to split one exon to two, you have to either delete the original one, or update it to change one boundary and create a new one. you've lost the ability to keep track of the original and the two new ones. ad: feats have place for arbitrary annotations. creational history log could be maintained. ls: how upload this to a server. splitting exon into two daughters is different from deleting and creating two new ones. ad: no needs this, for future. gh: it's needed now. ls: splitting genes into two pieces is important. people want to keep track of this. formal merges and splits permits this tracking. gh: my take, prefer fewer verbs as possible. if we can formally define splits and merges as combos of delets and creates, perfer this. ls: semantically difficult for server to know that a delete followed by two creates is different than a split. td: ancestor id on the features can solve this. ad: haven't heard about this use case. features have place where you can stick in new data. database can read it to understand history. gh: like idea of curational track of ancestors. before, people said we can't require dbs to do this. td: optional property ls: could thread it through feature properties. ad: this version, or for 2.1? gh: initial write back must support splits and merges. [broad agreement] ls: make sure it will work. what happens when track of ancestors and the ancestor object disappears. gh: can't assume a db has identifier for every curation in it's past state. roy: weakness of the current otter schema, james is working on a fix. tag a release and go back to genes as of that release. ls: acedb had this feature to rollback to older versions of gene model. aday: the schem we're using has support to previous version. roy: tedious. big script, but a good thing to have. ls: a few hours of more discussion to see what's involved in supporting tracking curational merges, splits, renames, etc. to make sure it's the write decision to put it into a curational property of feature rather than having a formal database merges and split operations. i'm ok doing it this way if it seems ok. gh, aday: me too Topic: NIH grant proposal ------------------------- gh: i'm the bottle neck Status reports: --------------- gh: igb das client still. checked in code. you can get das2 client in igb poiting to codesprint das2 server. sources, segments, types. no features yet. working on this today. should go faster today. ad: sent email to allen about some things about server that don't agree with spec. properties aday: features have no properties associated with them. do we need valtype or href. nh: a key with no value doesn't make sense. using 'true' if no value. aday: ok. but need an agreement on what to do for properties with no associated value or type ad: can make it so. aday: now put in empty string ad: use for both value and href aday: can't have both. ad: what's interpretation if you have both? can take out href part and have value= empty string nh: client deals with empty value. ad: leave it as a string suzi: uneasy about this. td: it does have a value, empty string. suzi: some places where empty string doesn't make sense. data gets dirty. if you're gonna have a tag-value structure, and may or may not be a value, it's bad. some things are tag-value, some things just have a value. it seems ambiguous, no guaranteed behavior. ad: guaratee is for all keys to have a value. can be empty string. gh: string or empty string is ok ad: only used for clients who know what it means. may have to update apollo gh: if we allow arbitrary xml in features, client will have to remember this xml or it will disappear. ls: a huge issue w/ apollo in past. when communicating w/ db's that have extra stuff, in the xml that isn't on client side data model. suzi: my take, the client should not have to pass it all through. nh: it forces client to be a complete database gh: then the delta writeback ls: works ok for deletes, updates become an issue ad: you have to deal with text you don't understand. ls: you have to keep track of tags you don't understand, other wise they are deleted. gh: trade off, simplicity of writeback, and what client has to remember. ls: client says: i don't understand it, but i can't delete it. gh: how hard is it to have an abritrary xml chunk by client? ls: give it an empty tag to say you want it to go away. nh: how do you delete things that came in empty and you want to delete them? ls: can have attribute="delete me". this creates a burden on server side. [client folks like this..] decided to keep everything you know know and send it back. round trip it. ad: client can throw away what it wants. can go back to server ls: boomerang. gh: a variety of ways to make sure the data gets stored. roy: will be in feature. just hold a pointer to it. suxi: hard for apollow. passive round tripping is fine.. difficulty is with deletes. ignoring stuff, don't know what it is. delete a transcript or whole gene. some of that stuff you don't know what it is, describes a mutant phenotype. you deleted from genomic record, but there's other data that shouldn't be deleted. client would have to be fully cognizant of it, beyond genome sequence features. client now needs to model all the other data too. ls: difficult to understand how a client could deal with it. ad: just xml is a opaque chunk. why can't client send back full record? suzi: won't solve the full problem. if annotator said delete it gh: client says delete that feature. it won't pass back any stuff underneath the feature. some stuff underneath it that shouldn't be deleted. ad: that's what you have back ups for. suzi: beyond this. to deal with this, we made deletes be more atomic. had to be handled at server side, otherwise, we have to put all that knowledge into client. gets tied to a particular group. ad: knowledge of what? suzi: additional information if you delete whole thing at top, any pass through data is also gone. gh: not hard on client, just what does the server do with that? suzi: this is why it belongs on server side. knows what matters and what doesn't matter. if you don't want clients tied to a particular db. that solution will be inadequate. we had to put the info on the client and make the operations as fine grained as we could. ap: writeback issues have been discussed. suggest to take this up tomorrow. ad: could someone write up why a client couldn't just track the tings that it wanted? then we can consider. Status reports, cont'd ---------------------- roy: zmap client. can get sources and types from server. parsing it creating internal objects. can't draw features yet. long discussion about write back today. ad: validator stuff td: talking about writeback. ap: working on registry. first das/2 server. distinguish between das/1 and das/2 via accession points. brian: rpm build for allen's server. will post today at biopackages.net suzi: spoke to chris about web services for ontology. he will talk with allen. thing about ids to deal with. also, if we do a web service that isn't das like, it should be doable. should be able to get the terms. also, if we want to have stop codon replacement, you also have to say what position, what it's replaced with (uridine). how is this done in das spec? gh: can you post to the list? suzi: yes. aday: will raise writeback issues as well. suzi: small point mutations, indel, substitution (base and position) aday: nearly got apache config file done, impl new std error documents, 300, with error document. nh: more apollo client progress. haven't dealt with types yet. ee: igb improvements. sc: pipeline for populating affy das server with array data. completed pipeline for exon array design data. From nomi at fruitfly.org Thu Feb 9 15:08:33 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Thu, 9 Feb 2006 12:08:33 -0800 (PST) Subject: [DAS2] unary properties In-Reply-To: References: Message-ID: <17387.41281.765157.17683@kinked.lbl.gov> On 9 February 2006, Andrew Dalke wrote: > Allen wants one more possibility, "existence", with no associated > value at all. Nomi says that Apollo can't round-trip that data > except by also tracking the input XML. I don't want a "it just > exists" field and would prefer those stored with an empty string. fwiw, the empty string (rather than no string) doesn't help apollo--the way it stores properties, if you ask for the value of property "foo" and there's no "foo" in the property table, you get back "" (this was to avoid having to put a million null-pointer checks). so apollo would not be able to differentiate--for purposes of writeback OR display without apollo--between and internally, both of these would look like "i don't know anything about property foo," unless i saved them as "foo=true" when they were read in, and then how would it know how to write them out correctly? i would suggest that either 1. we use two different terms to differentiate between key/value properties and properties that are valueless (though really i think they are *keyless* rather than valueless). perhaps the latter could be called "attributes" or something? (actually, ATTRIBUTE is probably a bad choice since it has a meaning in xml, but you get the idea.) OR (and i prefer this): 2. every property is required to have a key and either a value or an href. the valueless (or keyless) properties in the yeast data look like i guess these are like the default cases where other features might (although i haven't seen any of these) have properties like but where did "property/molecular_function unknown" come from in the first place? what i think it should look like is and then we avoid the whole keyless-property issue and make the information more accessible to clients (and hence to users). the way it is now, it's an uninterpretable blob of text (really more of a comment than a property), where as separating into key/value suddenly gives it more meaning. Nomi From Gregg_Helt at affymetrix.com Thu Feb 9 15:05:14 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 12:05:14 -0800 Subject: [DAS2] unary properties Message-ID: Looks to me like these might be GO terms, which should probably be represented more like: and possibly include an href to a description of that GO term. Of course one could argue whether the attribute values should be URI references rather than arbitrary strings, but you get the idea. gregg > -----Original Message----- > From: Nomi Harris [mailto:nomi at fruitfly.org] > Sent: Thursday, February 09, 2006 12:56 PM > To: Andrew Dalke; allenday at ucla.edu > Cc: nomi at fruitfly.org; Helt,Gregg > Subject: Re: [DAS2] unary properties > > On 9 February 2006, Nomi Harris wrote: > > the valueless (or keyless) properties in the yeast data look like > > > > i just looked at another region and found some more interesting valuless > (though i think they should be called keyless) properties: > > > > href=""/> > > these really seem to me to be missing important information. "nucleous"? > we're going to randomly mention cell parts? what this really should say > is > > right? > > so i think this is buggy data--it is missing the keys, and that should be > fixed. in fact, i think having the spec insist that properties have both > key and value would help to catch errors like this. > > Nomi From Gregg_Helt at affymetrix.com Thu Feb 9 18:18:42 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 15:18:42 -0800 Subject: [DAS2] Refinements to range attribute and query filters in spec Message-ID: In the latest spec, the format for range queries is seqid/min:max:strand and the format for range attributes in feature elements is min:max:strand In the earlier spec (http://biodas.org/documents/das2/das2_get.html#ranges) everything but the seqid component of the range query was optional. Are min and max still optional, as in these examples from the previous version of the spec? Chr1/1000 Chr1 beginning at position 1000 and going to the end. Chr1/:2000 Chr1 from the start to position 2000. I personally find these kind of ranges confusing and not particularly useful, and would rather make min and max required for both the range attribute and range-based query filters. Also, the latest spec states: A region may be on the forward or reverse strand or on both strands. These are respectively denoted 1, -1 and 0. The reverse strand is the reverse complement of the forward strand. Unspecified strand means forward strand. So for a features query, are the four overlap filters below equivalent? Chr1/1000:2000 Chr1/1000:2000:1 Chr1/1000:2000:-1 Chr1/1000:2000:0 Or does the addition of strand information further filter the returned features by strand? But if that's the case, then according to the spec having no strand specified means forward. So that would mean overlaps="Chr1/1000:2000" would only return forward strand annotations, and not any on the reverse strand? To me that's counterintuitive, from a filtering perspective I'd rather no strand info mean "both strands". My main point though is we need to be explicit about how strand info or lack thereof affects features queries with range-based filters. gregg From suzi at fruitfly.org Thu Feb 9 19:29:57 2006 From: suzi at fruitfly.org (Suzanna Lewis) Date: Thu, 9 Feb 2006 16:29:57 -0800 Subject: [DAS2] question or two In-Reply-To: References: Message-ID: <54bc0e433303827918fe475855669a89@fruitfly.org> if an annotator wants to indicate a stop-codon-readthrough (which may or may not be a seleno-cysteine mechanism). how would DAS send this info through? need SO type (the readthrough), the location (relative to transcript or genome), and the mechanism. tRNA anticodon or AA? alternative translation table? infer this from organism? -S From Gregg_Helt at affymetrix.com Thu Feb 9 20:43:16 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 17:43:16 -0800 Subject: [DAS2] feature NOTE and ALIAS elements? Message-ID: > -----Original Message----- > From: das2-bounces at portal.open-bio.org > [mailto:das2-bounces at portal.open-bio.org] On Behalf Of Andrew Dalke > Sent: Tuesday, February 07, 2006 7:45 AM > To: DAS/2 > Subject: Re: [DAS2] properties and queries > > > To summarize, the current thought here for properties and > queries is as follows (it's a long summary. More like an essay. :) > > Add support for zero or more elements in the feature, > of the form > This is some arbitrary (but non-markup-ed) text > > > Add a features search keyword "note=" which takes a search > string to be found in the note elements. (substring? > soundex? regex? the search engine calls up Lincoln and asks?) > > > Add support for zero or more elements in the feature, > of the form > > > (I missed this in the redraft. It should have been there. > Feature filter "name" already says it searches the "name" and > "alias" fields for a feature.) Is the plan still as stated above, to have optional NOTE and ALIAS elements in features? I don't see these elements in the feature schema, and the spec doc says they're built-in properties instead (values for PROP key attribute that have defined meaning). Gregg From td2 at sanger.ac.uk Fri Feb 10 03:54:16 2006 From: td2 at sanger.ac.uk (Thomas Down) Date: Fri, 10 Feb 2006 08:54:16 +0000 Subject: [DAS2] Refinements to range attribute and query filters in spec In-Reply-To: References: Message-ID: <4A9E3BE1-9E24-4D25-AAD1-1851F18857D0@sanger.ac.uk> On 9 Feb 2006, at 23:18, Helt,Gregg wrote: > > In the latest spec, the format for range queries is > seqid/min:max:strand > and the format for range attributes in feature elements is > min:max:strand > > In the earlier spec > (http://biodas.org/documents/das2/das2_get.html#ranges) everything but > the seqid component of the range query was optional. Are min and max > still optional, as in these examples from the previous version of the > spec? > Chr1/1000 Chr1 beginning at position 1000 and going to the > end. > Chr1/:2000 Chr1 from the start to position 2000. > I personally find these kind of ranges confusing and not particularly > useful, and would rather make min and max required for both the range > attribute and range-based query filters. I think it's reasonable for a client to want to fetch all features attached to a given sequence ID. This would certainly be sensible behaviour for clients which always work on reasonably short sequences (e.g. protein-specialized clients), but even genome-centric clients might want to do this when they've had a hint that a particular feature type is "low density" (e.g. chromosome banding patterns?). I'm not sure if anyone would want to query a range where only one of min and max are specified. > Also, the latest spec states: > > A region may be on the forward or reverse strand or on both strands. > These are respectively denoted 1, -1 and 0. The reverse strand is the > reverse complement of the forward strand. Unspecified strand means > forward strand. > > So for a features query, are the four overlap filters below > equivalent? > Chr1/1000:2000 > Chr1/1000:2000:1 > Chr1/1000:2000:-1 > Chr1/1000:2000:0 > Or does the addition of strand information further filter the returned > features by strand? But if that's the case, then according to the > spec > having no strand specified means forward. So that would mean > overlaps="Chr1/1000:2000" would only return forward strand > annotations, > and not any on the reverse strand? To me that's counterintuitive, > from > a filtering perspective I'd rather no strand info mean "both strands". > My main point though is we need to be explicit about how strand > info or > lack thereof affects features queries with range-based filters. Hmmm, I'd been interpreting Chr1/1000:2000 as "return features on both strands", but from the paragraph you quote I guess this is wrong. I'd be happy to see this changes to "Unspecified strand means both strands". Thomas. From dalke at dalkescientific.com Fri Feb 10 05:47:26 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 10:47:26 +0000 Subject: [DAS2] Refinements to range attribute and query filters in spec In-Reply-To: References: Message-ID: Gregg: > In the latest spec, the format for range queries is > seqid/min:max:strand > and the format for range attributes in feature elements is > min:max:strand > I personally find these kind of ranges confusing and not particularly > useful, and would rather make min and max required for both the range > attribute and range-based query filters. Agreed on this side. All clients can easily get the upper limit, and the lower limit is always 0. > My main point though is we need to be explicit about how strand info or > lack thereof affects features queries with range-based filters. It was a confusion on my part. There are three places which refer to location + strand. 1. specifying a feature location 2. fetching a sequence 3. doing a range search "1. specifying a feature location" We've been talking here about limiting the use of strands for these. Features definitely need a strand. If the strand is not specified then the feature is on both strands. or has no meaning. If needed, resolve the ambiguity by looking at the type (or other property). If you really, really want to specify that it's on both strands then use the 0. The location element currently looks like this Given the decision yesterday that segments are special, in terms of identification, I propose using the short id, so these look like, respectively "2. fetching a sequence" Why does the server needs to support a reverse complement feature? Let's leave it out and make the client do a string reversal if it needs it. "3. doing a range search" Is there any reason to specify the strandedness when doing a feature query? Discussion here seems to be "would be nice but that lack is one of the things people have never complained about in DAS1". I propose removing strandedness from the features query. If others disagree then here are two solutions: A. have a "strand=" parameter, so that the strandedness is different from the ranges. If you want a query for the union of range Chr1/A:B:-1 and range Chr1/X:Y:1 then tough - make two requests, one for each strand. B. ranges may specify the strand (as now) but if not specified then it means "of any strand". We worked on a few cases where it might be useful to make mixed strand queries. There weren't any compelling reasons. Even in the worst case scenario without strand support in the features query is that you get on average twice the number of features back, and worst case for option A is the need to make two queries. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Feb 10 05:48:18 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 10:48:18 +0000 Subject: [DAS2] Re: feature NOTE and ALIAS elements? In-Reply-To: References: Message-ID: Gregg: > Is the plan still as stated above, to have optional NOTE and ALIAS > elements in features? I don't see these elements in the feature > schema, > and the spec doc says they're built-in properties instead (values for > PROP key attribute that have defined meaning). Yes. I haven't updated the spec other than a few minor points in the last couple of days. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Feb 10 10:04:45 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 15:04:45 +0000 Subject: [DAS2] 'OR' syntax in query language Message-ID: <8593bb5041e0d054840da98c200d3e03@dalkescientific.com> We talked a bit about the DAS query language. It is currently of the form (modulo URL escaping) name=Andrew,Roy;inside=Chr/100:200 This is the same as ( name contains the substring "Andrew" OR name contains the substring "Roy" ) AND ( feature is inside 100:200 on the segment named 'Chr' ) That is, there is an AND of all terms, and a single term may have multiple OR-ed subqueries, merged by commas. We want to change this to the form name=Andrew;name=Roy;inside That is, the query key can exist more than once. Queries with the same key are 'OR'ed, elsewise they are 'AND'ed. The advantage is the simplicity of not having to worry about another quoting rule, in this case how to search for terms containing a ",". The only disadvantage is with servers which don't handle multiple keys in a query - but we think those client libraries are long since deceased. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Feb 10 10:15:05 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 15:15:05 +0000 Subject: [DAS2] range searches Message-ID: <80684d437a99822fd017cceee83b02b4@dalkescientific.com> I think Gregg has thought the most about this one. We have 4 classes of range search: 'inside' (feature completely inside request range) 'overlaps' (feature overlaps the request range) 'contains' (feature completely contains request range) 'identical' (feature is exactly the request range) They exist for smart clients which want to limit the region request size based on previously fetched knowledge. Example: client is viewing "500:600" and zooms out to "400:700". In that case the client could ask for features which overlap 400:500 OR overlap 600:700 excluding those which overlap 500:600. If that's the case, the selection language isn't powerful enough. There's no way to choose "excluding". The other option is to issue only the overlap queries. Does the query language need to be more powerful to allow "excluding what I know about these regions" for people like Gregg? Another question came up; are queries like overlap 400:500 OR inside 900:1000 useful? I don't think so. If it is, it is not supported by the current language which only does AND of dissimilar terms. Andrew dalke at dalkescientific.com From ap3 at sanger.ac.uk Fri Feb 10 10:21:25 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Fri, 10 Feb 2006 15:21:25 +0000 Subject: [DAS2] registry status Message-ID: <2fa320fbca91abfa9f175b64d0d8105c@sanger.ac.uk> Hi! the developmental registry has been updated: it now supports 2 requests: http://www.spice-3d.org/dasregistry/das2/sources lists das2 servers http://www.spice-3d.org/dasregistry/das1/sources lists das1 servers. The next step will be to provide user upload of das2 sources Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From dalke at dalkescientific.com Fri Feb 10 10:49:10 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 15:49:10 +0000 Subject: [DAS2] curation history and splits&merges Message-ID: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com> We talked some on tracking curation history. We decided it was a hard topic and we would defer further discussion to the next sprint. We're getting rather frazzled here after nearly 5 days of hard work. Here are some things that came up. The writeback delta needs a field for user comments. How persistent is an identifier for an object? Is it for the exact version of a feature or is it for the concept of a the given feature? That is, if there's a feature change the server could assign it a new id/url. It would need to tell the annotation about the new id, just like it tells the client about the newly created ids. This makes updates more like a changeset version control system, where there is a version number for each stable data set. Compare to CVS where there is a version number for each file/record but not for the whole system. But the current Otter database is more the CVS route. While the changeset version seems nicer, there will be some (I assume non-trivial) work to make Otter support it. There are advantages. You could do searches with timewarps by using a "changeset=" parameter in the query. The DAS mechanism handles that just fine, since interlinks between no-longer current URLs would be correct. There needs to be a way to get the history of an element. There are two thoughts: - put the curation history in the feature document (via some embedded XML) - link to a URL which provides the curational history document for the given element We prefer the latter. For splits and merges there needs to be support in the delta to say if there is a relationship to existing or about to be deleted features. We did not work on that, other than to get a feel that it works. Again, no server handles this so we decided it table it for the future, and work on it more for the next sprint. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Fri Feb 10 11:36:49 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Fri, 10 Feb 2006 08:36:49 -0800 Subject: [DAS2] IGB DAS/2 client partially working -- and using registry! Message-ID: Attached is a screenshot of IGB with data from a yeast test region (chrVII, ~364-366kb) loaded from Allen's codesprint server by way of Andreas' DAS/2 registry. Still need to work on synchronizing up source names, etc., but this is looking good. As we had planned, having the registry return a sources document allowed very easy integration! You may notice there is also a branch of the sources tree that is a direct path to the codesprint server. That just means I gave the discovery engine two URLs to start from -- the registry and the codesprint server. This is the same version of IGB as the current head of the CVS repository (as of today 8:30 AM PST). I'm tempted to roll up a jar so people can try it without having to compile the source, but on the other hand it's pretty fragile right now, and the image conveys the gist of it. gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Andreas Prlic > Sent: Friday, February 10, 2006 7:21 AM > To: DAS/2 > Subject: [DAS2] registry status > > Hi! > > the developmental registry has been updated: > it now supports 2 requests: > > http://www.spice-3d.org/dasregistry/das2/sources > lists das2 servers > > http://www.spice-3d.org/dasregistry/das1/sources > lists das1 servers. > > The next step will be to provide user upload of das2 sources > > Andreas > > > > > ----------------------------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -------------- next part -------------- A non-text attachment was scrubbed... Name: DAS2_in_IGB.JPG Type: image/jpeg Size: 170143 bytes Desc: DAS2_in_IGB.JPG URL: From Gregg_Helt at affymetrix.com Fri Feb 10 12:01:11 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Fri, 10 Feb 2006 09:01:11 -0800 Subject: [DAS2] Proposed agenda for DAS/2 Code Sprint teleconference, Feb 10 Message-ID: Properties Range-based queries Status reports - summarize overall progress during code sprint Discuss next code sprint - goals, etc. ??? From dalke at dalkescientific.com Fri Feb 10 13:14:47 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 18:14:47 +0000 Subject: [DAS2] changes commited Message-ID: <6425fabe79dc6d27fd3a797b837d32de@dalkescientific.com> removed the href= and type= options in the spec and all examples. changed the url "," syntax for OR'ed terms into multiple "key=value;key=value" terms. changed "att=key:value" into "prop-key=value" Andrew dalke at dalkescientific.com From suzi at fruitfly.org Fri Feb 10 14:48:58 2006 From: suzi at fruitfly.org (Suzanna Lewis) Date: Fri, 10 Feb 2006 11:48:58 -0800 Subject: [DAS2] question on properties In-Reply-To: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com> References: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com> Message-ID: <26b9a50a02deb2be55150c6a5a47d419@fruitfly.org> You probably know the answer to this Andrew. One of the cases we encountered was unique properties vs cumulative properties. For a simplistic (i.e. don't quibble to closely, I'm just trying to explain) example pretend that "ssn" and "comment" are both properties. On the client side the appropriate behavior for these is different if the data coming over from the server contains >1 prop element with that tag. If the client sees "ssn" twice it winces and then either ignores or overwrites with the 2nd value. If the client sees "comment" twice then it appends the additional comment. Question: Is this kind of information included in the spec? Uniqueness vs. cumulative From Steve_Chervitz at affymetrix.com Fri Feb 10 17:10:28 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Fri, 10 Feb 2006 14:10:28 -0800 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 10 Feb 2006 Message-ID: Notes from the DAS/2 teleconference for the code sprint, 10 Feb 2006 $Id: das2-teleconf-2006-02-10.txt,v 1.1 2006/02/10 22:13:17 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Sanger: Thomas Down, Andreas Prlic Sweden: Andrew Dalke UCLA: Allen Day Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. [note taker missed the first 5 minutes] Topic: Properties ----------------- gh: Properties are all tag-value ad: yes gh: don't think we need your binary thing. ad: ok drop it gh: href is needed. can always point it to a binary something out there. can the value just be a url? ad: can make it relative to xml base gh: do you need some property with tag value and href at same time? ls: how would you interpret that? should be either value or href. ad: there's nothing to say how to interpret the url. gh: nice to have multiple links out to somewhere else and to have some indication what they are w/out traversing the link. e.g., this is the genbank ref, ensembl ref, protein, etc. if xid had an extra field with label, title e.g. that would suffice. ad: sounds ok [A] xids will have title + href, properties will have tag + value Topic: Exercising the spec --------------------------- gh: we need the reference server to actually exercise this part of the spec. xid. possibly other things like: target overlap, inside, cigar strings. encoding, decoding. aday: oh no. ls: line element. cigar string is something that no one has tested yet. gh: if we don't have server doing it by next code sprint aday: any impls out there we could use? gh: bioperl has a gff3 parser. aday: I wrote it, and I didn't impl cigar string parsing. ls: there's a cigar processor in bioperl AlignIO. in theory not hard to do. gh: lbl folks (Nomi et al) have a java one, too. I think. gh: other parts of spec that aren't getting exercised? I doubt if anyone has used xml lang. ad: added xml id. just there for other reasons, but not what we need it for. gh: we talked about all ids being xml ids and combing xml id and xml base, can't remember why we stopped discussing. ad: don't think we need to. style sheet has uses for this maybe. ad: has anyone generated doc href yet? td: can add this stuff easily now. gh: for testing purposes, just throw a doc href everywhere it's allowed. ad: are servers supporting retrieval of seq data? aday: yes ad: support for alt feature formats? aday: can do old compact formats, not sure about coverage. gh: yes, alt feat formats are handled, but server isn't up and running yet. igb das/2 client can handle it already. ad: retrival of assembly? aday: no assembly data ad: i don't touch assembly gh: may be for next code sprint. Topic: range based query ------------------------ gh: thomas and i don't like optional mins and maxes. ls: fine as long as you can always determine the size of the reference. provide beginning and end. gh: exception: if you want the whole sequence, can you just not supply range? ad: yes gh: :1 and :-1 how to interpret nothing for strand on end and 0 for strand at end? ls: features that have strand +1, -1, features that have no strand or on both strands (0) features that may have a strand but you don't know (empty) gh: when you put it in the query there's a differences between i don't know and i will accept anything. use case: transfrags from transcriptome project. unknown strand, but I know it *is* one or the other strand. ls: how about this arrangement: empty = i don't care 0 = has strand but i dont know 1 = forward strand -1 = reverse strand 2 = both strands ad: could be organized by track (everything in a track has same strand. gh: don't think is good to structure a query so it's required that you do have strand. you might could have diff strand designation on same track. ls: you want to be able to distinguish things that are on both strands, things that are on either strand, but you don't know which. gh: biggest concern: given a range based query to server 1000-2000 means everything that overlaps, any strandedness within this range. ad: should support stranded searches. client can filter out opposed to do a strand request against seq to get the rev comp. client should be able to do this. gh: in range attrib of features, you can add colon to indicate strandedness. ad: yes gh: if no :strand does this mean unknown or don't care? ls: defaults to *, anything. you get fwd, rev, don't know, don't care. gh: required things on fwd strand to be :1, not make it a default. ad: ok. if not there, means ambiguous, unknown, or not appropriate. see email i sent. if you get rid of search for strand in region query, most of this issue goes away. gh: don't think people would use this often (stranded query) ad: you can make two queries to server instead of one. gh: this is a resolution for all range-related issues. ad: check my email to make sure it covers this. [A] everyone review andrew's email re: range queries and strand issues. gh: also or-ing of diff range-based queries is not useful for me. I mainly need intersects of overlaps and inside. or-ing is equivalent to using multiple queries. td: why do you need and overlaps and inside? gh: optimization on client side. keeps track of what it has received. wants to minimize re-fetching. td: can you just use overlap and not overlap? gh: that may be equivalent, but the way I do it, you can guarantee you never get the same feat twice with that combo. will require and-ing of two range-based queries. ad: modifying query lang, or-ing together two. include first range and include second range should use multiple query keys because of the comma. you will have to escape any comma if it's inside of query string. gh: don't like the implicit 'and' if different but 'or' if keys the same. it depends on the query. ad: now all queries are and-ed, but commas mean multiple. ls: comma syntax seems natural. the occasional query that had to have an escaped comma didn't cause any bother. td: this was as it is in das/1. exons and repeat. type=exon, type=repeat. so the suggestion is to use the das/1 behavior. ad: three independent segments gh: types as well. can have any number of types= and segment= all or-ed together. I still need anding of overlaps and inside. td: different key are or-ed, same keys are and-ed. ls: hoisted by my own petard here. works for me. gh: allen? aday: what's changed? ls: the whole query language has changed in a fundamental way. aday: dealing with multiple attributes with same name. fine. gh: will server accept full urls for types? aday: not now but will impl this. gh: all types should be full uri's now. my client can't deal but will soon. Topic: status reports --------------------- gh: state what what you hoped to accomplish and what you actually accomplished. gh: hoped to get igb das client up to date with spec, working with one das2 server, and get affy das2 server up and going. affy das2 server will take longer. maybe by next code sprint. igb is now using latest das2 spec, calling allen's server, and using registry as well. happy with results. not everything done, but some unexpected things (registry). wrote up progress report for grant: going out 3pm today (we got another day) a 2pg summary. will send out to everyone later. todo: get das2 server up. client: deal with full uri issue. this is a basic fuctionality of the client. smart handling of uris. ee: igb client. big thing is make it treat all data sources too all behave similar way das1/das2, quick load, separate files, regardless of the data format. want to make it all seamless. going well. sc: streamlined pipeline for populating das sever with affy exon array data. didn't get to pipeline for external data (UCSC tracks), but have basic framework in place. ad: decided to do more writeback at next sprint. when is next sprint? gh: march 13-17. lincoln will be in UK and can participate from there. ad: I'm in the states next week. will come to emeryville for next sprint. [A] next code sprint is 13-17 March. Mark your calendars. ad: hoped to work on spec, resolve detailed questions, make sure it works with people's needs. will work on incorporating latest ideas into spec. validator: have one but is not fit for public consumption. not at where it was last summer on the previous version of spec. ap: das interface for registry, can serve das1 and das2 sources w/ new source command. java client - not yet. registry: todo UI so users can upload to das registry. td: hoping to write server. got something up for feat, types, segments, need to run through andrew's validator. hope to work on writeback, but didn't happen (but good discussion on it). want to get more data included, ensembl database. roy has been working on zmap client, coming along fine. aday: primary goals: to support new version of spec -- not fully done uri problem in query parsing. apache config integration is done. installation and rpm for server - done for FC3 i386, available in the next couple of days (brian o'connor). general documentation improvement in code for server - not done. Next step: post, put, delete, writeback framework (originally planned this but may need to rethink), impl transaction logs (maybe in flux). adding more unit tests. ad: writeback spec won't happen for at least 2 weeks. need to write up what we've done on current spec first. ls: will be available from 14th on. at ensembl meeting up to the 13th. gh: allen come to emeryville? aday: maybe. gh: will have to explore how to fund hosting folks here for next codesprint. gh: speaking for nomi - she had apollo working for parsing features and displaying them. some issues with higher level integration into apollo. making good progress. gh: time to wrap it up. thanks for your hard work. [applause] [A] next teleconf will be on 20 Feb, 9:30 PST 5:30 UK (regular time) we're skipping 13 feb (next monday) given all our time this week. From dalke at dalkescientific.com Fri Feb 10 21:11:05 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 11 Feb 2006 02:11:05 +0000 Subject: [DAS2] Re: question on properties In-Reply-To: <26b9a50a02deb2be55150c6a5a47d419@fruitfly.org> References: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com> <26b9a50a02deb2be55150c6a5a47d419@fruitfly.org> Message-ID: Suzi: > On the client side the appropriate behavior for these is different if > the data coming over from the server contains >1 prop element with > that tag. > > If the client sees "ssn" twice it winces and then either ignores or > overwrites with the 2nd value. Or it says "error, error, cannot compute" and stops. From one of the guidelines ("the zen") of Python: "when in doubt, refuse the temptation to guess." > If the client sees "comment" twice then it appends the additional > comment. > > Question: Is this kind of information included in the spec? Uniqueness > vs. cumulative Here's my thoughts. We have several points for client/server extensions. One is this property table, which is a set of key/value strings. Because they are strings you can use them for almost anything, with the correct interpretation by the client and server. That requires collusion between the two. This is the extension point which is most familiar to everyone. But it's open to the problem you pointed out. The other is this non-DAS extension XML, which lets the server add *anything*. If the client doesn't know what the field does it must ignore it. If it does writeback with that feature it must include the ignored element, and not make any changes. That means your server can add 123-45-1534 If the client doesn't know what to do, it ignores it. It will never change the field. If the client knows what that field does it must follow the constraints set down for it, else the server should stop with an error and not allow the update to occur. There are two downsides to this approach. There's no way for a dumb client to understand that field, so no user will ever see it, and there's no way to do a search on that field. (A server can extend the search syntax and tell the client about the new syntax, but a dumb client doesn't know about that.) If there is need to support the dumb client then the only way to support the data type constraints is in the server. It must check a given field and possibly stop with an error or resolve ambiguities. We can have that the server reports an error message that the client and/or user can use to figure out what's wrong. Thinking about it a bit, it's possible to combine these two. For example, a server can have then list as an extension All this latter XML does is flag sufficiently aware clients that the server implements the special SSN requirements. A dumb client can ignore the flag, users add a new SSN, and the server bails out, while the smart client early on knows that that isn't going to be allowed. This hybrid solution doesn't seem right to me though. I currently (and without any experience) prefer putting schema constrained fields in as extension elements. Think of the property table as something exposed to the user as a completely editable table, with no ability to limit what that person does. For the case of the SSN that might be overkill. For other things, like the current stage of a feature in the curational process, it's best to put that data there and not in the generic property table. There is a long history of using generic key/value tables as an ad-hoc way to extend a protocol. I'm trying to improve on that by defining a way for a server to add well-structure, schema-dependent and searchable data (for smart clients) without needing to piggy back on a bunch of strings. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Feb 20 10:31:42 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 20 Feb 2006 08:31:42 -0700 Subject: [DAS2] today's conf. call and President's Day Message-ID: Today is President's Day in the US. Are the other US people working today? Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Mon Feb 20 11:47:13 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 20 Feb 2006 08:47:13 -0800 Subject: [DAS2] today's conf. call and President's Day Message-ID: It's a day off for Affymetrix, but I'm working anyway. Unless there are major objections I'd like to go ahead and do the conference call at the standard time (9:30 AM Pacific time). There may be a few less people joining in from the US. thanks, gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Andrew Dalke > Sent: Monday, February 20, 2006 7:32 AM > To: DAS/2 > Subject: [DAS2] today's conf. call and President's Day > > Today is President's Day in the US. > > Are the other US people working today? > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From lstein at cshl.edu Mon Feb 20 12:37:06 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 20 Feb 2006 12:37:06 -0500 Subject: [DAS2] today's conf. call and President's Day In-Reply-To: References: Message-ID: <200602201237.06497.lstein@cshl.edu> Hi, I've dialed in and all I"m getting is hold music. Could you confirm this info? 800 531-3250 287-9055 Thanks! Lincoln On Monday 20 February 2006 11:47, Helt,Gregg wrote: > It's a day off for Affymetrix, but I'm working anyway. Unless there are > major objections I'd like to go ahead and do the conference call at the > standard time (9:30 AM Pacific time). There may be a few less people > joining in from the US. > > thanks, > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > > [mailto:das2-bounces at portal.open- > > > bio.org] On Behalf Of Andrew Dalke > > Sent: Monday, February 20, 2006 7:32 AM > > To: DAS/2 > > Subject: [DAS2] today's conf. call and President's Day > > > > Today is President's Day in the US. > > > > Are the other US people working today? > > > > Andrew > > dalke at dalkescientific.com > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Mon Feb 20 11:50:38 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 20 Feb 2006 11:50:38 -0500 Subject: [DAS2] today's conf. call and President's Day In-Reply-To: References: Message-ID: <200602201150.38431.lstein@cshl.edu> I am working today! Lincoln On Monday 20 February 2006 10:31, Andrew Dalke wrote: > Today is President's Day in the US. > > Are the other US people working today? > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From dalke at dalkescientific.com Mon Feb 20 12:28:56 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 20 Feb 2006 10:28:56 -0700 Subject: [DAS2] today's conf. call and President's Day In-Reply-To: References: Message-ID: Thomas Down wrote: > Well, I can't speak for US people, but I do know that Andreas Prlic is > on holiday today and I presume won't be joining the conference call. > I can join if there's anything that needs discussing urgently, but > otherwise I'd be happy to leave it 'til next week. Status update for me: Last week was a break for me from the sprint - I was winded. I worked a bit here and there on how to do a GUI interface for the validation. I hope to get a demo page of the results up within a day or so. This week I'll be working on that and a new draft of the spec. Also, I'm now back home in Santa Fe, where we haven't had rain nor snow for 100 days - my cacti are drooping! :( Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Feb 27 09:50:10 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 27 Feb 2006 08:50:10 -0600 Subject: [DAS2] will miss today's conf. call Message-ID: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com> Hi all, Not only am I on the road back from the Python conference but my cell phone battery is nearing dead so I won't be able to make it to today's phone conference call. Here's my status. I've been working on the validator, to the detriment of the next spec rewrite. This validator does single-document checks. That is, it does not do internal integrity checks to make sure that the results of, say, a range query only returns features in that range, or that the features are in the range given by the segments. I plugged the results into a web server running on my laptop. It's using some new Python libraries which are not yet installed on the OBF machine, but which I can install after I get back to Santa Fe. The GUI is similar to what I threw together at Sanger during the Sprint - enter a URL and a document type, view the results. What took long is the code to pin down where the errors happened, for example, to show which attribute was the extra attribute in an element. I've attached sample output for your viewing pleasure. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- There is enough there for a Javascript jockey to make an neat little interactive viewer, eg, click on the error message to be shown where it occurs in the document. Also, the marker I'm using to show where the error occurs in the body of the text needs work - the method I use isn't that cross platform portable. I think the next steps for me are: - get the validator working as-is on the OBF web site (should be on-line by tomorrow) - get back to writing the 3rd draft of the spec. Andrew dalke at dalkescientific.com From ap3 at sanger.ac.uk Mon Feb 27 12:41:08 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon, 27 Feb 2006 17:41:08 +0000 Subject: [DAS2] will miss today's conf. call In-Reply-To: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com> References: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com> Message-ID: <197aeffa03988a8fc098f27926ee511d@sanger.ac.uk> any conference call today? - listening to the hold music Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From nomi at fruitfly.org Mon Feb 27 12:43:00 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Mon, 27 Feb 2006 09:43:00 -0800 Subject: [DAS2] will miss today's conf. call In-Reply-To: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com> References: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com> Message-ID: <17411.14884.410370.608675@spongecake.lbl.gov> are we having a teleconference today? i got bored of waiting on hold for the moderator. someone email me if it's happening. the validator sounds useful! Nomi From boconnor at ucla.edu Mon Feb 27 19:46:02 2006 From: boconnor at ucla.edu (Brian O'Connor) Date: Mon, 27 Feb 2006 16:46:02 -0800 Subject: [DAS2] DAS2 Reference Server @ UCLA Message-ID: <44039D4A.5000503@ucla.edu> Hi, If anyone is using the DAS/2 server at UCLA (das.biopackages.net) there will be some maintenance on the server later today (after 5pm Pacific). This won't affect the DAS/2 codebase, I'm just moving around some of our other production websites and there will be some downtime. The outage should just last a few minutes. --Brian From ap3 at sanger.ac.uk Wed Feb 1 12:42:16 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Wed, 1 Feb 2006 12:42:16 +0000 Subject: [DAS2] code sprint final infos Message-ID: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk> Hi! This is to provide final organisatorial infos about the DAS 2 code sprint next week. - We start Monday 10:00 (Sanger time) in the Morgan building - meeting point is the small meeting room next to the kitchen 1st floor (we get a better room later). - The sanger guest wireless network supports Skype. so instant messaging and voice over IP calls will be possible during all the time. - every day at 17:00 (Sanger time = 9:00 pacific time) there will be a conference call on the usual DAS2 line Greetings, Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From allenday at ucla.edu Wed Feb 1 22:42:26 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 1 Feb 2006 14:42:26 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: I just looked over your changes, and will begin making the changes to the server repository today. I'd like to update the server at das.biopackages.net with my changes on Friday, unless there are objections. I'll be taking notes along the way and will post to the list if anything in your document is unclear to me. At first glance, I agree -- the changes are minor. -Allen On Mon, 30 Jan 2006, Andrew Dalke wrote: > Allen: > > Is the spec going to be in a stable state for the code sprint? I'd > > like > > to use this time to sync the server implementation with a stable > > version > > of the spec. It looks like there have been many substantial changes. > > I have just (within the last few minutes) completed the first draft > of the update of the spec. > > It's not in HTML - that calls for too much work for this stage. > It's text, in CVS under das/das2/new_spec.txt > > There are many parts which need clarification. These are marked > with a "XXX" along with my comments. > > The RNC files are in > > das/das2/scratch/*.rnc > along with some test XML files. These XML files are not meant > to be realistic. They are meant more to check edge cases. > > I do no think there are major changes to the spec. Most of the > changes have actually trimmed things down, like getting rid of > the "properties" subtree and merging the different "sources" requests > into a single document. > > > Here are the major interfaces > > $PREFIX/sequence - a "sources" request > This is the top-level entry point to a DAS 2 server. It returns a > list of the available genomic sequence and their versions. > [sequence-namespace] > > $PREFIX/sequence/$SOURCE - a "source" request > Returns the available versions of the given genomic sequence. > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > Returns information about a given version of a genomic sequence. > Clients may assume that the sequence and assembly are constant for a > given version of a source. Note that annotation data on a server > with curational write-back support may change without changing the > version. > > > For a given version here are the sub-parts. Note that I've gone ahead > and split the query urls (segment, features and types each have query > interfaces) from the base directory used as containers for the segments, > features and types. > > $VERSION/segments - the segments query URL; summarizes the top-level > segments in the data source > > $VERSION/segment/$SEGMENT_ID - a segment query; used to get detailed > information about the identified segment > > $VERSION/features - the feature filter query URL. Features are > locatable annotations or experimental results. The feature filter > URL supports query parameters to select a subset of the features > based on position, feature type and other properties. > > $VERSION/feature/$FEATURE_ID - a feature query; used to get detailed > information about the identified feature > > $VERSION/types - the types query URL which returns a list of all > feature types. Feature types include ontology and depiction > details for all features of the given type. > > $VERSION/type/$TYPE_ID - details about the specified feature type > > Oh, and there are internal conflicts which will be straightened > out in the next draft. These shouldn't be big. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From Gregg_Helt at affymetrix.com Wed Feb 1 23:14:30 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 1 Feb 2006 15:14:30 -0800 Subject: [DAS2] Re: Apollo and DAS/2 priorities Message-ID: That would be great if you could update the biopackages server before the code sprint starts! Then client implementers will have a server to test with. thanks, gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Allen Day > Sent: Wednesday, February 01, 2006 2:42 PM > To: Andrew Dalke > Cc: DAS/2 > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > I just looked over your changes, and will begin making the changes to the > server repository today. > > I'd like to update the server at das.biopackages.net with my changes on > Friday, unless there are objections. > > I'll be taking notes along the way and will post to the list if anything > in your document is unclear to me. > > At first glance, I agree -- the changes are minor. > > -Allen > > > On Mon, 30 Jan 2006, Andrew Dalke wrote: > > > Allen: > > > Is the spec going to be in a stable state for the code sprint? I'd > > > like > > > to use this time to sync the server implementation with a stable > > > version > > > of the spec. It looks like there have been many substantial changes. > > > > I have just (within the last few minutes) completed the first draft > > of the update of the spec. > > > > It's not in HTML - that calls for too much work for this stage. > > It's text, in CVS under das/das2/new_spec.txt > > > > There are many parts which need clarification. These are marked > > with a "XXX" along with my comments. > > > > The RNC files are in > > > > das/das2/scratch/*.rnc > > along with some test XML files. These XML files are not meant > > to be realistic. They are meant more to check edge cases. > > > > I do no think there are major changes to the spec. Most of the > > changes have actually trimmed things down, like getting rid of > > the "properties" subtree and merging the different "sources" requests > > into a single document. > > > > > > Here are the major interfaces > > > > $PREFIX/sequence - a "sources" request > > This is the top-level entry point to a DAS 2 server. It returns a > > list of the available genomic sequence and their versions. > > [sequence-namespace] > > > > $PREFIX/sequence/$SOURCE - a "source" request > > Returns the available versions of the given genomic sequence. > > > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > > Returns information about a given version of a genomic sequence. > > Clients may assume that the sequence and assembly are constant for a > > given version of a source. Note that annotation data on a server > > with curational write-back support may change without changing the > > version. > > > > > > For a given version here are the sub-parts. Note that I've gone ahead > > and split the query urls (segment, features and types each have query > > interfaces) from the base directory used as containers for the segments, > > features and types. > > > > $VERSION/segments - the segments query URL; summarizes the top-level > > segments in the data source > > > > $VERSION/segment/$SEGMENT_ID - a segment query; used to get detailed > > information about the identified segment > > > > $VERSION/features - the feature filter query URL. Features are > > locatable annotations or experimental results. The feature filter > > URL supports query parameters to select a subset of the features > > based on position, feature type and other properties. > > > > $VERSION/feature/$FEATURE_ID - a feature query; used to get detailed > > information about the identified feature > > > > $VERSION/types - the types query URL which returns a list of all > > feature types. Feature types include ontology and depiction > > details for all features of the given type. > > > > $VERSION/type/$TYPE_ID - details about the specified feature type > > > > Oh, and there are internal conflicts which will be straightened > > out in the next draft. These shouldn't be big. > > > > Andrew > > dalke at dalkescientific.com > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From allenday at ucla.edu Wed Feb 1 23:27:11 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 1 Feb 2006 15:27:11 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: That's what I was thinking too, but I was worried about the existing Genoviz clients "in the wild" having the server suddenly break. So you're saying it's okay with you if those clients have a service interruption? -Allen On Wed, 1 Feb 2006, Helt,Gregg wrote: > > That would be great if you could update the biopackages server before > the code sprint starts! Then client implementers will have a server to > test with. > > thanks, > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > [mailto:das2-bounces at portal.open- > > bio.org] On Behalf Of Allen Day > > Sent: Wednesday, February 01, 2006 2:42 PM > > To: Andrew Dalke > > Cc: DAS/2 > > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > > I just looked over your changes, and will begin making the changes to > the > > server repository today. > > > > I'd like to update the server at das.biopackages.net with my changes > on > > Friday, unless there are objections. > > > > I'll be taking notes along the way and will post to the list if > anything > > in your document is unclear to me. > > > > At first glance, I agree -- the changes are minor. > > > > -Allen > > > > > > On Mon, 30 Jan 2006, Andrew Dalke wrote: > > > > > Allen: > > > > Is the spec going to be in a stable state for the code sprint? > I'd > > > > like > > > > to use this time to sync the server implementation with a stable > > > > version > > > > of the spec. It looks like there have been many substantial > changes. > > > > > > I have just (within the last few minutes) completed the first draft > > > of the update of the spec. > > > > > > It's not in HTML - that calls for too much work for this stage. > > > It's text, in CVS under das/das2/new_spec.txt > > > > > > There are many parts which need clarification. These are marked > > > with a "XXX" along with my comments. > > > > > > The RNC files are in > > > > > > das/das2/scratch/*.rnc > > > along with some test XML files. These XML files are not meant > > > to be realistic. They are meant more to check edge cases. > > > > > > I do no think there are major changes to the spec. Most of the > > > changes have actually trimmed things down, like getting rid of > > > the "properties" subtree and merging the different "sources" > requests > > > into a single document. > > > > > > > > > Here are the major interfaces > > > > > > $PREFIX/sequence - a "sources" request > > > This is the top-level entry point to a DAS 2 server. It returns > a > > > list of the available genomic sequence and their versions. > > > [sequence-namespace] > > > > > > $PREFIX/sequence/$SOURCE - a "source" request > > > Returns the available versions of the given genomic sequence. > > > > > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > > > Returns information about a given version of a genomic sequence. > > > Clients may assume that the sequence and assembly are constant > for a > > > given version of a source. Note that annotation data on a server > > > with curational write-back support may change without changing > the > > > version. > > > > > > > > > For a given version here are the sub-parts. Note that I've gone > ahead > > > and split the query urls (segment, features and types each have > query > > > interfaces) from the base directory used as containers for the > segments, > > > features and types. > > > > > > $VERSION/segments - the segments query URL; summarizes the > top-level > > > segments in the data source > > > > > > $VERSION/segment/$SEGMENT_ID - a segment query; used to get > detailed > > > information about the identified segment > > > > > > $VERSION/features - the feature filter query URL. Features are > > > locatable annotations or experimental results. The feature > filter > > > URL supports query parameters to select a subset of the features > > > based on position, feature type and other properties. > > > > > > $VERSION/feature/$FEATURE_ID - a feature query; used to get > detailed > > > information about the identified feature > > > > > > $VERSION/types - the types query URL which returns a list of all > > > feature types. Feature types include ontology and depiction > > > details for all features of the given type. > > > > > > $VERSION/type/$TYPE_ID - details about the specified feature type > > > > > > Oh, and there are internal conflicts which will be straightened > > > out in the next draft. These shouldn't be big. > > > > > > Andrew > > > dalke at dalkescientific.com > > > > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/das2 > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > From allenday at ucla.edu Wed Feb 1 23:30:22 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 1 Feb 2006 15:30:22 -0800 (PST) Subject: [DAS2] code sprint final infos In-Reply-To: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk> References: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk> Message-ID: What IM service are we using, and where can we collate all user IDs? Perhaps it would be better to meet up in an IRC channel. I propose gathering in #codesprint on EFnet. -Allen On Wed, 1 Feb 2006, Andreas Prlic wrote: > Hi! > > This is to provide final organisatorial infos about the DAS 2 code > sprint next week. > > - We start Monday 10:00 (Sanger time) in the Morgan building - > meeting point is the small meeting room next to the kitchen 1st floor > (we get a better room later). > > - The sanger guest wireless network supports Skype. so instant > messaging and voice over IP calls > will be possible during all the time. > > - every day at 17:00 (Sanger time = 9:00 pacific time) there will be a > conference call on the usual DAS2 line > > Greetings, > Andreas > > > > > ----------------------------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From nomi at fruitfly.org Thu Feb 2 00:37:44 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Wed, 1 Feb 2006 16:37:44 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: <17377.21592.854840.243376@kinked.lbl.gov> On 1 February 2006, Helt,Gregg wrote: > That would be great if you could update the biopackages server before > the code sprint starts! Then client implementers will have a server to > test with. yes!! On 1 February 2006, Allen Day wrote: > That's what I was thinking too, but I was worried about the existing > Genoviz clients "in the wild" having the server suddenly break. are there really a lot of users (as opposed to das developers) who are using the biopackages server? On 1 February 2006, Allen Day wrote: > What IM service are we using, and where can we collate all user IDs? > Perhaps it would be better to meet up in an IRC channel. > > I propose gathering in #codesprint on EFnet. i need details on this as well. i've never bothered registering for an IM service or IRC channel. Nomi From ed_erwin at affymetrix.com Wed Feb 1 23:44:35 2006 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Wed, 01 Feb 2006 15:44:35 -0800 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: <43E147E3.1030705@affymetrix.com> Gregg asked me to say "No". Please do not break the current server that IGB is using. Please make your changes on a server at a different URL. Thanks Ed Allen Day wrote: > That's what I was thinking too, but I was worried about the existing > Genoviz clients "in the wild" having the server suddenly break. > > So you're saying it's okay with you if those clients have a service > interruption? > > -Allen > > > On Wed, 1 Feb 2006, Helt,Gregg wrote: > > >>That would be great if you could update the biopackages server before >>the code sprint starts! Then client implementers will have a server to >>test with. >> >> thanks, >> gregg >> >> >>>-----Original Message----- >>>From: das2-bounces at portal.open-bio.org >> >>[mailto:das2-bounces at portal.open- >> >>>bio.org] On Behalf Of Allen Day >>>Sent: Wednesday, February 01, 2006 2:42 PM >>>To: Andrew Dalke >>>Cc: DAS/2 >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities >>> >>>I just looked over your changes, and will begin making the changes to >> >>the >> >>>server repository today. >>> >>>I'd like to update the server at das.biopackages.net with my changes >> >>on >> >>>Friday, unless there are objections. >>> >>>I'll be taking notes along the way and will post to the list if >> >>anything >> >>>in your document is unclear to me. >>> >>>At first glance, I agree -- the changes are minor. >>> >>>-Allen >>> >>> >>>On Mon, 30 Jan 2006, Andrew Dalke wrote: >>> >>> >>>>Allen: >>>> >>>>>Is the spec going to be in a stable state for the code sprint? >> >>I'd >> >>>>>like >>>>>to use this time to sync the server implementation with a stable >>>>>version >>>>>of the spec. It looks like there have been many substantial >> >>changes. >> >>>>I have just (within the last few minutes) completed the first draft >>>>of the update of the spec. >>>> >>>>It's not in HTML - that calls for too much work for this stage. >>>>It's text, in CVS under das/das2/new_spec.txt >>>> >>>>There are many parts which need clarification. These are marked >>>>with a "XXX" along with my comments. >>>> >>>>The RNC files are in >>>> >>>> das/das2/scratch/*.rnc >>>>along with some test XML files. These XML files are not meant >>>>to be realistic. They are meant more to check edge cases. >>>> >>>>I do no think there are major changes to the spec. Most of the >>>>changes have actually trimmed things down, like getting rid of >>>>the "properties" subtree and merging the different "sources" >> >>requests >> >>>>into a single document. >>>> >>>> >>>>Here are the major interfaces >>>> >>>>$PREFIX/sequence - a "sources" request >>>> This is the top-level entry point to a DAS 2 server. It returns >> >>a >> >>>> list of the available genomic sequence and their versions. >>>> [sequence-namespace] >>>> >>>>$PREFIX/sequence/$SOURCE - a "source" request >>>> Returns the available versions of the given genomic sequence. >>>> >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request >>>> Returns information about a given version of a genomic sequence. >>>> Clients may assume that the sequence and assembly are constant >> >>for a >> >>>> given version of a source. Note that annotation data on a server >>>> with curational write-back support may change without changing >> >>the >> >>>> version. >>>> >>>> >>>>For a given version here are the sub-parts. Note that I've gone >> >>ahead >> >>>>and split the query urls (segment, features and types each have >> >>query >> >>>>interfaces) from the base directory used as containers for the >> >>segments, >> >>>>features and types. >>>> >>>> $VERSION/segments - the segments query URL; summarizes the >> >>top-level >> >>>> segments in the data source >>>> >>>> $VERSION/segment/$SEGMENT_ID - a segment query; used to get >> >>detailed >> >>>> information about the identified segment >>>> >>>> $VERSION/features - the feature filter query URL. Features are >>>> locatable annotations or experimental results. The feature >> >>filter >> >>>> URL supports query parameters to select a subset of the features >>>> based on position, feature type and other properties. >>>> >>>> $VERSION/feature/$FEATURE_ID - a feature query; used to get >> >>detailed >> >>>> information about the identified feature >>>> >>>> $VERSION/types - the types query URL which returns a list of all >>>> feature types. Feature types include ontology and depiction >>>> details for all features of the given type. >>>> >>>> $VERSION/type/$TYPE_ID - details about the specified feature type >>>> >>>>Oh, and there are internal conflicts which will be straightened >>>>out in the next draft. These shouldn't be big. >>>> >>>> Andrew >>>> dalke at dalkescientific.com >>>> >>>>_______________________________________________ >>>>DAS2 mailing list >>>>DAS2 at portal.open-bio.org >>>>http://portal.open-bio.org/mailman/listinfo/das2 >>>> >>> >>>_______________________________________________ >>>DAS2 mailing list >>>DAS2 at portal.open-bio.org >>>http://portal.open-bio.org/mailman/listinfo/das2 >> > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From Gregg_Helt at affymetrix.com Wed Feb 1 23:51:23 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 1 Feb 2006 15:51:23 -0800 Subject: [DAS2] Re: Apollo and DAS/2 priorities Message-ID: Yes, what Ed said, that's what I meant. Updated server, but at a different address. Otherwise the current release of IGB will break when trying to use the biopackages server. Once our IGB code has caught up to the updated server, we can roll out a new release to point to the new server instead of the old one. But not yet. Thanks, Gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Ed Erwin > Sent: Wednesday, February 01, 2006 3:45 PM > To: Allen Day > Cc: DAS/2 > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > Gregg asked me to say "No". Please do not break the current server that > IGB is using. > > Please make your changes on a server at a different URL. > > Thanks > Ed > > Allen Day wrote: > > That's what I was thinking too, but I was worried about the existing > > Genoviz clients "in the wild" having the server suddenly break. > > > > So you're saying it's okay with you if those clients have a service > > interruption? > > > > -Allen > > > > > > On Wed, 1 Feb 2006, Helt,Gregg wrote: > > > > > >>That would be great if you could update the biopackages server before > >>the code sprint starts! Then client implementers will have a server to > >>test with. > >> > >> thanks, > >> gregg > >> > >> > >>>-----Original Message----- > >>>From: das2-bounces at portal.open-bio.org > >> > >>[mailto:das2-bounces at portal.open- > >> > >>>bio.org] On Behalf Of Allen Day > >>>Sent: Wednesday, February 01, 2006 2:42 PM > >>>To: Andrew Dalke > >>>Cc: DAS/2 > >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > >>> > >>>I just looked over your changes, and will begin making the changes to > >> > >>the > >> > >>>server repository today. > >>> > >>>I'd like to update the server at das.biopackages.net with my changes > >> > >>on > >> > >>>Friday, unless there are objections. > >>> > >>>I'll be taking notes along the way and will post to the list if > >> > >>anything > >> > >>>in your document is unclear to me. > >>> > >>>At first glance, I agree -- the changes are minor. > >>> > >>>-Allen > >>> > >>> > >>>On Mon, 30 Jan 2006, Andrew Dalke wrote: > >>> > >>> > >>>>Allen: > >>>> > >>>>>Is the spec going to be in a stable state for the code sprint? > >> > >>I'd > >> > >>>>>like > >>>>>to use this time to sync the server implementation with a stable > >>>>>version > >>>>>of the spec. It looks like there have been many substantial > >> > >>changes. > >> > >>>>I have just (within the last few minutes) completed the first draft > >>>>of the update of the spec. > >>>> > >>>>It's not in HTML - that calls for too much work for this stage. > >>>>It's text, in CVS under das/das2/new_spec.txt > >>>> > >>>>There are many parts which need clarification. These are marked > >>>>with a "XXX" along with my comments. > >>>> > >>>>The RNC files are in > >>>> > >>>> das/das2/scratch/*.rnc > >>>>along with some test XML files. These XML files are not meant > >>>>to be realistic. They are meant more to check edge cases. > >>>> > >>>>I do no think there are major changes to the spec. Most of the > >>>>changes have actually trimmed things down, like getting rid of > >>>>the "properties" subtree and merging the different "sources" > >> > >>requests > >> > >>>>into a single document. > >>>> > >>>> > >>>>Here are the major interfaces > >>>> > >>>>$PREFIX/sequence - a "sources" request > >>>> This is the top-level entry point to a DAS 2 server. It returns > >> > >>a > >> > >>>> list of the available genomic sequence and their versions. > >>>> [sequence-namespace] > >>>> > >>>>$PREFIX/sequence/$SOURCE - a "source" request > >>>> Returns the available versions of the given genomic sequence. > >>>> > >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > >>>> Returns information about a given version of a genomic sequence. > >>>> Clients may assume that the sequence and assembly are constant > >> > >>for a > >> > >>>> given version of a source. Note that annotation data on a server > >>>> with curational write-back support may change without changing > >> > >>the > >> > >>>> version. > >>>> > >>>> > >>>>For a given version here are the sub-parts. Note that I've gone > >> > >>ahead > >> > >>>>and split the query urls (segment, features and types each have > >> > >>query > >> > >>>>interfaces) from the base directory used as containers for the > >> > >>segments, > >> > >>>>features and types. > >>>> > >>>> $VERSION/segments - the segments query URL; summarizes the > >> > >>top-level > >> > >>>> segments in the data source > >>>> > >>>> $VERSION/segment/$SEGMENT_ID - a segment query; used to get > >> > >>detailed > >> > >>>> information about the identified segment > >>>> > >>>> $VERSION/features - the feature filter query URL. Features are > >>>> locatable annotations or experimental results. The feature > >> > >>filter > >> > >>>> URL supports query parameters to select a subset of the features > >>>> based on position, feature type and other properties. > >>>> > >>>> $VERSION/feature/$FEATURE_ID - a feature query; used to get > >> > >>detailed > >> > >>>> information about the identified feature > >>>> > >>>> $VERSION/types - the types query URL which returns a list of all > >>>> feature types. Feature types include ontology and depiction > >>>> details for all features of the given type. > >>>> > >>>> $VERSION/type/$TYPE_ID - details about the specified feature type > >>>> > >>>>Oh, and there are internal conflicts which will be straightened > >>>>out in the next draft. These shouldn't be big. > >>>> > >>>> Andrew > >>>> dalke at dalkescientific.com > >>>> > >>>>_______________________________________________ > >>>>DAS2 mailing list > >>>>DAS2 at portal.open-bio.org > >>>>http://portal.open-bio.org/mailman/listinfo/das2 > >>>> > >>> > >>>_______________________________________________ > >>>DAS2 mailing list > >>>DAS2 at portal.open-bio.org > >>>http://portal.open-bio.org/mailman/listinfo/das2 > >> > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From allenday at ucla.edu Thu Feb 2 00:07:54 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 1 Feb 2006 16:07:54 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: Okay, I will tag the current server and leave it at: http://das.biopackages.net/das I saw in the most recent commits by Andrew that the root-level "/das" is no longer needed, so I propose putting an updated server at: http://das.biopackages.net/codesprint If we're going to keep the current server in a "maintained but deprecated" mode like this, I'll be making changes to the "new" server before Friday. When the new version of IGB comes out we can then upgrade the current server. Sound good? -Allen On Wed, 1 Feb 2006, Helt,Gregg wrote: > Yes, what Ed said, that's what I meant. Updated server, but at a > different address. Otherwise the current release of IGB will break when > trying to use the biopackages server. > > Once our IGB code has caught up to the updated server, we can roll out a > new release to point to the new server instead of the old one. But not > yet. > > Thanks, > Gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > [mailto:das2-bounces at portal.open- > > bio.org] On Behalf Of Ed Erwin > > Sent: Wednesday, February 01, 2006 3:45 PM > > To: Allen Day > > Cc: DAS/2 > > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > > > > Gregg asked me to say "No". Please do not break the current server > that > > IGB is using. > > > > Please make your changes on a server at a different URL. > > > > Thanks > > Ed > > > > Allen Day wrote: > > > That's what I was thinking too, but I was worried about the existing > > > Genoviz clients "in the wild" having the server suddenly break. > > > > > > So you're saying it's okay with you if those clients have a service > > > interruption? > > > > > > -Allen > > > > > > > > > On Wed, 1 Feb 2006, Helt,Gregg wrote: > > > > > > > > >>That would be great if you could update the biopackages server > before > > >>the code sprint starts! Then client implementers will have a server > to > > >>test with. > > >> > > >> thanks, > > >> gregg > > >> > > >> > > >>>-----Original Message----- > > >>>From: das2-bounces at portal.open-bio.org > > >> > > >>[mailto:das2-bounces at portal.open- > > >> > > >>>bio.org] On Behalf Of Allen Day > > >>>Sent: Wednesday, February 01, 2006 2:42 PM > > >>>To: Andrew Dalke > > >>>Cc: DAS/2 > > >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > >>> > > >>>I just looked over your changes, and will begin making the changes > to > > >> > > >>the > > >> > > >>>server repository today. > > >>> > > >>>I'd like to update the server at das.biopackages.net with my > changes > > >> > > >>on > > >> > > >>>Friday, unless there are objections. > > >>> > > >>>I'll be taking notes along the way and will post to the list if > > >> > > >>anything > > >> > > >>>in your document is unclear to me. > > >>> > > >>>At first glance, I agree -- the changes are minor. > > >>> > > >>>-Allen > > >>> > > >>> > > >>>On Mon, 30 Jan 2006, Andrew Dalke wrote: > > >>> > > >>> > > >>>>Allen: > > >>>> > > >>>>>Is the spec going to be in a stable state for the code sprint? > > >> > > >>I'd > > >> > > >>>>>like > > >>>>>to use this time to sync the server implementation with a stable > > >>>>>version > > >>>>>of the spec. It looks like there have been many substantial > > >> > > >>changes. > > >> > > >>>>I have just (within the last few minutes) completed the first > draft > > >>>>of the update of the spec. > > >>>> > > >>>>It's not in HTML - that calls for too much work for this stage. > > >>>>It's text, in CVS under das/das2/new_spec.txt > > >>>> > > >>>>There are many parts which need clarification. These are marked > > >>>>with a "XXX" along with my comments. > > >>>> > > >>>>The RNC files are in > > >>>> > > >>>> das/das2/scratch/*.rnc > > >>>>along with some test XML files. These XML files are not meant > > >>>>to be realistic. They are meant more to check edge cases. > > >>>> > > >>>>I do no think there are major changes to the spec. Most of the > > >>>>changes have actually trimmed things down, like getting rid of > > >>>>the "properties" subtree and merging the different "sources" > > >> > > >>requests > > >> > > >>>>into a single document. > > >>>> > > >>>> > > >>>>Here are the major interfaces > > >>>> > > >>>>$PREFIX/sequence - a "sources" request > > >>>> This is the top-level entry point to a DAS 2 server. It > returns > > >> > > >>a > > >> > > >>>> list of the available genomic sequence and their versions. > > >>>> [sequence-namespace] > > >>>> > > >>>>$PREFIX/sequence/$SOURCE - a "source" request > > >>>> Returns the available versions of the given genomic sequence. > > >>>> > > >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > > >>>> Returns information about a given version of a genomic > sequence. > > >>>> Clients may assume that the sequence and assembly are constant > > >> > > >>for a > > >> > > >>>> given version of a source. Note that annotation data on a > server > > >>>> with curational write-back support may change without changing > > >> > > >>the > > >> > > >>>> version. > > >>>> > > >>>> > > >>>>For a given version here are the sub-parts. Note that I've gone > > >> > > >>ahead > > >> > > >>>>and split the query urls (segment, features and types each have > > >> > > >>query > > >> > > >>>>interfaces) from the base directory used as containers for the > > >> > > >>segments, > > >> > > >>>>features and types. > > >>>> > > >>>> $VERSION/segments - the segments query URL; summarizes the > > >> > > >>top-level > > >> > > >>>> segments in the data source > > >>>> > > >>>> $VERSION/segment/$SEGMENT_ID - a segment query; used to get > > >> > > >>detailed > > >> > > >>>> information about the identified segment > > >>>> > > >>>> $VERSION/features - the feature filter query URL. Features are > > >>>> locatable annotations or experimental results. The feature > > >> > > >>filter > > >> > > >>>> URL supports query parameters to select a subset of the > features > > >>>> based on position, feature type and other properties. > > >>>> > > >>>> $VERSION/feature/$FEATURE_ID - a feature query; used to get > > >> > > >>detailed > > >> > > >>>> information about the identified feature > > >>>> > > >>>> $VERSION/types - the types query URL which returns a list of all > > >>>> feature types. Feature types include ontology and depiction > > >>>> details for all features of the given type. > > >>>> > > >>>> $VERSION/type/$TYPE_ID - details about the specified feature > type > > >>>> > > >>>>Oh, and there are internal conflicts which will be straightened > > >>>>out in the next draft. These shouldn't be big. > > >>>> > > >>>> Andrew > > >>>> dalke at dalkescientific.com > > >>>> > > >>>>_______________________________________________ > > >>>>DAS2 mailing list > > >>>>DAS2 at portal.open-bio.org > > >>>>http://portal.open-bio.org/mailman/listinfo/das2 > > >>>> > > >>> > > >>>_______________________________________________ > > >>>DAS2 mailing list > > >>>DAS2 at portal.open-bio.org > > >>>http://portal.open-bio.org/mailman/listinfo/das2 > > >> > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/das2 > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > From Gregg_Helt at affymetrix.com Thu Feb 2 01:03:47 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 1 Feb 2006 17:03:47 -0800 Subject: [DAS2] Alternative feature formats in current DAS/2 spec Message-ID: When discussing alternative feature formats, the spec reads: The feature query URL supports the optional "format" parameter used to request that the results be returns in an alternative format. The format names are listed in the versioned source document in the element of the "feature" . I think the second sentence should instead read something like: The possible format names for a particular feature type are listed in the types document in the elements for a given type. Also, the spec says: Some of search results may not be expressible in the specified format. The server should silently skip those feature records and return only those records which can be converted. I would argue that if any of the search results cannot be returned in the specified format, then the server should really just return an error. Silently suppressing information is not good. A generic 400-"Bad Request" would work, although a 415-"Unsupported Media Type" might be more appropriate. gregg From allenday at ucla.edu Thu Feb 2 01:16:04 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 1 Feb 2006 17:16:04 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: There are still many references to "region" in Andrew's .txt document. Is it safe to assume that anywhere "region" is mentioned, it should really be "segment" now? I believe the answer is yes. I'm asking to see if I need to change the feature filter implementation. -Allen On Wed, 1 Feb 2006, Helt,Gregg wrote: > > That would be great if you could update the biopackages server before > the code sprint starts! Then client implementers will have a server to > test with. > > thanks, > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > [mailto:das2-bounces at portal.open- > > bio.org] On Behalf Of Allen Day > > Sent: Wednesday, February 01, 2006 2:42 PM > > To: Andrew Dalke > > Cc: DAS/2 > > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > > I just looked over your changes, and will begin making the changes to > the > > server repository today. > > > > I'd like to update the server at das.biopackages.net with my changes > on > > Friday, unless there are objections. > > > > I'll be taking notes along the way and will post to the list if > anything > > in your document is unclear to me. > > > > At first glance, I agree -- the changes are minor. > > > > -Allen > > > > > > On Mon, 30 Jan 2006, Andrew Dalke wrote: > > > > > Allen: > > > > Is the spec going to be in a stable state for the code sprint? > I'd > > > > like > > > > to use this time to sync the server implementation with a stable > > > > version > > > > of the spec. It looks like there have been many substantial > changes. > > > > > > I have just (within the last few minutes) completed the first draft > > > of the update of the spec. > > > > > > It's not in HTML - that calls for too much work for this stage. > > > It's text, in CVS under das/das2/new_spec.txt > > > > > > There are many parts which need clarification. These are marked > > > with a "XXX" along with my comments. > > > > > > The RNC files are in > > > > > > das/das2/scratch/*.rnc > > > along with some test XML files. These XML files are not meant > > > to be realistic. They are meant more to check edge cases. > > > > > > I do no think there are major changes to the spec. Most of the > > > changes have actually trimmed things down, like getting rid of > > > the "properties" subtree and merging the different "sources" > requests > > > into a single document. > > > > > > > > > Here are the major interfaces > > > > > > $PREFIX/sequence - a "sources" request > > > This is the top-level entry point to a DAS 2 server. It returns > a > > > list of the available genomic sequence and their versions. > > > [sequence-namespace] > > > > > > $PREFIX/sequence/$SOURCE - a "source" request > > > Returns the available versions of the given genomic sequence. > > > > > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > > > Returns information about a given version of a genomic sequence. > > > Clients may assume that the sequence and assembly are constant > for a > > > given version of a source. Note that annotation data on a server > > > with curational write-back support may change without changing > the > > > version. > > > > > > > > > For a given version here are the sub-parts. Note that I've gone > ahead > > > and split the query urls (segment, features and types each have > query > > > interfaces) from the base directory used as containers for the > segments, > > > features and types. > > > > > > $VERSION/segments - the segments query URL; summarizes the > top-level > > > segments in the data source > > > > > > $VERSION/segment/$SEGMENT_ID - a segment query; used to get > detailed > > > information about the identified segment > > > > > > $VERSION/features - the feature filter query URL. Features are > > > locatable annotations or experimental results. The feature > filter > > > URL supports query parameters to select a subset of the features > > > based on position, feature type and other properties. > > > > > > $VERSION/feature/$FEATURE_ID - a feature query; used to get > detailed > > > information about the identified feature > > > > > > $VERSION/types - the types query URL which returns a list of all > > > feature types. Feature types include ontology and depiction > > > details for all features of the given type. > > > > > > $VERSION/type/$TYPE_ID - details about the specified feature type > > > > > > Oh, and there are internal conflicts which will be straightened > > > out in the next draft. These shouldn't be big. > > > > > > Andrew > > > dalke at dalkescientific.com > > > > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/das2 > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > From allenday at ucla.edu Sat Feb 4 10:43:10 2006 From: allenday at ucla.edu (Allen Day) Date: Sat, 4 Feb 2006 02:43:10 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: Message-ID: There is a database server down, which is why I haven't posted the new code to /codesprint yet. Hopefully it will be back online tomorrow. However, on my dev box I was able to make the server code serve up almost all of what is described in Andrew's new_spec.txt file. The large remaining problems are: * Properties ( elements ). I still don't fully understand how these work, if the previous implementation continues to be valid, or if the implementation has been invalidated by the new document. * Alternate default Content-Type header for the same command, e.g. /sequence/.../segment # Content-Type: application/x-das-blah+xml /sequence/.../segment/chrM # Content-Type: text/x-fasta This is an artifact of an earlier design decision assumed Content-Type had a single default and would only be modified if a ?format= parameter was passed. This is difficult to fix properly, so right now the fasta is served up under the XML Content-Type. -Allen On Wed, 1 Feb 2006, Allen Day wrote: > Okay, I will tag the current server and leave it at: > > http://das.biopackages.net/das > > I saw in the most recent commits by Andrew that the root-level "/das" is > no longer needed, so I propose putting an updated server at: > > http://das.biopackages.net/codesprint > > If we're going to keep the current server in a "maintained but deprecated" > mode like this, I'll be making changes to the "new" server before Friday. > > When the new version of IGB comes out we can then upgrade the current > server. > > Sound good? > > -Allen > > > On Wed, 1 Feb 2006, Helt,Gregg wrote: > > > Yes, what Ed said, that's what I meant. Updated server, but at a > > different address. Otherwise the current release of IGB will break when > > trying to use the biopackages server. > > > > Once our IGB code has caught up to the updated server, we can roll out a > > new release to point to the new server instead of the old one. But not > > yet. > > > > Thanks, > > Gregg > > > > > -----Original Message----- > > > From: das2-bounces at portal.open-bio.org > > [mailto:das2-bounces at portal.open- > > > bio.org] On Behalf Of Ed Erwin > > > Sent: Wednesday, February 01, 2006 3:45 PM > > > To: Allen Day > > > Cc: DAS/2 > > > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > > > > > > > Gregg asked me to say "No". Please do not break the current server > > that > > > IGB is using. > > > > > > Please make your changes on a server at a different URL. > > > > > > Thanks > > > Ed > > > > > > Allen Day wrote: > > > > That's what I was thinking too, but I was worried about the existing > > > > Genoviz clients "in the wild" having the server suddenly break. > > > > > > > > So you're saying it's okay with you if those clients have a service > > > > interruption? > > > > > > > > -Allen > > > > > > > > > > > > On Wed, 1 Feb 2006, Helt,Gregg wrote: > > > > > > > > > > > >>That would be great if you could update the biopackages server > > before > > > >>the code sprint starts! Then client implementers will have a server > > to > > > >>test with. > > > >> > > > >> thanks, > > > >> gregg > > > >> > > > >> > > > >>>-----Original Message----- > > > >>>From: das2-bounces at portal.open-bio.org > > > >> > > > >>[mailto:das2-bounces at portal.open- > > > >> > > > >>>bio.org] On Behalf Of Allen Day > > > >>>Sent: Wednesday, February 01, 2006 2:42 PM > > > >>>To: Andrew Dalke > > > >>>Cc: DAS/2 > > > >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities > > > >>> > > > >>>I just looked over your changes, and will begin making the changes > > to > > > >> > > > >>the > > > >> > > > >>>server repository today. > > > >>> > > > >>>I'd like to update the server at das.biopackages.net with my > > changes > > > >> > > > >>on > > > >> > > > >>>Friday, unless there are objections. > > > >>> > > > >>>I'll be taking notes along the way and will post to the list if > > > >> > > > >>anything > > > >> > > > >>>in your document is unclear to me. > > > >>> > > > >>>At first glance, I agree -- the changes are minor. > > > >>> > > > >>>-Allen > > > >>> > > > >>> > > > >>>On Mon, 30 Jan 2006, Andrew Dalke wrote: > > > >>> > > > >>> > > > >>>>Allen: > > > >>>> > > > >>>>>Is the spec going to be in a stable state for the code sprint? > > > >> > > > >>I'd > > > >> > > > >>>>>like > > > >>>>>to use this time to sync the server implementation with a stable > > > >>>>>version > > > >>>>>of the spec. It looks like there have been many substantial > > > >> > > > >>changes. > > > >> > > > >>>>I have just (within the last few minutes) completed the first > > draft > > > >>>>of the update of the spec. > > > >>>> > > > >>>>It's not in HTML - that calls for too much work for this stage. > > > >>>>It's text, in CVS under das/das2/new_spec.txt > > > >>>> > > > >>>>There are many parts which need clarification. These are marked > > > >>>>with a "XXX" along with my comments. > > > >>>> > > > >>>>The RNC files are in > > > >>>> > > > >>>> das/das2/scratch/*.rnc > > > >>>>along with some test XML files. These XML files are not meant > > > >>>>to be realistic. They are meant more to check edge cases. > > > >>>> > > > >>>>I do no think there are major changes to the spec. Most of the > > > >>>>changes have actually trimmed things down, like getting rid of > > > >>>>the "properties" subtree and merging the different "sources" > > > >> > > > >>requests > > > >> > > > >>>>into a single document. > > > >>>> > > > >>>> > > > >>>>Here are the major interfaces > > > >>>> > > > >>>>$PREFIX/sequence - a "sources" request > > > >>>> This is the top-level entry point to a DAS 2 server. It > > returns > > > >> > > > >>a > > > >> > > > >>>> list of the available genomic sequence and their versions. > > > >>>> [sequence-namespace] > > > >>>> > > > >>>>$PREFIX/sequence/$SOURCE - a "source" request > > > >>>> Returns the available versions of the given genomic sequence. > > > >>>> > > > >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > > > >>>> Returns information about a given version of a genomic > > sequence. > > > >>>> Clients may assume that the sequence and assembly are constant > > > >> > > > >>for a > > > >> > > > >>>> given version of a source. Note that annotation data on a > > server > > > >>>> with curational write-back support may change without changing > > > >> > > > >>the > > > >> > > > >>>> version. > > > >>>> > > > >>>> > > > >>>>For a given version here are the sub-parts. Note that I've gone > > > >> > > > >>ahead > > > >> > > > >>>>and split the query urls (segment, features and types each have > > > >> > > > >>query > > > >> > > > >>>>interfaces) from the base directory used as containers for the > > > >> > > > >>segments, > > > >> > > > >>>>features and types. > > > >>>> > > > >>>> $VERSION/segments - the segments query URL; summarizes the > > > >> > > > >>top-level > > > >> > > > >>>> segments in the data source > > > >>>> > > > >>>> $VERSION/segment/$SEGMENT_ID - a segment query; used to get > > > >> > > > >>detailed > > > >> > > > >>>> information about the identified segment > > > >>>> > > > >>>> $VERSION/features - the feature filter query URL. Features are > > > >>>> locatable annotations or experimental results. The feature > > > >> > > > >>filter > > > >> > > > >>>> URL supports query parameters to select a subset of the > > features > > > >>>> based on position, feature type and other properties. > > > >>>> > > > >>>> $VERSION/feature/$FEATURE_ID - a feature query; used to get > > > >> > > > >>detailed > > > >> > > > >>>> information about the identified feature > > > >>>> > > > >>>> $VERSION/types - the types query URL which returns a list of all > > > >>>> feature types. Feature types include ontology and depiction > > > >>>> details for all features of the given type. > > > >>>> > > > >>>> $VERSION/type/$TYPE_ID - details about the specified feature > > type > > > >>>> > > > >>>>Oh, and there are internal conflicts which will be straightened > > > >>>>out in the next draft. These shouldn't be big. > > > >>>> > > > >>>> Andrew > > > >>>> dalke at dalkescientific.com > > > >>>> > > > >>>>_______________________________________________ > > > >>>>DAS2 mailing list > > > >>>>DAS2 at portal.open-bio.org > > > >>>>http://portal.open-bio.org/mailman/listinfo/das2 > > > >>>> > > > >>> > > > >>>_______________________________________________ > > > >>>DAS2 mailing list > > > >>>DAS2 at portal.open-bio.org > > > >>>http://portal.open-bio.org/mailman/listinfo/das2 > > > >> > > > > _______________________________________________ > > > > DAS2 mailing list > > > > DAS2 at portal.open-bio.org > > > > http://portal.open-bio.org/mailman/listinfo/das2 > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at portal.open-bio.org > > > http://portal.open-bio.org/mailman/listinfo/das2 > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From allenday at ucla.edu Mon Feb 6 07:13:59 2006 From: allenday at ucla.edu (Allen Day) Date: Sun, 5 Feb 2006 23:13:59 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: Okay folks, an implementation of the document cited below is available here: http://das.biopackages.net/codesprint http://das.biopackages.net/codesprint/sequence http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment etc. After looking closely over this first draft of new_spec.txt, it's apparent that there are still some holes, e.g. what should the response to the following requests look like? http://das.biopackages.net/codesprint/sequence/yeast http://das.biopackages.net/codesprint/sequence/yeast/S228C For now I have left responses the same as in the old HTML version of the spec. Of course if you find bugs, let me know. The server at: http://das.biopackages.net/das is currently unavailable. This is due to limitations in Apache/mod_perl that won't allow different versions of the same class to coexist in a family of processes. I'd like to discuss how we should handle this in the conference call tomrorow (today, if you're not in GMT+8). -Allen On Mon, 30 Jan 2006, Andrew Dalke wrote: > Allen: > > Is the spec going to be in a stable state for the code sprint? I'd > > like > > to use this time to sync the server implementation with a stable > > version > > of the spec. It looks like there have been many substantial changes. > > I have just (within the last few minutes) completed the first draft > of the update of the spec. > > It's not in HTML - that calls for too much work for this stage. > It's text, in CVS under das/das2/new_spec.txt > > There are many parts which need clarification. These are marked > with a "XXX" along with my comments. > > The RNC files are in > > das/das2/scratch/*.rnc > along with some test XML files. These XML files are not meant > to be realistic. They are meant more to check edge cases. > > I do no think there are major changes to the spec. Most of the > changes have actually trimmed things down, like getting rid of > the "properties" subtree and merging the different "sources" requests > into a single document. > > > Here are the major interfaces > > $PREFIX/sequence - a "sources" request > This is the top-level entry point to a DAS 2 server. It returns a > list of the available genomic sequence and their versions. > [sequence-namespace] > > $PREFIX/sequence/$SOURCE - a "source" request > Returns the available versions of the given genomic sequence. > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request > Returns information about a given version of a genomic sequence. > Clients may assume that the sequence and assembly are constant for a > given version of a source. Note that annotation data on a server > with curational write-back support may change without changing the > version. > > > For a given version here are the sub-parts. Note that I've gone ahead > and split the query urls (segment, features and types each have query > interfaces) from the base directory used as containers for the segments, > features and types. > > $VERSION/segments - the segments query URL; summarizes the top-level > segments in the data source > > $VERSION/segment/$SEGMENT_ID - a segment query; used to get detailed > information about the identified segment > > $VERSION/features - the feature filter query URL. Features are > locatable annotations or experimental results. The feature filter > URL supports query parameters to select a subset of the features > based on position, feature type and other properties. > > $VERSION/feature/$FEATURE_ID - a feature query; used to get detailed > information about the identified feature > > $VERSION/types - the types query URL which returns a list of all > feature types. Feature types include ontology and depiction > details for all features of the given type. > > $VERSION/type/$TYPE_ID - details about the specified feature type > > Oh, and there are internal conflicts which will be straightened > out in the next draft. These shouldn't be big. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From dalke at dalkescientific.com Mon Feb 6 11:33:34 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 11:33:34 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: Allen: > After looking closely over this first draft of new_spec.txt, it's > apparent > that there are still some holes, e.g. what should the response to the > following requests look like? > > http://das.biopackages.net/codesprint/sequence/yeast taxon="Yeast"> > http://das.biopackages.net/codesprint/sequence/yeast/S228C The same for this case. There is only on VERSION for "yeast". Your XML, btw, starts The "standalone" means that the DTD may affect the content of the documentation. http://www.stylusstudio.com/w3c/xml11/sec-rmd.htm > Markup declarations can affect the content of the document, as passed > from an XML Processor to an application; examples are attribute > defaults and entity declarations. The standalone document declaration, > which MAY appear as a component of the XML declaration, signals > whether or not there are such declarations which appear external to > the Document Entity or in parameter entities. An external markup > declaration is defined as a markup declaration occurring in the > external subset or in a parameter entity (external or internal, the > latter being included because non-validating processors are not > required to read them). For what we're doing, we don't need nor (I think) want that. There's no reason for a client to consult the DTD to figure out the XML. Instead, use and probably have the encoding That also means you can get rid of the statements. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Feb 6 12:02:40 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 12:02:40 +0000 Subject: [DAS2] timezone change Message-ID: <6c3ddd6d7dc01dc99f2e1e932e64e733@dalkescientific.com> To make it easier for Thomas' Java library, the timezone in the datestamps may also be of the form "0500". Here are the valid forms and new examples TZD = time zone designator (optional; one of the formats "Z", +hh:mm, +hhmm, -hh:mm, or -hhmm) 1959-21-52T09:35+0300 2042-03-18T01:19:00-11:15 Andrew dalke at dalkescientific.com From dhoworth at mrc-lmb.cam.ac.uk Mon Feb 6 12:12:52 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Mon, 06 Feb 2006 12:12:52 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: <43E73D44.5020107@mrc-lmb.cam.ac.uk> Andrew Dalke wrote: > That also means you can get rid of the > > Doing that automatically invalidates the document does it not? http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog "Definition: An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it. The document type declaration MUST appear before the first element in the document." Cheers, Dave From dalke at dalkescientific.com Mon Feb 6 13:42:03 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 13:42:03 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: <43E73D44.5020107@mrc-lmb.cam.ac.uk> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <43E73D44.5020107@mrc-lmb.cam.ac.uk> Message-ID: <57bc93f8acc736e752d048d970b1f332@dalkescientific.com> Dave Howorth: > Doing that automatically invalidates the document does it not? > > http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog > > "Definition: An XML document is valid if it has an associated document > type declaration and if the document complies with the constraints > expressed in it. > > The document type declaration MUST appear before the first element in > the document." I think this page summarizes it nicely: http://www.xml.com/lpt/a/2002/09/04/xslt.html "Valid" is a technical term referring to the presence of and conformance to a DOCTYPE declaration. XML documents w/o a DTD are "well-formed". XML documents with a DTD and which agree with the DTD are "valid". In this case not being "valid" does not mean that the document is "invalid XML". As I understand things, it's perfectly fine to pass well-formed but not valid XML documents around. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Feb 6 13:53:10 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 13:53:10 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: <3a3400e925dccf8583a5b47104e43766@dalkescientific.com> Trying out Allen's XML > > > > > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/"> > The xmlns is needed, else "SOURCES" is in the unnamed namespace, rather than the DAS2 namespace. It looks like your XSLT might not declare the namespace? I can't find the document to check, at either of http://das.biopackages.net/xsl/das.xsl http://radius.genomics.ctrl.ucla.edu/xsl/das.xsl The page at http://www.xml.com/pub/a/2001/04/04/trxml/ describes a bit on how to include namespace in your xslt > > > xmlns:xlink="http://www.w3.org/1999/xlink" > version="1.0"> > > > > > > > > > > > > Note the use of the "xlink:" namespace abbreviation. Andrew dalke at dalkescientific.com From dhoworth at mrc-lmb.cam.ac.uk Mon Feb 6 14:27:34 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Mon, 06 Feb 2006 14:27:34 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: <57bc93f8acc736e752d048d970b1f332@dalkescientific.com> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <43E73D44.5020107@mrc-lmb.cam.ac.uk> <57bc93f8acc736e752d048d970b1f332@dalkescientific.com> Message-ID: <43E75CD6.7000909@mrc-lmb.cam.ac.uk> Andrew Dalke wrote: > Dave Howorth: >> Doing that automatically invalidates the document does it not? >> >> http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog >> >> "Definition: An XML document is valid if it has an associated document >> type declaration and if the document complies with the constraints >> expressed in it. >> >> The document type declaration MUST appear before the first element in >> the document." > > I think this page summarizes it nicely: > http://www.xml.com/lpt/a/2002/09/04/xslt.html > > "Valid" is a technical term referring to the presence > of and conformance to a DOCTYPE declaration. I think that's a paraphrase of the first para I quoted above? > XML documents w/o a DTD are "well-formed". XML documents > with a DTD and which agree with the DTD are "valid". > > In this case not being "valid" does not mean that the > document is "invalid XML". No, I believe you're wrong there; 'not valid' and 'invalid' have the same meaning both colloquially and as used in the spec. It's either valid or it isn't, and if it isn't then its invalid. > As I understand things, it's perfectly fine to pass well-formed > but not valid XML documents around. I don't agree. There are occasions when it is acceptable but it's generally bad practice, IMHO. The discussion in sec 5 of the spec gives some motivation, particularly this section: http://www.w3.org/TR/REC-xml/#safe-behavior Or look here, or thousands of other places: http://www.online-learning.com/demos/xml/valid_xml.html http://en.wikipedia.org/wiki/Xml#Correctness_in_an_XML_document In particular for interoperability of an open, distributed system with many writers and readers implemented by different groups (i.e. DAS), I suggest validity is essential. I would have expected your experience of the PDB to make you keen on validation :) Cheers, Dave From dalke at dalkescientific.com Mon Feb 6 15:09:58 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 15:09:58 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: <43E75CD6.7000909@mrc-lmb.cam.ac.uk> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <43E73D44.5020107@mrc-lmb.cam.ac.uk> <57bc93f8acc736e752d048d970b1f332@dalkescientific.com> <43E75CD6.7000909@mrc-lmb.cam.ac.uk> Message-ID: <0aeda19421fdc7c75e2440ad0acd6391@dalkescientific.com> Dave Howorth wrote: > Andrew Dalke wrote: >> I think this page summarizes it nicely: >> http://www.xml.com/lpt/a/2002/09/04/xslt.html >> "Valid" is a technical term referring to the presence >> of and conformance to a DOCTYPE declaration. > > I think that's a paraphrase of the first para I quoted above? It adds the phrase "technical term", making it (in my interpretation) different from the word "valid" in its normal sense. > No, I believe you're wrong there; 'not valid' and 'invalid' have the > same meaning both colloquially and as used in the spec. It's either > valid or it isn't, and if it isn't then its invalid. I now agree that in the spec sense "invalid" and "not valid" are the same. I still think it has a technical difference from its normal use. See for example the thread at http://www.stylusstudio.com/xmldev/200411/post50310.html part of which says > >But does it matter if a document is Not valid? > > Not necessarily. It's up to you. Requiring a document to be valid is > a way of putting some constraints on it. If you don't have any such > constraints (unlikely, unless you are writing some very generic > software like an editor), then there's no need for validity. More > likely, not all your constraints can be expressed by a DTD, and you > will need to express them some other way. > > And of course you can require the document to be valid according to > some other kind of schema, such as XML schemas or RelaxNG or > Schematron. >> As I understand things, it's perfectly fine to pass well-formed >> but not valid XML documents around. > > I don't agree. There are occasions when it is acceptable but it's > generally bad practice, IMHO. The discussion in sec 5 of the spec > gives some motivation, particularly this section: > > http://www.w3.org/TR/REC-xml/#safe-behavior > > Or look here, or thousands of other places: > http://www.online-learning.com/demos/xml/valid_xml.html > http://en.wikipedia.org/wiki/Xml#Correctness_in_an_XML_document > > In particular for interoperability of an open, distributed system with > many writers and readers implemented by different groups (i.e. DAS), I > suggest validity is essential. Quoting the wikipedia reference to DTDs: > The oldest schema format for XML is the Document Type Definition > (DTD), inherited from SGML. While DTD support is ubiquitous due to its > inclusion in the XML 1.0 standard, it is seen as limited for the > following reasons: > * It has no support for newer features of XML, most importantly > namespaces. DAS2 uses namespaces. Hence it cannot use DTDs. We are defining Relax-NG schemas for the different formats, which can be used for better validity checking than is supported by DTDs. "valid DAS2 document" ::= "meets the DAS2 spec" "meets the DAS2 spec" is a stricter definition than "well-formed XML" + "meets the RNG spec" which is stricter than "well-formed XML" + "meets the (hypthetical namespace-aware) DTD" > I would have expected your experience of the PDB to make you keen > on validation :) Indeed, I'm working on the validator for DAS2, which uses the Relax-NG schemas. ;) Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Feb 6 16:03:07 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 16:03:07 +0000 Subject: [DAS2] elements Message-ID: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com> One discussion point from today is the elements. The current draft of the spec says they look like this Andreas Prlic pointed out that since the document says the "volvox" version "1" url is already known ("volvox/1") and the type="segments" then the query_id can be built from appending "segments" to the "volvox/1" (plus the "/") to get "volvox/1/segments". I originally responded from a ReST purity argument, in that URLs should not be constructed from non-URL data. This lets Thomas, for example, use GUIDs for the objects rather than the hierarchical structure I and others recommend. During discussion a better answer came up, which I think we talked about earlier but which is worth emphasizing is that the "query_id"s don't need to be on the same server. For example, the "regions" URL may and likely will point to a common reference server, and a database may offer only one set of "types" for all of the "features". That is, something like this DAS server example.com genome A version x segments at "ensembl.org/das2/genome_A/build_1/segments" features at "example.com/A/version_x/features" types at "example.com/A/types" version y segments at "ensembl.org/das2/genome_A/build_1/segments" features at "example.com/A/version_y/features" types at "example.com/A/types" version z segments at "ensembl.org/das2/genome_A/build_2/segments" features at "example.com/A/version_z/features" types at "example.com/A/types" DAS server biodas.org genome A version 1 segments at "ensembl.org/das2/genome_A/build_2/segments" features at "example.com/A/1/features" types at "example.com/A/types" (note: on other server!) Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Mon Feb 6 17:13:18 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 09:13:18 -0800 Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06 Message-ID: Status report DAS/2 XML - valid or not valid? CATEGORY elements -- constructing query URLs MAINTAINER information Use of xml:base update on feature properties - searching, etc. From lstein at cshl.edu Mon Feb 6 18:20:10 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 6 Feb 2006 13:20:10 -0500 Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06 In-Reply-To: References: Message-ID: <200602061320.11360.lstein@cshl.edu> Hi Gregg, I had a conflicting teleconference and wasn't sure whether there was a teleconference scheduled for the code sprint, so I didn't dial in. Just got the agenda now. I am online on both MSN and AOL chats, and will be all week, if anyone wants to IM me. Lincoln On Monday 06 February 2006 12:13, Helt,Gregg wrote: > Status report > DAS/2 XML - valid or not valid? > CATEGORY elements -- constructing query URLs > MAINTAINER information > Use of xml:base > update on feature properties - searching, etc. > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Mon Feb 6 18:42:24 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 6 Feb 2006 18:42:24 +0000 Subject: [DAS2] version= Message-ID: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com> If we add a version= field to the Content-Type, or whatever mechanism is proposed Content-Type: application/x-das2features+xml; version=12345 What will a client do when it gets a version number it has never heard of? Should it use the newest version it supports? The oldest? Abort? Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Mon Feb 6 19:50:14 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 06 Feb 2006 11:50:14 -0800 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 6 Feb 2006 Message-ID: Notes from the DAS/2 teleconference for the code sprint, 6 Feb 2006 $Id: das2-teleconf-2006-02-06.txt,v 1.2 2006/02/06 19:57:05 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt Sanger: Andreas Prlic, Thomas Down, Roy Sweden: Andrew Dalke UC Berkeley: Nomi Harris UCLA: Allen Day Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Gregg's topics for discussion: * Status report * DAS/2 XML - valid or not valid? * CATEGORY elements -- constructing query URLs * MAINTAINER information * Use of xml:base * update on feature properties - searching, etc. Status Reports - what people are working on for the code sprint ------------------------------------------------------------ andrew - getting folks up to speed on the spec changes, what he wrote. - getting a feel for ensembl schema. - change today: time zone specification b/c td's java time lib did something different than iso did. aday: tag & branch? gh: no branch, maybe tag ad: tagging probably not necessary gh: brings up a related issue: what is our mechanism for versioning - client & spec to understand which version of the spec they are/should be implementing - can talk about it later during the xml validation issue discussion ap: [missed it -- sorry!] td: java om, feature xml done, can read and write. roy: zmap das2 client, read/write das2, written in C. working with ed griffith who's not available this week. currently just a reader. from james gilbert, based on fmap from Acedb gh: updating client and server (mostly client). top down syncing in parallel, one command at a time. sources request is working on both sides. will start w/ allen's server today, doing gh's sources query against allen's server. segments and types today. nh: apollo das2 client. reads das2 xml from andrew's example, write out features in das2, now working on get, testing with server. sc: affy das2 server stuff. streamlining updating it with feature data from UCSC. also working on updating exon array data for use in IGB client. working w/ gregg on other server-related work. gh: graph data as well. ee: working on igb client. talk w/ gregg later to get specifics. gh: lots of ui stuff Topic: xml validation --------------------- ad: dtd's don't support namespaces, so we can't support dtds gh: not that simple. where do we add namespaces? ad: schemas have ns's testing.... gh: concern #1: is one of perception. don't like telling people we don't have valid xml ad: only means suports the dtd, not in human sense. gh: it's one of perception td: self-contained document + validation gh: getting rid of doctype declaration is issue of versioning. how will client know which version of spec it's supposed to be implementing? need to deal with spec crawl. The only way i'm aware of is via looking at dtd pointer changing. gh: not worried about new categories, but changing things like optional vs req'd attributes/elements. ad: content-type contains version td: or content negotiation ap: xml schema validator at w3c.org can use that and claim it is valid. can upload your files, push a button. ad: I have an extension of properties with arbitrary binary data vs text vs href. this is ok with relaxng, not by xsd. ad: we could say what is valid das2 since we're the arbiters of what is valid das xml document. e.g., well-formed, validates against the rng schemas gh: the rng we now have allows arbitrary xml? ad: yes. can say there are arbitrary elements under some node. checked in as file named common.rnc gh: ok, getting rid of requirement for doctype declaration. any versioning is done via content-type gh: if we don't do content neg, a sources query goes out, whatever version that the server supports comes back. this will be the latest version of the spec the server supports. ad: for backwards compatibility that won't be needed. extensibility will be sufficient for a few years. gh: don't believe it. td: spec is churning fast now. there'll be less churn once there are impls. gh: there were impls 3 or 4 mos ago (allen, gregg). so there have been plenty of churn even with impls.so we'll need versioning, ok on content-type. aday: we definitely need versioning. need it now. also want a tagged version we we can work at same time. ad: content-type-xdas;version=1.1 in general not the right solution (not general purpose), but for this case, makes sense. aday: can impl, header says 1.1 gh/ad: contents are a subset of the specification. so it's tied to a version of the rng schema. ad: the tag will be the cvs revision # gh: this isn't temporary, where there will not be a time when we are not generating churn. ad: believes this is temporary, won't have to have it long-term aday: no mechanism for it now. ad: need a way to turn it into meaning. agreement on what string means which verison of a program. nh: second gregg. will always be an issue. ad says it's not good long-term, maybe we should come up with it. gh: we have some basis to go forward. [A] das/2 server will specify spec version via content-type-xdas;version=X.X Topic: category elements, how to construct a query url ------------------------------------------------------ ad: what is syntax of string used to specify ontology? SO:? aday: attribute for it gh: ontol term is a uri aday: type element has ontology gh: id of type is not nec an ontol term ad: the attrib of feat type, ontol=something gh: that's a uri, abs or rel point to a frag in so/fa ontol ad: can't find how this should look. said SO:0000001. that should be a uri? gh: yes. in types xml that's returned, id and ontol are uri's. a server will pick one for it's xml base. the other will have to be a full uri. ad: how do diff clients know a given term corresponds to what term in the ontol? gh: they will have to understand sofa/so. ad: do they have persistent ids? gh: my understanding is that they can use fragment notation for a stable url for the term aday: ontol docs aren't xml, no anchors for pointing to a fragment. they're their own format. nervous about building dependency on fragment record uris into our system gh: good point. would be happier if it was recast as xml aday: is now pointing to an xml document for ontology nodes ad: happier if we could use "SO:xxx" i.e., a urn gh: would like a re-cast as xml document, hosted at so/sofa website. that xml would be like a std ontology representation so you could extend it. so someone could point to an extension of it. Category elements -- constructing query URLs -------------------------------------------- gh: andreas' point (email): query id attribute, constructing these out of relative uri, or based on base uri. agree with andreas: we know what those will be. for clarity of spec, we should specify: here's base uri, here's how you construct the segments query, etc. ad: trouble for segments- could be on ref server gh: doubt that people will impl this way. will be specific to server and will be related to everyone else's notion of chromosomes and assemblies. ad: where does the distributed nature of das come from? ref server gh: das/1: ref server has residues to serve, regions (entry pts) served up by everyone. this was the notion of ref vs non-ref server to carry forward. non-ref server still serves up segments. will have segments in it's reference space. reference would be genome assembly version + organism. sufficient to globally identify it. ap: had discussions about this. query id td: issue comes from seqs being urls rather than opaque ids in a ns defined by coord system. have a set of servers that share common coord syst. then a seq identified by stringx on one server is same as on the other server. the remaining q: server that doesn't want to serve up seqs, what urls does it use? can it use an opaque seq name that is known by that name of ref server? gh: restating concerns here: using query string to construct uri's 1. confusion: arbitrary uri means more confusing spec, and how to implement it (can't just say /segment, but 'whatever is pointed at by such and such uri') 2. size of documents. right now, can use same xml:base for features document, can make feat ids and location id relative to it, nice and short. if seg is on other server, need to expand one of the ids compresses well, but that will take longer than transmission. this is only for features xml. can use coords or assembly info to determine identity between urls. want a defined ns. ad: you want a way to say: these are relative urls to a base url for that data type. so that this type url is relative to some base url for types, similar for segments, features. gh: we have this now, can be relative or absolute ad: there is a default xml base like thing: one for type, segment, features. so you could have relative ids to those bases. gh: possibly, but not ideal. It's better to use a std xml base for all of them. each server has it's own unique uris for segments. I'm proposing that we decouple segments from residues and having segments doesn't mean we can serve residues. reasoning: - this leads to smaller xml docs - simplifies the spec if we didn't have to construct query ids from category element would rather specify the string that's appended in the spec. sc: might could deal with this issue by adding structure to the document in order to add different xml:bases for different data types. e.g., use different parent elements that could define their own xml:bases, one for types, segments, and feautures. might complicate the spec tho. ad: single genome have same types across all dbs. gh: across servers, dangerous. ad/td: globally unique ids, could have everything in the same directory. td: can we just use seq/name, type/name. i.e., codifying what the convention now is. ad: name is put at end of base url a feature document may give types, segments, other features. td: just use simple strings, not urls. gh: std uri syntax isn't important, but a std query mechanism to get all of these is. some uri you put a '/types' on or a '/segments'. ad: you have this right now. gh: but it's only defined for a server, not the whole spec. there's no where in the spec that says this. confusing for people reading/implementing the spec. ap: If you make it free text, you don't know what to put for a given server? ad: you get a document ap: I already know the server, not necessarily a document. ad: taking out the mention of any hierarchy, just refer to things as feat query url. [note taker is having trouble following the thread of this discussion.] gh: let's sleep on it, discuss tomorrow, vote then. From nomi at fruitfly.org Mon Feb 6 20:49:51 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Mon, 6 Feb 2006 12:49:51 -0800 (PST) Subject: New DAS/2 server for codesprint [was Re: [DAS2] Re: Apollo and DAS/2 priorities] In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: <17383.46703.563017.422300@kinked.lbl.gov> thanks for setting up the new das/2 server, allen. i'm having trouble with some of the queries. On 5 February 2006, Allen Day wrote: > Okay folks, an implementation of the document cited below is available > here: > > http://das.biopackages.net/codesprint I get "Internal Server Error" > http://das.biopackages.net/codesprint/sequence > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment these both work. > http://das.biopackages.net/codesprint/sequence/yeast > http://das.biopackages.net/codesprint/sequence/yeast/S228C for these i get Error loading stylesheet: A network error occured loading an XSLT stylesheet: http://das.biopackages.net/xsl/das.xsl i'm running firefox on mozilla, so i'm not surprised when it has problems with stylesheets, but i used to be able to get data from the old das/2 server, even though it did have some complaint about not finding the stylesheet. http://das.biopackages.net/codesprint/sequence/human/17/feature churned forever (or, at least, for several minutes--maybe it will eventually return). Nomi From nomi at fruitfly.org Mon Feb 6 22:34:30 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Mon, 6 Feb 2006 14:34:30 -0800 (PST) Subject: [DAS2] Re: New DAS/2 server for codesprint In-Reply-To: <17383.46703.563017.422300@kinked.lbl.gov> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <17383.46703.563017.422300@kinked.lbl.gov> Message-ID: <17383.52982.274142.351003@kinked.lbl.gov> On 6 February 2006, Nomi Harris wrote: > thanks for setting up the new das/2 server, allen. i'm having trouble > with some of the queries. ok, i realized that some of the queries i was trying were senseless, but here are some that should work that are just hanging: http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments http://das.biopackages.net/codesprint/sequence/yeast/S228C/types Nomi From allenday at ucla.edu Mon Feb 6 21:53:34 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 6 Feb 2006 13:53:34 -0800 (PST) Subject: New DAS/2 server for codesprint [was Re: [DAS2] Re: Apollo and DAS/2 priorities] In-Reply-To: <17383.46703.563017.422300@kinked.lbl.gov> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <17383.46703.563017.422300@kinked.lbl.gov> Message-ID: On Mon, 6 Feb 2006, Nomi Harris wrote: > thanks for setting up the new das/2 server, allen. i'm having trouble > with some of the queries. > > On 5 February 2006, Allen Day wrote: > > Okay folks, an implementation of the document cited below is available > > here: > > > > http://das.biopackages.net/codesprint > I get "Internal Server Error" That's to be expected. The spec does not specify what the response to this request should be, or if it is even valid -- so I didn't implement it. > > http://das.biopackages.net/codesprint/sequence > > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment > these both work. > > > http://das.biopackages.net/codesprint/sequence/yeast > > http://das.biopackages.net/codesprint/sequence/yeast/S228C > for these i get > Error loading stylesheet: A network error occured loading an XSLT stylesheet: > http://das.biopackages.net/xsl/das.xsl This happens if you're browsing the URLs in a web browser that supports xsl directives. Previous versions of the server supported web browsers, but at the cost of using a 'text/xml' Content-Type header. Consensus in the group was that web browsers are not a target platform, so this feature no longer works -- so you won't be able to view the DAS2XML in your browser anymore. I just haven't removed the XSL references yet. > i'm running firefox on mozilla, so i'm not surprised when it has problems > with stylesheets, but i used to be able to get data from the old das/2 > server, even though it did have some complaint about not finding the > stylesheet. > > http://das.biopackages.net/codesprint/sequence/human/17/feature The server is coded to throw an error if you ask for all features, so I'm surprised it didn't just give you a 4xx or 5xx response. I'll look into it. > churned forever (or, at least, for several minutes--maybe it will > eventually return). > > Nomi > From allenday at ucla.edu Mon Feb 6 22:00:50 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 6 Feb 2006 14:00:50 -0800 (PST) Subject: [DAS2] Re: New DAS/2 server for codesprint In-Reply-To: <17383.52982.274142.351003@kinked.lbl.gov> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <17383.46703.563017.422300@kinked.lbl.gov> <17383.52982.274142.351003@kinked.lbl.gov> Message-ID: Hi Nomi, I just restarted the server, the "all features" request used all the memory and hung the webserver. I'll look into why that request wasn't immediately denied as it used to be. As for your .../segments and .../types, they should be .../segment and .../type. I see no reason to pluralize these URLs given that the sources response allows me to provide them at any arbitrary URL: [...] [...] -Allen On Mon, 6 Feb 2006, Nomi Harris wrote: > On 6 February 2006, Nomi Harris wrote: > > thanks for setting up the new das/2 server, allen. i'm having trouble > > with some of the queries. > > ok, i realized that some of the queries i was trying were senseless, but > here are some that should work that are just hanging: > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments > http://das.biopackages.net/codesprint/sequence/yeast/S228C/types > > Nomi > From Steve_Chervitz at affymetrix.com Mon Feb 6 22:27:01 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 06 Feb 2006 14:27:01 -0800 Subject: [DAS2] version= In-Reply-To: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com> Message-ID: Andrew Dalke wrote on Mon, 6 Feb 2006 18:42:24 > > If we add a version= field to the Content-Type, or whatever > mechanism is proposed > > Content-Type: application/x-das2features+xml; version=12345 > > What will a client do when it gets a version number it has > never heard of? Should it use the newest version it supports? > The oldest? Abort? Rather than have version data be something that the client has to discover in the response, an then have to react to in some intelligent way, how about adding an optional dasversion field to all requests, such as: http://www.wormbase.org/das/genome/volvox/1/type?dasversion=1.1 The server would then either: 1) return the appropriate response document if the server supports the requested version or a later version that is backward compatible with it, or 2) return a 505 error 'DAS Version Not Supported', which we already have in the spec. This puts the onus on the server rather than the client, but I think it would be less trouble on the server than the alternative scheme would be for the client. The client can now be fairly dumb about versioning and assume everything is kosher unless it gets an error. We could put some of the onus for DAS version support on the revisers of the spec: When a new version of the spec is released, we'll know right then what parts will be backward compatible and what parts will not be. The reviser could document whether the new version of the spec is backwards compatible with which previous versions, with the appropriate level of granularity (e.g., "all requests are backward compatible except for the types request"). This would serve as a guide for maintainers of das2 servers. Thoughts? Steve From nomi at fruitfly.org Mon Feb 6 23:41:23 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Mon, 6 Feb 2006 15:41:23 -0800 (PST) Subject: [DAS2] version= In-Reply-To: References: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com> Message-ID: <17383.56995.914058.889189@kinked.lbl.gov> i think it would be nice to have it work both ways--the version is reported by the server, but the client can also request a particular version as you suggest. whatever we decide on, can we please make the version IDs numerical so that they can be compared easily (e.g. "if (dasversion > 1.3) ...")? Nomi On 6 February 2006, Steve Chervitz wrote: > Andrew Dalke wrote on Mon, 6 Feb 2006 18:42:24 > > > > If we add a version= field to the Content-Type, or whatever > > mechanism is proposed > > > > Content-Type: application/x-das2features+xml; version=12345 > > > > What will a client do when it gets a version number it has > > never heard of? Should it use the newest version it supports? > > The oldest? Abort? > > Rather than have version data be something that the client has to discover > in the response, an then have to react to in some intelligent way, how about > adding an optional dasversion field to all requests, such as: > > http://www.wormbase.org/das/genome/volvox/1/type?dasversion=1.1 > > The server would then either: > > 1) return the appropriate response document if the server supports the > requested version or a later version that is backward compatible with it, > or > 2) return a 505 error 'DAS Version Not Supported', which we already have in > the spec. > > This puts the onus on the server rather than the client, but I think it > would be less trouble on the server than the alternative scheme would be for > the client. The client can now be fairly dumb about versioning and assume > everything is kosher unless it gets an error. > > We could put some of the onus for DAS version support on the revisers of the > spec: When a new version of the spec is released, we'll know right then what > parts will be backward compatible and what parts will not be. The reviser > could document whether the new version of the spec is backwards compatible > with which previous versions, with the appropriate level of granularity > (e.g., "all requests are backward compatible except for the types request"). > This would serve as a guide for maintainers of das2 servers. > > Thoughts? > > Steve From ed_erwin at affymetrix.com Mon Feb 6 22:48:49 2006 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Mon, 06 Feb 2006 14:48:49 -0800 Subject: [DAS2] elements In-Reply-To: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com> References: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com> Message-ID: <43E7D251.8050703@affymetrix.com> Andrew Dalke wrote: > One discussion point from today is the elements. > > The current draft of the spec says they look like this > > > > > > > > > > > > > > > Andreas Prlic pointed out that since the document says > the "volvox" version "1" url is already known ("volvox/1") > and the type="segments" then the query_id can be built > from appending "segments" to the "volvox/1" (plus the "/") > to get "volvox/1/segments". > > I originally responded from a ReST purity argument, in that > URLs should not be constructed from non-URL data. This > lets Thomas, for example, use GUIDs for the objects rather > than the hierarchical structure I and others recommend. > > During discussion a better answer came up, which I think > we talked about earlier but which is worth emphasizing > is that the "query_id"s don't need to be on the same server. > > For example, the "regions" URL may and likely will point > to a common reference server, and a database may offer > only one set of "types" for all of the "features". > > That is, something like this > > DAS server example.com > genome A > version x > segments at "ensembl.org/das2/genome_A/build_1/segments" > features at "example.com/A/version_x/features" > types at "example.com/A/types" None of your examples vary the words "segments", "types" or "features", but it is legal to do so, right?: segments at "ensembl.org/das2/genome_A/build_1/segment" features at "example.com/A/version_x/things/and/more/things" types at "example.com/A/rhinoceros" OK, so no one is likely to go that far, but is it legal for example to use non-plural "segment", "feature" and "type" ? From ed_erwin at affymetrix.com Mon Feb 6 22:51:11 2006 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Mon, 06 Feb 2006 14:51:11 -0800 Subject: [DAS2] version= In-Reply-To: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com> References: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com> Message-ID: <43E7D2DF.7060507@affymetrix.com> Andrew Dalke wrote: > If we add a version= field to the Content-Type, or whatever > mechanism is proposed > > Content-Type: application/x-das2features+xml; version=12345 > > What will a client do when it gets a version number it has > never heard of? Should it use the newest version it supports? > The oldest? Abort? > It is up to the client to decide what to do, and this does not need to be specified here. From Gregg_Helt at affymetrix.com Mon Feb 6 23:16:35 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 15:16:35 -0800 Subject: [DAS2] RE: New DAS/2 server for codesprint Message-ID: Ack, you're right! I didn't expect to get bitten by rogue query_ids so soon... gregg > -----Original Message----- > From: Nomi Harris [mailto:nomi at fruitfly.org] > Sent: Monday, February 06, 2006 3:48 PM > To: Allen Day > Cc: Helt,Gregg > Subject: Re: New DAS/2 server for codesprint > > On 6 February 2006, Allen Day wrote: > > Hi Nomi, > > > > I just restarted the server, the "all features" request used all the > > memory and hung the webserver. I'll look into why that request wasn't > > immediately denied as it used to be. > > > > As for your .../segments and .../types, they should be .../segment and > > .../type. I see no reason to pluralize these URLs given that the > sources > > response allows me to provide them at any arbitrary URL: > > oops, gregg led me astray with that one. right, /segment and /type > work. sorry for hanging your server with my inadvertent "all features" > request. > Nomi From Gregg_Helt at affymetrix.com Tue Feb 7 00:14:55 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 16:14:55 -0800 Subject: [DAS2] Re: New DAS/2 server for codesprint Message-ID: Allen, can you recommend a reasonable region on yeast to do a features query that will return features with some hierarchy (like transcript/exons)? Thanks, Gregg From Gregg_Helt at affymetrix.com Tue Feb 7 00:29:12 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 16:29:12 -0800 Subject: [DAS2] Re: New DAS/2 server for codesprint Message-ID: Actually, that "arbitrary URL" thing doesn't quite work with the current biopackages server, which has an xml:base pointing to a server at UCLA for the response to the sequence query: http://das.biopackages.net/codesprint/sequence ... ... ... Which means (I think) that the segments query resolves to http://radius.genomics.ctrl.ucla.edu/das/sequence/human/17/segment which for me returns a 404 Not Found response. gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Allen Day > Sent: Monday, February 06, 2006 2:01 PM > To: Nomi Harris > Cc: DAS/2 > Subject: [DAS2] Re: New DAS/2 server for codesprint ... > As for your .../segments and .../types, they should be .../segment and > .../type. I see no reason to pluralize these URLs given that the sources > response allows me to provide them at any arbitrary URL: > > [...] > > > > [...] > > -Allen > > > > On Mon, 6 Feb 2006, Nomi Harris wrote: > > > On 6 February 2006, Nomi Harris wrote: > > > thanks for setting up the new das/2 server, allen. i'm having > trouble > > > with some of the queries. > > > > ok, i realized that some of the queries i was trying were senseless, but > > here are some that should work that are just hanging: > > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments > > http://das.biopackages.net/codesprint/sequence/yeast/S228C/types > > > > Nomi > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From Steve_Chervitz at affymetrix.com Tue Feb 7 01:02:30 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 06 Feb 2006 17:02:30 -0800 Subject: [DAS2] Re: New DAS/2 server for codesprint In-Reply-To: Message-ID: There's a gene (RPL7A) with two introns on chr7 at roughly 366kbp - 364kbp: http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YGL076C Most genes with introns in cerevisiae (which aren't many) have just a single intron that creates a small 5' exon, such as the alpha and beta tubulin genes on chr13. Tub1 is on chr13 at 99Kbp, and tub3 is also on chr13 at 23Kbp. So the first 100Kb of chr13 would be another region to try. http://db.yeastgenome.org/cgi-bin/locus.pl?locus=tub1 Steve > From: "Helt,Gregg" > Date: Mon, 6 Feb 2006 16:14:55 -0800 > To: Allen Day > Cc: DAS/2 > Conversation: [DAS2] Re: New DAS/2 server for codesprint > Subject: RE: [DAS2] Re: New DAS/2 server for codesprint > > > Allen, can you recommend a reasonable region on yeast to do a features > query that will return features with some hierarchy (like > transcript/exons)? > > Thanks, > Gregg > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From Gregg_Helt at affymetrix.com Tue Feb 7 02:42:18 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 18:42:18 -0800 Subject: [DAS2] Modifying com.affymetrix.igb.das2 classes Message-ID: Brian and Marc, I'm about to start seriously modifying the IGB DAS/2 classes in the com.affymetrix.igb.das2 package. There's code in there you wrote to work with materials, assays, results, and ontology. I think we discussed at some point splitting this stuff out into a separate package(s). Which sounds good, especially since (as I understand it), these domains are separate from the DAS/2 "sequence" domain. The only place there's a lot of mixture of code for these domains with the sequence parts is in Das2VersionedSource. Is it okay if I move this out (or comment it out) of Das2VersionedSource while I renovate other parts of the class? thanks, Gregg From Gregg_Helt at affymetrix.com Tue Feb 7 03:34:48 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 19:34:48 -0800 Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes Message-ID: You're right, it looks like some of this code was already getting moved over to the das2.assay and das2.ontology packages as subclasses of Das2VersionedSource. However it's not clear to me if the equivalent of source and versioned source for assay, ontology, and other domains are going to be similar enough to the DAS/2 sequence domain to justify sharing a base class/interface. What do/will they share? I'll go ahead with changes to the das2 package, and look into moving much of this code into a das2.sequence package. Thanks, Gregg > -----Original Message----- > From: Brian O'Connor [mailto:boconnor at ucla.edu] > Sent: Monday, February 06, 2006 7:09 PM > To: Helt,Gregg > Cc: Marc Carlson; Allen Day; DAS/2 > Subject: Re: Modifying com.affymetrix.igb.das2 classes > > Hi Gregg, > > Go for it!! Marc and I can take a look at it again when you're happy > with the changes. The versioned source object really needed an overhaul > anyway to deal with the multiple domains of the DAS/2 server. I think > there should be a VersionedSource parent and then children for each > domain (i.e. VersionedSourceAssay). I think Marc started to do this but > he was afraid to alter the VersionedSource object too much for fear of > breaking the IGB client. > > --Brian > > Helt,Gregg wrote: > > > Brian and Marc, > > > > I'm about to start seriously modifying the IGB DAS/2 classes in the > > com.affymetrix.igb.das2 package. There's code in there you wrote to > > work with materials, assays, results, and ontology. I think we > > discussed at some point splitting this stuff out into a separate > > package(s). Which sounds good, especially since (as I understand it), > > these domains are separate from the DAS/2 "sequence" domain. The only > > place there's a lot of mixture of code for these domains with the > > sequence parts is in Das2VersionedSource. Is it okay if I move this > > out (or comment it out) of Das2VersionedSource while I renovate other > > parts of the class? > > > > thanks, > > > > Gregg > > From boconnor at ucla.edu Tue Feb 7 03:09:22 2006 From: boconnor at ucla.edu (Brian O'Connor) Date: Mon, 06 Feb 2006 19:09:22 -0800 Subject: [DAS2] Re: Modifying com.affymetrix.igb.das2 classes In-Reply-To: References: Message-ID: <43E80F62.4050403@ucla.edu> Hi Gregg, Go for it!! Marc and I can take a look at it again when you're happy with the changes. The versioned source object really needed an overhaul anyway to deal with the multiple domains of the DAS/2 server. I think there should be a VersionedSource parent and then children for each domain (i.e. VersionedSourceAssay). I think Marc started to do this but he was afraid to alter the VersionedSource object too much for fear of breaking the IGB client. --Brian Helt,Gregg wrote: > Brian and Marc, > > I?m about to start seriously modifying the IGB DAS/2 classes in the > com.affymetrix.igb.das2 package. There?s code in there you wrote to > work with materials, assays, results, and ontology. I think we > discussed at some point splitting this stuff out into a separate > package(s). Which sounds good, especially since (as I understand it), > these domains are separate from the DAS/2 ?sequence? domain. The only > place there?s a lot of mixture of code for these domains with the > sequence parts is in Das2VersionedSource. Is it okay if I move this > out (or comment it out) of Das2VersionedSource while I renovate other > parts of the class? > > thanks, > > Gregg > From Gregg_Helt at affymetrix.com Tue Feb 7 05:43:07 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 21:43:07 -0800 Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes Message-ID: Okay, I just split the code that was in Das2VersionedSource. Now regions and types (w/o ontology) are handled in Das2VersionedSource, and ontology, materials, results, and assays are handled by a subclass, Das2VersionedSourcePlus. I might do some further refactoring at a later date, but for right now this works (and compiles/runs). I also went ahead and committed almost all my DAS/2 code changes to the genoviz repository. Gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Helt,Gregg > Sent: Monday, February 06, 2006 7:35 PM > To: Brian O'Connor > Cc: DAS/2; Marc Carlson > Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes > > > You're right, it looks like some of this code was already getting moved > over to the das2.assay and das2.ontology packages as subclasses of > Das2VersionedSource. > > However it's not clear to me if the equivalent of source and versioned > source for assay, ontology, and other domains are going to be similar > enough to the DAS/2 sequence domain to justify sharing a base > class/interface. What do/will they share? > > I'll go ahead with changes to the das2 package, and look into moving > much of this code into a das2.sequence package. > > Thanks, > Gregg > > > -----Original Message----- > > From: Brian O'Connor [mailto:boconnor at ucla.edu] > > Sent: Monday, February 06, 2006 7:09 PM > > To: Helt,Gregg > > Cc: Marc Carlson; Allen Day; DAS/2 > > Subject: Re: Modifying com.affymetrix.igb.das2 classes > > > > Hi Gregg, > > > > Go for it!! Marc and I can take a look at it again when you're happy > > with the changes. The versioned source object really needed an > overhaul > > anyway to deal with the multiple domains of the DAS/2 server. I think > > there should be a VersionedSource parent and then children for each > > domain (i.e. VersionedSourceAssay). I think Marc started to do this > but > > he was afraid to alter the VersionedSource object too much for fear of > > breaking the IGB client. > > > > --Brian > > > > Helt,Gregg wrote: > > > > > Brian and Marc, > > > > > > I'm about to start seriously modifying the IGB DAS/2 classes in the > > > com.affymetrix.igb.das2 package. There's code in there you wrote to > > > work with materials, assays, results, and ontology. I think we > > > discussed at some point splitting this stuff out into a separate > > > package(s). Which sounds good, especially since (as I understand > it), > > > these domains are separate from the DAS/2 "sequence" domain. The > only > > > place there's a lot of mixture of code for these domains with the > > > sequence parts is in Das2VersionedSource. Is it okay if I move this > > > out (or comment it out) of Das2VersionedSource while I renovate > other > > > parts of the class? > > > > > > thanks, > > > > > > Gregg > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From Gregg_Helt at affymetrix.com Tue Feb 7 05:46:37 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 6 Feb 2006 21:46:37 -0800 Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06 Message-ID: Will you be able to join the teleconference tomorrow (Tuesday?). Suzi is planning to join in, I'm hoping we can spend some time discussing ontologies. Thanks Gregg P.S. 9 AM Pacific time 800-531-3250 id: 2879055 > -----Original Message----- > From: Lincoln Stein [mailto:lstein at cshl.edu] > Sent: Monday, February 06, 2006 10:20 AM > To: das2 at portal.open-bio.org > Cc: Helt,Gregg > Subject: Re: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06 > > Hi Gregg, > > I had a conflicting teleconference and wasn't sure whether there was a > teleconference scheduled for the code sprint, so I didn't dial in. Just > got > the agenda now. > > I am online on both MSN and AOL chats, and will be all week, if anyone > wants > to IM me. > > Lincoln > > On Monday 06 February 2006 12:13, Helt,Gregg wrote: > > Status report > > DAS/2 XML - valid or not valid? > > CATEGORY elements -- constructing query URLs > > MAINTAINER information > > Use of xml:base > > update on feature properties - searching, etc. > > > > > > > > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Tue Feb 7 09:22:56 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 09:22:56 +0000 Subject: [DAS2] elements In-Reply-To: <43E7D251.8050703@affymetrix.com> References: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com> <43E7D251.8050703@affymetrix.com> Message-ID: <8daf0ba1e5744f8e0b99fc644fb5dd38@dalkescientific.com> Ed Erwin wrote: > None of your examples vary the words "segments", "types" or > "features", but it is legal to do so, right?: > > segments at "ensembl.org/das2/genome_A/build_1/segment" > features at "example.com/A/version_x/things/and/more/things" > types at "example.com/A/rhinoceros" > > OK, so no one is likely to go that far, but is it legal for example to > use non-plural "segment", "feature" and "type" ? Yes. My goal is two-fold. First, make no assertions on the internal organization of the DAS server. Machines can change, directories can move around. The specific advantages are: - annotation servers can all point to the same "segments" server - multiple versions of the same genomic source and on the same machine can reuse the same "types" server Another thought, perhaps too old-fashioned for modern web development, is that the query URLs are cgi scripts in a "cgi-bin" directory while the data files are flat-files in some other directory. Simiarly, the query url if a CGI script might end with a ".cgi" or ".pl" extension. My second goal is to develop a recommended layout. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Feb 7 09:32:11 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 09:32:11 +0000 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 6 Feb 2006 In-Reply-To: References: Message-ID: <97f6d51a2e54031ed49fe7997af383eb@dalkescientific.com> > gh: would like a re-cast as xml document, hosted at so/sofa > website. that xml would be like a std ontology representation so you > could extend it. so someone could point to an extension of it. I asked as an action item if Gregg would look into the solution for this. Do we refer to the ontology by a "GO:0123456" identifier or by some URL scheme? If so, what's the mapping from URL scheme to something that clients and people can understand, eg, to ask for everything which is an exon? Does this mapping need a version number - does it change over time? Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Feb 7 10:38:28 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 10:38:28 +0000 Subject: [DAS2] per-database MAINTAINER Message-ID: <294a2caeb29a823dd93fa1155012c8cb@dalkescientific.com> Based on Andreas Prlic's work with the DAS2 registry I've added a new MAINTAINER element to the SOURCE/VERSION part of the SOURCES document. I've updated das/das2/scratch/sources4.xml to have an example. It looks something like this The idea is that the database maintainer can be different than the server maintainer. On the other hand addition, if the SOURCES/SOURCE/VERSION/MAINTAINER is not present then clients may assume that the database maintainer is the same as the SOURCES/MAINTAINER The maintainer elements are both optional. Andrew dalke at dalkescientific.com From allenday at ucla.edu Tue Feb 7 10:52:12 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 7 Feb 2006 02:52:12 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: The XML is now as you requested, please confirm. After some thought today I realized the new SOURCES response is fully compatible with the existing server. The doc at: http://das.biopackages.net/codesprint/sequence is now simply a static XML doc that points into the stable server (plus the new "segments" response) implementation at: http://das.biopackages.net/das/genome The headers for the static document don't include the correct Content-Type "application/x-das-blah ; version = XxX", it's simply "text/xml". I'll add the headers in the morning GMT+8. There are probably also some other Content-Type headers that need to be changed for the other responses -- let me know if you spot them. -Allen On Mon, 6 Feb 2006, Andrew Dalke wrote: > Allen: > > After looking closely over this first draft of new_spec.txt, it's > > apparent > > that there are still some holes, e.g. what should the response to the > > following requests look like? > > > > http://das.biopackages.net/codesprint/sequence/yeast > > > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/"> > taxon="Yeast"> > > > > > > > > > > > > > > > > > > > > > > > > > > > http://das.biopackages.net/codesprint/sequence/yeast/S228C > > The same for this case. There is only on VERSION for "yeast". > > > Your XML, btw, starts > > > > > > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/"> > > The "standalone" means that the DTD may affect the content of the > documentation. > http://www.stylusstudio.com/w3c/xml11/sec-rmd.htm > > > Markup declarations can affect the content of the document, as passed > > from an XML Processor to an application; examples are attribute > > defaults and entity declarations. The standalone document declaration, > > which MAY appear as a component of the XML declaration, signals > > whether or not there are such declarations which appear external to > > the Document Entity or in parameter entities. An external markup > > declaration is defined as a markup declaration occurring in the > > external subset or in a parameter entity (external or internal, the > > latter being included because non-validating processors are not > > required to read them). > > For what we're doing, we don't need nor (I think) want that. There's > no reason for a client to consult the DTD to figure out the XML. > > Instead, use > > > > and probably have the encoding > > > > That also means you can get rid of the > > > > statements. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From dalke at dalkescientific.com Tue Feb 7 12:19:28 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 12:19:28 +0000 Subject: [DAS2] properties and queries Message-ID: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com> We've had a long discussion here about properties and how to search them. As it stands now the spec has a few holes in it. Here are the properties we've talked about. program_name: the program used to make the annotation, like "BLASTX 1.2.3" notes: There can be 0 or more notes. Notes might refer to other notes (eg, "the previous note said XYZ but I think ABC") phase: (is it 0, 1, 2 or 1, 2, 3?) (And does anyone use this? People here don't use it; Thomas "reinfers it by counting along the transcript" "but maybe that's just me". Others say they don't use the DAS1 phase.) icon: a hypothetical image use for the feature, perhaps as a binary png; curation history: a list of elements, each with - person - timestamp - reason for change score: a floating point number, which may be in exponential notation like "1E-3" Each one needs different search mechanisms. For example, "annotations done by that buggy version of BLAST 1.2.3" "scores better than 1E-2" "changes by Andrew done in August 2004" "notes with the substring 'helicase'" (case sensitive or not?) "notes with the phrase 'E. Coli'" (substring might not work if there's the note has 'E.\nColi') The property storage scheme doesn't handle this quite correctly. Here are problems: - how do you store multiple notes? Answer 1: use structured named, like "note_1", "note_2", "note_3", .. HACK! Then what if a note is deleted? Bigger problem; how do you search the "note" field using the existing query language? Answer 2: allow duplicate note elements, like Question: so the order must be preserved if two fields have the same name? Can't implement with a dictionary/hash data type. Question: what if there are duplicate "score" or "phase" elements? Which one wins? Answer 3: Notes are important and we know we need them now. Let's have a element and not make it be a property. This is a note The previous note is a lie! Is this an E or a NOT-E? (perhaps also with timestamp and author name, but that's a different question.) Then we also define that the "note=" parameter in as DAS query is a substring search of the elements of a feature. I like this one. - How do you do numeric searches? This is hypothetical. There hasn't been a requirement for this. 'Course it may be because people haven't had the ability. In any case, how to search numeric fields like "score" with comparisons? - querying non-queryable fields If there's embedded binary data, like an image, is it searchable? Does a server complain and die? Ignore the request? - more complex text searches "proteinase but not inhibitor" - complex data We have support for non-DAS extensions, which might be Change the this into that because of some reason or other Thomas proposed that we support some sort of complex query language, probably in XML, and get rid of the simple query scheme we have now. I argued against the complexity of that given that nearly all of the queries will be "give me these feature types on this range of that chromosome". I also pointed out that developing a generic query language is hard, and implementing it is harder. Why require all that effort? Roy commented the other way - in a server with only a few hundred features, why require a query language at all? Just return all of the features in the request. Here's what I proposed. We have the "CATEGORY" (but after discussion I now want to take it back to "CAPABILITY" since that's now much closer to what it does - it describes where to go to do something) So I'll use "CAPABILITY" The current scheme has This is an extensibility point. Suppose Thomas has an XML query search interface support on his server, with Sanger clients that handle it. Then there can be A client can see the list of CAPABILITIES and decide to use the feature search mechanism it likes best. In addition, we could say that "this supports the normal DAS query scheme but also supports extension vocabulary. For example, With this a client knows that the query_url supports the normal DAS queries and also supports the "annotator", "annotation_before" and "annotation_after" queries, like this .../features?annotator=Andrew;annotation_before=2005 Possible idea: if there is no SUPPORTs tag then the server implements no search syntax and instead returns everything, for the example Roy mentioned. Okay, we're off to lunch. Andrew dalke at dalkescientific.com From ap3 at sanger.ac.uk Tue Feb 7 12:21:53 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 7 Feb 2006 12:21:53 +0000 Subject: [DAS2] das-regstry sources response Message-ID: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk> Hi! I added a DAS2- sources response to a copy of the das registry running on my laptop. the attached file shows how the das1 sources are described using the das2 spec. - it fits together rather well. I did not know what to put under the . The already contain all required info. Therefore I propose to drop Andreas -------------- next part -------------- A non-text attachment was scrubbed... Name: sources_response.xml Type: application/octet-stream Size: 32318 bytes Desc: not available URL: -------------- next part -------------- ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From dalke at dalkescientific.com Tue Feb 7 13:20:35 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 13:20:35 +0000 Subject: [DAS2] das-regstry sources response In-Reply-To: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk> References: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk> Message-ID: Andreas: > I did not know what to put under the . The > already contain all required info. > Therefore I propose to drop Removed and commited to CVS. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Tue Feb 7 15:34:21 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Tue, 7 Feb 2006 07:34:21 -0800 Subject: [DAS2] Ontologies in DAS/2 Message-ID: I talked to Suzi, she's planning to join our teleconference today to discuss ontologies, wearing her hat as co-PI of the National Center for Biomedical Ontology. Hopefully Lincoln can join too. I took a closer look at the DAS/2 ontology work Allen has done (see http://biodas.org/documents/das2/das2_ontology.html). I urge anyone who wants to contribute to the ontology discussion to read this doc. It specifies a way to retrieve ontologies in OBOXML format. In this format each ontology term gets an absolute URI through the same mechanism that the rest of DAS/2 uses (URIs for ids, which can be either absolute or relative but resolvable). As Allen pointed out yesterday this would solve our problem of how to uniquely specify ontology terms in the DAS/2 TYPES XML. I couldn't find any documentation for the OBOXML format, other than the code that generates it from OBO files. But I'm using OBOXML as an example here because it clearly has resolvable URIs for each ontology term. In Allen's spec, ontologies can also be returned in other formats, but it's unclear to me whether terms in these other formats would resolve to similar URIs. gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Andrew Dalke > Sent: Tuesday, February 07, 2006 1:32 AM > To: DAS/2 > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > sprint,6 Feb 2006 > > > gh: would like a re-cast as xml document, hosted at so/sofa > > website. that xml would be like a std ontology representation so you > > could extend it. so someone could point to an extension of it. > > I asked as an action item if Gregg would look into the solution > for this. Do we refer to the ontology by a "GO:0123456" identifier > or by some URL scheme? If so, what's the mapping from URL scheme > to something that clients and people can understand, eg, to > ask for everything which is an exon? > > Does this mapping need a version number - does it change over time? > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org From dalke at dalkescientific.com Tue Feb 7 15:45:00 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 15:45:00 +0000 Subject: [DAS2] properties and queries In-Reply-To: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com> References: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com> Message-ID: <16111cd36850795dfd46696a63fb1057@dalkescientific.com> To summarize, the current thought here for properties and queries is as follows (it's a long summary. More like an essay. :) Add support for zero or more elements in the feature, of the form This is some arbitrary (but non-markup-ed) text Add a features search keyword "note=" which takes a search string to be found in the note elements. (substring? soundex? regex? the search engine calls up Lincoln and asks?) Add support for zero or more elements in the feature, of the form (I missed this in the redraft. It should have been there. Feature filter "name" already says it searches the "name" and "alias" fields for a feature.) Ignore the "phase" property (contentious, perhaps?) or add it as an attribute of something else in the feature element. Ignore the "score" property. As written in the current spec "score" A floating point number indicating a context-dependent score. This is to be used only when a more specific ontology-driven score cannot be used. (Umm, where do the other scores go?) Unless someone wants to define that score ontology and what it means to search that field, this is a can of worms I don't want to open. Ignore the "editable" property. As written (and kibbitzed) "editable" indicates that features may be updateable (this is at the discretion of the server). (But this is potentially per-user data.) This should either be in the feature type or it should be in some write-back specific data structure the client can fetch. (To be discussed) It isn't a feature property. This gets rid of all stated needs for arbitrary key/value data. That doesn't mean there won't be future needs. In that case, here's how to add new pieces of data. 1) use a non-DAS extension element. Clients must ignore elements they don't understand. This is good for storing data, but not for searching. The thing is, the search mechanism (or multiple search mechanisms perhaps) is data field specific. Hence, 2) servers may provide extensions to the basic DAS query mechanism. Currently the mechanism is: and-ed set of zero or more keyword = (set, of, or, terms, for, keyword) where "keyword" is well-defined by DAS except for the "att" property keywords. Query extensions add new keywords in the same syntax, and define somewhere how that syntax works. It must be backwards compatible to the existing syntax and semantics. The problem then is clients don't know that a server supports a given query extension, so 3) add a element to the element. (Also proposed, renaming "CATEGORY" back to "CAPABILITY".) The CAPABILITY may have zero or more of Here are the two defined unique strings, The "all" query says that a client may reasonably fetch all the features in one go. This would occur with a small DAS server containing only a few hundred features. In that case there's no need to even have a CGI script running on the back end - just a set of flat files. The query is done by fetching the URL with no parameters. A rich server with millions of features might decide to not support an "all" query. The "das2" query is the one we've been talking about. If a site develops a query extension it adds so clients know what the server can do. (In this case supporting searches for "annotator", "annotation_before" and "annotation_after" fields.) That all said, this doesn't mean that the server shouldn't have a property table. It's a question of what it means to search the property table. People here want the following: multiple properties may have the same key and different value the order of the properties is not important the "att:" search is renamed a "prop:" search, like "prop:author" the search is a substring search. a feature matches a search if any of the properties with that name match the substring search For example, source = BLAST 2.3.4 author = Andrew Dalke author = Thomas Down lets me search for features?prop:author=Andrew all features with "Andrew" as a substring in the "author" property features?prop:author=Andrew;source=BLAST all features with "Andrew" as a substring in the "author" and with "BLAST" in the source name features?prop:author=Andrew,Thomas all features with "Andrew" or "Thomas" as an author Really what I think this essay is doing is saying that storing data and searching data is different. Servers can develop new ways to extend DAS searches and flag that they support new searches. (Eg, the new search may be to support a different way to search a field in the property table.) But there needs to be a really basic substring search, given that there will be simple string key/ string value data for the property table. Oh, and should the key/value table also include my proposed "href" and embedded binary data fields like images? Hmmmmm.... Lots of talk about this here. Time for a tea break. Andrew dalke at dalkescientific.com From lstein at cshl.edu Tue Feb 7 16:00:52 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 7 Feb 2006 11:00:52 -0500 Subject: [DAS2] properties and queries In-Reply-To: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com> References: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com> Message-ID: <200602071100.52818.lstein@cshl.edu> Hi, I use the phase information quite a lot and I know that others do as well. The phase is {0,1,2} and the meaning is described here: For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon. In other words, a phase of "0" indicates that the next codon begins at the first base of the region described by the current line, a phase of "1" indicates that the next codon begins at the second base of this region, and a phase of "2" indicates that the codon begins at the third base of this region. This is NOT to be confused with the frame, which is simply start modulo 3. Lincoln On Tuesday 07 February 2006 07:19, Andrew Dalke wrote: > We've had a long discussion here about properties and how to > search them. As it stands now the spec has a few holes in it. > > Here are the properties we've talked about. > > program_name: the program used to make the annotation, like > "BLASTX 1.2.3" > > notes: > There can be 0 or more notes. Notes might refer to other > notes (eg, "the previous note said XYZ but I think ABC") > > phase: (is it 0, 1, 2 or 1, 2, 3?) > (And does anyone use this? People here don't use it; Thomas > "reinfers it by counting along the transcript" "but maybe > that's just me". Others say they don't use the DAS1 phase.) > > icon: a hypothetical image use for the feature, perhaps as > a binary png; > > curation history: > a list of elements, each with > - person > - timestamp > - reason for change > > score: a floating point number, which may be in exponential > notation like "1E-3" > > Each one needs different search mechanisms. For example, > "annotations done by that buggy version of BLAST 1.2.3" > "scores better than 1E-2" > "changes by Andrew done in August 2004" > "notes with the substring 'helicase'" (case sensitive or not?) > "notes with the phrase 'E. Coli'" (substring might not work > if there's the note has 'E.\nColi') > > The property storage scheme doesn't handle this quite correctly. > Here are problems: > > - how do you store multiple notes? > > Answer 1: use structured named, like "note_1", "note_2", "note_3", .. > HACK! Then what if a note is deleted? Bigger problem; how do you > search the "note" field using the existing query language? > > Answer 2: allow duplicate note elements, like > > > > > Question: so the order must be preserved if two fields have the > same name? Can't implement with a dictionary/hash data type. > > Question: what if there are duplicate "score" or "phase" elements? > Which one wins? > > Answer 3: Notes are important and we know we need them now. > Let's have a element and not make it be a property. > > This is a note > The previous note is a lie! > Is this an E or a NOT-E? > > (perhaps also with timestamp and author name, but that's a different > question.) Then we also define that the "note=" parameter in as > DAS query is a substring search of the elements of a feature. > > I like this one. > > > - How do you do numeric searches? > > This is hypothetical. There hasn't been a requirement for this. > 'Course it may be because people haven't had the ability. In > any case, how to search numeric fields like "score" with comparisons? > > > - querying non-queryable fields > > If there's embedded binary data, like an image, is it searchable? > Does a server complain and die? Ignore the request? > > - more complex text searches > > "proteinase but not inhibitor" > > - complex data > > We have support for non-DAS extensions, which might be > > > > Change the this into that because of some reason or other > > > > Thomas proposed that we support some sort of complex query > language, probably in XML, and get rid of the simple query scheme > we have now. > > I argued against the complexity of that given that nearly all > of the queries will be "give me these feature types on this range > of that chromosome". I also pointed out that developing a > generic query language is hard, and implementing it is harder. > Why require all that effort? > > Roy commented the other way - in a server with only a few hundred > features, why require a query language at all? Just return all > of the features in the request. > > Here's what I proposed. > > We have the "CATEGORY" (but after discussion I now want to take > it back to "CAPABILITY" since that's now much closer to what > it does - it describes where to go to do something) > > So I'll use "CAPABILITY" > > The current scheme has > > > > > > This is an extensibility point. Suppose Thomas has an XML > query search interface support on his server, with Sanger > clients that handle it. Then there can be > > query_url="http.../search-features"> > > > > A client can see the list of CAPABILITIES and decide to > use the feature search mechanism it likes best. > > In addition, we could say that "this supports the normal DAS > query scheme but also supports extension vocabulary. For example, > > > > > > > With this a client knows that the query_url supports the normal > DAS queries and also supports the "annotator", "annotation_before" > and "annotation_after" queries, like this > > .../features?annotator=Andrew;annotation_before=2005 > > Possible idea: if there is no SUPPORTs tag then the server > implements no search syntax and instead returns everything, > for the example Roy mentioned. > > Okay, we're off to lunch. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Tue Feb 7 16:46:47 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 7 Feb 2006 11:46:47 -0500 Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: <200602071146.48212.lstein@cshl.edu> Hi, I have group meeting from 12-1 every Tuesday, so I can't make this one. I'll be present for the telecon Wednesday at 12. Lincoln On Tuesday 07 February 2006 10:34, Helt,Gregg wrote: > I talked to Suzi, she's planning to join our teleconference today to > discuss ontologies, wearing her hat as co-PI of the National Center for > Biomedical Ontology. Hopefully Lincoln can join too. > > I took a closer look at the DAS/2 ontology work Allen has done (see > http://biodas.org/documents/das2/das2_ontology.html). I urge anyone who > wants to contribute to the ontology discussion to read this doc. It > specifies a way to retrieve ontologies in OBOXML format. In this format > each ontology term gets an absolute URI through the same mechanism that > the rest of DAS/2 uses (URIs for ids, which can be either absolute or > relative but resolvable). As Allen pointed out yesterday this would > solve our problem of how to uniquely specify ontology terms in the DAS/2 > TYPES XML. > > I couldn't find any documentation for the OBOXML format, other than the > code that generates it from OBO files. But I'm using OBOXML as an > example here because it clearly has resolvable URIs for each ontology > term. In Allen's spec, ontologies can also be returned in other > formats, but it's unclear to me whether terms in these other formats > would resolve to similar URIs. > > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > > [mailto:das2-bounces at portal.open- > > > bio.org] On Behalf Of Andrew Dalke > > Sent: Tuesday, February 07, 2006 1:32 AM > > To: DAS/2 > > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > > sprint,6 Feb 2006 > > > > > gh: would like a re-cast as xml document, hosted at so/sofa > > > website. that xml would be like a std ontology representation so you > > > could extend it. so someone could point to an extension of it. > > > > I asked as an action item if Gregg would look into the solution > > for this. Do we refer to the ontology by a "GO:0123456" identifier > > or by some URL scheme? If so, what's the mapping from URL scheme > > to something that clients and people can understand, eg, to > > ask for everything which is an exon? > > > > Does this mapping need a version number - does it change over time? > > > > Andrew > > dalke at dalkescientific.com > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Tue Feb 7 16:50:56 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 16:50:56 +0000 Subject: [DAS2] query_api and server layout Message-ID: Continuing from yesterday's discussion... There are several things in a DAS server - there is the list of all sources and versions - there is a list of all versions for a source - there is the versioned source information The versioned source only really provides a bit of overall configuration information and links to three URLs: - the query interface for features - the query interface for types - the query interface for segments It doesn't say anything about where the actual feature, type and segment data is stored. It doesn't even mean that the query URLs are on the same machine as the versioned source document. Hence Andreas can have his registry server. DAS defines what those queries do. The segments query URL interface can be a shared reference server. It has a rather simple interface: - get URLs and information for each segment - given a sequence URL return the sequence data - return the assembly data The segment and sequence data does not need to be on the same machine as the segments query URL. It likely will be but does not need to be. DAS defines what the types interface does. At present it is also very simple. Be default it lists everything, or you can ask it for an "ontology" or (proposed new query) "exact_ontology", and it returns all DAS types which match that request. The actual DAS type data does not need to be on the same server has the DAS query URL, though again it probably will be. The types query URL does not need to be on the same machine as the segments query URL. Similarly, the features query URL implements the DAS query interface and returns a list of features. The actual features do not need to be on the same machine or directory location as the feature query, or the types, or the segments. Here are some possible reasons for the different locations: Common case: - segments query URL and segments data on a reference server - versioned source provides its own types and features New genome / internal project: - database implements all three query URLs Registry server: - each versioned source entry points to the original machine's values for the segments, types and features query URLs Multiple versions database, shared types: - segments points to the reference server - all versioned sources "types" query url point to the same URL - each versioned source gets it own features query old-style CGI-based web server: - the "segments" query url points to the reference server - the individual features, types and sources are ".xml" files in the file system - the query URLs end with ".cgi" and start a CGI script If we say that the URL for doing a types query is composed as: + "/" (if missing) + "types" then at the very least we preclude CGI-based servers. No big deal perhaps? It also makes things slightly more duplicitous when several versions of the database share the same DAS "types" (and "segments"). I also think using a server-provided URL is easier than constructing the URL in code. Get the "query_url", perhaps resolved by the xml:base. That's it. No need to add in the "/types". Gregg worries about the network performance of having because each location has the full URL to another server and the type in this case refers to a types collection shared by all of the versions of the source. I've thought about that for a while. It's a reasonable and serious architectural concern. I think the right response is that that's an architecture decision we should leave up to the data provider. If Gregg wants more compact XML and that on-the-fly compression slows things down too much then his DAS server can make the segments, types and features all be not only on the same machine but in the same directory. The following is valid (omitting some required parts) The features request can return GET /h_sapiens/v1/features In this architecture, features start with an 'F', like /h_sapiens/v1/F12345 types start with a 'T', like /h_sapiens/v1/Tabcde and regions start with a 'C', like /h_sapiens/v1/S1 This is about as compact as I think you can make it, yet it still fits into the current DAS spec. (You don't even need the special character - it only makes it easier to see that the names/URLs will never collide.) Andrew dalke at dalkescientific.com From lstein at cshl.edu Tue Feb 7 16:51:55 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 7 Feb 2006 11:51:55 -0500 Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: <200602071151.56939.lstein@cshl.edu> Allen's ideas seem very sensible and easy to manage. We can already serve associations between genomic features and GO terms via properties, so the concerns expressed in the discussion section about the big GO API shouldn't apply. Lincoln On Tuesday 07 February 2006 10:34, Helt,Gregg wrote: > I talked to Suzi, she's planning to join our teleconference today to > discuss ontologies, wearing her hat as co-PI of the National Center for > Biomedical Ontology. Hopefully Lincoln can join too. > > I took a closer look at the DAS/2 ontology work Allen has done (see > http://biodas.org/documents/das2/das2_ontology.html). I urge anyone who > wants to contribute to the ontology discussion to read this doc. It > specifies a way to retrieve ontologies in OBOXML format. In this format > each ontology term gets an absolute URI through the same mechanism that > the rest of DAS/2 uses (URIs for ids, which can be either absolute or > relative but resolvable). As Allen pointed out yesterday this would > solve our problem of how to uniquely specify ontology terms in the DAS/2 > TYPES XML. > > I couldn't find any documentation for the OBOXML format, other than the > code that generates it from OBO files. But I'm using OBOXML as an > example here because it clearly has resolvable URIs for each ontology > term. In Allen's spec, ontologies can also be returned in other > formats, but it's unclear to me whether terms in these other formats > would resolve to similar URIs. > > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > > [mailto:das2-bounces at portal.open- > > > bio.org] On Behalf Of Andrew Dalke > > Sent: Tuesday, February 07, 2006 1:32 AM > > To: DAS/2 > > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > > sprint,6 Feb 2006 > > > > > gh: would like a re-cast as xml document, hosted at so/sofa > > > website. that xml would be like a std ontology representation so you > > > could extend it. so someone could point to an extension of it. > > > > I asked as an action item if Gregg would look into the solution > > for this. Do we refer to the ontology by a "GO:0123456" identifier > > or by some URL scheme? If so, what's the mapping from URL scheme > > to something that clients and people can understand, eg, to > > ask for everything which is an exon? > > > > Does this mapping need a version number - does it change over time? > > > > Andrew > > dalke at dalkescientific.com > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Gregg_Helt at affymetrix.com Tue Feb 7 16:54:39 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Tue, 7 Feb 2006 08:54:39 -0800 Subject: [DAS2] Proposed agenda for DAS/2 code sprint teleconference, Tuesday Feb 7 Message-ID: Vote on how to construct URLs to query for segments, types, features: 1.) specified by query_id 2.) hardwired to ~/segments, ~/types, ~/features 3.) ? Status Report Integrating sequence ontology with DAS/2 (and possibly other ontologies) Feature properties and queries over properties MAINTAINER information Use of xml:base ? From dalke at dalkescientific.com Tue Feb 7 17:01:38 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 17:01:38 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> Message-ID: <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com> Allen > The XML is now as you requested, please confirm. Missing the namespace declaration. You have should be The element goes after the CATEGORY. (Which I want to rename back to CAPABILITY.) The ASSEMBLY element no longer exists. Fixing those by hand, * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: error: attribute "writeable" not allowed at this point; ignored * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: error: attribute "taxon" not allowed at this point; ignored There is no more 'writeable' (that's, IMO) something to be decided as part of the writeback spec. It might be that we have a and the existence of that indicate writeability. It's also "taxid" and not "taxon". I used "taxid" because that's what NCBI uses for their data. > There are probably also some other Content-Type headers that need to be > changed for the other responses -- let me know if you spot them. Haven't gotten that far yet. Andrew dalke at dalkescientific.com From allenday at ucla.edu Tue Feb 7 17:25:03 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 7 Feb 2006 09:25:03 -0800 (PST) Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com> Message-ID: On Tue, 7 Feb 2006, Andrew Dalke wrote: > Allen > > The XML is now as you requested, please confirm. > > Missing the namespace declaration. You have > > > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://das.biopackages.net/das/genome/"> > > should be > > xmlns="http://www.biodas.org/ns/das/genome/2.00" > xmlns:xlink="http://www.w3.org/1999/xlink" > xml:base="http://das.biopackages.net/das/genome/"> done > > The element goes after the CATEGORY. (Which I want to > rename back to CAPABILITY.) done > > The ASSEMBLY element no longer exists. done > > Fixing those by hand, > > * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: > error: attribute "writeable" not allowed at this point; ignored > * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: > error: attribute "taxon" not allowed at this point; ignored > > There is no more 'writeable' (that's, IMO) something to be decided > as part of the writeback spec. It might be that we have a > > > > and the existence of that indicate writeability. i have not made the change if this is an IMO. > > It's also "taxid" and not "taxon". I used "taxid" because that's > what NCBI uses for their data. done -Allen > > > There are probably also some other Content-Type headers that need to be > > changed for the other responses -- let me know if you spot them. > > Haven't gotten that far yet. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From ap3 at sanger.ac.uk Tue Feb 7 17:44:41 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 7 Feb 2006 17:44:41 +0000 Subject: [DAS2] toy - das2 registry Message-ID: Hi! A "toy" das2 registry serving das1 servers, via das2 responses can be accessed at http://www.spice-3d.org/dasregistry/das2/sources/ I will work on adding the first das2 servers tomorrow. Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From cjm at fruitfly.org Tue Feb 7 17:29:09 2006 From: cjm at fruitfly.org (Chris Mungall) Date: Tue, 7 Feb 2006 09:29:09 -0800 (PST) Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: Hi all I'm concerned that the XML in the URL below isn't quite Obo-XML, it's Allen's modified version of it. In particular, the adding of an "id" attribute which is redundant with the id element, and the modification of the ID scheme to use slashes instead of :s. I believe the latter may have been to make the ID scheme more DAS-y? OBO IDs are composed of a prefix and a local ID. These are always joined with a :. The prefix can be specified as shortform (eg GO) or a URI prefix. When the long form is combined with the local ID you get your URI. If DAS wants to use a modified version of Obo-XML, that's fine, but please don't call it Obo-XML, it will cause huge confusion! I would much prefer if you used Obo-XML as it is - if there are things you'd like to see changed about the format we can perhaps work that out. I'm concerned by the changing the ID to use / instead of :. This is wrong, and if it's something that's required for DAS, how will you interoperate with RDF etc? In fact there are other parts where the xml is definitely not Obo-XML - it looks like Allen has coded these by hand rather than taking existing XML. That's fine, but it should be marked as such. For example, there is no develops_from element in Obo-XML; all relations bar is_a are encoded as relationship elements. There is a DTD at the moment http://www.godatabase.org/dev/xml/dtd The docs are minimal as the explanation of all the fields is in the docs for the obo text file format http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf} We'll be converting to RNG+XSD soon You can get Obo-XML examples from http://www.fruitfly.org/~cjm/obo-download You can see the default rule for creating a URI in the OWL files; these currently all get the geneontology.org URI prefix by default, but this will change (we were going to use LSIDs but the majority of OWL tools don't seem to handle URNs very well) As far as DAS/2 supporting different file formats, Obo-XML and RDFS/OWL would seem to be the natural contenders. We currently go from the former to the latter via a simple XSLT, the reverse transformation is a little more difficult. Allen has inlined some comments from an email exchange with me in the document. I agree about keeping the API minimal. On the other hand you will need at least some inferencing machinery - I'd encourage you to reuse existing reasoning services here. Cheers Chris On Tue, 7 Feb 2006, Helt,Gregg wrote: > I talked to Suzi, she's planning to join our teleconference today to > discuss ontologies, wearing her hat as co-PI of the National Center for > Biomedical Ontology. Hopefully Lincoln can join too. > > I took a closer look at the DAS/2 ontology work Allen has done (see > http://biodas.org/documents/das2/das2_ontology.html). I urge anyone who > wants to contribute to the ontology discussion to read this doc. It > specifies a way to retrieve ontologies in OBOXML format. In this format > each ontology term gets an absolute URI through the same mechanism that > the rest of DAS/2 uses (URIs for ids, which can be either absolute or > relative but resolvable). As Allen pointed out yesterday this would > solve our problem of how to uniquely specify ontology terms in the DAS/2 > TYPES XML. > > I couldn't find any documentation for the OBOXML format, other than the > code that generates it from OBO files. But I'm using OBOXML as an > example here because it clearly has resolvable URIs for each ontology > term. In Allen's spec, ontologies can also be returned in other > formats, but it's unclear to me whether terms in these other formats > would resolve to similar URIs. > > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > [mailto:das2-bounces at portal.open- > > bio.org] On Behalf Of Andrew Dalke > > Sent: Tuesday, February 07, 2006 1:32 AM > > To: DAS/2 > > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > > sprint,6 Feb 2006 > > > > > gh: would like a re-cast as xml document, hosted at so/sofa > > > website. that xml would be like a std ontology representation so you > > > could extend it. so someone could point to an extension of it. > > > > I asked as an action item if Gregg would look into the solution > > for this. Do we refer to the ontology by a "GO:0123456" identifier > > or by some URL scheme? If so, what's the mapping from URL scheme > > to something that clients and people can understand, eg, to > > ask for everything which is an exon? > > > > Does this mapping need a version number - does it change over time? > > > > Andrew > > dalke at dalkescientific.com > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From cjm at fruitfly.org Tue Feb 7 17:32:24 2006 From: cjm at fruitfly.org (chris mungall) Date: Tue, 7 Feb 2006 09:32:24 -0800 Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: <200602071151.56939.lstein@cshl.edu> References: <200602071151.56939.lstein@cshl.edu> Message-ID: What inferencing rules do you use for fetching features by their Ontology_terms? On Feb 7, 2006, at 8:51 AM, Lincoln Stein wrote: > Allen's ideas seem very sensible and easy to manage. We can already > serve > associations between genomic features and GO terms via properties, so > the > concerns expressed in the discussion section about the big GO API > shouldn't > apply. > > Lincoln > > On Tuesday 07 February 2006 10:34, Helt,Gregg wrote: >> I talked to Suzi, she's planning to join our teleconference today to >> discuss ontologies, wearing her hat as co-PI of the National Center >> for >> Biomedical Ontology. Hopefully Lincoln can join too. >> >> I took a closer look at the DAS/2 ontology work Allen has done (see >> http://biodas.org/documents/das2/das2_ontology.html). I urge anyone >> who >> wants to contribute to the ontology discussion to read this doc. It >> specifies a way to retrieve ontologies in OBOXML format. In this >> format >> each ontology term gets an absolute URI through the same mechanism >> that >> the rest of DAS/2 uses (URIs for ids, which can be either absolute or >> relative but resolvable). As Allen pointed out yesterday this would >> solve our problem of how to uniquely specify ontology terms in the >> DAS/2 >> TYPES XML. >> >> I couldn't find any documentation for the OBOXML format, other than >> the >> code that generates it from OBO files. But I'm using OBOXML as an >> example here because it clearly has resolvable URIs for each ontology >> term. In Allen's spec, ontologies can also be returned in other >> formats, but it's unclear to me whether terms in these other formats >> would resolve to similar URIs. >> >> gregg >> >>> -----Original Message----- >>> From: das2-bounces at portal.open-bio.org >> >> [mailto:das2-bounces at portal.open- >> >>> bio.org] On Behalf Of Andrew Dalke >>> Sent: Tuesday, February 07, 2006 1:32 AM >>> To: DAS/2 >>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code >>> sprint,6 Feb 2006 >>> >>>> gh: would like a re-cast as xml document, hosted at so/sofa >>>> website. that xml would be like a std ontology representation so you >>>> could extend it. so someone could point to an extension of it. >>> >>> I asked as an action item if Gregg would look into the solution >>> for this. Do we refer to the ontology by a "GO:0123456" identifier >>> or by some URL scheme? If so, what's the mapping from URL scheme >>> to something that clients and people can understand, eg, to >>> ask for everything which is an exon? >>> >>> Does this mapping need a version number - does it change over time? >>> >>> Andrew >>> dalke at dalkescientific.com >>> >>> _______________________________________________ >>> DAS2 mailing list >>> DAS2 at portal.open-bio.org >> >> _______________________________________________ >> DAS2 mailing list >> DAS2 at portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/das2 > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From dalke at dalkescientific.com Tue Feb 7 18:40:56 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 18:40:56 +0000 Subject: [DAS2] category -> capability Message-ID: <98a28be1166142c23be61650f51b66ae@dalkescientific.com> I've made the commit. The element SOURCES/SOURCE/VERSION/CATEGORY is now (in some shallow and some deep sense) back to SOURCES/SOURCE/VERSION/CAPABILITY Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Tue Feb 7 19:00:40 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Tue, 7 Feb 2006 11:00:40 -0800 Subject: [DAS2] Working with xml:base in Java? Message-ID: Thomas, I'm wondering what toolkits you're using for binding XML to Java objects? And particularly how you are dealing with resolving URIs when xml:base is used. So far I've mostly used various implementations of SAX and DOM -- I've found some reports of builtin xml:base support in Xerces SAX/DOM, but it's still unclear. I've been avoiding the issue up till now. It won't be too hard to implement URI resolution relative to xml:base, but I thought I'd check around first and see if there's automated support of this in some toolkit. Thanks, Gregg From dalke at dalkescientific.com Tue Feb 7 19:11:09 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 19:11:09 +0000 Subject: [DAS2] toy - das2 registry In-Reply-To: References: Message-ID: <551a60258c89cd953f35c6a4450a444d@dalkescientific.com> Andreas Prlic wrote: > A "toy" das2 registry serving das1 servers, via das2 responses can > be accessed at > > http://www.spice-3d.org/dasregistry/das2/sources/ > > I will work on adding the first das2 servers tomorrow. There are differences between this and the spec. These are "CATEGORY" -> "CAPABILITIES" Andreas knew that but didn't get it changed before having to head out for a bit. "testcode" should be "test_range" - it was added this afternoon but I changed the name on Andreas. (He agreed to the change.) # this is range string (eg, "Chr1/1:100" or "CloneABC123/500:599") # used in an "inside=" feature query. It is used by the registry # server when doing a heartbeat check. attribute test_range { text }?, The underlying problem is that a web server can be up while the back-end database is down. While a server should report that as an error, sadly that's not always the case. This test_range is used by Andreas registry server in a periodic feature query. It should return a "reasonable" number of features. I decided to make it part of the spec for two reasons: - it simplifies auto-fill-in during registry discovery - clients can also use it to query the server and see if it's really alive or if it really means to return an empty list of features all the time. It is optional. The MAINTAINER "name" was required. Andreas has examples where there is only an email address and wants the name to be optional. So now "name", "email" and "href" are all optional. I would like that one must be provided. Finally, the "taxid" in the COORDINATES is optional. The RNG schema thought it was mandatory. I've updated the schemas and the spec for the last two. Committed. Looks like I'll be spending most of tomorrow updating the rest of the spec document. I got a copy of Andreas' document and edited it to meet the current spec and I've checked it in under "scratch/registry_sources.xml" Feel free to test it out with your parsers. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Feb 7 19:28:49 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 19:28:49 +0000 Subject: [DAS2] format version Message-ID: <4cd0c60fb7871ad6a70ad2b25cb73406@dalkescientific.com> Just committed to the spec. If I'm wrong and the version number proves useful, I'll make it less snarky. :) This document defines several new content-types. These are application/x-das-sources+xml application/x-das-features+xml application/x-das-types+xml application/x-das-segments+xml A server may supply an optional "version" value for the Content-Type, to specify which version of the specification it provides. This is (at present and unless others can convince me otherwise) meant to be used only during this period of specification development while things are in flux. A client can look at the version string and use an appropriate reader to handle it. Example: Content-Type: application/x-das-types+xml; version=1 The list of versions is as follows: 601071920: this version The versions will be increasing integers. The format will be "YMMDDHHMM" where "Y" is the year - 2005. (This makes it a 32 bit integer, in case you were wondering.) There's no way this spec will be in flux in 4 years time. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Tue Feb 7 19:14:15 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 7 Feb 2006 19:14:15 +0000 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com> Message-ID: <7ca6d9841d7c3c334589da147c38de53@dalkescientific.com> >> There is no more 'writeable' (that's, IMO) something to be decided >> as part of the writeback spec. It might be that we have a > i have not made the change if this is an IMO. Okay. There is no "writeable". The writeability is determined by the element. If there is a CAPABILITY with a type == "locks" then the server is (potentially) writeable in the same way that "writeable='yes'" means that it's writeable. Anyone else have an O? Andrew dalke at dalkescientific.com From ed_erwin at affymetrix.com Tue Feb 7 20:46:01 2006 From: ed_erwin at affymetrix.com (Ed Erwin) Date: Tue, 07 Feb 2006 12:46:01 -0800 Subject: [DAS2] Re: Apollo and DAS/2 priorities In-Reply-To: <7ca6d9841d7c3c334589da147c38de53@dalkescientific.com> References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com> <08c9b852196e449cba6b16f99c3c3212@dalkescientific.com> <8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk> <8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com> <17368.18146.195226.166165@kinked.lbl.gov> <17369.24994.880706.685148@kinked.lbl.gov> <24afbfd39f79595678721f1ef75a239e@dalkescientific.com> <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com> <7ca6d9841d7c3c334589da147c38de53@dalkescientific.com> Message-ID: <43E90709.6060602@affymetrix.com> This is something we should discuss when we discuss the 'writeable' parts of the spec. But in my opinion, 'writeable' and 'lockable' are two separate 's. I see no reason not to allow some implementers to develop simple servers that are writeable but don't implement a locking mechanism. Large public servers may want locking, but I'd bet that a non-locking server would very rarely lead to problems, especially in small projects. (If the server is non-locking, the client could add a little more logic to check that nothing has changed since the last retrieval before doing a commit.) Andrew Dalke wrote: >>> There is no more 'writeable' (that's, IMO) something to be decided >>> as part of the writeback spec. It might be that we have a > > >> i have not made the change if this is an IMO. > > > Okay. There is no "writeable". The writeability is determined > by the element. If there is a CAPABILITY with > a type == "locks" then the server is (potentially) writeable > in the same way that "writeable='yes'" means that it's writeable. > > Anyone else have an O? > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From allenday at ucla.edu Tue Feb 7 21:20:53 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 7 Feb 2006 13:20:53 -0800 (PST) Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: Hi Chris, On Tue, 7 Feb 2006, Chris Mungall wrote: > > Hi all > > I'm concerned that the XML in the URL below isn't quite Obo-XML, it's > Allen's modified version of it. In particular, the adding of an "id" > attribute which is redundant with the id element, and the modification of > the ID scheme to use slashes instead of :s. > > I believe the latter may have been to make the ID scheme more DAS-y? The slash was introduced to take advantage of xml:base and the hierarchical relationship between namespaces and terms, e.g. xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001" is equivalent to: /das/ontology/obo/1/ontology/SO/0000001 If we want the identifier to be SO:0000001, it means that we have to make xml:base="/das/ontology/obo/1/ontology/SO. This is problematic for two reasons: 1) multiple xml:base cannot be defined for the entire document, meaning that URIs for other records referenced become very long. 2) different ontologies cannot use the same xml:base The only way I see out of this ATM is to treat : as a / internal to the Ontology-DAS service. > OBO IDs are composed of a prefix and a local ID. These are always joined > with a :. The prefix can be specified as shortform (eg GO) or a URI > prefix. When the long form is combined with the local ID you get your URI. > > If DAS wants to use a modified version of Obo-XML, that's fine, but please > don't call it Obo-XML, it will cause huge confusion! > > I would much prefer if you used Obo-XML as it is - if there are things > you'd like to see changed about the format we can perhaps work that out. > I'm concerned by the changing the ID to use / instead of :. This is wrong, > and if it's something that's required for DAS, how will you interoperate > with RDF etc? > > In fact there are other parts where the xml is definitely not Obo-XML - it > looks like Allen has coded these by hand rather than taking existing XML. > That's fine, but it should be marked as such. For example, there is no > develops_from element in Obo-XML; all relations bar is_a are encoded as > relationship elements. The XML provided by the Ontology-DAS server is using templates to mark up ontology records that have been loaded to a chado database using perl-go-perl. The develops_from node, IIRC, was created because there is a section in a perl-go-perl .xslt that creates elements for all relationship types. > > There is a DTD at the moment > http://www.godatabase.org/dev/xml/dtd This didn't exist at the time I wrote my templates ( 4-6 months ago), or I would have validated. -Allen > > The docs are minimal as the explanation of all the fields is in the docs > for the obo text file format > http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf} > > We'll be converting to RNG+XSD soon > > You can get Obo-XML examples from > http://www.fruitfly.org/~cjm/obo-download > > You can see the default rule for creating a URI in the OWL files; these > currently all get the geneontology.org URI prefix by default, but this > will change (we were going to use LSIDs but the majority of OWL tools > don't seem to handle URNs very well) > > As far as DAS/2 supporting different file formats, Obo-XML and RDFS/OWL > would seem to be the natural contenders. We currently go from the former > to the latter via a simple XSLT, the reverse transformation is a little > more difficult. > > Allen has inlined some comments from an email exchange with me in the > document. I agree about keeping the API minimal. On the other hand you > will need at least some inferencing machinery - I'd encourage you to reuse > existing reasoning services here. > > Cheers > Chris > > On Tue, 7 Feb 2006, Helt,Gregg wrote: > > > I talked to Suzi, she's planning to join our teleconference today to > > discuss ontologies, wearing her hat as co-PI of the National Center for > > Biomedical Ontology. Hopefully Lincoln can join too. > > > > I took a closer look at the DAS/2 ontology work Allen has done (see > > http://biodas.org/documents/das2/das2_ontology.html). I urge anyone who > > wants to contribute to the ontology discussion to read this doc. It > > specifies a way to retrieve ontologies in OBOXML format. In this format > > each ontology term gets an absolute URI through the same mechanism that > > the rest of DAS/2 uses (URIs for ids, which can be either absolute or > > relative but resolvable). As Allen pointed out yesterday this would > > solve our problem of how to uniquely specify ontology terms in the DAS/2 > > TYPES XML. > > > > I couldn't find any documentation for the OBOXML format, other than the > > code that generates it from OBO files. But I'm using OBOXML as an > > example here because it clearly has resolvable URIs for each ontology > > term. In Allen's spec, ontologies can also be returned in other > > formats, but it's unclear to me whether terms in these other formats > > would resolve to similar URIs. > > > > gregg > > > > > -----Original Message----- > > > From: das2-bounces at portal.open-bio.org > > [mailto:das2-bounces at portal.open- > > > bio.org] On Behalf Of Andrew Dalke > > > Sent: Tuesday, February 07, 2006 1:32 AM > > > To: DAS/2 > > > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > > > sprint,6 Feb 2006 > > > > > > > gh: would like a re-cast as xml document, hosted at so/sofa > > > > website. that xml would be like a std ontology representation so you > > > > could extend it. so someone could point to an extension of it. > > > > > > I asked as an action item if Gregg would look into the solution > > > for this. Do we refer to the ontology by a "GO:0123456" identifier > > > or by some URL scheme? If so, what's the mapping from URL scheme > > > to something that clients and people can understand, eg, to > > > ask for everything which is an exon? > > > > > > Does this mapping need a version number - does it change over time? > > > > > > Andrew > > > dalke at dalkescientific.com > > > > > > _______________________________________________ > > > DAS2 mailing list > > > DAS2 at portal.open-bio.org > > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 > From cjm at fruitfly.org Tue Feb 7 21:59:12 2006 From: cjm at fruitfly.org (chris mungall) Date: Tue, 7 Feb 2006 13:59:12 -0800 Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: On Feb 7, 2006, at 1:20 PM, Allen Day wrote: > Hi Chris, > > On Tue, 7 Feb 2006, Chris Mungall wrote: > >> >> Hi all >> >> I'm concerned that the XML in the URL below isn't quite Obo-XML, it's >> Allen's modified version of it. In particular, the adding of an "id" >> attribute which is redundant with the id element, and the >> modification of >> the ID scheme to use slashes instead of :s. >> >> I believe the latter may have been to make the ID scheme more DAS-y? > > The slash was introduced to take advantage of xml:base and the > hierarchical relationship between namespaces and terms, e.g. > > xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001" > > is equivalent to: > > /das/ontology/obo/1/ontology/SO/0000001 it's actually equivalent to: /das/ontology/obo/1/ontologySO/0000001 > If we want the identifier to be SO:0000001, it means that we have to > make > xml:base="/das/ontology/obo/1/ontology/SO. This is problematic for two > reasons: > > 1) multiple xml:base cannot be defined for the entire document, > meaning > that URIs for other records referenced become very long. Why not just define a qname for every idspace? This is the standard way of doing this in XML Using xml:base is not a big gain for brevity, since fairly soon some obo ontologies will reference other obo ontologies. In fact is this even as issue if you get rid of the id attribute to conform to obo-xml? ids in obo-xml are encoded as elements, so xml:base rules are not applied. Obo has it's own rules for ID generation. This has the arguable disadvantage that we can't directly use xml:base and the whole xml namespace system for OBO IDs, we layer our own system on top. This is actually preferable for us. > 2) different ontologies cannot use the same xml:base > > The only way I see out of this ATM is to treat : as a / internal to the > Ontology-DAS service. I'm still not sure what the problem is, and I think you may be stuck anyway when it comes to RDF/OWL ontologies > >> OBO IDs are composed of a prefix and a local ID. These are always >> joined >> with a :. The prefix can be specified as shortform (eg GO) or a URI >> prefix. When the long form is combined with the local ID you get your >> URI. >> >> If DAS wants to use a modified version of Obo-XML, that's fine, but >> please >> don't call it Obo-XML, it will cause huge confusion! >> >> I would much prefer if you used Obo-XML as it is - if there are things >> you'd like to see changed about the format we can perhaps work that >> out. >> I'm concerned by the changing the ID to use / instead of :. This is >> wrong, >> and if it's something that's required for DAS, how will you >> interoperate >> with RDF etc? >> >> In fact there are other parts where the xml is definitely not Obo-XML >> - it >> looks like Allen has coded these by hand rather than taking existing >> XML. >> That's fine, but it should be marked as such. For example, there is no >> develops_from element in Obo-XML; all relations bar is_a are encoded >> as >> relationship elements. > > The XML provided by the Ontology-DAS server is using templates to mark > up > ontology records that have been loaded to a chado database using > perl-go-perl. The develops_from node, IIRC, was created because there > is > a section in a perl-go-perl .xslt that creates elements for all > relationship types. hmmm, I don't think so, but the point is moot anyway, just so long as the final version uses xml that validates, either against obo-xml or your own documented variant > >> >> There is a DTD at the moment >> http://www.godatabase.org/dev/xml/dtd > > This didn't exist at the time I wrote my templates ( 4-6 months ago), > or I > would have validated. it did, it's just not well signposted! sorry about that look forward to seeing a demo. I do this you have to work out the semantics of retrieval by ontology term though. cheers chris > > -Allen > > > >> >> The docs are minimal as the explanation of all the fields is in the >> docs >> for the obo text file format >> http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf} >> >> We'll be converting to RNG+XSD soon >> >> You can get Obo-XML examples from >> http://www.fruitfly.org/~cjm/obo-download >> >> You can see the default rule for creating a URI in the OWL files; >> these >> currently all get the geneontology.org URI prefix by default, but this >> will change (we were going to use LSIDs but the majority of OWL tools >> don't seem to handle URNs very well) >> >> As far as DAS/2 supporting different file formats, Obo-XML and >> RDFS/OWL >> would seem to be the natural contenders. We currently go from the >> former >> to the latter via a simple XSLT, the reverse transformation is a >> little >> more difficult. >> >> Allen has inlined some comments from an email exchange with me in the >> document. I agree about keeping the API minimal. On the other hand you >> will need at least some inferencing machinery - I'd encourage you to >> reuse >> existing reasoning services here. >> >> Cheers >> Chris >> >> On Tue, 7 Feb 2006, Helt,Gregg wrote: >> >>> I talked to Suzi, she's planning to join our teleconference today to >>> discuss ontologies, wearing her hat as co-PI of the National Center >>> for >>> Biomedical Ontology. Hopefully Lincoln can join too. >>> >>> I took a closer look at the DAS/2 ontology work Allen has done (see >>> http://biodas.org/documents/das2/das2_ontology.html). I urge anyone >>> who >>> wants to contribute to the ontology discussion to read this doc. It >>> specifies a way to retrieve ontologies in OBOXML format. In this >>> format >>> each ontology term gets an absolute URI through the same mechanism >>> that >>> the rest of DAS/2 uses (URIs for ids, which can be either absolute or >>> relative but resolvable). As Allen pointed out yesterday this would >>> solve our problem of how to uniquely specify ontology terms in the >>> DAS/2 >>> TYPES XML. >>> >>> I couldn't find any documentation for the OBOXML format, other than >>> the >>> code that generates it from OBO files. But I'm using OBOXML as an >>> example here because it clearly has resolvable URIs for each ontology >>> term. In Allen's spec, ontologies can also be returned in other >>> formats, but it's unclear to me whether terms in these other formats >>> would resolve to similar URIs. >>> >>> gregg >>> >>>> -----Original Message----- >>>> From: das2-bounces at portal.open-bio.org >>> [mailto:das2-bounces at portal.open- >>>> bio.org] On Behalf Of Andrew Dalke >>>> Sent: Tuesday, February 07, 2006 1:32 AM >>>> To: DAS/2 >>>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code >>>> sprint,6 Feb 2006 >>>> >>>>> gh: would like a re-cast as xml document, hosted at so/sofa >>>>> website. that xml would be like a std ontology representation so >>>>> you >>>>> could extend it. so someone could point to an extension of it. >>>> >>>> I asked as an action item if Gregg would look into the solution >>>> for this. Do we refer to the ontology by a "GO:0123456" identifier >>>> or by some URL scheme? If so, what's the mapping from URL scheme >>>> to something that clients and people can understand, eg, to >>>> ask for everything which is an exon? >>>> >>>> Does this mapping need a version number - does it change over time? >>>> >>>> Andrew >>>> dalke at dalkescientific.com >>>> >>>> _______________________________________________ >>>> DAS2 mailing list >>>> DAS2 at portal.open-bio.org >>> >>> >>> _______________________________________________ >>> DAS2 mailing list >>> DAS2 at portal.open-bio.org >>> http://portal.open-bio.org/mailman/listinfo/das2 >>> >> >> >> _______________________________________________ >> DAS2 mailing list >> DAS2 at portal.open-bio.org >> http://portal.open-bio.org/mailman/listinfo/das2 >> From Steve_Chervitz at affymetrix.com Wed Feb 8 00:30:52 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Tue, 07 Feb 2006 16:30:52 -0800 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 7 Feb 2006 Message-ID: Notes from the DAS/2 teleconference for the code sprint, 7 Feb 2006 $Id: das2-teleconf-2006-02-07.txt,v 1.1 2006/02/08 00:37:41 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt Sanger: Andreas Prlic, Thomas Down Sweden: Andrew Dalke UC Berkeley: Nomi Harris, Suzi Lewis UCLA: Allen Day Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: * Vote on constructing URLs/URIs to query segments, types, features * Status report from people * Ontologies * Feat property changes Topic: Constructing URLS/URIs to query segments, types, features ---------------------------------------------------------------- 1.) specified by query_id 2.) hardwired to ~/segments, ~/types, ~/features 3.) ? ad: lots of people have left here so the vote won't include all. see email why a query url is useful agree w/ gregg: short names could be a nice to have. shouldn't have to worry about how you organize your urls gh: yes it does: this/types this/segments etc. ad: can take it out if there's confusion gh: recommended structure is good. ee/gh: people will look at the examples and do it that way. they won't look at .rnc file gh: make it clearer in the spec that these are merely suggestions of the hierarchy, you don't have to do it this way. ad: roy's view: likes the query id url for doing search for all featues, or all types. query id is the url used to do search against features. uri could be relative or absolute. gh: category element defines a query id for a subset of das. it's the attribute query id in the category ad: I also want to rename category back to capability. how do we arrange urls in a versioned source. construction off of strings or via attributes in a url gh: votes for hardwired, but feels less strong today about it. ad: majority vote is for query id, spec czar goes with that. [A] query id [A] andrew will update spec to have less mention of hierarchical structure [A] allen will update server to do it the recommended way gh: in addition to have an arbitrary query id to get segments, types, features, there's a recommended way to do it via the hierarchy. server should do it the recommended way (hierarchy) ee: we should be flexible about it. gh/ad: ok take out recommendation. Topic: Status reports --------------------- ad: see his emails. gh: we need examples in spec document and scratch to be better synchronized. ad: should be, i've been trying to keep these in sync. gh: plan to push into html, incorporate scratch into doc? ad: yes, eventually. will also add andreas' work to scratch too. td: java xml binding libraries, how to put it into a workable server ap: das registry, sources command, attribute handling, people can connect to a toy server publically available. gh: registry will respond? ap: yes. toy server, toy data like das1, returning sources command. gh: can you add allen's codesprint server? consider it registered. ap: is fully working? gh: can allen send a command to it to register it? ap: no. gh: would like to tell my client to do discovery rather than hard wiring. gh: comits to igb das/2 client to handle seq, segment, types. not features query yet. given decision about url construction, can do this fast so we can test on codesprint server seq, seg, types to bring up something meaningful in gui. not features by today. affy das/2 server is running behind. will sync up today as well. nh: apollo working out sequence, segment, types request. now does versioned sources. integrating those into query gui as well. aday: changes early this am. server running under /codesprint is now a static doc pointing back to the old server. adding segment command, merging region and seq command. has made everything except capabilities writeback stuff. ad: there's another request recently, see my email. aday: have gotten 40 emails from you in the last day! aday: brian oconnor is working on bundling dependencies for an rpm based release. gh: I also did significant refactoring/moving assay/ontology stuff into subclasses on client side. haven't seen brian's code, but should run fine. Topic: Integrating Sequence Ontology with DAS/2 ----------------------------------------------- suzi: national center for biomedical ontology, one of 7 natl centers for biomedical computing. focus on needs regarding developing and using ontologies. gh: hoping to have a typing system in das/2 via types queries that references SO but doesn't require client to fully understand ontologies. too much of a burden. that's the challenge. this translates into referring to ontology terms as opaque uris suzi: 'understands' means they're ignoring any relationships between types. gh: yes. currently type has attrib for id, attrib for ontology. ad: uri or arbitrary string suzi: can use uri or string, preprocessed ad: one or the other gh: prefers uri suzi: from uri you can get the string gh: not clear how to construct uri for particular terms in an ontology doc suzi: this will happen in next few months. talking with daniel rubin about this. gh: this is where allen comes in. ontology das. aday: next step is getting it hosted on NCBO server. currently communicating with chris mungall. said they're planning on implementing something similar soon, not sure if they'd accept allen's solution. unclear. working with gavin sherlock on ontology support for microarry samples, tissue type, phenotype. was hoping people could pick this up and use it. suzi: gavin and I could help push this. gh: chris m posted concerns about obo xml that's in allen's scheme isn't same as what he's using. re: how you make absolution uris. aday: there's not much docs on obo xml format. did the best I could. suzi: should be able to sort it out. just an inertia problem of getting it installed. not a competition issue. fine with me. not difficult? aday: by end of week we'll have an rpm. suzi: let's keep pushing on this to make it happen. I'll talk to gavin tomorrow. can we install on sf site, or do we need to set it up elsewhere? aday: could conceivably set up a cgi on sf. uses custom apache handler tho. gh: more ontology q's can wait till tomorrow w/ lincoln. concern: how do we deal w/ types that represent more than one ontology terms. defer discussion till tomorrow. Topic: Feature Properties ------------------------- See andrew's post today. ad: this ties into ontologies. two ontology related issues: two different ways to query. ontology of a feature, and two diff ways to search a db for that property: exactly equal, or a subtype. this is a property with two diff searches you may want to do on it. properties like note, alias, phase have ability to search key/val properties, e.g., att:alias=something. score is a floating point number you may want to support > or < on it. regular exp searches, identical, etc. td says use xml query language, but worried about complexity of this. 99% of time this is way more that you need. scenario: given 4 different notes in a feature, is order important? extensions: curation point gives curator's name and time stamp. e.g., search for all featues modified by andrew in 2004. discussion: pull this into a note element, perhaps phase and alias too. property table only supports a substring search. give me an author name, e.g. not saying getting rid of tag values. server supporting new data types, extensions, feat search w/ sanger curation elements for query. or thomas xml search. this is why I want to move categories back to capabilities. gh: more appropriate as capabilities than header. ad: someone can get a document. andreas can combining many servers into one, say: which one supports which. to summarize: - properties are simple strings - only substring searches - change att: to prop: - note and alias and phase are elements - advertise that a server has extension to das query lang gh: what about phase? lincoln needs it. ad: if it's something that people will be editing, make it a element. gh: phase is inappropriate for certain types. would like formal way when it should be there or not. ad: this is formalizing a way for server to tell client that there are more types of searches available. can't see how to do it automatically: eg for a given score, knowing what is considered significant (low or high, e.g.). td: if he needs a phase he re-infers it. doesn't work for partial CDS tho. gh: how much spec churn will this generate? ad: [various things, half a dozen or so, some simplifying] gh: does a colon in a query string need to be escaped? if so, this makes it hard to read. ad: could use prop_ rather than prop: thomas and I had long discussion about this. [A] andrew will incorporate these changes into feature properties Topic: Maintainer information ----------------------------- ad: modified examples under scratch gh: maintainer at source or version level ad: one for all sources level ap: at sanger we have one central server with lots of sources. notes who's responsible for which server. gh: ownership cascades down to sub elements? ad: yes Topic: XML Base --------------- gh: can be in any element. as well as xml:lang, don't really understand. ad: it's what the atom spec does, so we copied. maybe for bidirectional languages. gh: flexible uri resolution scheme w/ xml base. implementation in java tools is spotty for xml:base. curious about java obj binding of xml what support they have for resolving xml base. at this point will have to roll it myself. want to ask thomas about this. ap: he's using Stacks parser, gets global namespace. gh: bigger concern for when we have to use sax, need to do xml:base resolution, eg. when we need to retrieve lots of features. ad: it can be done with sax. gh: not hard, but it is a multistep process. ad: multiple levels of xml:base ad: tomorrow's agenda: go through roy's otter stuff, convert into new das format. to get a feel for how data will look. see roy's email. to use experience gathered from otter to make sure we're sufficiently covering features. gh: talking about writeback? ad: premature. let's talk style sheets wed, and writeback thursday. plus anything else that's come up about the spec. want to know how style sheets will look. lincoln should be able to help out there. From nomi at fruitfly.org Wed Feb 8 03:27:13 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Tue, 7 Feb 2006 19:27:13 -0800 (PST) Subject: [DAS2] We need DAS/2 progress reports for the grant! Message-ID: <17385.25873.660275.790249@kinked.lbl.gov> Dear DAS/2 developers, I am writing this on behalf of Gregg and the DAS/2 team. This is so important I'm actually using capital letters. As you know, we have submitted a request for renewing the DAS/2 grant. Our chances of having this renewal approved are iffy, especially since we are asking for more money than in the original grant and NIH's budget is very tight right now. The reviewers are about to read our grant proposal and decide whether to fund it, and we need to send them a supplementary progress report about what we've accomplished since we submitted the grant in November. Describing how much progress we've made towards implementing the DAS/2 protocol in both servers and clients will help make our case that we deserve more funding to continue this important research. Gregg has been trying for weeks to find out when this progress report was due (we had figured we had until the end of February). Today he *finally* got through to our scientific review administrator, who said that we have to send it to them no later than THIS THURSDAY. Obviously, this is very short notice, so we are asking all of you to very quickly put together a paragraph (no more!) describing your progress between Nov 1 and the of the end of this week (i.e., you can project to what you expect to have completed by Friday). If you need context, I have attached a copy of the grant; I will also send some of you individual notes about what we need from you. Please send us (the DAS2 mailing list, or, if you're feeling shy, just me and Gregg) your paragraph in PLAIN TEXT so that I can more easily assimilate them into a single document. We plan to work on incorporating your reports into our progress report tomorrow (Wed), send out a draft tomorrow night (our time) for you to review, and incorporate any suggestions into our final version that we'll send off on Thursday. Sorry for the short notice, and thanks in advance for your help. Nomi and Gregg -------------- next part -------------- A non-text attachment was scrubbed... Name: DAS2_renewal_grant_final2l.doc Type: application/octet-stream Size: 453632 bytes Desc: DAS2 renewal grant proposal URL: From allenday at ucla.edu Wed Feb 8 03:14:49 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 7 Feb 2006 19:14:49 -0800 (PST) Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: Chris, Why have you chosen to make a subelement of ? Is it expected that there will be multiple IDs for a given term, and if so is there not a primary ID? having an id attribute is a defacto standard for DOM libs, so you can call getElementById(). -Allen On Tue, 7 Feb 2006, chris mungall wrote: > > On Feb 7, 2006, at 1:20 PM, Allen Day wrote: > > > Hi Chris, > > > > On Tue, 7 Feb 2006, Chris Mungall wrote: > > > >> > >> Hi all > >> > >> I'm concerned that the XML in the URL below isn't quite Obo-XML, it's > >> Allen's modified version of it. In particular, the adding of an "id" > >> attribute which is redundant with the id element, and the > >> modification of > >> the ID scheme to use slashes instead of :s. > >> > >> I believe the latter may have been to make the ID scheme more DAS-y? > > > > The slash was introduced to take advantage of xml:base and the > > hierarchical relationship between namespaces and terms, e.g. > > > > xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001" > > > > is equivalent to: > > > > /das/ontology/obo/1/ontology/SO/0000001 > > it's actually equivalent to: > /das/ontology/obo/1/ontologySO/0000001 > > > If we want the identifier to be SO:0000001, it means that we have to > > make > > xml:base="/das/ontology/obo/1/ontology/SO. This is problematic for two > > reasons: > > > > 1) multiple xml:base cannot be defined for the entire document, > > meaning > > that URIs for other records referenced become very long. > > Why not just define a qname for every idspace? This is the standard way > of doing this in XML > > Using xml:base is not a big gain for brevity, since fairly soon some > obo ontologies will reference other obo ontologies. > > In fact is this even as issue if you get rid of the id attribute to > conform to obo-xml? ids in obo-xml are encoded as elements, so xml:base > rules are not applied. Obo has it's own rules for ID generation. This > has the arguable disadvantage that we can't directly use xml:base and > the whole xml namespace system for OBO IDs, we layer our own system on > top. This is actually preferable for us. > > > 2) different ontologies cannot use the same xml:base > > > > The only way I see out of this ATM is to treat : as a / internal to the > > Ontology-DAS service. > > I'm still not sure what the problem is, and I think you may be stuck > anyway when it comes to RDF/OWL ontologies > > > > >> OBO IDs are composed of a prefix and a local ID. These are always > >> joined > >> with a :. The prefix can be specified as shortform (eg GO) or a URI > >> prefix. When the long form is combined with the local ID you get your > >> URI. > >> > >> If DAS wants to use a modified version of Obo-XML, that's fine, but > >> please > >> don't call it Obo-XML, it will cause huge confusion! > >> > >> I would much prefer if you used Obo-XML as it is - if there are things > >> you'd like to see changed about the format we can perhaps work that > >> out. > >> I'm concerned by the changing the ID to use / instead of :. This is > >> wrong, > >> and if it's something that's required for DAS, how will you > >> interoperate > >> with RDF etc? > >> > >> In fact there are other parts where the xml is definitely not Obo-XML > >> - it > >> looks like Allen has coded these by hand rather than taking existing > >> XML. > >> That's fine, but it should be marked as such. For example, there is no > >> develops_from element in Obo-XML; all relations bar is_a are encoded > >> as > >> relationship elements. > > > > The XML provided by the Ontology-DAS server is using templates to mark > > up > > ontology records that have been loaded to a chado database using > > perl-go-perl. The develops_from node, IIRC, was created because there > > is > > a section in a perl-go-perl .xslt that creates elements for all > > relationship types. > > hmmm, I don't think so, but the point is moot anyway, just so long as > the final version uses xml that validates, either against obo-xml or > your own documented variant > > > > >> > >> There is a DTD at the moment > >> http://www.godatabase.org/dev/xml/dtd > > > > This didn't exist at the time I wrote my templates ( 4-6 months ago), > > or I > > would have validated. > > it did, it's just not well signposted! sorry about that > > look forward to seeing a demo. I do this you have to work out the > semantics of retrieval by ontology term though. > > cheers > chris > > > > > -Allen > > > > > > > >> > >> The docs are minimal as the explanation of all the fields is in the > >> docs > >> for the obo text file format > >> http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf} > >> > >> We'll be converting to RNG+XSD soon > >> > >> You can get Obo-XML examples from > >> http://www.fruitfly.org/~cjm/obo-download > >> > >> You can see the default rule for creating a URI in the OWL files; > >> these > >> currently all get the geneontology.org URI prefix by default, but this > >> will change (we were going to use LSIDs but the majority of OWL tools > >> don't seem to handle URNs very well) > >> > >> As far as DAS/2 supporting different file formats, Obo-XML and > >> RDFS/OWL > >> would seem to be the natural contenders. We currently go from the > >> former > >> to the latter via a simple XSLT, the reverse transformation is a > >> little > >> more difficult. > >> > >> Allen has inlined some comments from an email exchange with me in the > >> document. I agree about keeping the API minimal. On the other hand you > >> will need at least some inferencing machinery - I'd encourage you to > >> reuse > >> existing reasoning services here. > >> > >> Cheers > >> Chris > >> > >> On Tue, 7 Feb 2006, Helt,Gregg wrote: > >> > >>> I talked to Suzi, she's planning to join our teleconference today to > >>> discuss ontologies, wearing her hat as co-PI of the National Center > >>> for > >>> Biomedical Ontology. Hopefully Lincoln can join too. > >>> > >>> I took a closer look at the DAS/2 ontology work Allen has done (see > >>> http://biodas.org/documents/das2/das2_ontology.html). I urge anyone > >>> who > >>> wants to contribute to the ontology discussion to read this doc. It > >>> specifies a way to retrieve ontologies in OBOXML format. In this > >>> format > >>> each ontology term gets an absolute URI through the same mechanism > >>> that > >>> the rest of DAS/2 uses (URIs for ids, which can be either absolute or > >>> relative but resolvable). As Allen pointed out yesterday this would > >>> solve our problem of how to uniquely specify ontology terms in the > >>> DAS/2 > >>> TYPES XML. > >>> > >>> I couldn't find any documentation for the OBOXML format, other than > >>> the > >>> code that generates it from OBO files. But I'm using OBOXML as an > >>> example here because it clearly has resolvable URIs for each ontology > >>> term. In Allen's spec, ontologies can also be returned in other > >>> formats, but it's unclear to me whether terms in these other formats > >>> would resolve to similar URIs. > >>> > >>> gregg > >>> > >>>> -----Original Message----- > >>>> From: das2-bounces at portal.open-bio.org > >>> [mailto:das2-bounces at portal.open- > >>>> bio.org] On Behalf Of Andrew Dalke > >>>> Sent: Tuesday, February 07, 2006 1:32 AM > >>>> To: DAS/2 > >>>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code > >>>> sprint,6 Feb 2006 > >>>> > >>>>> gh: would like a re-cast as xml document, hosted at so/sofa > >>>>> website. that xml would be like a std ontology representation so > >>>>> you > >>>>> could extend it. so someone could point to an extension of it. > >>>> > >>>> I asked as an action item if Gregg would look into the solution > >>>> for this. Do we refer to the ontology by a "GO:0123456" identifier > >>>> or by some URL scheme? If so, what's the mapping from URL scheme > >>>> to something that clients and people can understand, eg, to > >>>> ask for everything which is an exon? > >>>> > >>>> Does this mapping need a version number - does it change over time? > >>>> > >>>> Andrew > >>>> dalke at dalkescientific.com > >>>> > >>>> _______________________________________________ > >>>> DAS2 mailing list > >>>> DAS2 at portal.open-bio.org > >>> > >>> > >>> _______________________________________________ > >>> DAS2 mailing list > >>> DAS2 at portal.open-bio.org > >>> http://portal.open-bio.org/mailman/listinfo/das2 > >>> > >> > >> > >> _______________________________________________ > >> DAS2 mailing list > >> DAS2 at portal.open-bio.org > >> http://portal.open-bio.org/mailman/listinfo/das2 > >> > From allenday at ucla.edu Wed Feb 8 03:57:05 2006 From: allenday at ucla.edu (Allen Day) Date: Tue, 7 Feb 2006 19:57:05 -0800 (PST) Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: Hi Chris, > Why not just define a qname for every idspace? This is the standard way > of doing this in XML Can you give a concrete example of this? a search for "qname idspace" returns a single godatabase.org result. Anyway, I have stripped out the id= attributes from the and elements. You can see valid (by your DTD) obo xml produced from the das server here: Entire SO: http://das.biopackages.net/das/ontology/obo/1/ontology/SO?format=legacy1 SO "exon" record: http://das.biopackages.net/das/ontology/obo/1/ontology/SO/0000147?format=legacy1 -Allen From Gregg_Helt at affymetrix.com Wed Feb 8 08:36:01 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 8 Feb 2006 00:36:01 -0800 Subject: [DAS2] Working with xml:base in Java? Message-ID: I've been mucking around trying to find an answer to my own question about ways to easily handle xml:base in Java. And I think the answer if I want to continue to use DOM ends up being "code it yourself". But it took a while to get to that answer. I'm writing down these notes so I can refer back to them next time if the issues I encountered come up again. But I figured I might as well post in case other DAS/2 implementers have similar problems. So the standard Java 1.5 distribution includes the org.xml.dom.Node interface, which conveniently enough has a getBaseURI() method that should do exactly what I want -- for any node in an XML document, give me the resolved base URI for that node (regardless of how complex a combination of xml:base attributes are used in the path to that node). Which I can then combine with whatever id attribute I'm interested in (via Java networking classes) to get the full URI. But I need to guarantee compatibility with Java 1.4, so I can't rely on 1.5. Java 1.4 has a previous version of org.xml.dom.Node, but with no getBaseURI() method. Turns out this is because the 1.5 Node interface complies with DOM-level3 spec (includes XML Base support) but the 1.4 Node interface only supports DOM-level2 spec (no XML Base support). Okay, but I can download the Xerces2 distribution, which is a Java library that also has a full implementation of DOM-level3. So I get that set up, add some calls to node.getBaseURI() to my code, and it compiles fine. But when I run the program I get an ugly java.lang.NoSuchMethodError. I dig around on the web and find the problem is a class/package namespace collision -- both Xerces2 and the builtin java libraries have a class named org.xml.dom.Node, but of course they're different. And replacing built-in java classes is not normally allowed, so when the program is actually run and classes are loaded the builtin Node class wins (the one w/o the getBaseURI() method). It would have been nice if they mentioned this in the JDK Compatibility section of the Xerces2 FAQ... But there is some discussion of solutions to this problem on the Xerces mailing list. There is actually a way to replace builtin java packages via an "Endorsed Standards Override Mechanism", if they're on the list of endorsed standards, which org.w3c.dom is. This involves putting the replacement package in an endorsed directory and setting a system property to direct the JVM to look there for replacement packages. But... whatever solution I use has to work with Java WebStart. I can't find _any_ info on whether the package override mechanism works with WebStart. And even if it does work for some WebStart implementations, I'd be wary of assuming it works for others -- it seems like one of those things IT folks on the user end might get concerned about. I've also found other solutions to the package name clash, but none that seems compatible with WebStart. So it looks like, considering my other constraints, if I want to stick with DOM I'll need to code xml:base handling myself. Looking at the source code for Xerces2, doesn't look too hard. Except... damn, the getBaseURI() method implementation is actually commented in the Xerces code as "Experimental". Looking closer... um, I think it actually doesn't implement the spec correctly. Grr... To summarize, when it's time for my status report tomorrow, I think it's best if I just remain silent. gregg P.S. I suspect the answer for SAX will be similar. P.P.S. XOM (http://www.xom.nu/) is starting to look pretty good, but I may just be hallucinating at this point... > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Helt,Gregg > Sent: Tuesday, February 07, 2006 11:01 AM > To: Thomas Down > Cc: DAS/2 > Subject: [DAS2] Working with xml:base in Java? > > > Thomas, I'm wondering what toolkits you're using for binding XML > to Java objects? And particularly how you are dealing with resolving > URIs when xml:base is used. So far I've mostly used various > implementations of SAX and DOM -- I've found some reports of builtin > xml:base support in Xerces SAX/DOM, but it's still unclear. > > I've been avoiding the issue up till now. It won't be too hard > to implement URI resolution relative to xml:base, but I thought I'd > check around first and see if there's automated support of this in some > toolkit. > > Thanks, > Gregg > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From td2 at sanger.ac.uk Wed Feb 8 08:44:38 2006 From: td2 at sanger.ac.uk (Thomas Down) Date: Wed, 8 Feb 2006 08:44:38 +0000 Subject: [DAS2] Re: Working with xml:base in Java? In-Reply-To: References: Message-ID: <70790A43-AA5F-4F4A-8F20-50CDE30C7BB3@sanger.ac.uk> On 7 Feb 2006, at 19:00, Helt,Gregg wrote: > > Thomas, I'm wondering what toolkits you're using for binding XML > to Java objects? And particularly how you are dealing with resolving > URIs when xml:base is used. So far I've mostly used various > implementations of SAX and DOM -- I've found some reports of builtin > xml:base support in Xerces SAX/DOM, but it's still unclear. > > I've been avoiding the issue up till now. It won't be too hard > to implement URI resolution relative to xml:base, but I thought I'd > check around first and see if there's automated support of this in > some > toolkit. Hi Greg, I'm actually using Stax (the streaming API for XML). The implementation I use is called Woodstox: http://woodstox.codehaus.org/ (but there are a few others out there). No builtin xml:base support but it's easy to write a little wrapper around XMLStreamReader to spot xml:base attributes and maintain a stack of base URIs. I'm using java.net.URI to do the URI handling/resolution/ relativization. Seems to be working okay... so far... Thomas. From Gregg_Helt at affymetrix.com Wed Feb 8 10:12:22 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 8 Feb 2006 02:12:22 -0800 Subject: [DAS2] RE: Working with xml:base in Java? Message-ID: > -----Original Message----- > From: Thomas Down [mailto:td2 at sanger.ac.uk] > Sent: Wednesday, February 08, 2006 12:45 AM > To: Helt,Gregg > Cc: DAS/2 > Subject: Re: Working with xml:base in Java? > > > On 7 Feb 2006, at 19:00, Helt,Gregg wrote: > > > > > Thomas, I'm wondering what toolkits you're using for binding XML > > to Java objects? And particularly how you are dealing with resolving > > URIs when xml:base is used. So far I've mostly used various > > implementations of SAX and DOM -- I've found some reports of builtin > > xml:base support in Xerces SAX/DOM, but it's still unclear. > > > > I've been avoiding the issue up till now. It won't be too hard > > to implement URI resolution relative to xml:base, but I thought I'd > > check around first and see if there's automated support of this in > > some > > toolkit. > > Hi Greg, > > I'm actually using Stax (the streaming API for XML). The > implementation I use is called Woodstox: > > http://woodstox.codehaus.org/ I would like to check out Stax, haven't used it before. > (but there are a few others out there). No builtin xml:base support > but it's easy to write a little wrapper around XMLStreamReader to > spot xml:base attributes and maintain a stack of base URIs. > > I'm using java.net.URI to do the URI handling/resolution/ > relativization. Seems to be working okay... so far... That's what I was thinking about when I said it wouldn't be too hard to implement... But that was yesterday. A long time ago. Now I've taken a detour into re-reading the XML Base spec http://www.w3.org/TR/xmlbase/, and things don't seem so easy. I _think_ if there's at least one xml:base attribute in the element hierarchy above where you're trying to determine a base URI, and resolution of those xml:base attributes yields an absolute URI, it's all good, that's the base URI. But on the other hand if this resolution yields a relative URI instead of an absolute URI I'm not sure what happens -- I would guess it's an error, but I can't see anywhere in the XML Base spec that spells this out. And if there's no xml:base to use to determine a base URI, things get weird: if the document is "encapsulated within another entity", the base URI is the URI of that entity (I have no idea if DAS/2 docs could appear in such a context) otherwise the base URI is the URI used to retrieve the document oh, except if you burrow down into the spec pointers to RFC 2396 http://www.ietf.org/rfc/rfc2396.txt, if the request gets redirected you need to make sure the base URI is the last URI used in the redirect oh yeah, and apparently external entity declarations can affect all of this in ways I don't understand and there's probably other gotchas I've missed... Now from the server side, none of this is really an issue. Just pick from a multitude of variants that XML Base allows when you send responses to the client. From the client side, if we really want DAS/2 to support XML Base (and I think we do), things get tricky. It's definitely pushing me towards using libraries that provide builtin support for XML Base. Gregg From dhoworth at mrc-lmb.cam.ac.uk Wed Feb 8 11:54:54 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Wed, 08 Feb 2006 11:54:54 +0000 Subject: [DAS2] Ontologies in DAS/2 In-Reply-To: References: Message-ID: <43E9DC0E.30809@mrc-lmb.cam.ac.uk> Allen Day wrote: > Why have you chosen to make a subelement of ? Is it expected > that there will be multiple IDs for a given term, and if so is there not a > primary ID? having an id attribute is a defacto standard for DOM libs, so > you can call getElementById(). I'm curious about the DAS use of id attributes, especially given an expectation to use getElementById(). DAS has attributes that are URLs - they include the '/' character. But getElementById() is an HTML or XHTML DOM method I believe. Both HTML 4 and XHTML require that id attributes be of type ID, I think, and the ID type does not permit '/' characters (IDs are Names). I find it pretty confusing that DAS uses an attribute that is called id that isn't an ID. And I'm curious to know if getElementById() works with it? Sounds like a sloppy implementation of the DOM. Or did I miss something? Cheers, Dave From dalke at dalkescientific.com Wed Feb 8 16:36:11 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 8 Feb 2006 16:36:11 +0000 Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2) In-Reply-To: <43E9DC0E.30809@mrc-lmb.cam.ac.uk> References: <43E9DC0E.30809@mrc-lmb.cam.ac.uk> Message-ID: <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com> Dave Howorth wrote: > I'm curious about the DAS use of id attributes, especially given an > expectation to use getElementById(). > > DAS has attributes that are URLs - they include the '/' character. > > But getElementById() is an HTML or XHTML DOM method I believe. > > Both HTML 4 and XHTML require that id attributes be of type ID, I > think, and the ID type does not permit '/' characters (IDs are Names). > > I find it pretty confusing that DAS uses an attribute that is called > id that isn't an ID. And I'm curious to know if getElementById() works > with it? Sounds like a sloppy implementation of the DOM. Or did I miss > something? We've been talking about this and related matters most of the day. It started with Thomas' question "How do I get all of the exons in the database which are from Vega?" (Vega being some other database.) All of the features which are exons from Vega have the same DAS data type. This means he wants to do a feature query with type = He needs to get the DAS type id. He can get all of the exons using an ontology search. But he wants to search for the string "exon". Given the discussion yesterday, will the type query support "ontology='exon'" or must he use some other service to convert "exon" to "SO:exon" or to "http://some/server.url"? Suppose for now it is "SO:exon". He does http://das.server/../types?ontology=SO:exon That gets all of the exon types, but not the ones from Vega. The Vega types have a source="Vega". DAS type queries do not support searching on that field. PROPOSAL: Add a "source=" (case-insensitive substring search) field to the types query. (I don't think there is any contention here so I'll add it.) http://das.server/../types?ontology=SO:exon;source=Vega That comes back with a single DAS type. He now wants to search for all features with that type. What does he use for the query? Is it (assuming proper escaping) http://das.server/../features?type=http://das.server/../type/T12345 ? That's rather excessive, especially if there are many DAS types derived from the given ontology term. All around people want to use "T12345" for that, and not the full URL. Are there people who do want to use the full URL? The current system comes from saying the URL is the identifier for a DAS object. If as Dave points out we have a "id" which is a simple string (of the format /[A-Za-z0-9_]+/ or so) then there's no problem. We can use that for the query, as http://das.server/../features?type=T12345 PROPOSAL: do not use a URL for the identifier for objects That fixes a few problems: - xml:base is no longer an issue; these are ids and not URLs - the names are short and sweet It introduces a few problems. Problem 1: a feature has a type. How can the client get from the type id to the type information if there is no URL to resolve? Solution 1: add a 'id=' term to the types query URL, eg http://das.server/../types?id=T12345 (or possibly call it 'type=') Solution 2: append "/" + type id to the types query URL, eg http://das.server/../types/T1234 Solution 3: have both an 'id' and an 'href' attribute Solution 4: the client downloads all the types and compares the id fields. QUESTION: At Hinxton nearly all the DAS servers have only one or two types. Ensembl has 45 types and Allen's has about 50. Is it reasonable to have clients just go ahead and download everything and not worry about a query language? Is Chado any different? Problem 2: a feature can refer to its parent and part features. It can refer to regions on other features. How does a client get information about the feature given the feature id? Solution 1: add a 'id=' term to the features query URL Solution 2: append "/" + feature id to the feature query URL Solution 3: have both an 'id' and an 'href' attribute We discussed this a lot and decided on PROPOSAL: add an 'id=' query to the types and features query. We decided against solution 2 because of me - I don't like working with URLs that way. Thomas pointed out that an 'id=' query is useful, eg, if a feature has three parts then a client can request http://das.server/../features?id=part1,part2,part3 (NOTE: we're also thinking of proposing this syntax for an 'OR' query over the same term http://das.server/../features?id=part1;id=part2;id=part3 ) I pointed out that having both means there are two ways in the server to look-up by id - extra machinery. QUESTION: Who will want to refer to features and types by URL? Possibilities: - hypothetical model where the queries return a list of URLs and the server (through HTTP pipelining) asks only for the ones it doesn't have already; saving bandwidth. THIS IS NOT A USE CASE! - request a feature in a specific format (but that can be done through the query URL) - RDF people who want individually named items (not a use case) ?We couldn't come up with a case where someone would want to refer to features and types as an individually named URL! For segments there is a use case - you can ask for sequence by range, and that's through the segment URLs. However, that could be done with the segment query URL so it's not a strong use case. In any case, it hasn't been a problem so I'll put that off for now. That being the case, there's no need to consider "Solution 2". Why have URLs if no one wants to use them? What did come up during the discussion here was that we had planned to use URLs for writeback. That model seems rather nice. "DELETE" and "PUT" to the correct URLs, rather than going through a "POST to delete.cgi?type_id=", etc. The model for writeback was something like "ask server to make a copy, with region A:C available for editing. User works with region. User commits region back to server." In that case, the request for region might as easily make a copy of the source, available through a special URL visible only to that one user. In this copy it can expose "url=" attributes for editing, perhaps also with a "writeable=" field because some features will not be editable for that user. I complained yesterday about "writeable" but that was because for the general purpose server the concept of "writeable" was user-specific and not appropriate. In this writeback model it's just fine. Another thing came up during discussion of this. Roy yesterday proposed the idea of a simple server which only supports getting "everything". It doesn't support the DAS query specification. That is, it only supports http://das.server/../types http://das.server/../features and fetching those returns everything. This is useful for small data sets because those could be simple files, like http://das.server/../types.xml http://das.server/../features.xml Still, for that case there would need to be "feature/F1", "type/T2", etc. In essense, a duplicate of every record. Last December during discussion people said there was no use case for this sort of flat-file oriented server. This was not a design goal. Thomas mentioned that there is a use case. Uploading of DAS tracks to a server. People complain now that it's hard to do that. With this url-less model people can upload a small number of documents (or at .zip file of a directory) with the versioned source, types, and features data. There is no need to have an "exploded" copy of all of the records in parallel to the types and features xml files. Big Advantage: Stylesheets are much easier to write. Refer to fields by short id instead of long URL. Conclusion: Proposal 1: "id"s are of the form /[A-Za-z0-9_]+/ Proposal 2: FEATURE and TYPE elements have an option "url" (or "href") attribute Proposal 3: the feature and type queries support a 'id=' search Proposal 4: the type query supports a "source=" search Churn factor: Allen's server doesn't need the 'type/' and 'feature/' fields Gregg and others don't need to worry about xml:base any more. Type and feature lookups need to track the query URL as well as the type and feature id We need a new 'id=' search capability These don't seem big on a programming sense, more a conceptual one. Andrew dalke at dalkescientific.com From cjm at fruitfly.org Wed Feb 8 18:03:41 2006 From: cjm at fruitfly.org (chris mungall) Date: Wed, 8 Feb 2006 10:03:41 -0800 Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2) In-Reply-To: <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com> References: <43E9DC0E.30809@mrc-lmb.cam.ac.uk> <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com> Message-ID: <94bafd156da54842f9093244ca6083d1@fruitfly.org> I'm mostly skim the messages here, so I may be missing something, but I'm a little confused by this: On Feb 8, 2006, at 8:36 AM, Andrew Dalke wrote: > > http://das.server/../types?ontology=SO:exon I don't understand this - SO:exon isn't an ontology > > That gets all of the exon types, but not the ones from Vega. > The Vega types have a source="Vega". DAS type queries do > not support searching on that field. > > PROPOSAL: Add a "source=" (case-insensitive substring search) > field to the types query. (I don't think there is any contention > here so I'll add it.) > > http://das.server/../types?ontology=SO:exon;source=Vega What does 'types' return? A type from an ontology (eg SO:exon) or something else? Why would source be recorded here? Surely source would be a valid constraint on a feature query, but not a type query. Perhaps it's the case that in DAS a 'type' means some kind of arbitrary grouping (eg features of type X and source Y), and 'ontology' means a term/type from an ontology. If it isn't too late I'd suggest changing these conventions. From Gregg_Helt at affymetrix.com Wed Feb 8 18:12:46 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 8 Feb 2006 10:12:46 -0800 Subject: [DAS2] Why use URIs for feature IDs? Message-ID: Regarding using URIs for DAS features, here's the quote from Paul Prescod that I used in the original DAS/2 grant proposal addressing the question "why use URIs?". From http://www.prescod.net/rest/rpc_for_get.html : You can give that URI address to anyone, anywhere and they can reuse it. In particular this means that we can compose applications that were not thought of in advance. Google is an example of an application that was composed "after the fact" out of URIs. Yahoo is another...There are a raft of deployed W3C recommendations that work with information related through URIs. Many of these are XML-related specifications that work as well in API-like applications as in user interface-based applications. These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery, xml-stylesheet. Information published through HTTP URIs can be combined through XInclude, queried and sorted through XQuery and XSLT, visually rendered with xml-stylesheet, related through RDF, linked through XLink, pointed into through XPointer. From dalke at dalkescientific.com Wed Feb 8 19:24:06 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed, 8 Feb 2006 19:24:06 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: References: Message-ID: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com> Yes. I like URLs. I've been so in favor of URLs that until this morning I had in the spec that the "id" *is* the URL. There was no short form for the URL. (still /is/ no short form since it hasn't changed ;) That meant several things: - everyone needs to disambiguate through the xml:base to figure out if two features are the same. (Neither Gregg nor Thomas liked that) - queries of the style we are doing become more complex (type=http://www.server/path/to/das/type/000A956826C8 vs. type=000A956826C8 ) - passing URLs about make for bigger XML, hence slower. The first is technical. The second is emotional - that sort of query looks ugly. The last is .. I can't speak for the last. In an earlier email I showed how a different site layout can be as efficient as any id scheme. Quickly, use http://www.../volvox/1/S <- versioned source URL http://www.../volvox/1/T?.. <- types query url http://www.../volvox/1/T001 <- type urls http://www.../volvox/1/F?.. <- feature query urls http://www.../volvox/1/F001 <- type urls and don't worry about any sort of hierarchy in the system. Everything has the xml:base of "http://www.../volvox/1/" so relative URLs are trivial strings. Several said "just chop off the last bit of the URL to get the id" or "combine some base feature URL with the feature id to get the full URL." Why is that useful? Lincoln said on today's phone call that he wants both a URL and an id, and expected that both would be there. I'm now going to be either stubborn or irritating or both. Why have an id at all? That is, why at all have a short string (say of the form /[A-Za-z0-9_]/ when the URL is there and meets all the functional requirements of an identifier? (I'll use 'id' to refer to a short string, 'url' to refer to a URL. Both are identifiers. I should be using 'uri' for the latter, I know. See comment below.) Today I thought I came up with one reason to have ids and to have a non-existant URL for a element. I think now that I was wrong. My use case was for uploading data to the Emsembl viewer to display a new DAS track. Put all of the types into one file, in the types XML format. Put all of the features into another file in a features XML format. Use arbitrary ids for cross referencing, because there is no URL for them - they don't exist in any form outside the document. Upload them to the server. The server reassembles the annotations by cross referencing the ids. I now see that that's a mistake. As Gregg corrected me, they use URIs not just URLs. They could use "das_private:ABC123" or a fully-qualified URL or a xml:base and the partial URL or whatever scheme. All the server needs to know is how to compare the two URI strings. It's free to rename the strings if need be. (Could it keep the original URLs? Perhaps, but the original data might not be accessible. Consider an exon predictor whose output you want to upload to the Ensembl viewer. There is no URL for that.) Given that this isn't a valid use case for having an 'id' and not having a 'url' now I ask again, what's the point of\ having *both* a unique URL and a unique 'id' for the elements? Tradition? Elegance? With Dave Howorth's comment about the specialness of 'id' I can see changing the attribute name to 'url'.... or 'uri'. I've got to write a couple paragraphs for Nomi now. I'll leave with the following comment from http://tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages > Designing XML Languages is hard. It?s boring, political, > time-consuming, unglamorous, irritating work. It always takes longer > than you think it will, and when you?re finished, there?s always this > feeling that you could have done more or should have done less or got > some detail essentially wrong. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Wed Feb 8 21:46:37 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Wed, 8 Feb 2006 13:46:37 -0800 Subject: [DAS2] Re: New DAS/2 server for codesprint Message-ID: Following Steve's suggestion, I'm focusing on the region around YGL076C (also known as RPL7A) on the yeast genome to get a small slice of feature XML back from the codesprint server for a region where I know what the genes should be: http://das.biopackages.net/das/genome/yeast/S228C/feature?overlaps=chrVI I/364251:366080;type=SO:gene This returns the YGL076C gene with three CDS and two introns. A nearby snoRNA also gets returned. Gregg > -----Original Message----- > From: Chervitz, Steve > Sent: Monday, February 06, 2006 5:03 PM > To: Helt,Gregg; Allen Day > Cc: DAS/2 > Subject: Re: [DAS2] Re: New DAS/2 server for codesprint > > > > There's a gene (RPL7A) with two introns on chr7 at roughly > 366kbp - 364kbp: > http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YGL076C > > Most genes with introns in cerevisiae (which aren't many) > have just a single intron that creates a small 5' exon, such > as the alpha and beta tubulin genes on chr13. Tub1 is on > chr13 at 99Kbp, and tub3 is also on chr13 at 23Kbp. So the > first 100Kb of chr13 would be another region to try. > http://db.yeastgenome.org/cgi-bin/locus.pl?locus=tub1 > > Steve > > > > From: "Helt,Gregg" > > Date: Mon, 6 Feb 2006 16:14:55 -0800 > > To: Allen Day > > Cc: DAS/2 > > Conversation: [DAS2] Re: New DAS/2 server for codesprint > > Subject: RE: [DAS2] Re: New DAS/2 server for codesprint > > > > > > Allen, can you recommend a reasonable region on yeast to do > a features > > query that will return features with some hierarchy (like > > transcript/exons)? > > > > Thanks, > > Gregg > > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > > From Steve_Chervitz at affymetrix.com Wed Feb 8 21:47:18 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Wed, 08 Feb 2006 13:47:18 -0800 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 8 Feb 2006h Message-ID: Notes from the DAS/2 teleconference for the code sprint, 8 Feb 2006 $Id: das2-teleconf-2006-02-08.txt,v 1.1 2006/02/08 21:51:14 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Sanger: Thomas Down Sweden: Andrew Dalke UC Berkeley: Nomi Harris UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Agenda: * progress report for grant renewal * ontologies * ids and urls * style sheets * status reports Topic: Progress report for grant -------------------------------- gh: needs to be in the mail by 5pm tomorrow, to be included as a hard copy addendum to grant. will improve chances of funding for next cycle. review will be done be end of feb. nh: no later than 4pm pst today. state what you've accomplished since Nov 1 and now, in particular this week. one or two paragraphs. gh: 1. highlight significant enhancements 2. involvement of sanger, ebi 3. registry work from andreas, http spec for that registry 4. writeback ad: andreas worked on registry server, will send write up soon post telelconference. [A] Everyone write up 1-2 paragraphs of progress and send to Nomi ASAP Topic: Ontologies ----------------- gh: concerned about ontol attrib in types doc because, do we want it to be possible for a type to be an instantiation of multiple terms in the ontology. ls: will make it hard to validate. one type = many ontol terms. don't like it. types will be specializations of SO terms and will not have multiple parents. gh: thinking about people doing curation. if a type is anchored to one tern in the ontol, and a feat can have only one type, a feat won't be able to refer to >1 term in SO. ls: any use case for this? gh: still exploring this. eg., both a computed feature and an exon? ls: no. separate category for predicted genes. gh: is there something for 'computed exon' or 'computed cds'? ls: think so. sc: multiple branches like go? ls: multiple relationship types do exist. something can be is_a or part_of. I wanted das/2 to be limited to what you can say in SO, with notion that you can extend it. e.g., three predicted exons one with genefinder, exonerate, etc. ad: given a string 'exon' how does that get used to query server? ls: find exon SO term, download list of types from das server, find everything that inherits from exon ontology term. clients need to know how to search the SO list. they will have a local copy of SO that they'll refresh from time to time. gh: client isn't required to know the full structure, except maybe to search higher-level terms. but the term in the ontology attribute is sufficient. ls: could just search types and desc to find exons, but that relies on implementer describing their types correctly. gh: if a client wants to understand an ontol, the best way to go is via what allen's proposing, searching via ontology das, preferably via NCBO server. ad: what is the actual string we're searching on? aday: name or definition, or id. ls: client should have a copy of the SO. unambiguous in this opinion. client has SO, looks through types XML to find what the local types are which the server supports which match what it's looking for in the SO. here's a flowchart: - client downloads SO, caches. - client downloads seq types list, caches. - user searches to find exon - client looks to find matches against 'exon', maybe 5 hits. - prompts user to select which he's looking for - client looks thru cached types xml to find server types of SO term that user selected - client does feature query. ad: what is the string that the user is looking for URL or string? ls: in type xml how do we indicate the term? gh: we've been discussing this the past few days ls: why not replace the term with SO accession number? then we don't have to figure out the correct representation of ontology in an xml. can finish this by friday. chris mungall has weighed in, and xml version of SO ontology is not completely stable. gh: perferctly ok for client to know nothing about SO and treat these as unique string. ls: right. names will eventually be things like 'exon'. aday: chris's main complaint is that the doc didn't validate. I didn't have a dtd. got it and now it validates. I thought this was a done deal. there is a document written that describes how to do what we're talking about. ls: the only thing to be resolved, in types xml document, how do we refer to SO terms? aday: an attribute there that allows you to put in uri. it's a relative url that points to ontology das server to get obo xml for that term. ad: how do I go from string 'exon' to find out what that is? aday: ls: lets say administrator of das server has local type called foobar. associated w/ url for SO 'exon' term. andrew's question is, user want's to search for exons, how to go from 'exon' to correct url in SO to find what types correspond to that? what's to go from 'exon' to foobar. aday: search SO for exon, local types. there's a filter onontolgy that lets you search all terms and definitions gh: there's a reqt now that server must understnd parent child relationships in ontology. aday: server could do xpath query to pull out the terms you're interested in w/o understanding ontology ls: user types 'exon' returns all feats in the genome that are exons. aday: two servers, feat and ontol server gets all types from feat server, each has url to ontology das server, maybe multiple ontology das servers. each must have it's ontology searched returns supported or not. client assembles all search results from static obo xml documents, gh: for most clients this will be irrelevant. user will get a list of types - genscan, blat alignment, for things they may be interested in. they don't need to understand ontology nor does client. there may be a url to look up info about the term. this is the typical case. more sophisticated use cases can be put off till later. ls: in types xml can we have two attributes, url and accession so_accession="SO:12414", other will be url for obo xml. [A] types will have separate attributes for URI and SO accession number Topic: IDs and URLs ------------------- ad: discussion about searching for exon, use case: client goes to server to get list of all types, wants all features of a given type in a given range. may filter based on contains or inside, das-type=xxxxx. talking about that being a URL to get full name for it. what is the thing you send to server to ask for the types? gh: url ad: make this an id so it's not a long complex url. just an id specific to that server. such that you go to feat query url and get it. ls: can just chose the last component of the url, type id. ad: why have ability to get feature type individually? ls: will have to be uniquified, by adding url to types query. ad: feat query = ls: isn't this the way it was? gh: every feat has unique uri. ad: talking about filtering and querying. ls: just give it the id not the whole url. ad: now it is the url ls: should be the id does it make sense to be something that another server has defined? probably not. just a local type. [lots of back and forth here, didn't catch it all...] ad: do we need ability to refer to feature or type by url? gh: yes. for making rdf statements about das2 features. ad: who will do this? gh: I will if no one else does. web technology is moving in this direction. ls: we want every object a das server serves to be referencable as a url/uri. as for filtering mechanism, for type filter we can just use the id of the type, a short string. ad: agree, as of this morning the url and id are same thing. ls: a relative uri, by definition the server should implicitly attach the versioned data source url to it. ad: xml processors ls: define the way the filter query mechanism, hard code implicit paths into it. ls: featuresquery?type=something if 'something' has no slashes, server implicitly adds http://myserver/das/types/... ad: don't like pasting urls and strings together to get things. don't like queries with implicit logic like that. ls: perfectly happy saying you can use urls in the query strings. I'd go with short ids ad: propsing we have both, id and href. here's the case: people uploading to server want to provide a das track, can provide two documents. works well for < 1000 features gh: we have to have uri for features. ad: why? gh: I will send you the page from the first grant. ls: main reason is: to avoid namespace clashes when integrating data sets. td: what do you mean by integrate? ls: view of features from 4 diff annotation groups, want to search for a particular feature by its id, need to indicate which data source it's coming from. td: won't you be keeping track of which data source anyway? you never get a track that's a mixture of diff sources. gh: dangerous to do this. td: there must be something keeping track of which track is from. gh: my assumption is that this is with uri td: there's nothing that constrains a server to only use uris from itself. gh: we sacrificed this when we went with capabilities. ls: a server can emit a set of features, some use relative uris and some absolute ones. if my server starts emiting features with affymetrix uris, the assumption is these originate from affymetrix. uris indicate that they originate from diff places even though you may physically get them from a das server at a different location. gh: thomas is right. given a feature uri you have no way to tell which das server it came from. clients must keep track of this themselves. ls: we wanted to divorce the origin of the feat from the sever that serves it. should be possible to serve features that come from somewhere else. gh: making feature uri opaque was deliberate. ad: when you do a feat query it could return the whole db. so the server must know how to return a feature document that contains all features. that server must know all the data. gh: don't see problem ad: all features and types have id and url. different. url is optional gh: no, required. also, not url, but uri. ad: ok. why should all records have a uri? gh: compatibility with semantic web/rdf, lsid, future proofing. ad: if they want to they can, if not they shouldn't be required. no one is doing rdf now. ls: what issue are you concerned about with respect to uri? ad: like ontology search. give me all features of this das type, you then have to give the url. this is different than id. ls: completely happy treating id as the last component of uri and doing a paste. why don't you like the paste? ad: you can get features from two diff places, each ending with same last word. ls: what query is it that allows you to filter by feature id? we have positional, type filtering and getting a single feature from server of origin. gh: there shouldn't be an id filter. just resolving uri for that feature. ls: we can't search a feature by regex match on it's id. ad: i'm not saying that. I'm suggesting that the url be optional. ls: I don't understand the point. gh: why can't uri be required? ad: see use case in email today subject="ids and urls". involves uploading das tracks to a server. [some trouble: not everyone has seen it] ls: I say we have a policy that if there is big discussion, the email should come more than 30 minutes before conf call. gh: I've read most of it and am still confused. ls: I still don't understand it after reading. you'll have to rephrase it. ad: all types and features have id and url. ls: no, explain in a follow up email. ad: ok [A] Andrew will send follow up email to elaborate on his "ids and urls" use case [A] Everyone will try to absorb andrew's ids and urls use case Topic: Style Sheets ------------------- ad: how do you refer to elements in style sheets, by id or url? gh: no opinion ad: if everything is refered to by id, that makes style sheets easier to write. gh: has anyone gotten to implementation of style sheets for das/2? ad: my proposal was a straw man. Topic: Status reports --------------------- gh: reading lots of specs. after yesterday's rant about xml:base last night, implemented a stack. works fine for our current server. we shouldn't throw out xml:base because of a few edge cases. we might want to specify which subset of xml:base we use. checked in code for igb client, does capabilities, specify feat, types, segments. trouble when modeling sequences. ee: working on das/2 client. building new widget as gregg asked for. ad: working with andreas write up for registry. td: understanding the spec. xml parsing. gh: you are using stacks, have experience with it? td: yes, less painful. streaming api for xml. gh: tried xom. picky about namespaces. difficult to use with spec that's not stable. td: some trouble with dom gh: sources, types, segments I use dom (small document). for features use sax nh: progress with apollo. list of versioned sources, show segments, user picks, gets features. something that the parser doesn't like. not sure where the problem comes from. sc: working on setting up internal das server on 64bit machine here. refining the pipeline for generating files for loading the affy das server with updated data for various public and affy data sources. also writing up and posting meeting notes. aday: message from gavin about ontology responses. caching issue cased trouble with model/controller. chris's obo dtd. dependencies for server rpm were finished. now building the rpm. td: prsing xml from codesprint server. a few things are matching the spec from a few weeks back. prop, loc elements. will these be changed. aday: feature xml? td: yes. I'm still absorbing the changes, dozens of mails about feat properties. gh: more important is loc element, splitting into id and range. used to be one thing, now is two. one is id, other is start,end,strand. aday: will look into today. nh: I'm also taking charge of getting grant progress report done. especially need allen re: server, andreas via registry. gh: any reports for write back. brian: some work on that. not ready for prime time. gh: roy? ad: some talk about this puts and deletes on the urls. gh: let's talk about it tomorrow. From td2 at sanger.ac.uk Wed Feb 8 23:20:34 2006 From: td2 at sanger.ac.uk (Thomas Down) Date: Wed, 8 Feb 2006 23:20:34 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: References: Message-ID: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk> [I should prefix my comments here by saying that I don't actually have a terribly strong opinion on this matter *except that* I'd really like the spec to be explicit on how feature query language works... Does it go .../features?type=exon, .../features?type=types/ exon, or .../features?type=http://das2.sanger.ac.uk/ensembl35/types/ exon?]. Anyway, I'm still having a bit of trouble seeing why features need individually GETable URIs. The use case I remember from the conference call was that it would be nice to be able to describe DAS/ 2 features in RDF documents. I guess that makes sense to me, but for this purpose is there anything wrong with a URI like: http://das2.sanger.ac.uk/ensembl35/features#id12345 This seems compatible with Andrew's ID proposal. My memory of RDF/DAML/OWL/etc is that most objects which get described in such documents are actually fragment identifiers in larger documents, rather than individually GETable entities. Am I missing something here? Thomas On 8 Feb 2006, at 18:12, Helt,Gregg wrote: > Regarding using URIs for DAS features, here's the quote from > Paul > Prescod that I used in the original DAS/2 grant proposal addressing > the > question "why use URIs?". From > http://www.prescod.net/rest/rpc_for_get.html : > > You can give that URI address to anyone, anywhere and they can > reuse it. > In particular this means that we can compose applications that were > not > thought of in advance. Google is an example of an application that was > composed "after the fact" out of URIs. Yahoo is another...There are a > raft of deployed W3C recommendations that work with information > related > through URIs. Many of these are XML-related specifications that > work as > well in API-like applications as in user interface-based applications. > These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery, > xml-stylesheet. Information published through HTTP URIs can be > combined > through XInclude, queried and sorted through XQuery and XSLT, visually > rendered with xml-stylesheet, related through RDF, linked through > XLink, > pointed into through XPointer. > > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From dalke at dalkescientific.com Thu Feb 9 09:35:19 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 09:35:19 +0000 Subject: [DAS2] Re: New DAS/2 server for codesprint In-Reply-To: References: Message-ID: In the das2/scratch directory is a program called "verify_examples.py" I ran it against http://das.biopackages.net/das/genome/yeast/S228C/feature?overlaps=chrVI I/364251:366080;type=SO:gene as follows [guest276:das/das2/scratch] dalke% python ./verify_examples.py load FEATURES "http://das.biopackages.net/das/genome/yeast/S228C/feature? overlaps=chrVII/364251:366080;type=SO:gene" ! expected root tag '{http://www.biodas.org/ns/das/genome/2.00}FEATURES' got '{http://www.biodas.org/ns/das/2.00}FEATURELIST' ^D [guest276:das/das2/scratch] dalke% That is, it's a simple command language. The command to load a URL of the given type is load FEATURES "url" In this case it warns that the top-level name is "FEATURELIST" instead of "FEATURES", which is something that was changed last summer, I think. Saving locally and editing by hand I then get ! expected root tag '{http://www.biodas.org/ns/das/genome/2.00}FEATURES' got '{http://www.biodas.org/ns/das/2.00}FEATURES' That's because element. I'll explain in the next email. * file:///Users/dalke/cvses/das/das2/scratch/biopackages_features.xml:95: 57: error: element "LOC" from namespace "http://www.biodas.org/ns/das/genome/2.00" not allowed in this context That came from The RNC had a bug - it only allowed a single LOC element. Fixed. I've updated the schema and committed a copy of a features data set from Allen's server to CVS under das/das2/scratch/biopackages_features.xml Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Feb 9 10:00:45 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 10:00:45 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk> References: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk> Message-ID: <631777ea4ff08f68b6dde657effc18fe@dalkescientific.com> Thomas Down wrote: > Anyway, I'm still having a bit of trouble seeing why features need > individually GETable URIs. The use case I remember from the > conference call was that it would be nice to be able to describe DAS/2 > features in RDF documents. I guess that makes sense to me, but for > this purpose is there anything wrong with a URI like: > > http://das2.sanger.ac.uk/ensembl35/features#id12345 For that matter, the spec doesn't at present say that the individual URLs need to be fetchable. A client could treat them as opaque and unresolvable URLs and still do what it wants. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Feb 9 11:15:18 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 11:15:18 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com> References: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com> Message-ID: <4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com> I'm going to incur the possibility of pitchforks here.. :) Me: > Yes. I like URLs. I've been so in favor of URLs that until > this morning I had in the spec that the "id" *is* the URL. > There was no short form for the URL. (still /is/ no short form > since it hasn't changed ;) > > I'm now going to be either stubborn or irritating or both. > Why have an id at all? That is, why at all have a short string > (say of the form /[A-Za-z0-9_]/ when the URL is there and > meets all the functional requirements of an identifier? Here's the change - or not change since it reflects the current spec. Features and types have a single "id". That id is a uri in all its glory. Referring to Dave's email, yes, special characters are included - this is a uri. Looking at http://blog.bitflux.ch/wiki/GetElementById_Pitfalls the getElementById refers to the attribute with type "ID" which happens to be named "id" for XHTML and SVG. Given http://www.w3.org/TR/xml-id/ I have added xml:id as a common attribute for all of the DAS items for independent and optional identification of an element in a document. There is no short-form id for features and types. Queries are done using the full URL. For example, to find all elements of type "http://www.example.com/das2/human/1/type/T12345" the query string (assuming the query url is ".../1/feature_search.cgi") http://www.example.com/das2/human/1/feature_search.cgi? type=http%3A%2F%2Fwww.example.com%2Fdas2%2Fhuman%2F1%2Ftype%2FT12345 The single and sole exception is for range queries. Each segment has a URL and a "name" attribute. This name is a unique short-form identifier used for range queries. The name is of the form /[A-Za-z_][A-Za-z_0-9]*/ . To do a range query for all features on a segment with name Chr1 and range 50 to 100 use the format "X/50:100" and the query looks like http://www.example.com/das2/human/1/feature_search.cgi? overlaps=X%2F50%3A100 The reason for this exception is three-fold: - the syntax for merging the URL and two/three fields became ugly - Gregg wants to send multiple ranges at a time, if the client knows enough about what it has already - the client may consult one of several reference servers given the coordinate system for the annotations. These do not hold for feature types (features are independent objects; there will be at most a handful in most servers; the types are specific to the given set of features) Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Feb 9 11:41:35 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 11:41:35 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: <4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com> References: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com> <4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com> Message-ID: <0255ae96de376ffd89e2af0d9766aed6@dalkescientific.com> > I'm going to incur the possibility of pitchforks here.. :) To mollify or intensify the pitchforks ... Several people have said that "the id is the last component of the URL" or "the URL is the base + '/' + the id". That's what DAS1 did. I don't like URL construction like this. It makes the URL organization imposed by the specification when it doesn't need to do so. For example, Allen prefers his URLs like this /feature?this=that is the query interface /feature/F00001 is an identifier for the features I might like it like this /feature_search.cgi?.. is the query interface /feature/F00001 is an identifier for the features Still others as /features?this=that is the query interface /feature/exon/A1 is an identifier for the features /feature/contig/A is another identifier for the features ** NOTE: in this case the "last term of the URL" is not sufficient as a unique short-form id ** Or still others as /cgi-bin/fsearch.rb?this=that is the query interface /data/F1 is an identifier for the features /data/F2 is another identifier One advantage to hard-coding the URL organization into the spec is the tradition from DAS1, and the general practice of expecting one-off URL schemes during web scraping. Another is that people understand it more easily. It's a lot easier to write out examples in one naming scheme than it is to say "using the identifier from the record ..." On the other hand, the programming is easier. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Feb 9 11:48:02 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 11:48:02 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: <631777ea4ff08f68b6dde657effc18fe@dalkescientific.com> References: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk> <631777ea4ff08f68b6dde657effc18fe@dalkescientific.com> Message-ID: <2878cecec027ce28826c48d1a3a68e30@dalkescientific.com> Churn factor: The only part of the spec that changes is the query interface for types. The type feature filter must take a full URL and not a partial URL nor a non-existant 'short id'. Allen's server does not support queries given the full URL. Here's what the spec says -- note that it quotes the previous draft and I added some comments. > Query parameter "type" > > type=type_url > > Example: > $FQ?type=http%3A%2F%2Fwww.biodas.org%2FtypeA > > Match features with the given feature type. > > XXX the previous version of this document says > > Match features of the given type. A type is one of: > 1. a typeid returned by the feature type document described > earlier. Only features exactly matching the type are returned. > > 2. a sequence ontology term, such as "exon". Features matching the > term or *any of its ISA descendents* are returned. > > 3. a sequence ontology accession number, such as SO:12345. Features > matching the accession number or *any of its ISA descendents* are > returned. > > 4. a reserved type beginning with the namespace "das:". The only such > reserved type is currently "das:feature-lock", used for feature > updating. > > XXX I think we should only have it do 1. For 2 and 3 use the query > parameter 'ontology'. For 4, use a different query term, or don't use > locks as features. Based on the discussion yesterday, this changes to: 1. we support this one, with fully resolved URLs 2. the searching is done in the client so this option is removed 3. the searching is done in the client so this option is removed 4. we can always define "http://www.biodas.org/spec/special-type" as a URL to send to the server if we want to define a special query. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Thu Feb 9 15:27:57 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 07:27:57 -0800 Subject: [DAS2] Why use URIs for feature IDs? Message-ID: I think that as Thomas says, using URI fragment notation, http://das2.sanger.ac.uk/ensembl35/features#id12345 is a perfectly valid URI and thus is acceptable as a feature ID. But, if the intent is to construct feature URIs using fragment identifiers in combination with either ID attributes (as defined in a DTD) or xml:id attributes, as an alternative approach to URI = ID attribute with xml:base resolution, I think it would get messy. As I understand it a fragment identifier approach would mean URI = (URL of doc feature XML is embedded in) + "#" + value of feature's ID attribute. But then if the feature is returned as part of a query, say: http://das2.sanger.ac.uk/ensembl35/features?overlaps=chr1/10000:20000 and the feature with attribute id="id12345", then the feature URI using standard fragment notation would be http://das2.sanger.ac.uk/ensembl35/features?overlaps=chr1/10000:20000#id 12345 In other words there would be a very large number of possible feature URIs, with query string gunk in them, identifying the same feature. Unless we define a nonstandard way of constructing fragment identifiers that chops off the query string. Instead of something nonstandard I'd rather use xml:base, adhere to the XML Base spec, and allow the feature id attribute to be full or relative URIs. Then specifying in the top element that xml:base = http://das2.sanger.ac.uk/ensembl35/features/, a feature returned by the features query whose with attribute id="id12345" resolves the feature URI to: http://das2.sanger.ac.uk/ensembl35/features/id12345 There might even be a way to fiddle with xml:base and id to use a "#" instead of the last "/", though I'm not at all sure about that. gregg > From: Thomas Down [mailto:td2 at sanger.ac.uk] > Sent: Wednesday, February 08, 2006 3:21 PM > To: Helt,Gregg > Cc: DAS/2 > Subject: Re: [DAS2] Why use URIs for feature IDs? > > [I should prefix my comments here by saying that I don't actually > have a terribly strong opinion on this matter *except that* I'd > really like the spec to be explicit on how feature query language > works... Does it go .../features?type=exon, .../features?type=types/ > exon, or .../features?type=http://das2.sanger.ac.uk/ensembl35/types/ > exon?]. > > Anyway, I'm still having a bit of trouble seeing why features need > individually GETable URIs. The use case I remember from the > conference call was that it would be nice to be able to describe DAS/ > 2 features in RDF documents. I guess that makes sense to me, but for > this purpose is there anything wrong with a URI like: > > http://das2.sanger.ac.uk/ensembl35/features#id12345 > > This seems compatible with Andrew's ID proposal. > > My memory of RDF/DAML/OWL/etc is that most objects which get > described in such documents are actually fragment identifiers in > larger documents, rather than individually GETable entities. Am I > missing something here? > > Thomas > > > On 8 Feb 2006, at 18:12, Helt,Gregg wrote: > > > Regarding using URIs for DAS features, here's the quote from > > Paul > > Prescod that I used in the original DAS/2 grant proposal addressing > > the > > question "why use URIs?". From > > http://www.prescod.net/rest/rpc_for_get.html : > > > > You can give that URI address to anyone, anywhere and they can > > reuse it. > > In particular this means that we can compose applications that were > > not > > thought of in advance. Google is an example of an application that was > > composed "after the fact" out of URIs. Yahoo is another...There are a > > raft of deployed W3C recommendations that work with information > > related > > through URIs. Many of these are XML-related specifications that > > work as > > well in API-like applications as in user interface-based applications. > > These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery, > > xml-stylesheet. Information published through HTTP URIs can be > > combined > > through XInclude, queried and sorted through XQuery and XSLT, visually > > rendered with xml-stylesheet, related through RDF, linked through > > XLink, > > pointed into through XPointer. > > > > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 From dalke at dalkescientific.com Thu Feb 9 15:43:27 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 15:43:27 +0000 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: References: Message-ID: <5920623233379c4200775188315082bb@dalkescientific.com> Gregg > As I understand it a fragment identifier approach would mean > URI = (URL of doc feature XML is embedded in) + "#" + value of > feature's > ID attribute. As I understand it the part after the '#' is a query language which is document type specific and used by the client. DAS does not define how that query language is used, so it has no meaning in the DAS world. http://www.ietf.org/rfc/rfc2396.txt 4. URI References The term "URI-reference" is used here to denote the common usage of a resource identifier. A URI reference may be absolute or relative, and may have additional information attached in the form of a fragment identifier. However, "the URI" that results from such a reference includes only the absolute URI after the fragment identifier (if any) is removed and after any relative URI is resolved to its absolute form. Although it is possible to limit the discussion of URI syntax and semantics to that of the absolute result, most usage of URI is within general URI references, and it is impossible to obtain the URI from such a reference without also parsing the fragment and resolving the relative form. .... 4.1. Fragment Identifier When a URI reference is used to perform a retrieval action on the identified resource, the optional fragment identifier, separated from the URI by a crosshatch ("#") character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed. As such, it is not part of a URI, but is often used in conjunction with a URI. fragment = *uric The semantics of a fragment identifier is a property of the data resulting from a retrieval action, regardless of the type of URI used in the reference. Therefore, the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result. The character restrictions described in Section 2 for URI also apply to the fragment in a URI-reference. Individual media types may define additional restrictions or structure within the fragment for specifying different types of "partial views" that can be identified within that media type. A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Feb 9 15:53:38 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 15:53:38 +0000 Subject: [DAS2] writeback via diffs Message-ID: <7a182cd18dacf110341f5cec43436f38@dalkescientific.com> Summary: We've been talking about the "update via a delta" model as an alternative to the "lots of changes to the server" model. Deltas mean the heavy work is done in the client (or middleware), vs. the server. We've been looking at the writeback spec. It doesn't handle the case of a complex feature with a parent/part relationship. In the current scheme that's done as a: - get the write lock - POST the new feature (parent) - POST the new feature (child) - commit on the lock What URL does the parent record have to point to the child? Does the database defer referential integrity checks until the commit on the lock? Is this a case where the POST for that feature returns an UPDATELIST document for every unknown/ placeholder identifier in the record? Probably. Another solution is to ask the server "give me two identifiers which can be used for features". (NOTE: must do this for either URLs or 'short ids' because the client might guess and override an existing feature.) Cute. But no real takers here. BTW, does the full DAS query system support searches of the modified version of the server? How does the server know that the search request comes from a client working in an editable view? In talking about it we've been working on an idea we all talked about last year; submitting a delta to the server and moving the heavy work into the client. That is, after the client is done locally it sends a document which looks like ... updated type information here ... ... ... There are several things to note: - the elements, to remove existing types and features - the types and features are in the normal formats. - there is no way to update a part of a record/ the record is sent in full - new identifiers are still a problem The use model for this is as follows, based on Otter. - get the SOURCES document, which will have - get an exclusive write lock on a region - POST to the locks URL (and GET gets a list of the locks?) - only one region locked at a time (current spec allows the full query language; is that needed?) - user is authenticated via HTTP-level authentication (Q: allow https for any of this?) - optional timeout time in request; server may give shorter or longer timeout - user is allowed to edit all features in the given region - get all the features in that region (because there may have been a commit before the write lock) - work with the data on the local copy of the server data - push the big red "COMMIT" button - server POSTS the delta to the server - user authentication again - also sends a lock-id or a nounce so the server can double-check that there wasn't some other change - server checks payload for referential integrity The problem is the need for a URL. We've come up with two solutions. 1. ask the server for things which can be used as identifiers. These identifiers live for the life of the lock. 2. reserve a private URI scheme, like "das-private:" followed by a client-defined identifier. On upload the server maps those into valid local identifiers. To work correctly for the client the response document would need to contain mapping from private identifiers to server identifiers. The current spec uses the latter mechanism but does not specify how the placeholder identifier is generated. The mapping is essentially the "UPDATELIST" from the current spec, though with no need to support the status field on a per item basis - it should be an all or none transaction. Sending a delta gets rid of the DELETE and PUT (and POST update) methods on the server. Not ReSTful. It places the burden on the client for tracking the user edits instead of in the server. But we have a good sense that it will work and is understandable. It maps much more closely to the current Otter use. We don't know how Apollo/Chado wants to support writeback. If we decide to stay with the existing ReSTy spec then our recommendations are: - there's no need to support partial updates; clients send the complete record to the server for update - the query language does not need to support the full DAS query language; only the "region" query (based on Otter experience) - there's no current need to extend the range of a lock nor to extend the time of the lock. And I don't like that "lock=" is a parameter to the feature and types URLs which creates locks for those types rather than performs queries. I would rather these be new URLs. Andrew dalke at dalkescientific.com From lstein at cshl.edu Thu Feb 9 16:12:32 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 9 Feb 2006 11:12:32 -0500 Subject: [DAS2] Why use URIs for feature IDs? In-Reply-To: <5920623233379c4200775188315082bb@dalkescientific.com> References: <5920623233379c4200775188315082bb@dalkescientific.com> Message-ID: <200602091112.33548.lstein@cshl.edu> Hi Folks, I've drunk the W3C Kool-Aid and do feel that a major feature of DAS/2 as it now stands is that all data objects are referenceable as URIs. Furthermore, I think it is a handy-dandy feature for them to be fetchable URLs as well, having, I suppose, drunk the REST Kool-Aid. For this reason, I prefer the / notation to the # notation. Over and above the fact that the #fragment is not a part of the URI at all (according to the part of the spec that Andrew quoted), a practical issue with the # notation is that all browsers (and, I believe, some client-side libraries, although not the Perl LWP) strip out the # and whatever follows it. The server never gets a chance to act on the fragment. Since xml:base is giving us a hard time with respect to the queries, and causing major confusion and dissension in the group, I'd prefer to go with Andrew's strict idea of making all the IDs passed to the queries full URIs. In other words, including the properly escaped http://etc.etc in the query string. This is going to make it a bit annoying to debug servers from within browsers, but will clean up the semantics considerably and once and for all remove the confusion about who "owns" a feature versus who "serves" a feature. Lincoln On Thursday 09 February 2006 10:43, Andrew Dalke wrote: > Gregg > > > As I understand it a fragment identifier approach would mean > > URI = (URL of doc feature XML is embedded in) + "#" + value of > > feature's > > ID attribute. > > As I understand it the part after the '#' is a query language > which is document type specific and used by the client. DAS does not > define how that query language is used, so it has no meaning in the > DAS world. > > http://www.ietf.org/rfc/rfc2396.txt > > 4. URI References > > The term "URI-reference" is used here to denote the common usage of a > resource identifier. A URI reference may be absolute or relative, > and may have additional information attached in the form of a > fragment identifier. However, "the URI" that results from such a > reference includes only the absolute URI after the fragment > identifier (if any) is removed and after any relative URI is resolved > to its absolute form. Although it is possible to limit the > discussion of URI syntax and semantics to that of the absolute > result, most usage of URI is within general URI references, and it is > impossible to obtain the URI from such a reference without also > parsing the fragment and resolving the relative form. > .... > 4.1. Fragment Identifier > > When a URI reference is used to perform a retrieval action on the > identified resource, the optional fragment identifier, separated from > the URI by a crosshatch ("#") character, consists of additional > reference information to be interpreted by the user agent after the > retrieval action has been successfully completed. As such, it is not > part of a URI, but is often used in conjunction with a URI. > > fragment = *uric > > The semantics of a fragment identifier is a property of the data > resulting from a retrieval action, regardless of the type of URI used > in the reference. Therefore, the format and interpretation of > fragment identifiers is dependent on the media type [RFC2046] of the > retrieval result. The character restrictions described in Section 2 > > for URI also apply to the fragment in a URI-reference. Individual > media types may define additional restrictions or structure within > the fragment for specifying different types of "partial views" that > can be identified within that media type. > > A fragment identifier is only meaningful when a URI reference is > intended for retrieval and the result of that retrieval is a document > for which the identified fragment is consistently defined. > > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From lstein at cshl.edu Thu Feb 9 16:15:48 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 9 Feb 2006 11:15:48 -0500 Subject: [DAS2] RE: Working with xml:base in Java? In-Reply-To: References: Message-ID: <200602091115.49675.lstein@cshl.edu> The Perl libraries provide a very simple HTTP_Base attribute. As you parse your way through the XML, you can change the HTTP_Base using any of the relative or absolute address resolution modes, so that subsequent URLs are correctly resolved. Unfortunately it is a SAX model, so that you have to push previous bases onto a stack and restore them as needed. Lincoln On Wednesday 08 February 2006 05:12, Helt,Gregg wrote: > > -----Original Message----- > > From: Thomas Down [mailto:td2 at sanger.ac.uk] > > Sent: Wednesday, February 08, 2006 12:45 AM > > To: Helt,Gregg > > Cc: DAS/2 > > Subject: Re: Working with xml:base in Java? > > > > On 7 Feb 2006, at 19:00, Helt,Gregg wrote: > > > Thomas, I'm wondering what toolkits you're using for binding XML > > > to Java objects? And particularly how you are dealing with > > resolving > > > > URIs when xml:base is used. So far I've mostly used various > > > implementations of SAX and DOM -- I've found some reports of builtin > > > xml:base support in Xerces SAX/DOM, but it's still unclear. > > > > > > I've been avoiding the issue up till now. It won't be too hard > > > to implement URI resolution relative to xml:base, but I thought I'd > > > check around first and see if there's automated support of this in > > > some > > > toolkit. > > > > Hi Greg, > > > > I'm actually using Stax (the streaming API for XML). The > > implementation I use is called Woodstox: > > > > http://woodstox.codehaus.org/ > > I would like to check out Stax, haven't used it before. > > > (but there are a few others out there). No builtin xml:base support > > but it's easy to write a little wrapper around XMLStreamReader to > > spot xml:base attributes and maintain a stack of base URIs. > > > > I'm using java.net.URI to do the URI handling/resolution/ > > relativization. Seems to be working okay... so far... > > That's what I was thinking about when I said it wouldn't be too hard to > implement... But that was yesterday. A long time ago. > > Now I've taken a detour into re-reading the XML Base spec > http://www.w3.org/TR/xmlbase/, and things don't seem so easy. > > I _think_ if there's at least one xml:base attribute in the element > hierarchy above where you're trying to determine a base URI, and > resolution of those xml:base attributes yields an absolute URI, it's all > good, that's the base URI. But on the other hand if this resolution > yields a relative URI instead of an absolute URI I'm not sure what > happens -- I would guess it's an error, but I can't see anywhere in the > XML Base spec that spells this out. And if there's no xml:base to use > to determine a base URI, things get weird: > if the document is "encapsulated within another entity", the base URI > is the URI of that entity (I have no idea if DAS/2 docs could appear in > such a context) > otherwise the base URI is the URI used to retrieve the document > oh, except if you burrow down into the spec pointers to RFC 2396 > http://www.ietf.org/rfc/rfc2396.txt, if the request gets redirected you > need to make sure the base URI is the last URI used in the redirect > oh yeah, and apparently external entity declarations can affect all > of this in ways I don't understand > and there's probably other gotchas I've missed... > > Now from the server side, none of this is really an issue. Just pick > from a multitude of variants that XML Base allows when you send > responses to the client. From the client side, if we really want DAS/2 > to support XML Base (and I think we do), things get tricky. It's > definitely pushing me towards using libraries that provide builtin > support for XML Base. > > Gregg > > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Thu Feb 9 16:37:12 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 16:37:12 +0000 Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2) In-Reply-To: <94bafd156da54842f9093244ca6083d1@fruitfly.org> References: <43E9DC0E.30809@mrc-lmb.cam.ac.uk> <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com> <94bafd156da54842f9093244ca6083d1@fruitfly.org> Message-ID: [Top-posting summary] I agree with Chris that the DAS "type"s aren't really types. Chris Mungall: > I'm mostly skim the messages here, so I may be missing something, but > I'm a little confused by this: > > On Feb 8, 2006, at 8:36 AM, Andrew Dalke wrote: > >> >> http://das.server/../types?ontology=SO:exon > > I don't understand this - SO:exon isn't an ontology I made it up; I mean "whatever the SO term is for an exon". I think it's SO:0005845 ("single_exon") or SO:0000147 ("exon") >> PROPOSAL: Add a "source=" (case-insensitive substring search) >> field to the types query. (I don't think there is any contention >> here so I'll add it.) >> >> http://das.server/../types?ontology=SO:exon;source=Vega > > What does 'types' return? A type from an ontology (eg SO:exon) or > something else? Why would source be recorded here? Surely source would > be a valid constraint on a feature query, but not a type query. A DAS type is a somewhat strange thing, in the type sense. It stores: - the link to the ontology - a list of the formats available for features of that type - this "source" field - potentially some per-source data used for depiction, or perhaps not Thomas Down here has this use case. He has a program which searches for exons. All of the annotations it makes for a month are from that program. He wants them to be the same type - conceptually "the exons predicted by the program". Some of that data could be moved into the feature. The feature can point directly to the ontology, and have a "source". > Perhaps it's the case that in DAS a 'type' means some kind of > arbitrary grouping (eg features of type X and source Y), and > 'ontology' means a > term/type from an ontology. If it isn't too late I'd suggest changing > these conventions. That is more like the case. Got a better name. "class"? ROFL. Or not. It is not a type system. It is closer to a group than anything else. I agree that "type" has connotations which are not true for this case. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Thu Feb 9 16:40:34 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 08:40:34 -0800 Subject: [DAS2] Why use URIs for feature IDs? Message-ID: Interesting, I hadn't fully absorbed part 4 of the URI spec (rfc2396). So if I understand correctly: If we replace everywhere we've called something a "URI" with "URI reference" we're being correct -- a URI reference can be an absolute or relative URI, and can also include a fragment identifier. And according to the spec saying "the URI" means the absolute URI, not the relative URI. So to restate, I think the ids we use in DAS/2 should be URI references. Maybe instead of "id" or "uri" we should use "uri_ref" for the attribute name? I still see no reason to exclude URI references with fragment identifiers, though I agree with Lincoln that actually resolving a URL with a fragment is problematic. But we're not guaranteeing that these URI references are URLs anyway. The capabilities "query_id" attributes are another story. These need to be not just URI references but also resolve via XML-Base to full URLs. gregg > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Andrew Dalke > Sent: Thursday, February 09, 2006 7:43 AM > To: DAS/2 > Subject: Re: [DAS2] Why use URIs for feature IDs? > > Gregg > > As I understand it a fragment identifier approach would mean > > URI = (URL of doc feature XML is embedded in) + "#" + value of > > feature's > > ID attribute. > > As I understand it the part after the '#' is a query language > which is document type specific and used by the client. DAS does not > define how that query language is used, so it has no meaning in the > DAS world. > > http://www.ietf.org/rfc/rfc2396.txt > > 4. URI References > > The term "URI-reference" is used here to denote the common usage of a > resource identifier. A URI reference may be absolute or relative, > and may have additional information attached in the form of a > fragment identifier. However, "the URI" that results from such a > reference includes only the absolute URI after the fragment > identifier (if any) is removed and after any relative URI is resolved > to its absolute form. Although it is possible to limit the > discussion of URI syntax and semantics to that of the absolute > result, most usage of URI is within general URI references, and it is > impossible to obtain the URI from such a reference without also > parsing the fragment and resolving the relative form. > .... > 4.1. Fragment Identifier > > When a URI reference is used to perform a retrieval action on the > identified resource, the optional fragment identifier, separated from > the URI by a crosshatch ("#") character, consists of additional > reference information to be interpreted by the user agent after the > retrieval action has been successfully completed. As such, it is not > part of a URI, but is often used in conjunction with a URI. > > fragment = *uric > > The semantics of a fragment identifier is a property of the data > resulting from a retrieval action, regardless of the type of URI used > in the reference. Therefore, the format and interpretation of > fragment identifiers is dependent on the media type [RFC2046] of the > retrieval result. The character restrictions described in Section 2 > > for URI also apply to the fragment in a URI-reference. Individual > media types may define additional restrictions or structure within > the fragment for specifying different types of "partial views" that > can be identified within that media type. > > A fragment identifier is only meaningful when a URI reference is > intended for retrieval and the result of that retrieval is a document > for which the identified fragment is consistently defined. > > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From Gregg_Helt at affymetrix.com Thu Feb 9 16:57:02 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 08:57:02 -0800 Subject: [DAS2] Proposed agenda for DAS/2 code sprint teleconference, Feb 9 Message-ID: ids for features, sequences, types, etc. stylesheets writeback update to NIH grant proposal status report Anything else we should add? From dalke at dalkescientific.com Thu Feb 9 18:28:48 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 9 Feb 2006 18:28:48 +0000 Subject: [DAS2] arbitrary data in writeback Message-ID: The DAS spec for features looks something like this ... iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFAQAAAABUH0DFAAAANUlEQVR4nGIM1AcAAAD// 2LiYgAA AAD//2I6wAAAAAD//2JycAAAAAD//2L6dgAAAAD//wMAEoUDipTjFscAAAAASUVORK5CYII= ... .. There are two points for extension. One is the PROP table which is meant to be simple. Clients can do substring searches of PROP elements with "value"s, as in prop-name=blah+blah All clients should be able to understand these data formats, though there is no constraint for the key names. They are convention only. Right now a key gets either a string, a URL, or a chuck of binary data which is uuencoded. (The key can be present many times; is that a problem with Apollo?) The latter two (URL and binary data) are *proposals*. They are neat, but not based on user demand. No one has told me that they will use it. Allen wants one more possibility, "existence", with no associated value at all. Nomi says that Apollo can't round-trip that data except by also tracking the input XML. I don't want a "it just exists" field and would prefer those stored with an empty string. Then there is the support for non-DAS elements as extensions. These can contain arbitrary XML, so long as they are not in the DAS XML namespace. A client can ignore elements it doesn't understand. However, if it does writeback of a feature it *MUST* include all elements it doesn't understand. I can write that into the spec. It doesn't need to do anything with that data. It can keep it around as a chunk of text. It just needs to send it back to the server when it does the writeback. For that matter, it doesn't even need to keep it around. It can throw the unknown data to the wind and work with the stuff it does know. Just before doing the writeback, go back to the server and get the features again. From the documents get the unknown extension elements and insert them into the data - as text! - to be sent back to the server. Clients may mess up and commit records without these elements. The server will treat those as delete of those records. Because it cannot tell if the client really knows what to do with that data. This is the easiest solution as a spec writer. We have nearly all of the format for that transaction, excepting a bit about being able to delete. NOTE: a server may ignore the uploaded data. For example, it may modify the transaction history and throw out whatever the client sent to it -- if that's how the element is specified. The other solution is to be more fine grained, so that clients send deltas, like ... iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFAQAAAABUH0DFAAAANUlEQVR4nGIM1AcAAAD// 2LiYgAA AAD//2I6wAAAAAD//2JycAAAAAD//2L6dgAAAAD//wMAEoUDipTjFscAAAAASUVORK5CYII= .. but that gets complex. You end up with a grammar for the deltas. Eg, "delete the first 'some_non_das_namespace:curation-history' but not the others". It's a harder grammar to write and a harder semantic to implement on client and server. I don't understand the case where complete writeback is a problem. There was the mention of if a client deletes a feature when it shouldn't have because of extra data that it just didn't know about. I didn't follow that at all. Please enlighten me! :) Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Thu Feb 9 19:06:03 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 09 Feb 2006 11:06:03 -0800 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 9 Feb 2006 Message-ID: Notes from the DAS/2 teleconference for the code sprint, 9 Feb 2006 $Id: das2-teleconf-2006-02-09.txt,v 1.1 2006/02/09 19:13:39 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Sanger: Thomas Down, Roy Sweden: Andrew Dalke UC Berkeley: Nomi Harris, Suzi Lewis UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. [note taker missed the first 5-10 minutes] Topic: encoded URLs ------------------- ls: apache bug - unesacped //. must be percent encoded or apache can run into problems gh: most people don't bother escaping, we should make this clear in the spec. every major library has ways of doing this automatically. [A] update spec to state: contained urls w/in das query urls should be encoded Topic: Style sheets ------------------- ad: see Jan 26/27 email, "style sheet question" what i described is not the same as what das/1 style sheets supply. we already have a mechanism gh: embed ss in types element? ad: or, new capapbility or link server for a given source. gh: prefer this td: easy to have a single style element gh: would a types elem have ptr to ss or do you query for the capability? ad: if no one's interested we don't have to answer the question. sounds like no one's interested in style sheets. gh: we'll keep what you have in the spec for style sheets and move on. ls: what is it? ad: yes. style is embedded in type record. it's now on a per-element basis. ls: ok with this. attributes of types. is there a need for a separate ss? true it mixes presentation with data model. people will look for the info they need and can ignore. ls: transition to separate sheets - visual style id pointing to ss url. same as with html. instead of 'i' tag moved to font style info. Topic: Writeback ---------------- gh: discussion in progress in uk. how big a change from current writeback spec? ad: spec: server does modification to data. this proposal: client can now do more stuff with the data. gh: writeback for client is considerably harder, rarer to impl. ad: issues: can you still do searches for modified data on server? ls: building objs from bottom up (children, to parent) so everything has a url. ad: each feat has parent and a part. ls: true. temporary id mechanism, response indicates mapping to local id is. what happens is: client locks, uploads parents, children with temp ids, does referential integrity checking, then reports mapping from temp to local id. gh: doing http DELETE imposes a constraint ls: how handling id issue? gh: you need something to create new, real id ad: b/c they're in one transaction, server can ls: delete is a problem because http delete only permits one at a time. updates a problem too. post that creates new objs allows you to create multiple new objs at same time, but push and delete only operate one at time. ad: at this point don't want to change data model. ls: so everything will be a post then, under your proposal, for writeback url. ad: a single post. gh: moving from http delete to a trying to understand how this is a delta model. ad: only updates things that changed, and listed deletions ls: fine. writeback, create update and delete sections td: granularity. not single characters. one feature. ls: three transactions we previously had, put, post, and delete, and roll up into a single transaction. gh: when you send back a feat you ve already seen, do you restate all the xml for that feature, since otherwise it is deleted? ad: yes. gh: would like the unit of ro ls: this achieves per transaction integrity, since you don't have to do multiple deletes. the lock idea, had to persist over multiple transactions to allow for that atomicity. gh: we need to keep lock so curators can guarantee that nothing changes underneath them. td: lock corresponds to a db transaction as well. ls: no one's impl this writeback so there's no friction against changing it. i'm fine with it. as long as people don't mind we're losing a cute feature described in a grant. gh: what does roy or ed g. think? roy: have been involved in this. this mirrors some features that otter does. a good idea. deletes and put aren't big winners, if updating multiple feats and they refer to each other. roy: whole xml doc is the transcaction ls: if anything doesn't make sense, all requests in the writeback doc are rolled back. roy: yes. some error messages to understand what might be going wrong. gh: splits and merges work too? merging one feature from two, or splitting one transcript into two. roy: fits in well. get back two ids of new features. otter give a lot back in the xml after posting the data. gh: treats id in feat is a placeholder and it sends a real id back to you. ls: your given a temporary placeholder then it give you real id. might want to put a formal merge and split commands. because in proposed new system (and old) to split one exon to two, you have to either delete the original one, or update it to change one boundary and create a new one. you've lost the ability to keep track of the original and the two new ones. ad: feats have place for arbitrary annotations. creational history log could be maintained. ls: how upload this to a server. splitting exon into two daughters is different from deleting and creating two new ones. ad: no needs this, for future. gh: it's needed now. ls: splitting genes into two pieces is important. people want to keep track of this. formal merges and splits permits this tracking. gh: my take, prefer fewer verbs as possible. if we can formally define splits and merges as combos of delets and creates, perfer this. ls: semantically difficult for server to know that a delete followed by two creates is different than a split. td: ancestor id on the features can solve this. ad: haven't heard about this use case. features have place where you can stick in new data. database can read it to understand history. gh: like idea of curational track of ancestors. before, people said we can't require dbs to do this. td: optional property ls: could thread it through feature properties. ad: this version, or for 2.1? gh: initial write back must support splits and merges. [broad agreement] ls: make sure it will work. what happens when track of ancestors and the ancestor object disappears. gh: can't assume a db has identifier for every curation in it's past state. roy: weakness of the current otter schema, james is working on a fix. tag a release and go back to genes as of that release. ls: acedb had this feature to rollback to older versions of gene model. aday: the schem we're using has support to previous version. roy: tedious. big script, but a good thing to have. ls: a few hours of more discussion to see what's involved in supporting tracking curational merges, splits, renames, etc. to make sure it's the write decision to put it into a curational property of feature rather than having a formal database merges and split operations. i'm ok doing it this way if it seems ok. gh, aday: me too Topic: NIH grant proposal ------------------------- gh: i'm the bottle neck Status reports: --------------- gh: igb das client still. checked in code. you can get das2 client in igb poiting to codesprint das2 server. sources, segments, types. no features yet. working on this today. should go faster today. ad: sent email to allen about some things about server that don't agree with spec. properties aday: features have no properties associated with them. do we need valtype or href. nh: a key with no value doesn't make sense. using 'true' if no value. aday: ok. but need an agreement on what to do for properties with no associated value or type ad: can make it so. aday: now put in empty string ad: use for both value and href aday: can't have both. ad: what's interpretation if you have both? can take out href part and have value= empty string nh: client deals with empty value. ad: leave it as a string suzi: uneasy about this. td: it does have a value, empty string. suzi: some places where empty string doesn't make sense. data gets dirty. if you're gonna have a tag-value structure, and may or may not be a value, it's bad. some things are tag-value, some things just have a value. it seems ambiguous, no guaranteed behavior. ad: guaratee is for all keys to have a value. can be empty string. gh: string or empty string is ok ad: only used for clients who know what it means. may have to update apollo gh: if we allow arbitrary xml in features, client will have to remember this xml or it will disappear. ls: a huge issue w/ apollo in past. when communicating w/ db's that have extra stuff, in the xml that isn't on client side data model. suzi: my take, the client should not have to pass it all through. nh: it forces client to be a complete database gh: then the delta writeback ls: works ok for deletes, updates become an issue ad: you have to deal with text you don't understand. ls: you have to keep track of tags you don't understand, other wise they are deleted. gh: trade off, simplicity of writeback, and what client has to remember. ls: client says: i don't understand it, but i can't delete it. gh: how hard is it to have an abritrary xml chunk by client? ls: give it an empty tag to say you want it to go away. nh: how do you delete things that came in empty and you want to delete them? ls: can have attribute="delete me". this creates a burden on server side. [client folks like this..] decided to keep everything you know know and send it back. round trip it. ad: client can throw away what it wants. can go back to server ls: boomerang. gh: a variety of ways to make sure the data gets stored. roy: will be in feature. just hold a pointer to it. suxi: hard for apollow. passive round tripping is fine.. difficulty is with deletes. ignoring stuff, don't know what it is. delete a transcript or whole gene. some of that stuff you don't know what it is, describes a mutant phenotype. you deleted from genomic record, but there's other data that shouldn't be deleted. client would have to be fully cognizant of it, beyond genome sequence features. client now needs to model all the other data too. ls: difficult to understand how a client could deal with it. ad: just xml is a opaque chunk. why can't client send back full record? suzi: won't solve the full problem. if annotator said delete it gh: client says delete that feature. it won't pass back any stuff underneath the feature. some stuff underneath it that shouldn't be deleted. ad: that's what you have back ups for. suzi: beyond this. to deal with this, we made deletes be more atomic. had to be handled at server side, otherwise, we have to put all that knowledge into client. gets tied to a particular group. ad: knowledge of what? suzi: additional information if you delete whole thing at top, any pass through data is also gone. gh: not hard on client, just what does the server do with that? suzi: this is why it belongs on server side. knows what matters and what doesn't matter. if you don't want clients tied to a particular db. that solution will be inadequate. we had to put the info on the client and make the operations as fine grained as we could. ap: writeback issues have been discussed. suggest to take this up tomorrow. ad: could someone write up why a client couldn't just track the tings that it wanted? then we can consider. Status reports, cont'd ---------------------- roy: zmap client. can get sources and types from server. parsing it creating internal objects. can't draw features yet. long discussion about write back today. ad: validator stuff td: talking about writeback. ap: working on registry. first das/2 server. distinguish between das/1 and das/2 via accession points. brian: rpm build for allen's server. will post today at biopackages.net suzi: spoke to chris about web services for ontology. he will talk with allen. thing about ids to deal with. also, if we do a web service that isn't das like, it should be doable. should be able to get the terms. also, if we want to have stop codon replacement, you also have to say what position, what it's replaced with (uridine). how is this done in das spec? gh: can you post to the list? suzi: yes. aday: will raise writeback issues as well. suzi: small point mutations, indel, substitution (base and position) aday: nearly got apache config file done, impl new std error documents, 300, with error document. nh: more apollo client progress. haven't dealt with types yet. ee: igb improvements. sc: pipeline for populating affy das server with array data. completed pipeline for exon array design data. From nomi at fruitfly.org Thu Feb 9 20:08:33 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Thu, 9 Feb 2006 12:08:33 -0800 (PST) Subject: [DAS2] unary properties In-Reply-To: References: Message-ID: <17387.41281.765157.17683@kinked.lbl.gov> On 9 February 2006, Andrew Dalke wrote: > Allen wants one more possibility, "existence", with no associated > value at all. Nomi says that Apollo can't round-trip that data > except by also tracking the input XML. I don't want a "it just > exists" field and would prefer those stored with an empty string. fwiw, the empty string (rather than no string) doesn't help apollo--the way it stores properties, if you ask for the value of property "foo" and there's no "foo" in the property table, you get back "" (this was to avoid having to put a million null-pointer checks). so apollo would not be able to differentiate--for purposes of writeback OR display without apollo--between and internally, both of these would look like "i don't know anything about property foo," unless i saved them as "foo=true" when they were read in, and then how would it know how to write them out correctly? i would suggest that either 1. we use two different terms to differentiate between key/value properties and properties that are valueless (though really i think they are *keyless* rather than valueless). perhaps the latter could be called "attributes" or something? (actually, ATTRIBUTE is probably a bad choice since it has a meaning in xml, but you get the idea.) OR (and i prefer this): 2. every property is required to have a key and either a value or an href. the valueless (or keyless) properties in the yeast data look like i guess these are like the default cases where other features might (although i haven't seen any of these) have properties like but where did "property/molecular_function unknown" come from in the first place? what i think it should look like is and then we avoid the whole keyless-property issue and make the information more accessible to clients (and hence to users). the way it is now, it's an uninterpretable blob of text (really more of a comment than a property), where as separating into key/value suddenly gives it more meaning. Nomi From Gregg_Helt at affymetrix.com Thu Feb 9 20:05:14 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 12:05:14 -0800 Subject: [DAS2] unary properties Message-ID: Looks to me like these might be GO terms, which should probably be represented more like: and possibly include an href to a description of that GO term. Of course one could argue whether the attribute values should be URI references rather than arbitrary strings, but you get the idea. gregg > -----Original Message----- > From: Nomi Harris [mailto:nomi at fruitfly.org] > Sent: Thursday, February 09, 2006 12:56 PM > To: Andrew Dalke; allenday at ucla.edu > Cc: nomi at fruitfly.org; Helt,Gregg > Subject: Re: [DAS2] unary properties > > On 9 February 2006, Nomi Harris wrote: > > the valueless (or keyless) properties in the yeast data look like > > > > i just looked at another region and found some more interesting valuless > (though i think they should be called keyless) properties: > > > > href=""/> > > these really seem to me to be missing important information. "nucleous"? > we're going to randomly mention cell parts? what this really should say > is > > right? > > so i think this is buggy data--it is missing the keys, and that should be > fixed. in fact, i think having the spec insist that properties have both > key and value would help to catch errors like this. > > Nomi From Gregg_Helt at affymetrix.com Thu Feb 9 23:18:42 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 15:18:42 -0800 Subject: [DAS2] Refinements to range attribute and query filters in spec Message-ID: In the latest spec, the format for range queries is seqid/min:max:strand and the format for range attributes in feature elements is min:max:strand In the earlier spec (http://biodas.org/documents/das2/das2_get.html#ranges) everything but the seqid component of the range query was optional. Are min and max still optional, as in these examples from the previous version of the spec? Chr1/1000 Chr1 beginning at position 1000 and going to the end. Chr1/:2000 Chr1 from the start to position 2000. I personally find these kind of ranges confusing and not particularly useful, and would rather make min and max required for both the range attribute and range-based query filters. Also, the latest spec states: A region may be on the forward or reverse strand or on both strands. These are respectively denoted 1, -1 and 0. The reverse strand is the reverse complement of the forward strand. Unspecified strand means forward strand. So for a features query, are the four overlap filters below equivalent? Chr1/1000:2000 Chr1/1000:2000:1 Chr1/1000:2000:-1 Chr1/1000:2000:0 Or does the addition of strand information further filter the returned features by strand? But if that's the case, then according to the spec having no strand specified means forward. So that would mean overlaps="Chr1/1000:2000" would only return forward strand annotations, and not any on the reverse strand? To me that's counterintuitive, from a filtering perspective I'd rather no strand info mean "both strands". My main point though is we need to be explicit about how strand info or lack thereof affects features queries with range-based filters. gregg From suzi at fruitfly.org Fri Feb 10 00:29:57 2006 From: suzi at fruitfly.org (Suzanna Lewis) Date: Thu, 9 Feb 2006 16:29:57 -0800 Subject: [DAS2] question or two In-Reply-To: References: Message-ID: <54bc0e433303827918fe475855669a89@fruitfly.org> if an annotator wants to indicate a stop-codon-readthrough (which may or may not be a seleno-cysteine mechanism). how would DAS send this info through? need SO type (the readthrough), the location (relative to transcript or genome), and the mechanism. tRNA anticodon or AA? alternative translation table? infer this from organism? -S From Gregg_Helt at affymetrix.com Fri Feb 10 01:43:16 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Thu, 9 Feb 2006 17:43:16 -0800 Subject: [DAS2] feature NOTE and ALIAS elements? Message-ID: > -----Original Message----- > From: das2-bounces at portal.open-bio.org > [mailto:das2-bounces at portal.open-bio.org] On Behalf Of Andrew Dalke > Sent: Tuesday, February 07, 2006 7:45 AM > To: DAS/2 > Subject: Re: [DAS2] properties and queries > > > To summarize, the current thought here for properties and > queries is as follows (it's a long summary. More like an essay. :) > > Add support for zero or more elements in the feature, > of the form > This is some arbitrary (but non-markup-ed) text > > > Add a features search keyword "note=" which takes a search > string to be found in the note elements. (substring? > soundex? regex? the search engine calls up Lincoln and asks?) > > > Add support for zero or more elements in the feature, > of the form > > > (I missed this in the redraft. It should have been there. > Feature filter "name" already says it searches the "name" and > "alias" fields for a feature.) Is the plan still as stated above, to have optional NOTE and ALIAS elements in features? I don't see these elements in the feature schema, and the spec doc says they're built-in properties instead (values for PROP key attribute that have defined meaning). Gregg From td2 at sanger.ac.uk Fri Feb 10 08:54:16 2006 From: td2 at sanger.ac.uk (Thomas Down) Date: Fri, 10 Feb 2006 08:54:16 +0000 Subject: [DAS2] Refinements to range attribute and query filters in spec In-Reply-To: References: Message-ID: <4A9E3BE1-9E24-4D25-AAD1-1851F18857D0@sanger.ac.uk> On 9 Feb 2006, at 23:18, Helt,Gregg wrote: > > In the latest spec, the format for range queries is > seqid/min:max:strand > and the format for range attributes in feature elements is > min:max:strand > > In the earlier spec > (http://biodas.org/documents/das2/das2_get.html#ranges) everything but > the seqid component of the range query was optional. Are min and max > still optional, as in these examples from the previous version of the > spec? > Chr1/1000 Chr1 beginning at position 1000 and going to the > end. > Chr1/:2000 Chr1 from the start to position 2000. > I personally find these kind of ranges confusing and not particularly > useful, and would rather make min and max required for both the range > attribute and range-based query filters. I think it's reasonable for a client to want to fetch all features attached to a given sequence ID. This would certainly be sensible behaviour for clients which always work on reasonably short sequences (e.g. protein-specialized clients), but even genome-centric clients might want to do this when they've had a hint that a particular feature type is "low density" (e.g. chromosome banding patterns?). I'm not sure if anyone would want to query a range where only one of min and max are specified. > Also, the latest spec states: > > A region may be on the forward or reverse strand or on both strands. > These are respectively denoted 1, -1 and 0. The reverse strand is the > reverse complement of the forward strand. Unspecified strand means > forward strand. > > So for a features query, are the four overlap filters below > equivalent? > Chr1/1000:2000 > Chr1/1000:2000:1 > Chr1/1000:2000:-1 > Chr1/1000:2000:0 > Or does the addition of strand information further filter the returned > features by strand? But if that's the case, then according to the > spec > having no strand specified means forward. So that would mean > overlaps="Chr1/1000:2000" would only return forward strand > annotations, > and not any on the reverse strand? To me that's counterintuitive, > from > a filtering perspective I'd rather no strand info mean "both strands". > My main point though is we need to be explicit about how strand > info or > lack thereof affects features queries with range-based filters. Hmmm, I'd been interpreting Chr1/1000:2000 as "return features on both strands", but from the paragraph you quote I guess this is wrong. I'd be happy to see this changes to "Unspecified strand means both strands". Thomas. From dalke at dalkescientific.com Fri Feb 10 10:47:26 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 10:47:26 +0000 Subject: [DAS2] Refinements to range attribute and query filters in spec In-Reply-To: References: Message-ID: Gregg: > In the latest spec, the format for range queries is > seqid/min:max:strand > and the format for range attributes in feature elements is > min:max:strand > I personally find these kind of ranges confusing and not particularly > useful, and would rather make min and max required for both the range > attribute and range-based query filters. Agreed on this side. All clients can easily get the upper limit, and the lower limit is always 0. > My main point though is we need to be explicit about how strand info or > lack thereof affects features queries with range-based filters. It was a confusion on my part. There are three places which refer to location + strand. 1. specifying a feature location 2. fetching a sequence 3. doing a range search "1. specifying a feature location" We've been talking here about limiting the use of strands for these. Features definitely need a strand. If the strand is not specified then the feature is on both strands. or has no meaning. If needed, resolve the ambiguity by looking at the type (or other property). If you really, really want to specify that it's on both strands then use the 0. The location element currently looks like this Given the decision yesterday that segments are special, in terms of identification, I propose using the short id, so these look like, respectively "2. fetching a sequence" Why does the server needs to support a reverse complement feature? Let's leave it out and make the client do a string reversal if it needs it. "3. doing a range search" Is there any reason to specify the strandedness when doing a feature query? Discussion here seems to be "would be nice but that lack is one of the things people have never complained about in DAS1". I propose removing strandedness from the features query. If others disagree then here are two solutions: A. have a "strand=" parameter, so that the strandedness is different from the ranges. If you want a query for the union of range Chr1/A:B:-1 and range Chr1/X:Y:1 then tough - make two requests, one for each strand. B. ranges may specify the strand (as now) but if not specified then it means "of any strand". We worked on a few cases where it might be useful to make mixed strand queries. There weren't any compelling reasons. Even in the worst case scenario without strand support in the features query is that you get on average twice the number of features back, and worst case for option A is the need to make two queries. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Feb 10 10:48:18 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 10:48:18 +0000 Subject: [DAS2] Re: feature NOTE and ALIAS elements? In-Reply-To: References: Message-ID: Gregg: > Is the plan still as stated above, to have optional NOTE and ALIAS > elements in features? I don't see these elements in the feature > schema, > and the spec doc says they're built-in properties instead (values for > PROP key attribute that have defined meaning). Yes. I haven't updated the spec other than a few minor points in the last couple of days. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Feb 10 15:04:45 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 15:04:45 +0000 Subject: [DAS2] 'OR' syntax in query language Message-ID: <8593bb5041e0d054840da98c200d3e03@dalkescientific.com> We talked a bit about the DAS query language. It is currently of the form (modulo URL escaping) name=Andrew,Roy;inside=Chr/100:200 This is the same as ( name contains the substring "Andrew" OR name contains the substring "Roy" ) AND ( feature is inside 100:200 on the segment named 'Chr' ) That is, there is an AND of all terms, and a single term may have multiple OR-ed subqueries, merged by commas. We want to change this to the form name=Andrew;name=Roy;inside That is, the query key can exist more than once. Queries with the same key are 'OR'ed, elsewise they are 'AND'ed. The advantage is the simplicity of not having to worry about another quoting rule, in this case how to search for terms containing a ",". The only disadvantage is with servers which don't handle multiple keys in a query - but we think those client libraries are long since deceased. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Fri Feb 10 15:15:05 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 15:15:05 +0000 Subject: [DAS2] range searches Message-ID: <80684d437a99822fd017cceee83b02b4@dalkescientific.com> I think Gregg has thought the most about this one. We have 4 classes of range search: 'inside' (feature completely inside request range) 'overlaps' (feature overlaps the request range) 'contains' (feature completely contains request range) 'identical' (feature is exactly the request range) They exist for smart clients which want to limit the region request size based on previously fetched knowledge. Example: client is viewing "500:600" and zooms out to "400:700". In that case the client could ask for features which overlap 400:500 OR overlap 600:700 excluding those which overlap 500:600. If that's the case, the selection language isn't powerful enough. There's no way to choose "excluding". The other option is to issue only the overlap queries. Does the query language need to be more powerful to allow "excluding what I know about these regions" for people like Gregg? Another question came up; are queries like overlap 400:500 OR inside 900:1000 useful? I don't think so. If it is, it is not supported by the current language which only does AND of dissimilar terms. Andrew dalke at dalkescientific.com From ap3 at sanger.ac.uk Fri Feb 10 15:21:25 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Fri, 10 Feb 2006 15:21:25 +0000 Subject: [DAS2] registry status Message-ID: <2fa320fbca91abfa9f175b64d0d8105c@sanger.ac.uk> Hi! the developmental registry has been updated: it now supports 2 requests: http://www.spice-3d.org/dasregistry/das2/sources lists das2 servers http://www.spice-3d.org/dasregistry/das1/sources lists das1 servers. The next step will be to provide user upload of das2 sources Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From dalke at dalkescientific.com Fri Feb 10 15:49:10 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 15:49:10 +0000 Subject: [DAS2] curation history and splits&merges Message-ID: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com> We talked some on tracking curation history. We decided it was a hard topic and we would defer further discussion to the next sprint. We're getting rather frazzled here after nearly 5 days of hard work. Here are some things that came up. The writeback delta needs a field for user comments. How persistent is an identifier for an object? Is it for the exact version of a feature or is it for the concept of a the given feature? That is, if there's a feature change the server could assign it a new id/url. It would need to tell the annotation about the new id, just like it tells the client about the newly created ids. This makes updates more like a changeset version control system, where there is a version number for each stable data set. Compare to CVS where there is a version number for each file/record but not for the whole system. But the current Otter database is more the CVS route. While the changeset version seems nicer, there will be some (I assume non-trivial) work to make Otter support it. There are advantages. You could do searches with timewarps by using a "changeset=" parameter in the query. The DAS mechanism handles that just fine, since interlinks between no-longer current URLs would be correct. There needs to be a way to get the history of an element. There are two thoughts: - put the curation history in the feature document (via some embedded XML) - link to a URL which provides the curational history document for the given element We prefer the latter. For splits and merges there needs to be support in the delta to say if there is a relationship to existing or about to be deleted features. We did not work on that, other than to get a feel that it works. Again, no server handles this so we decided it table it for the future, and work on it more for the next sprint. Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Fri Feb 10 16:36:49 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Fri, 10 Feb 2006 08:36:49 -0800 Subject: [DAS2] IGB DAS/2 client partially working -- and using registry! Message-ID: Attached is a screenshot of IGB with data from a yeast test region (chrVII, ~364-366kb) loaded from Allen's codesprint server by way of Andreas' DAS/2 registry. Still need to work on synchronizing up source names, etc., but this is looking good. As we had planned, having the registry return a sources document allowed very easy integration! You may notice there is also a branch of the sources tree that is a direct path to the codesprint server. That just means I gave the discovery engine two URLs to start from -- the registry and the codesprint server. This is the same version of IGB as the current head of the CVS repository (as of today 8:30 AM PST). I'm tempted to roll up a jar so people can try it without having to compile the source, but on the other hand it's pretty fragile right now, and the image conveys the gist of it. gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Andreas Prlic > Sent: Friday, February 10, 2006 7:21 AM > To: DAS/2 > Subject: [DAS2] registry status > > Hi! > > the developmental registry has been updated: > it now supports 2 requests: > > http://www.spice-3d.org/dasregistry/das2/sources > lists das2 servers > > http://www.spice-3d.org/dasregistry/das1/sources > lists das1 servers. > > The next step will be to provide user upload of das2 sources > > Andreas > > > > > ----------------------------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -------------- next part -------------- A non-text attachment was scrubbed... Name: DAS2_in_IGB.JPG Type: image/jpeg Size: 170143 bytes Desc: DAS2_in_IGB.JPG URL: From Gregg_Helt at affymetrix.com Fri Feb 10 17:01:11 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Fri, 10 Feb 2006 09:01:11 -0800 Subject: [DAS2] Proposed agenda for DAS/2 Code Sprint teleconference, Feb 10 Message-ID: Properties Range-based queries Status reports - summarize overall progress during code sprint Discuss next code sprint - goals, etc. ??? From dalke at dalkescientific.com Fri Feb 10 18:14:47 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 10 Feb 2006 18:14:47 +0000 Subject: [DAS2] changes commited Message-ID: <6425fabe79dc6d27fd3a797b837d32de@dalkescientific.com> removed the href= and type= options in the spec and all examples. changed the url "," syntax for OR'ed terms into multiple "key=value;key=value" terms. changed "att=key:value" into "prop-key=value" Andrew dalke at dalkescientific.com From suzi at fruitfly.org Fri Feb 10 19:48:58 2006 From: suzi at fruitfly.org (Suzanna Lewis) Date: Fri, 10 Feb 2006 11:48:58 -0800 Subject: [DAS2] question on properties In-Reply-To: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com> References: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com> Message-ID: <26b9a50a02deb2be55150c6a5a47d419@fruitfly.org> You probably know the answer to this Andrew. One of the cases we encountered was unique properties vs cumulative properties. For a simplistic (i.e. don't quibble to closely, I'm just trying to explain) example pretend that "ssn" and "comment" are both properties. On the client side the appropriate behavior for these is different if the data coming over from the server contains >1 prop element with that tag. If the client sees "ssn" twice it winces and then either ignores or overwrites with the 2nd value. If the client sees "comment" twice then it appends the additional comment. Question: Is this kind of information included in the spec? Uniqueness vs. cumulative From Steve_Chervitz at affymetrix.com Fri Feb 10 22:10:28 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Fri, 10 Feb 2006 14:10:28 -0800 Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint, 10 Feb 2006 Message-ID: Notes from the DAS/2 teleconference for the code sprint, 10 Feb 2006 $Id: das2-teleconf-2006-02-10.txt,v 1.1 2006/02/10 22:13:17 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed E., Gregg Helt CSHL: Lincoln Stein Sanger: Thomas Down, Andreas Prlic Sweden: Andrew Dalke UCLA: Allen Day Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. [note taker missed the first 5 minutes] Topic: Properties ----------------- gh: Properties are all tag-value ad: yes gh: don't think we need your binary thing. ad: ok drop it gh: href is needed. can always point it to a binary something out there. can the value just be a url? ad: can make it relative to xml base gh: do you need some property with tag value and href at same time? ls: how would you interpret that? should be either value or href. ad: there's nothing to say how to interpret the url. gh: nice to have multiple links out to somewhere else and to have some indication what they are w/out traversing the link. e.g., this is the genbank ref, ensembl ref, protein, etc. if xid had an extra field with label, title e.g. that would suffice. ad: sounds ok [A] xids will have title + href, properties will have tag + value Topic: Exercising the spec --------------------------- gh: we need the reference server to actually exercise this part of the spec. xid. possibly other things like: target overlap, inside, cigar strings. encoding, decoding. aday: oh no. ls: line element. cigar string is something that no one has tested yet. gh: if we don't have server doing it by next code sprint aday: any impls out there we could use? gh: bioperl has a gff3 parser. aday: I wrote it, and I didn't impl cigar string parsing. ls: there's a cigar processor in bioperl AlignIO. in theory not hard to do. gh: lbl folks (Nomi et al) have a java one, too. I think. gh: other parts of spec that aren't getting exercised? I doubt if anyone has used xml lang. ad: added xml id. just there for other reasons, but not what we need it for. gh: we talked about all ids being xml ids and combing xml id and xml base, can't remember why we stopped discussing. ad: don't think we need to. style sheet has uses for this maybe. ad: has anyone generated doc href yet? td: can add this stuff easily now. gh: for testing purposes, just throw a doc href everywhere it's allowed. ad: are servers supporting retrieval of seq data? aday: yes ad: support for alt feature formats? aday: can do old compact formats, not sure about coverage. gh: yes, alt feat formats are handled, but server isn't up and running yet. igb das/2 client can handle it already. ad: retrival of assembly? aday: no assembly data ad: i don't touch assembly gh: may be for next code sprint. Topic: range based query ------------------------ gh: thomas and i don't like optional mins and maxes. ls: fine as long as you can always determine the size of the reference. provide beginning and end. gh: exception: if you want the whole sequence, can you just not supply range? ad: yes gh: :1 and :-1 how to interpret nothing for strand on end and 0 for strand at end? ls: features that have strand +1, -1, features that have no strand or on both strands (0) features that may have a strand but you don't know (empty) gh: when you put it in the query there's a differences between i don't know and i will accept anything. use case: transfrags from transcriptome project. unknown strand, but I know it *is* one or the other strand. ls: how about this arrangement: empty = i don't care 0 = has strand but i dont know 1 = forward strand -1 = reverse strand 2 = both strands ad: could be organized by track (everything in a track has same strand. gh: don't think is good to structure a query so it's required that you do have strand. you might could have diff strand designation on same track. ls: you want to be able to distinguish things that are on both strands, things that are on either strand, but you don't know which. gh: biggest concern: given a range based query to server 1000-2000 means everything that overlaps, any strandedness within this range. ad: should support stranded searches. client can filter out opposed to do a strand request against seq to get the rev comp. client should be able to do this. gh: in range attrib of features, you can add colon to indicate strandedness. ad: yes gh: if no :strand does this mean unknown or don't care? ls: defaults to *, anything. you get fwd, rev, don't know, don't care. gh: required things on fwd strand to be :1, not make it a default. ad: ok. if not there, means ambiguous, unknown, or not appropriate. see email i sent. if you get rid of search for strand in region query, most of this issue goes away. gh: don't think people would use this often (stranded query) ad: you can make two queries to server instead of one. gh: this is a resolution for all range-related issues. ad: check my email to make sure it covers this. [A] everyone review andrew's email re: range queries and strand issues. gh: also or-ing of diff range-based queries is not useful for me. I mainly need intersects of overlaps and inside. or-ing is equivalent to using multiple queries. td: why do you need and overlaps and inside? gh: optimization on client side. keeps track of what it has received. wants to minimize re-fetching. td: can you just use overlap and not overlap? gh: that may be equivalent, but the way I do it, you can guarantee you never get the same feat twice with that combo. will require and-ing of two range-based queries. ad: modifying query lang, or-ing together two. include first range and include second range should use multiple query keys because of the comma. you will have to escape any comma if it's inside of query string. gh: don't like the implicit 'and' if different but 'or' if keys the same. it depends on the query. ad: now all queries are and-ed, but commas mean multiple. ls: comma syntax seems natural. the occasional query that had to have an escaped comma didn't cause any bother. td: this was as it is in das/1. exons and repeat. type=exon, type=repeat. so the suggestion is to use the das/1 behavior. ad: three independent segments gh: types as well. can have any number of types= and segment= all or-ed together. I still need anding of overlaps and inside. td: different key are or-ed, same keys are and-ed. ls: hoisted by my own petard here. works for me. gh: allen? aday: what's changed? ls: the whole query language has changed in a fundamental way. aday: dealing with multiple attributes with same name. fine. gh: will server accept full urls for types? aday: not now but will impl this. gh: all types should be full uri's now. my client can't deal but will soon. Topic: status reports --------------------- gh: state what what you hoped to accomplish and what you actually accomplished. gh: hoped to get igb das client up to date with spec, working with one das2 server, and get affy das2 server up and going. affy das2 server will take longer. maybe by next code sprint. igb is now using latest das2 spec, calling allen's server, and using registry as well. happy with results. not everything done, but some unexpected things (registry). wrote up progress report for grant: going out 3pm today (we got another day) a 2pg summary. will send out to everyone later. todo: get das2 server up. client: deal with full uri issue. this is a basic fuctionality of the client. smart handling of uris. ee: igb client. big thing is make it treat all data sources too all behave similar way das1/das2, quick load, separate files, regardless of the data format. want to make it all seamless. going well. sc: streamlined pipeline for populating das sever with affy exon array data. didn't get to pipeline for external data (UCSC tracks), but have basic framework in place. ad: decided to do more writeback at next sprint. when is next sprint? gh: march 13-17. lincoln will be in UK and can participate from there. ad: I'm in the states next week. will come to emeryville for next sprint. [A] next code sprint is 13-17 March. Mark your calendars. ad: hoped to work on spec, resolve detailed questions, make sure it works with people's needs. will work on incorporating latest ideas into spec. validator: have one but is not fit for public consumption. not at where it was last summer on the previous version of spec. ap: das interface for registry, can serve das1 and das2 sources w/ new source command. java client - not yet. registry: todo UI so users can upload to das registry. td: hoping to write server. got something up for feat, types, segments, need to run through andrew's validator. hope to work on writeback, but didn't happen (but good discussion on it). want to get more data included, ensembl database. roy has been working on zmap client, coming along fine. aday: primary goals: to support new version of spec -- not fully done uri problem in query parsing. apache config integration is done. installation and rpm for server - done for FC3 i386, available in the next couple of days (brian o'connor). general documentation improvement in code for server - not done. Next step: post, put, delete, writeback framework (originally planned this but may need to rethink), impl transaction logs (maybe in flux). adding more unit tests. ad: writeback spec won't happen for at least 2 weeks. need to write up what we've done on current spec first. ls: will be available from 14th on. at ensembl meeting up to the 13th. gh: allen come to emeryville? aday: maybe. gh: will have to explore how to fund hosting folks here for next codesprint. gh: speaking for nomi - she had apollo working for parsing features and displaying them. some issues with higher level integration into apollo. making good progress. gh: time to wrap it up. thanks for your hard work. [applause] [A] next teleconf will be on 20 Feb, 9:30 PST 5:30 UK (regular time) we're skipping 13 feb (next monday) given all our time this week. From dalke at dalkescientific.com Sat Feb 11 02:11:05 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sat, 11 Feb 2006 02:11:05 +0000 Subject: [DAS2] Re: question on properties In-Reply-To: <26b9a50a02deb2be55150c6a5a47d419@fruitfly.org> References: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com> <26b9a50a02deb2be55150c6a5a47d419@fruitfly.org> Message-ID: Suzi: > On the client side the appropriate behavior for these is different if > the data coming over from the server contains >1 prop element with > that tag. > > If the client sees "ssn" twice it winces and then either ignores or > overwrites with the 2nd value. Or it says "error, error, cannot compute" and stops. From one of the guidelines ("the zen") of Python: "when in doubt, refuse the temptation to guess." > If the client sees "comment" twice then it appends the additional > comment. > > Question: Is this kind of information included in the spec? Uniqueness > vs. cumulative Here's my thoughts. We have several points for client/server extensions. One is this property table, which is a set of key/value strings. Because they are strings you can use them for almost anything, with the correct interpretation by the client and server. That requires collusion between the two. This is the extension point which is most familiar to everyone. But it's open to the problem you pointed out. The other is this non-DAS extension XML, which lets the server add *anything*. If the client doesn't know what the field does it must ignore it. If it does writeback with that feature it must include the ignored element, and not make any changes. That means your server can add 123-45-1534 If the client doesn't know what to do, it ignores it. It will never change the field. If the client knows what that field does it must follow the constraints set down for it, else the server should stop with an error and not allow the update to occur. There are two downsides to this approach. There's no way for a dumb client to understand that field, so no user will ever see it, and there's no way to do a search on that field. (A server can extend the search syntax and tell the client about the new syntax, but a dumb client doesn't know about that.) If there is need to support the dumb client then the only way to support the data type constraints is in the server. It must check a given field and possibly stop with an error or resolve ambiguities. We can have that the server reports an error message that the client and/or user can use to figure out what's wrong. Thinking about it a bit, it's possible to combine these two. For example, a server can have then list as an extension All this latter XML does is flag sufficiently aware clients that the server implements the special SSN requirements. A dumb client can ignore the flag, users add a new SSN, and the server bails out, while the smart client early on knows that that isn't going to be allowed. This hybrid solution doesn't seem right to me though. I currently (and without any experience) prefer putting schema constrained fields in as extension elements. Think of the property table as something exposed to the user as a completely editable table, with no ability to limit what that person does. For the case of the SSN that might be overkill. For other things, like the current stage of a feature in the curational process, it's best to put that data there and not in the generic property table. There is a long history of using generic key/value tables as an ad-hoc way to extend a protocol. I'm trying to improve on that by defining a way for a server to add well-structure, schema-dependent and searchable data (for smart clients) without needing to piggy back on a bunch of strings. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Feb 20 15:31:42 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 20 Feb 2006 08:31:42 -0700 Subject: [DAS2] today's conf. call and President's Day Message-ID: Today is President's Day in the US. Are the other US people working today? Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Mon Feb 20 16:47:13 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 20 Feb 2006 08:47:13 -0800 Subject: [DAS2] today's conf. call and President's Day Message-ID: It's a day off for Affymetrix, but I'm working anyway. Unless there are major objections I'd like to go ahead and do the conference call at the standard time (9:30 AM Pacific time). There may be a few less people joining in from the US. thanks, gregg > -----Original Message----- > From: das2-bounces at portal.open-bio.org [mailto:das2-bounces at portal.open- > bio.org] On Behalf Of Andrew Dalke > Sent: Monday, February 20, 2006 7:32 AM > To: DAS/2 > Subject: [DAS2] today's conf. call and President's Day > > Today is President's Day in the US. > > Are the other US people working today? > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 From lstein at cshl.edu Mon Feb 20 17:37:06 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 20 Feb 2006 12:37:06 -0500 Subject: [DAS2] today's conf. call and President's Day In-Reply-To: References: Message-ID: <200602201237.06497.lstein@cshl.edu> Hi, I've dialed in and all I"m getting is hold music. Could you confirm this info? 800 531-3250 287-9055 Thanks! Lincoln On Monday 20 February 2006 11:47, Helt,Gregg wrote: > It's a day off for Affymetrix, but I'm working anyway. Unless there are > major objections I'd like to go ahead and do the conference call at the > standard time (9:30 AM Pacific time). There may be a few less people > joining in from the US. > > thanks, > gregg > > > -----Original Message----- > > From: das2-bounces at portal.open-bio.org > > [mailto:das2-bounces at portal.open- > > > bio.org] On Behalf Of Andrew Dalke > > Sent: Monday, February 20, 2006 7:32 AM > > To: DAS/2 > > Subject: [DAS2] today's conf. call and President's Day > > > > Today is President's Day in the US. > > > > Are the other US people working today? > > > > Andrew > > dalke at dalkescientific.com > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at portal.open-bio.org > > http://portal.open-bio.org/mailman/listinfo/das2 > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From lstein at cshl.edu Mon Feb 20 16:50:38 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 20 Feb 2006 11:50:38 -0500 Subject: [DAS2] today's conf. call and President's Day In-Reply-To: References: Message-ID: <200602201150.38431.lstein@cshl.edu> I am working today! Lincoln On Monday 20 February 2006 10:31, Andrew Dalke wrote: > Today is President's Day in the US. > > Are the other US people working today? > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at portal.open-bio.org > http://portal.open-bio.org/mailman/listinfo/das2 -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) From dalke at dalkescientific.com Mon Feb 20 17:28:56 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 20 Feb 2006 10:28:56 -0700 Subject: [DAS2] today's conf. call and President's Day In-Reply-To: References: Message-ID: Thomas Down wrote: > Well, I can't speak for US people, but I do know that Andreas Prlic is > on holiday today and I presume won't be joining the conference call. > I can join if there's anything that needs discussing urgently, but > otherwise I'd be happy to leave it 'til next week. Status update for me: Last week was a break for me from the sprint - I was winded. I worked a bit here and there on how to do a GUI interface for the validation. I hope to get a demo page of the results up within a day or so. This week I'll be working on that and a new draft of the spec. Also, I'm now back home in Santa Fe, where we haven't had rain nor snow for 100 days - my cacti are drooping! :( Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Feb 27 14:50:10 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 27 Feb 2006 08:50:10 -0600 Subject: [DAS2] will miss today's conf. call Message-ID: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com> Hi all, Not only am I on the road back from the Python conference but my cell phone battery is nearing dead so I won't be able to make it to today's phone conference call. Here's my status. I've been working on the validator, to the detriment of the next spec rewrite. This validator does single-document checks. That is, it does not do internal integrity checks to make sure that the results of, say, a range query only returns features in that range, or that the features are in the range given by the segments. I plugged the results into a web server running on my laptop. It's using some new Python libraries which are not yet installed on the OBF machine, but which I can install after I get back to Santa Fe. The GUI is similar to what I threw together at Sanger during the Sprint - enter a URL and a document type, view the results. What took long is the code to pin down where the errors happened, for example, to show which attribute was the extra attribute in an element. I've attached sample output for your viewing pleasure. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- There is enough there for a Javascript jockey to make an neat little interactive viewer, eg, click on the error message to be shown where it occurs in the document. Also, the marker I'm using to show where the error occurs in the body of the text needs work - the method I use isn't that cross platform portable. I think the next steps for me are: - get the validator working as-is on the OBF web site (should be on-line by tomorrow) - get back to writing the 3rd draft of the spec. Andrew dalke at dalkescientific.com From ap3 at sanger.ac.uk Mon Feb 27 17:41:08 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon, 27 Feb 2006 17:41:08 +0000 Subject: [DAS2] will miss today's conf. call In-Reply-To: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com> References: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com> Message-ID: <197aeffa03988a8fc098f27926ee511d@sanger.ac.uk> any conference call today? - listening to the hold music Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From nomi at fruitfly.org Mon Feb 27 17:43:00 2006 From: nomi at fruitfly.org (Nomi Harris) Date: Mon, 27 Feb 2006 09:43:00 -0800 Subject: [DAS2] will miss today's conf. call In-Reply-To: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com> References: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com> Message-ID: <17411.14884.410370.608675@spongecake.lbl.gov> are we having a teleconference today? i got bored of waiting on hold for the moderator. someone email me if it's happening. the validator sounds useful! Nomi From boconnor at ucla.edu Tue Feb 28 00:46:02 2006 From: boconnor at ucla.edu (Brian O'Connor) Date: Mon, 27 Feb 2006 16:46:02 -0800 Subject: [DAS2] DAS2 Reference Server @ UCLA Message-ID: <44039D4A.5000503@ucla.edu> Hi, If anyone is using the DAS/2 server at UCLA (das.biopackages.net) there will be some maintenance on the server later today (after 5pm Pacific). This won't affect the DAS/2 codebase, I'm just moving around some of our other production websites and there will be some downtime. The outage should just last a few minutes. --Brian