From ap3 at sanger.ac.uk  Wed Feb  1 07:42:16 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Wed, 1 Feb 2006 12:42:16 +0000
Subject: [DAS2] code sprint final infos
Message-ID: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk>

Hi!

This is to provide final organisatorial infos about the DAS 2 code 
sprint next week.

- We start Monday 10:00 (Sanger time) in the Morgan building -
   meeting point is the small meeting room next to the kitchen 1st floor 
(we get a better room later).

- The sanger guest wireless network supports Skype. so instant 
messaging and voice over IP calls
will be possible during all the time.

- every day at 17:00 (Sanger time = 9:00 pacific time) there will be a 
conference call on the usual DAS2 line

Greetings,
Andreas


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From allenday at ucla.edu  Wed Feb  1 17:42:26 2006
From: allenday at ucla.edu (Allen Day)
Date: Wed, 1 Feb 2006 14:42:26 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0602011439140.1651@sumo.ctrl.ucla.edu>

I just looked over your changes, and will begin making the changes to the
server repository today.

I'd like to update the server at das.biopackages.net with my changes on
Friday, unless there are objections.

I'll be taking notes along the way and will post to the list if anything
in your document is unclear to me.

At first glance, I agree -- the changes are minor.

-Allen


On Mon, 30 Jan 2006, Andrew Dalke wrote:

> Allen:
> > Is the spec going to be in a stable state for the code sprint?  I'd 
> > like
> > to use this time to sync the server implementation with a stable 
> > version
> > of the spec.  It looks like there have been many substantial changes.
> 
> I have just (within the last few minutes) completed the first draft
> of the update of the spec.
> 
> It's not in HTML - that calls for too much work for this stage.
> It's text, in CVS under das/das2/new_spec.txt
> 
> There are many parts which need clarification.  These are marked
> with a "XXX" along with my comments.
> 
> The RNC files are in
> 
>    das/das2/scratch/*.rnc
> along with some test XML files.  These XML files are not meant
> to be realistic.  They are meant more to check edge cases.
> 
> I do no think there are major changes to the spec.  Most of the
> changes have actually trimmed things down, like getting rid of
> the "properties" subtree and merging the different "sources" requests
> into a single document.
> 
> 
> Here are the major interfaces
> 
> $PREFIX/sequence - a "sources" request
>    This is the top-level entry point to a DAS 2 server.  It returns a
>    list of the available genomic sequence and their versions.
>    [sequence-namespace]
> 
> $PREFIX/sequence/$SOURCE - a "source" request
>    Returns the available versions of the given genomic sequence.
> 
> $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
>    Returns information about a given version of a genomic sequence.
>    Clients may assume that the sequence and assembly are constant for a
>    given version of a source. Note that annotation data on a server
>    with curational write-back support may change without changing the
>    version.
> 
> 
> For a given version here are the sub-parts.  Note that I've gone ahead
> and split the query urls (segment, features and types each have query
> interfaces) from the base directory used as containers for the segments,
> features and types.
> 
>   $VERSION/segments - the segments query URL; summarizes the top-level
>      segments in the data source
> 
>   $VERSION/segment/$SEGMENT_ID - a segment query; used to get detailed
>      information about the identified segment
> 
>   $VERSION/features - the feature filter query URL.  Features are
>     locatable annotations or experimental results.  The feature filter
>     URL supports query parameters to select a subset of the features
>     based on position, feature type and other properties.
> 
>   $VERSION/feature/$FEATURE_ID - a feature query; used to get detailed
>      information about the identified feature
> 
>   $VERSION/types - the types query URL which returns a list of all
>     feature types.  Feature types include ontology and depiction
>     details for all features of the given type.
> 
>   $VERSION/type/$TYPE_ID - details about the specified feature type
> 
> Oh, and there are internal conflicts which will be straightened
> out in the next draft.  These shouldn't be big.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From Gregg_Helt at affymetrix.com  Wed Feb  1 18:14:30 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 1 Feb 2006 15:14:30 -0800
Subject: [DAS2] Re: Apollo and DAS/2 priorities
Message-ID: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>


That would be great if you could update the biopackages server before
the code sprint starts!  Then client implementers will have a server to
test with.

	thanks,
	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Allen Day
> Sent: Wednesday, February 01, 2006 2:42 PM
> To: Andrew Dalke
> Cc: DAS/2
> Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> 
> I just looked over your changes, and will begin making the changes to
the
> server repository today.
> 
> I'd like to update the server at das.biopackages.net with my changes
on
> Friday, unless there are objections.
> 
> I'll be taking notes along the way and will post to the list if
anything
> in your document is unclear to me.
> 
> At first glance, I agree -- the changes are minor.
> 
> -Allen
> 
> 
> On Mon, 30 Jan 2006, Andrew Dalke wrote:
> 
> > Allen:
> > > Is the spec going to be in a stable state for the code sprint?
I'd
> > > like
> > > to use this time to sync the server implementation with a stable
> > > version
> > > of the spec.  It looks like there have been many substantial
changes.
> >
> > I have just (within the last few minutes) completed the first draft
> > of the update of the spec.
> >
> > It's not in HTML - that calls for too much work for this stage.
> > It's text, in CVS under das/das2/new_spec.txt
> >
> > There are many parts which need clarification.  These are marked
> > with a "XXX" along with my comments.
> >
> > The RNC files are in
> >
> >    das/das2/scratch/*.rnc
> > along with some test XML files.  These XML files are not meant
> > to be realistic.  They are meant more to check edge cases.
> >
> > I do no think there are major changes to the spec.  Most of the
> > changes have actually trimmed things down, like getting rid of
> > the "properties" subtree and merging the different "sources"
requests
> > into a single document.
> >
> >
> > Here are the major interfaces
> >
> > $PREFIX/sequence - a "sources" request
> >    This is the top-level entry point to a DAS 2 server.  It returns
a
> >    list of the available genomic sequence and their versions.
> >    [sequence-namespace]
> >
> > $PREFIX/sequence/$SOURCE - a "source" request
> >    Returns the available versions of the given genomic sequence.
> >
> > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> >    Returns information about a given version of a genomic sequence.
> >    Clients may assume that the sequence and assembly are constant
for a
> >    given version of a source. Note that annotation data on a server
> >    with curational write-back support may change without changing
the
> >    version.
> >
> >
> > For a given version here are the sub-parts.  Note that I've gone
ahead
> > and split the query urls (segment, features and types each have
query
> > interfaces) from the base directory used as containers for the
segments,
> > features and types.
> >
> >   $VERSION/segments - the segments query URL; summarizes the
top-level
> >      segments in the data source
> >
> >   $VERSION/segment/$SEGMENT_ID - a segment query; used to get
detailed
> >      information about the identified segment
> >
> >   $VERSION/features - the feature filter query URL.  Features are
> >     locatable annotations or experimental results.  The feature
filter
> >     URL supports query parameters to select a subset of the features
> >     based on position, feature type and other properties.
> >
> >   $VERSION/feature/$FEATURE_ID - a feature query; used to get
detailed
> >      information about the identified feature
> >
> >   $VERSION/types - the types query URL which returns a list of all
> >     feature types.  Feature types include ontology and depiction
> >     details for all features of the given type.
> >
> >   $VERSION/type/$TYPE_ID - details about the specified feature type
> >
> > Oh, and there are internal conflicts which will be straightened
> > out in the next draft.  These shouldn't be big.
> >
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> >
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From allenday at ucla.edu  Wed Feb  1 18:27:11 2006
From: allenday at ucla.edu (Allen Day)
Date: Wed, 1 Feb 2006 15:27:11 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
Message-ID: <Pine.LNX.4.58.0602011526030.1651@sumo.ctrl.ucla.edu>

That's what I was thinking too, but I was worried about the existing 
Genoviz clients "in the wild" having the server suddenly break.

So you're saying it's okay with you if those clients have a service
interruption?

-Allen


On Wed, 1 Feb 2006, Helt,Gregg wrote:

> 
> That would be great if you could update the biopackages server before
> the code sprint starts!  Then client implementers will have a server to
> test with.
> 
> 	thanks,
> 	gregg
> 
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
> [mailto:das2-bounces at portal.open-
> > bio.org] On Behalf Of Allen Day
> > Sent: Wednesday, February 01, 2006 2:42 PM
> > To: Andrew Dalke
> > Cc: DAS/2
> > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > 
> > I just looked over your changes, and will begin making the changes to
> the
> > server repository today.
> > 
> > I'd like to update the server at das.biopackages.net with my changes
> on
> > Friday, unless there are objections.
> > 
> > I'll be taking notes along the way and will post to the list if
> anything
> > in your document is unclear to me.
> > 
> > At first glance, I agree -- the changes are minor.
> > 
> > -Allen
> > 
> > 
> > On Mon, 30 Jan 2006, Andrew Dalke wrote:
> > 
> > > Allen:
> > > > Is the spec going to be in a stable state for the code sprint?
> I'd
> > > > like
> > > > to use this time to sync the server implementation with a stable
> > > > version
> > > > of the spec.  It looks like there have been many substantial
> changes.
> > >
> > > I have just (within the last few minutes) completed the first draft
> > > of the update of the spec.
> > >
> > > It's not in HTML - that calls for too much work for this stage.
> > > It's text, in CVS under das/das2/new_spec.txt
> > >
> > > There are many parts which need clarification.  These are marked
> > > with a "XXX" along with my comments.
> > >
> > > The RNC files are in
> > >
> > >    das/das2/scratch/*.rnc
> > > along with some test XML files.  These XML files are not meant
> > > to be realistic.  They are meant more to check edge cases.
> > >
> > > I do no think there are major changes to the spec.  Most of the
> > > changes have actually trimmed things down, like getting rid of
> > > the "properties" subtree and merging the different "sources"
> requests
> > > into a single document.
> > >
> > >
> > > Here are the major interfaces
> > >
> > > $PREFIX/sequence - a "sources" request
> > >    This is the top-level entry point to a DAS 2 server.  It returns
> a
> > >    list of the available genomic sequence and their versions.
> > >    [sequence-namespace]
> > >
> > > $PREFIX/sequence/$SOURCE - a "source" request
> > >    Returns the available versions of the given genomic sequence.
> > >
> > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> > >    Returns information about a given version of a genomic sequence.
> > >    Clients may assume that the sequence and assembly are constant
> for a
> > >    given version of a source. Note that annotation data on a server
> > >    with curational write-back support may change without changing
> the
> > >    version.
> > >
> > >
> > > For a given version here are the sub-parts.  Note that I've gone
> ahead
> > > and split the query urls (segment, features and types each have
> query
> > > interfaces) from the base directory used as containers for the
> segments,
> > > features and types.
> > >
> > >   $VERSION/segments - the segments query URL; summarizes the
> top-level
> > >      segments in the data source
> > >
> > >   $VERSION/segment/$SEGMENT_ID - a segment query; used to get
> detailed
> > >      information about the identified segment
> > >
> > >   $VERSION/features - the feature filter query URL.  Features are
> > >     locatable annotations or experimental results.  The feature
> filter
> > >     URL supports query parameters to select a subset of the features
> > >     based on position, feature type and other properties.
> > >
> > >   $VERSION/feature/$FEATURE_ID - a feature query; used to get
> detailed
> > >      information about the identified feature
> > >
> > >   $VERSION/types - the types query URL which returns a list of all
> > >     feature types.  Feature types include ontology and depiction
> > >     details for all features of the given type.
> > >
> > >   $VERSION/type/$TYPE_ID - details about the specified feature type
> > >
> > > Oh, and there are internal conflicts which will be straightened
> > > out in the next draft.  These shouldn't be big.
> > >
> > > 					Andrew
> > > 					dalke at dalkescientific.com
> > >
> > > _______________________________________________
> > > DAS2 mailing list
> > > DAS2 at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/das2
> > >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> 


From allenday at ucla.edu  Wed Feb  1 18:30:22 2006
From: allenday at ucla.edu (Allen Day)
Date: Wed, 1 Feb 2006 15:30:22 -0800 (PST)
Subject: [DAS2] code sprint final infos
In-Reply-To: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk>
References: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk>
Message-ID: <Pine.LNX.4.58.0602011527310.1651@sumo.ctrl.ucla.edu>

What IM service are we using, and where can we collate all user IDs?  
Perhaps it would be better to meet up in an IRC channel.

I propose gathering in #codesprint on EFnet.

-Allen

On Wed, 1 Feb 2006, Andreas Prlic wrote:

> Hi!
> 
> This is to provide final organisatorial infos about the DAS 2 code 
> sprint next week.
> 
> - We start Monday 10:00 (Sanger time) in the Morgan building -
>    meeting point is the small meeting room next to the kitchen 1st floor 
> (we get a better room later).
> 
> - The sanger guest wireless network supports Skype. so instant 
> messaging and voice over IP calls
> will be possible during all the time.
> 
> - every day at 17:00 (Sanger time = 9:00 pacific time) there will be a 
> conference call on the usual DAS2 line
> 
> Greetings,
> Andreas
> 
> 
> 
> 
> -----------------------------------------------------------------------
> 
> Andreas Prlic      Wellcome Trust Sanger Institute
>                                Hinxton, Cambridge CB10 1SA, UK
> 			 +44 (0) 1223 49 6891
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From nomi at fruitfly.org  Wed Feb  1 19:37:44 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Wed, 1 Feb 2006 16:37:44 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
Message-ID: <17377.21592.854840.243376@kinked.lbl.gov>

On 1 February 2006, Helt,Gregg wrote:
 > That would be great if you could update the biopackages server before
 > the code sprint starts!  Then client implementers will have a server to
 > test with.

yes!!

On 1 February 2006, Allen Day wrote:
 > That's what I was thinking too, but I was worried about the existing 
 > Genoviz clients "in the wild" having the server suddenly break.

are there really a lot of users (as opposed to das developers) who are
using the biopackages server?

On 1 February 2006, Allen Day wrote:
 > What IM service are we using, and where can we collate all user IDs?  
 > Perhaps it would be better to meet up in an IRC channel.
 > 
 > I propose gathering in #codesprint on EFnet.

i need details on this as well.  i've never bothered registering for an
IM service or IRC channel.

   Nomi


From ed_erwin at affymetrix.com  Wed Feb  1 18:44:35 2006
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Wed, 01 Feb 2006 15:44:35 -0800
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <Pine.LNX.4.58.0602011526030.1651@sumo.ctrl.ucla.edu>
References: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
	<Pine.LNX.4.58.0602011526030.1651@sumo.ctrl.ucla.edu>
Message-ID: <43E147E3.1030705@affymetrix.com>


Gregg asked me to say "No".  Please do not break the current server that 
IGB is using.

Please make your changes on a server at a different URL.

Thanks
Ed

Allen Day wrote:
> That's what I was thinking too, but I was worried about the existing 
> Genoviz clients "in the wild" having the server suddenly break.
> 
> So you're saying it's okay with you if those clients have a service
> interruption?
> 
> -Allen
> 
> 
> On Wed, 1 Feb 2006, Helt,Gregg wrote:
> 
> 
>>That would be great if you could update the biopackages server before
>>the code sprint starts!  Then client implementers will have a server to
>>test with.
>>
>>	thanks,
>>	gregg
>>
>>
>>>-----Original Message-----
>>>From: das2-bounces at portal.open-bio.org
>>
>>[mailto:das2-bounces at portal.open-
>>
>>>bio.org] On Behalf Of Allen Day
>>>Sent: Wednesday, February 01, 2006 2:42 PM
>>>To: Andrew Dalke
>>>Cc: DAS/2
>>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
>>>
>>>I just looked over your changes, and will begin making the changes to
>>
>>the
>>
>>>server repository today.
>>>
>>>I'd like to update the server at das.biopackages.net with my changes
>>
>>on
>>
>>>Friday, unless there are objections.
>>>
>>>I'll be taking notes along the way and will post to the list if
>>
>>anything
>>
>>>in your document is unclear to me.
>>>
>>>At first glance, I agree -- the changes are minor.
>>>
>>>-Allen
>>>
>>>
>>>On Mon, 30 Jan 2006, Andrew Dalke wrote:
>>>
>>>
>>>>Allen:
>>>>
>>>>>Is the spec going to be in a stable state for the code sprint?
>>
>>I'd
>>
>>>>>like
>>>>>to use this time to sync the server implementation with a stable
>>>>>version
>>>>>of the spec.  It looks like there have been many substantial
>>
>>changes.
>>
>>>>I have just (within the last few minutes) completed the first draft
>>>>of the update of the spec.
>>>>
>>>>It's not in HTML - that calls for too much work for this stage.
>>>>It's text, in CVS under das/das2/new_spec.txt
>>>>
>>>>There are many parts which need clarification.  These are marked
>>>>with a "XXX" along with my comments.
>>>>
>>>>The RNC files are in
>>>>
>>>>   das/das2/scratch/*.rnc
>>>>along with some test XML files.  These XML files are not meant
>>>>to be realistic.  They are meant more to check edge cases.
>>>>
>>>>I do no think there are major changes to the spec.  Most of the
>>>>changes have actually trimmed things down, like getting rid of
>>>>the "properties" subtree and merging the different "sources"
>>
>>requests
>>
>>>>into a single document.
>>>>
>>>>
>>>>Here are the major interfaces
>>>>
>>>>$PREFIX/sequence - a "sources" request
>>>>   This is the top-level entry point to a DAS 2 server.  It returns
>>
>>a
>>
>>>>   list of the available genomic sequence and their versions.
>>>>   [sequence-namespace]
>>>>
>>>>$PREFIX/sequence/$SOURCE - a "source" request
>>>>   Returns the available versions of the given genomic sequence.
>>>>
>>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
>>>>   Returns information about a given version of a genomic sequence.
>>>>   Clients may assume that the sequence and assembly are constant
>>
>>for a
>>
>>>>   given version of a source. Note that annotation data on a server
>>>>   with curational write-back support may change without changing
>>
>>the
>>
>>>>   version.
>>>>
>>>>
>>>>For a given version here are the sub-parts.  Note that I've gone
>>
>>ahead
>>
>>>>and split the query urls (segment, features and types each have
>>
>>query
>>
>>>>interfaces) from the base directory used as containers for the
>>
>>segments,
>>
>>>>features and types.
>>>>
>>>>  $VERSION/segments - the segments query URL; summarizes the
>>
>>top-level
>>
>>>>     segments in the data source
>>>>
>>>>  $VERSION/segment/$SEGMENT_ID - a segment query; used to get
>>
>>detailed
>>
>>>>     information about the identified segment
>>>>
>>>>  $VERSION/features - the feature filter query URL.  Features are
>>>>    locatable annotations or experimental results.  The feature
>>
>>filter
>>
>>>>    URL supports query parameters to select a subset of the features
>>>>    based on position, feature type and other properties.
>>>>
>>>>  $VERSION/feature/$FEATURE_ID - a feature query; used to get
>>
>>detailed
>>
>>>>     information about the identified feature
>>>>
>>>>  $VERSION/types - the types query URL which returns a list of all
>>>>    feature types.  Feature types include ontology and depiction
>>>>    details for all features of the given type.
>>>>
>>>>  $VERSION/type/$TYPE_ID - details about the specified feature type
>>>>
>>>>Oh, and there are internal conflicts which will be straightened
>>>>out in the next draft.  These shouldn't be big.
>>>>
>>>>					Andrew
>>>>					dalke at dalkescientific.com
>>>>
>>>>_______________________________________________
>>>>DAS2 mailing list
>>>>DAS2 at portal.open-bio.org
>>>>http://portal.open-bio.org/mailman/listinfo/das2
>>>>
>>>
>>>_______________________________________________
>>>DAS2 mailing list
>>>DAS2 at portal.open-bio.org
>>>http://portal.open-bio.org/mailman/listinfo/das2
>>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Wed Feb  1 18:51:23 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 1 Feb 2006 15:51:23 -0800
Subject: [DAS2] Re: Apollo and DAS/2 priorities
Message-ID: <C71929195D04BF48BAECD499AF717B480198C994@msex02.affymetrix.com>

Yes, what Ed said, that's what I meant.  Updated server, but at a
different address.  Otherwise the current release of IGB will break when
trying to use the biopackages server.

Once our IGB code has caught up to the updated server, we can roll out a
new release to point to the new server instead of the old one.  But not
yet.

	Thanks,
	Gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Ed Erwin
> Sent: Wednesday, February 01, 2006 3:45 PM
> To: Allen Day
> Cc: DAS/2
> Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> 
> 
> Gregg asked me to say "No".  Please do not break the current server
that
> IGB is using.
> 
> Please make your changes on a server at a different URL.
> 
> Thanks
> Ed
> 
> Allen Day wrote:
> > That's what I was thinking too, but I was worried about the existing
> > Genoviz clients "in the wild" having the server suddenly break.
> >
> > So you're saying it's okay with you if those clients have a service
> > interruption?
> >
> > -Allen
> >
> >
> > On Wed, 1 Feb 2006, Helt,Gregg wrote:
> >
> >
> >>That would be great if you could update the biopackages server
before
> >>the code sprint starts!  Then client implementers will have a server
to
> >>test with.
> >>
> >>	thanks,
> >>	gregg
> >>
> >>
> >>>-----Original Message-----
> >>>From: das2-bounces at portal.open-bio.org
> >>
> >>[mailto:das2-bounces at portal.open-
> >>
> >>>bio.org] On Behalf Of Allen Day
> >>>Sent: Wednesday, February 01, 2006 2:42 PM
> >>>To: Andrew Dalke
> >>>Cc: DAS/2
> >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> >>>
> >>>I just looked over your changes, and will begin making the changes
to
> >>
> >>the
> >>
> >>>server repository today.
> >>>
> >>>I'd like to update the server at das.biopackages.net with my
changes
> >>
> >>on
> >>
> >>>Friday, unless there are objections.
> >>>
> >>>I'll be taking notes along the way and will post to the list if
> >>
> >>anything
> >>
> >>>in your document is unclear to me.
> >>>
> >>>At first glance, I agree -- the changes are minor.
> >>>
> >>>-Allen
> >>>
> >>>
> >>>On Mon, 30 Jan 2006, Andrew Dalke wrote:
> >>>
> >>>
> >>>>Allen:
> >>>>
> >>>>>Is the spec going to be in a stable state for the code sprint?
> >>
> >>I'd
> >>
> >>>>>like
> >>>>>to use this time to sync the server implementation with a stable
> >>>>>version
> >>>>>of the spec.  It looks like there have been many substantial
> >>
> >>changes.
> >>
> >>>>I have just (within the last few minutes) completed the first
draft
> >>>>of the update of the spec.
> >>>>
> >>>>It's not in HTML - that calls for too much work for this stage.
> >>>>It's text, in CVS under das/das2/new_spec.txt
> >>>>
> >>>>There are many parts which need clarification.  These are marked
> >>>>with a "XXX" along with my comments.
> >>>>
> >>>>The RNC files are in
> >>>>
> >>>>   das/das2/scratch/*.rnc
> >>>>along with some test XML files.  These XML files are not meant
> >>>>to be realistic.  They are meant more to check edge cases.
> >>>>
> >>>>I do no think there are major changes to the spec.  Most of the
> >>>>changes have actually trimmed things down, like getting rid of
> >>>>the "properties" subtree and merging the different "sources"
> >>
> >>requests
> >>
> >>>>into a single document.
> >>>>
> >>>>
> >>>>Here are the major interfaces
> >>>>
> >>>>$PREFIX/sequence - a "sources" request
> >>>>   This is the top-level entry point to a DAS 2 server.  It
returns
> >>
> >>a
> >>
> >>>>   list of the available genomic sequence and their versions.
> >>>>   [sequence-namespace]
> >>>>
> >>>>$PREFIX/sequence/$SOURCE - a "source" request
> >>>>   Returns the available versions of the given genomic sequence.
> >>>>
> >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> >>>>   Returns information about a given version of a genomic
sequence.
> >>>>   Clients may assume that the sequence and assembly are constant
> >>
> >>for a
> >>
> >>>>   given version of a source. Note that annotation data on a
server
> >>>>   with curational write-back support may change without changing
> >>
> >>the
> >>
> >>>>   version.
> >>>>
> >>>>
> >>>>For a given version here are the sub-parts.  Note that I've gone
> >>
> >>ahead
> >>
> >>>>and split the query urls (segment, features and types each have
> >>
> >>query
> >>
> >>>>interfaces) from the base directory used as containers for the
> >>
> >>segments,
> >>
> >>>>features and types.
> >>>>
> >>>>  $VERSION/segments - the segments query URL; summarizes the
> >>
> >>top-level
> >>
> >>>>     segments in the data source
> >>>>
> >>>>  $VERSION/segment/$SEGMENT_ID - a segment query; used to get
> >>
> >>detailed
> >>
> >>>>     information about the identified segment
> >>>>
> >>>>  $VERSION/features - the feature filter query URL.  Features are
> >>>>    locatable annotations or experimental results.  The feature
> >>
> >>filter
> >>
> >>>>    URL supports query parameters to select a subset of the
features
> >>>>    based on position, feature type and other properties.
> >>>>
> >>>>  $VERSION/feature/$FEATURE_ID - a feature query; used to get
> >>
> >>detailed
> >>
> >>>>     information about the identified feature
> >>>>
> >>>>  $VERSION/types - the types query URL which returns a list of all
> >>>>    feature types.  Feature types include ontology and depiction
> >>>>    details for all features of the given type.
> >>>>
> >>>>  $VERSION/type/$TYPE_ID - details about the specified feature
type
> >>>>
> >>>>Oh, and there are internal conflicts which will be straightened
> >>>>out in the next draft.  These shouldn't be big.
> >>>>
> >>>>					Andrew
> >>>>					dalke at dalkescientific.com
> >>>>
> >>>>_______________________________________________
> >>>>DAS2 mailing list
> >>>>DAS2 at portal.open-bio.org
> >>>>http://portal.open-bio.org/mailman/listinfo/das2
> >>>>
> >>>
> >>>_______________________________________________
> >>>DAS2 mailing list
> >>>DAS2 at portal.open-bio.org
> >>>http://portal.open-bio.org/mailman/listinfo/das2
> >>
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From allenday at ucla.edu  Wed Feb  1 19:07:54 2006
From: allenday at ucla.edu (Allen Day)
Date: Wed, 1 Feb 2006 16:07:54 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C994@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C994@msex02.affymetrix.com>
Message-ID: <Pine.LNX.4.58.0602011600150.1651@sumo.ctrl.ucla.edu>

Okay, I will tag the current server and leave it at:

http://das.biopackages.net/das

I saw in the most recent commits by Andrew that the root-level "/das" is
no longer needed, so I propose putting an updated server at:

http://das.biopackages.net/codesprint

If we're going to keep the current server in a "maintained but deprecated"  
mode like this, I'll be making changes to the "new" server before Friday.

When the new version of IGB comes out we can then upgrade the current
server.

Sound good?

-Allen


On Wed, 1 Feb 2006, Helt,Gregg wrote:

> Yes, what Ed said, that's what I meant.  Updated server, but at a
> different address.  Otherwise the current release of IGB will break when
> trying to use the biopackages server.
> 
> Once our IGB code has caught up to the updated server, we can roll out a
> new release to point to the new server instead of the old one.  But not
> yet.
> 
> 	Thanks,
> 	Gregg
> 
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
> [mailto:das2-bounces at portal.open-
> > bio.org] On Behalf Of Ed Erwin
> > Sent: Wednesday, February 01, 2006 3:45 PM
> > To: Allen Day
> > Cc: DAS/2
> > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > 
> > 
> > Gregg asked me to say "No".  Please do not break the current server
> that
> > IGB is using.
> > 
> > Please make your changes on a server at a different URL.
> > 
> > Thanks
> > Ed
> > 
> > Allen Day wrote:
> > > That's what I was thinking too, but I was worried about the existing
> > > Genoviz clients "in the wild" having the server suddenly break.
> > >
> > > So you're saying it's okay with you if those clients have a service
> > > interruption?
> > >
> > > -Allen
> > >
> > >
> > > On Wed, 1 Feb 2006, Helt,Gregg wrote:
> > >
> > >
> > >>That would be great if you could update the biopackages server
> before
> > >>the code sprint starts!  Then client implementers will have a server
> to
> > >>test with.
> > >>
> > >>	thanks,
> > >>	gregg
> > >>
> > >>
> > >>>-----Original Message-----
> > >>>From: das2-bounces at portal.open-bio.org
> > >>
> > >>[mailto:das2-bounces at portal.open-
> > >>
> > >>>bio.org] On Behalf Of Allen Day
> > >>>Sent: Wednesday, February 01, 2006 2:42 PM
> > >>>To: Andrew Dalke
> > >>>Cc: DAS/2
> > >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > >>>
> > >>>I just looked over your changes, and will begin making the changes
> to
> > >>
> > >>the
> > >>
> > >>>server repository today.
> > >>>
> > >>>I'd like to update the server at das.biopackages.net with my
> changes
> > >>
> > >>on
> > >>
> > >>>Friday, unless there are objections.
> > >>>
> > >>>I'll be taking notes along the way and will post to the list if
> > >>
> > >>anything
> > >>
> > >>>in your document is unclear to me.
> > >>>
> > >>>At first glance, I agree -- the changes are minor.
> > >>>
> > >>>-Allen
> > >>>
> > >>>
> > >>>On Mon, 30 Jan 2006, Andrew Dalke wrote:
> > >>>
> > >>>
> > >>>>Allen:
> > >>>>
> > >>>>>Is the spec going to be in a stable state for the code sprint?
> > >>
> > >>I'd
> > >>
> > >>>>>like
> > >>>>>to use this time to sync the server implementation with a stable
> > >>>>>version
> > >>>>>of the spec.  It looks like there have been many substantial
> > >>
> > >>changes.
> > >>
> > >>>>I have just (within the last few minutes) completed the first
> draft
> > >>>>of the update of the spec.
> > >>>>
> > >>>>It's not in HTML - that calls for too much work for this stage.
> > >>>>It's text, in CVS under das/das2/new_spec.txt
> > >>>>
> > >>>>There are many parts which need clarification.  These are marked
> > >>>>with a "XXX" along with my comments.
> > >>>>
> > >>>>The RNC files are in
> > >>>>
> > >>>>   das/das2/scratch/*.rnc
> > >>>>along with some test XML files.  These XML files are not meant
> > >>>>to be realistic.  They are meant more to check edge cases.
> > >>>>
> > >>>>I do no think there are major changes to the spec.  Most of the
> > >>>>changes have actually trimmed things down, like getting rid of
> > >>>>the "properties" subtree and merging the different "sources"
> > >>
> > >>requests
> > >>
> > >>>>into a single document.
> > >>>>
> > >>>>
> > >>>>Here are the major interfaces
> > >>>>
> > >>>>$PREFIX/sequence - a "sources" request
> > >>>>   This is the top-level entry point to a DAS 2 server.  It
> returns
> > >>
> > >>a
> > >>
> > >>>>   list of the available genomic sequence and their versions.
> > >>>>   [sequence-namespace]
> > >>>>
> > >>>>$PREFIX/sequence/$SOURCE - a "source" request
> > >>>>   Returns the available versions of the given genomic sequence.
> > >>>>
> > >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> > >>>>   Returns information about a given version of a genomic
> sequence.
> > >>>>   Clients may assume that the sequence and assembly are constant
> > >>
> > >>for a
> > >>
> > >>>>   given version of a source. Note that annotation data on a
> server
> > >>>>   with curational write-back support may change without changing
> > >>
> > >>the
> > >>
> > >>>>   version.
> > >>>>
> > >>>>
> > >>>>For a given version here are the sub-parts.  Note that I've gone
> > >>
> > >>ahead
> > >>
> > >>>>and split the query urls (segment, features and types each have
> > >>
> > >>query
> > >>
> > >>>>interfaces) from the base directory used as containers for the
> > >>
> > >>segments,
> > >>
> > >>>>features and types.
> > >>>>
> > >>>>  $VERSION/segments - the segments query URL; summarizes the
> > >>
> > >>top-level
> > >>
> > >>>>     segments in the data source
> > >>>>
> > >>>>  $VERSION/segment/$SEGMENT_ID - a segment query; used to get
> > >>
> > >>detailed
> > >>
> > >>>>     information about the identified segment
> > >>>>
> > >>>>  $VERSION/features - the feature filter query URL.  Features are
> > >>>>    locatable annotations or experimental results.  The feature
> > >>
> > >>filter
> > >>
> > >>>>    URL supports query parameters to select a subset of the
> features
> > >>>>    based on position, feature type and other properties.
> > >>>>
> > >>>>  $VERSION/feature/$FEATURE_ID - a feature query; used to get
> > >>
> > >>detailed
> > >>
> > >>>>     information about the identified feature
> > >>>>
> > >>>>  $VERSION/types - the types query URL which returns a list of all
> > >>>>    feature types.  Feature types include ontology and depiction
> > >>>>    details for all features of the given type.
> > >>>>
> > >>>>  $VERSION/type/$TYPE_ID - details about the specified feature
> type
> > >>>>
> > >>>>Oh, and there are internal conflicts which will be straightened
> > >>>>out in the next draft.  These shouldn't be big.
> > >>>>
> > >>>>					Andrew
> > >>>>					dalke at dalkescientific.com
> > >>>>
> > >>>>_______________________________________________
> > >>>>DAS2 mailing list
> > >>>>DAS2 at portal.open-bio.org
> > >>>>http://portal.open-bio.org/mailman/listinfo/das2
> > >>>>
> > >>>
> > >>>_______________________________________________
> > >>>DAS2 mailing list
> > >>>DAS2 at portal.open-bio.org
> > >>>http://portal.open-bio.org/mailman/listinfo/das2
> > >>
> > > _______________________________________________
> > > DAS2 mailing list
> > > DAS2 at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/das2
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> 


From Gregg_Helt at affymetrix.com  Wed Feb  1 20:03:47 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 1 Feb 2006 17:03:47 -0800
Subject: [DAS2] Alternative feature formats in current DAS/2 spec
Message-ID: <C71929195D04BF48BAECD499AF717B480198C995@msex02.affymetrix.com>

When discussing alternative feature formats, the spec reads:
The feature query URL supports the optional "format" parameter used to
request that the results be returns in an alternative format.  The
format names are listed in the versioned source document in the
<FORMAT> element of the "feature" <CATEGORY>.
 
I think the second sentence should instead read something like:
The possible format names for a particular feature type are listed in
the types document in the <FORMAT> elements for a given type. 
 
Also, the spec says:
Some of search results may not be expressible in the specified format.
The server should silently skip those feature records and return only
those records which can be converted.
 
I would argue that if any of the search results cannot be returned in
the specified format, then the server should really just return an
error.  Silently suppressing information is not good.  A generic
400-"Bad Request" would work, although a 415-"Unsupported Media Type"
might be more appropriate.
 
        gregg
 

From allenday at ucla.edu  Wed Feb  1 20:16:04 2006
From: allenday at ucla.edu (Allen Day)
Date: Wed, 1 Feb 2006 17:16:04 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
Message-ID: <Pine.LNX.4.58.0602011714370.1651@sumo.ctrl.ucla.edu>

There are still many references to "region" in Andrew's .txt document.  
Is it safe to assume that anywhere "region" is mentioned, it should really
be "segment" now?  I believe the answer is yes.

I'm asking to see if I need to change the feature filter implementation.

-Allen


On Wed, 1 Feb 2006, Helt,Gregg wrote:

> 
> That would be great if you could update the biopackages server before
> the code sprint starts!  Then client implementers will have a server to
> test with.
> 
> 	thanks,
> 	gregg
> 
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
> [mailto:das2-bounces at portal.open-
> > bio.org] On Behalf Of Allen Day
> > Sent: Wednesday, February 01, 2006 2:42 PM
> > To: Andrew Dalke
> > Cc: DAS/2
> > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > 
> > I just looked over your changes, and will begin making the changes to
> the
> > server repository today.
> > 
> > I'd like to update the server at das.biopackages.net with my changes
> on
> > Friday, unless there are objections.
> > 
> > I'll be taking notes along the way and will post to the list if
> anything
> > in your document is unclear to me.
> > 
> > At first glance, I agree -- the changes are minor.
> > 
> > -Allen
> > 
> > 
> > On Mon, 30 Jan 2006, Andrew Dalke wrote:
> > 
> > > Allen:
> > > > Is the spec going to be in a stable state for the code sprint?
> I'd
> > > > like
> > > > to use this time to sync the server implementation with a stable
> > > > version
> > > > of the spec.  It looks like there have been many substantial
> changes.
> > >
> > > I have just (within the last few minutes) completed the first draft
> > > of the update of the spec.
> > >
> > > It's not in HTML - that calls for too much work for this stage.
> > > It's text, in CVS under das/das2/new_spec.txt
> > >
> > > There are many parts which need clarification.  These are marked
> > > with a "XXX" along with my comments.
> > >
> > > The RNC files are in
> > >
> > >    das/das2/scratch/*.rnc
> > > along with some test XML files.  These XML files are not meant
> > > to be realistic.  They are meant more to check edge cases.
> > >
> > > I do no think there are major changes to the spec.  Most of the
> > > changes have actually trimmed things down, like getting rid of
> > > the "properties" subtree and merging the different "sources"
> requests
> > > into a single document.
> > >
> > >
> > > Here are the major interfaces
> > >
> > > $PREFIX/sequence - a "sources" request
> > >    This is the top-level entry point to a DAS 2 server.  It returns
> a
> > >    list of the available genomic sequence and their versions.
> > >    [sequence-namespace]
> > >
> > > $PREFIX/sequence/$SOURCE - a "source" request
> > >    Returns the available versions of the given genomic sequence.
> > >
> > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> > >    Returns information about a given version of a genomic sequence.
> > >    Clients may assume that the sequence and assembly are constant
> for a
> > >    given version of a source. Note that annotation data on a server
> > >    with curational write-back support may change without changing
> the
> > >    version.
> > >
> > >
> > > For a given version here are the sub-parts.  Note that I've gone
> ahead
> > > and split the query urls (segment, features and types each have
> query
> > > interfaces) from the base directory used as containers for the
> segments,
> > > features and types.
> > >
> > >   $VERSION/segments - the segments query URL; summarizes the
> top-level
> > >      segments in the data source
> > >
> > >   $VERSION/segment/$SEGMENT_ID - a segment query; used to get
> detailed
> > >      information about the identified segment
> > >
> > >   $VERSION/features - the feature filter query URL.  Features are
> > >     locatable annotations or experimental results.  The feature
> filter
> > >     URL supports query parameters to select a subset of the features
> > >     based on position, feature type and other properties.
> > >
> > >   $VERSION/feature/$FEATURE_ID - a feature query; used to get
> detailed
> > >      information about the identified feature
> > >
> > >   $VERSION/types - the types query URL which returns a list of all
> > >     feature types.  Feature types include ontology and depiction
> > >     details for all features of the given type.
> > >
> > >   $VERSION/type/$TYPE_ID - details about the specified feature type
> > >
> > > Oh, and there are internal conflicts which will be straightened
> > > out in the next draft.  These shouldn't be big.
> > >
> > > 					Andrew
> > > 					dalke at dalkescientific.com
> > >
> > > _______________________________________________
> > > DAS2 mailing list
> > > DAS2 at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/das2
> > >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> 


From allenday at ucla.edu  Sat Feb  4 05:43:10 2006
From: allenday at ucla.edu (Allen Day)
Date: Sat, 4 Feb 2006 02:43:10 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <Pine.LNX.4.58.0602011600150.1651@sumo.ctrl.ucla.edu>
References: <C71929195D04BF48BAECD499AF717B480198C994@msex02.affymetrix.com>
	<Pine.LNX.4.58.0602011600150.1651@sumo.ctrl.ucla.edu>
Message-ID: <Pine.LNX.4.58.0602040232090.19184@sumo.ctrl.ucla.edu>

There is a database server down, which is why I haven't posted the new
code to /codesprint yet.  Hopefully it will be back online tomorrow.

However, on my dev box I was able to make the server code serve up almost
all of what is described in Andrew's new_spec.txt file.  The large
remaining problems are:

* Properties ( <PROP/> elements ).  I still don't fully understand how
these work, if the previous implementation continues to be valid, or if
the implementation has been invalidated by the new document.

* Alternate default Content-Type header for the same command, e.g.

  /sequence/.../segment       # Content-Type: application/x-das-blah+xml
  /sequence/.../segment/chrM  # Content-Type: text/x-fasta

This is an artifact of an earlier design decision assumed Content-Type had
a single default and would only be modified if a ?format= parameter was
passed.  This is difficult to fix properly, so right now the fasta is
served up under the XML Content-Type.

-Allen


On Wed, 1 Feb 2006, Allen Day wrote:

> Okay, I will tag the current server and leave it at:
> 
> http://das.biopackages.net/das
> 
> I saw in the most recent commits by Andrew that the root-level "/das" is
> no longer needed, so I propose putting an updated server at:
> 
> http://das.biopackages.net/codesprint
> 
> If we're going to keep the current server in a "maintained but deprecated"  
> mode like this, I'll be making changes to the "new" server before Friday.
> 
> When the new version of IGB comes out we can then upgrade the current
> server.
> 
> Sound good?
> 
> -Allen
> 
> 
> On Wed, 1 Feb 2006, Helt,Gregg wrote:
> 
> > Yes, what Ed said, that's what I meant.  Updated server, but at a
> > different address.  Otherwise the current release of IGB will break when
> > trying to use the biopackages server.
> > 
> > Once our IGB code has caught up to the updated server, we can roll out a
> > new release to point to the new server instead of the old one.  But not
> > yet.
> > 
> > 	Thanks,
> > 	Gregg
> > 
> > > -----Original Message-----
> > > From: das2-bounces at portal.open-bio.org
> > [mailto:das2-bounces at portal.open-
> > > bio.org] On Behalf Of Ed Erwin
> > > Sent: Wednesday, February 01, 2006 3:45 PM
> > > To: Allen Day
> > > Cc: DAS/2
> > > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > > 
> > > 
> > > Gregg asked me to say "No".  Please do not break the current server
> > that
> > > IGB is using.
> > > 
> > > Please make your changes on a server at a different URL.
> > > 
> > > Thanks
> > > Ed
> > > 
> > > Allen Day wrote:
> > > > That's what I was thinking too, but I was worried about the existing
> > > > Genoviz clients "in the wild" having the server suddenly break.
> > > >
> > > > So you're saying it's okay with you if those clients have a service
> > > > interruption?
> > > >
> > > > -Allen
> > > >
> > > >
> > > > On Wed, 1 Feb 2006, Helt,Gregg wrote:
> > > >
> > > >
> > > >>That would be great if you could update the biopackages server
> > before
> > > >>the code sprint starts!  Then client implementers will have a server
> > to
> > > >>test with.
> > > >>
> > > >>	thanks,
> > > >>	gregg
> > > >>
> > > >>
> > > >>>-----Original Message-----
> > > >>>From: das2-bounces at portal.open-bio.org
> > > >>
> > > >>[mailto:das2-bounces at portal.open-
> > > >>
> > > >>>bio.org] On Behalf Of Allen Day
> > > >>>Sent: Wednesday, February 01, 2006 2:42 PM
> > > >>>To: Andrew Dalke
> > > >>>Cc: DAS/2
> > > >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > > >>>
> > > >>>I just looked over your changes, and will begin making the changes
> > to
> > > >>
> > > >>the
> > > >>
> > > >>>server repository today.
> > > >>>
> > > >>>I'd like to update the server at das.biopackages.net with my
> > changes
> > > >>
> > > >>on
> > > >>
> > > >>>Friday, unless there are objections.
> > > >>>
> > > >>>I'll be taking notes along the way and will post to the list if
> > > >>
> > > >>anything
> > > >>
> > > >>>in your document is unclear to me.
> > > >>>
> > > >>>At first glance, I agree -- the changes are minor.
> > > >>>
> > > >>>-Allen
> > > >>>
> > > >>>
> > > >>>On Mon, 30 Jan 2006, Andrew Dalke wrote:
> > > >>>
> > > >>>
> > > >>>>Allen:
> > > >>>>
> > > >>>>>Is the spec going to be in a stable state for the code sprint?
> > > >>
> > > >>I'd
> > > >>
> > > >>>>>like
> > > >>>>>to use this time to sync the server implementation with a stable
> > > >>>>>version
> > > >>>>>of the spec.  It looks like there have been many substantial
> > > >>
> > > >>changes.
> > > >>
> > > >>>>I have just (within the last few minutes) completed the first
> > draft
> > > >>>>of the update of the spec.
> > > >>>>
> > > >>>>It's not in HTML - that calls for too much work for this stage.
> > > >>>>It's text, in CVS under das/das2/new_spec.txt
> > > >>>>
> > > >>>>There are many parts which need clarification.  These are marked
> > > >>>>with a "XXX" along with my comments.
> > > >>>>
> > > >>>>The RNC files are in
> > > >>>>
> > > >>>>   das/das2/scratch/*.rnc
> > > >>>>along with some test XML files.  These XML files are not meant
> > > >>>>to be realistic.  They are meant more to check edge cases.
> > > >>>>
> > > >>>>I do no think there are major changes to the spec.  Most of the
> > > >>>>changes have actually trimmed things down, like getting rid of
> > > >>>>the "properties" subtree and merging the different "sources"
> > > >>
> > > >>requests
> > > >>
> > > >>>>into a single document.
> > > >>>>
> > > >>>>
> > > >>>>Here are the major interfaces
> > > >>>>
> > > >>>>$PREFIX/sequence - a "sources" request
> > > >>>>   This is the top-level entry point to a DAS 2 server.  It
> > returns
> > > >>
> > > >>a
> > > >>
> > > >>>>   list of the available genomic sequence and their versions.
> > > >>>>   [sequence-namespace]
> > > >>>>
> > > >>>>$PREFIX/sequence/$SOURCE - a "source" request
> > > >>>>   Returns the available versions of the given genomic sequence.
> > > >>>>
> > > >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> > > >>>>   Returns information about a given version of a genomic
> > sequence.
> > > >>>>   Clients may assume that the sequence and assembly are constant
> > > >>
> > > >>for a
> > > >>
> > > >>>>   given version of a source. Note that annotation data on a
> > server
> > > >>>>   with curational write-back support may change without changing
> > > >>
> > > >>the
> > > >>
> > > >>>>   version.
> > > >>>>
> > > >>>>
> > > >>>>For a given version here are the sub-parts.  Note that I've gone
> > > >>
> > > >>ahead
> > > >>
> > > >>>>and split the query urls (segment, features and types each have
> > > >>
> > > >>query
> > > >>
> > > >>>>interfaces) from the base directory used as containers for the
> > > >>
> > > >>segments,
> > > >>
> > > >>>>features and types.
> > > >>>>
> > > >>>>  $VERSION/segments - the segments query URL; summarizes the
> > > >>
> > > >>top-level
> > > >>
> > > >>>>     segments in the data source
> > > >>>>
> > > >>>>  $VERSION/segment/$SEGMENT_ID - a segment query; used to get
> > > >>
> > > >>detailed
> > > >>
> > > >>>>     information about the identified segment
> > > >>>>
> > > >>>>  $VERSION/features - the feature filter query URL.  Features are
> > > >>>>    locatable annotations or experimental results.  The feature
> > > >>
> > > >>filter
> > > >>
> > > >>>>    URL supports query parameters to select a subset of the
> > features
> > > >>>>    based on position, feature type and other properties.
> > > >>>>
> > > >>>>  $VERSION/feature/$FEATURE_ID - a feature query; used to get
> > > >>
> > > >>detailed
> > > >>
> > > >>>>     information about the identified feature
> > > >>>>
> > > >>>>  $VERSION/types - the types query URL which returns a list of all
> > > >>>>    feature types.  Feature types include ontology and depiction
> > > >>>>    details for all features of the given type.
> > > >>>>
> > > >>>>  $VERSION/type/$TYPE_ID - details about the specified feature
> > type
> > > >>>>
> > > >>>>Oh, and there are internal conflicts which will be straightened
> > > >>>>out in the next draft.  These shouldn't be big.
> > > >>>>
> > > >>>>					Andrew
> > > >>>>					dalke at dalkescientific.com
> > > >>>>
> > > >>>>_______________________________________________
> > > >>>>DAS2 mailing list
> > > >>>>DAS2 at portal.open-bio.org
> > > >>>>http://portal.open-bio.org/mailman/listinfo/das2
> > > >>>>
> > > >>>
> > > >>>_______________________________________________
> > > >>>DAS2 mailing list
> > > >>>DAS2 at portal.open-bio.org
> > > >>>http://portal.open-bio.org/mailman/listinfo/das2
> > > >>
> > > > _______________________________________________
> > > > DAS2 mailing list
> > > > DAS2 at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/das2
> > > _______________________________________________
> > > DAS2 mailing list
> > > DAS2 at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/das2
> > 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From allenday at ucla.edu  Mon Feb  6 02:13:59 2006
From: allenday at ucla.edu (Allen Day)
Date: Sun, 5 Feb 2006 23:13:59 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>

Okay folks, an implementation of the document cited below is available 
here:

http://das.biopackages.net/codesprint
http://das.biopackages.net/codesprint/sequence
http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment
etc.

After looking closely over this first draft of new_spec.txt, it's apparent 
that there are still some holes, e.g. what should the response to the 
following requests look like?

http://das.biopackages.net/codesprint/sequence/yeast
http://das.biopackages.net/codesprint/sequence/yeast/S228C

For now I have left responses the same as in the old HTML version of the
spec.  Of course if you find bugs, let me know.

The server at:

http://das.biopackages.net/das

is currently unavailable.  This is due to limitations in Apache/mod_perl
that won't allow different versions of the same class to coexist in a
family of processes.  I'd like to discuss how we should handle this in the
conference call tomrorow (today, if you're not in GMT+8).

-Allen


On Mon, 30 Jan 2006, Andrew Dalke wrote:

> Allen:
> > Is the spec going to be in a stable state for the code sprint?  I'd 
> > like
> > to use this time to sync the server implementation with a stable 
> > version
> > of the spec.  It looks like there have been many substantial changes.
> 
> I have just (within the last few minutes) completed the first draft
> of the update of the spec.
> 
> It's not in HTML - that calls for too much work for this stage.
> It's text, in CVS under das/das2/new_spec.txt
> 
> There are many parts which need clarification.  These are marked
> with a "XXX" along with my comments.
> 
> The RNC files are in
> 
>    das/das2/scratch/*.rnc
> along with some test XML files.  These XML files are not meant
> to be realistic.  They are meant more to check edge cases.
> 
> I do no think there are major changes to the spec.  Most of the
> changes have actually trimmed things down, like getting rid of
> the "properties" subtree and merging the different "sources" requests
> into a single document.
> 
> 
> Here are the major interfaces
> 
> $PREFIX/sequence - a "sources" request
>    This is the top-level entry point to a DAS 2 server.  It returns a
>    list of the available genomic sequence and their versions.
>    [sequence-namespace]
> 
> $PREFIX/sequence/$SOURCE - a "source" request
>    Returns the available versions of the given genomic sequence.
> 
> $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
>    Returns information about a given version of a genomic sequence.
>    Clients may assume that the sequence and assembly are constant for a
>    given version of a source. Note that annotation data on a server
>    with curational write-back support may change without changing the
>    version.
> 
> 
> For a given version here are the sub-parts.  Note that I've gone ahead
> and split the query urls (segment, features and types each have query
> interfaces) from the base directory used as containers for the segments,
> features and types.
> 
>   $VERSION/segments - the segments query URL; summarizes the top-level
>      segments in the data source
> 
>   $VERSION/segment/$SEGMENT_ID - a segment query; used to get detailed
>      information about the identified segment
> 
>   $VERSION/features - the feature filter query URL.  Features are
>     locatable annotations or experimental results.  The feature filter
>     URL supports query parameters to select a subset of the features
>     based on position, feature type and other properties.
> 
>   $VERSION/feature/$FEATURE_ID - a feature query; used to get detailed
>      information about the identified feature
> 
>   $VERSION/types - the types query URL which returns a list of all
>     feature types.  Feature types include ontology and depiction
>     details for all features of the given type.
> 
>   $VERSION/type/$TYPE_ID - details about the specified feature type
> 
> Oh, and there are internal conflicts which will be straightened
> out in the next draft.  These shouldn't be big.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From dalke at dalkescientific.com  Mon Feb  6 06:33:34 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 11:33:34 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
Message-ID: <c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>

Allen:
> After looking closely over this first draft of new_spec.txt, it's 
> apparent
> that there are still some holes, e.g. what should the response to the
> following requests look like?
>
> http://das.biopackages.net/codesprint/sequence/yeast

<?xml version="1.0" encoding="UTF-8"?>
<SOURCES
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">
taxon="Yeast">
       <VERSION id="yeast/S228C" title="Sce" created="" modified="">

       <COORDINATES taxid="" source="" authority="">
         <VERSION name=""/>
       </COORDINATES>

       <ASSEMBLY>
         <LINK href="" priority=""/>
       </ASSEMBLY>

       <PROP key="" value=""/>

       <CATEGORY type="features" query_id="yeast/S228C/feature">
         <!-- list non-das2xml templates here -->
       </CATEGORY>
       <CATEGORY type="segments" query_id="yeast/S228C/segment"/>
       <CATEGORY type="types"    query_id="yeast/S228C/type"/>
       <CATEGORY type="locks"    query_id="yeast/S228C/lock"/>

     </VERSION>

   </SOURCE>
</SOURCES>


> http://das.biopackages.net/codesprint/sequence/yeast/S228C

The same for this case.  There is only on VERSION for "yeast".


Your XML, btw, starts

<?xml version="1.0" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="/xsl/das.xsl"?>
<!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">
<!-- this doesn't work and screws up the xsl     
xmlns="http://www.biodas.org/ns/das/genome/2.00" -->
<SOURCES
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">

The "standalone" means that the DTD may affect the content of the
documentation.
   http://www.stylusstudio.com/w3c/xml11/sec-rmd.htm

> Markup declarations can affect the content of the document, as passed 
> from an XML Processor to an application; examples are attribute 
> defaults and entity declarations. The standalone document declaration, 
> which MAY appear as a component of the XML declaration, signals 
> whether or not there are such declarations which appear external to 
> the Document Entity or in parameter entities. An external markup 
> declaration is defined as a markup declaration occurring in the 
> external subset or in a parameter entity (external or internal, the 
> latter being included because non-validating processors are not 
> required to read them).

For what we're doing, we don't need nor (I think) want that.  There's
no reason for a client to consult the DTD to figure out the XML.

Instead, use

<?xml version="1.0"?>

and probably have the encoding

<?xml version="1.0" encoding="UTF-8"?>

That also means you can get rid of the

<!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">

statements.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Feb  6 07:02:40 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 12:02:40 +0000
Subject: [DAS2] timezone change
Message-ID: <6c3ddd6d7dc01dc99f2e1e932e64e733@dalkescientific.com>

To make it easier for Thomas' Java library, the timezone
in the datestamps may also be of the form "0500".

Here are the valid forms and new examples

       TZD  = time zone designator (optional; one of the formats
                      "Z", +hh:mm, +hhmm, -hh:mm, or -hhmm)


    1959-21-52T09:35+0300

    2042-03-18T01:19:00-11:15


					Andrew
					dalke at dalkescientific.com


From dhoworth at mrc-lmb.cam.ac.uk  Mon Feb  6 07:12:52 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Mon, 06 Feb 2006 12:12:52 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>	<17368.18146.195226.166165@kinked.lbl.gov>	<17369.24994.880706.685148@kinked.lbl.gov>	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
Message-ID: <43E73D44.5020107@mrc-lmb.cam.ac.uk>

Andrew Dalke wrote:
> That also means you can get rid of the
> 
> <!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">

Doing that automatically invalidates the document does it not?

http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog

"Definition: An XML document is valid if it has an associated document 
type declaration and if the document complies with the constraints 
expressed in it.

The document type declaration MUST appear before the first element in 
the document."

Cheers, Dave


From dalke at dalkescientific.com  Mon Feb  6 08:42:03 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 13:42:03 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <43E73D44.5020107@mrc-lmb.cam.ac.uk>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>	<17368.18146.195226.166165@kinked.lbl.gov>	<17369.24994.880706.685148@kinked.lbl.gov>	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
	<43E73D44.5020107@mrc-lmb.cam.ac.uk>
Message-ID: <57bc93f8acc736e752d048d970b1f332@dalkescientific.com>

Dave Howorth:
> Doing that automatically invalidates the document does it not?
>
> http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog
>
> "Definition: An XML document is valid if it has an associated document 
> type declaration and if the document complies with the constraints 
> expressed in it.
>
> The document type declaration MUST appear before the first element in 
> the document."

I think this page summarizes it nicely:
http://www.xml.com/lpt/a/2002/09/04/xslt.html

     "Valid" is a technical term referring to the presence
     of and conformance to a DOCTYPE declaration.

XML documents w/o a DTD are "well-formed".  XML documents
with a DTD and which agree with the DTD are "valid".

In this case not being "valid" does not mean that the
document is "invalid XML".

As I understand things, it's perfectly fine to pass well-formed
but not valid XML documents around.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Feb  6 08:53:10 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 13:53:10 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
Message-ID: <3a3400e925dccf8583a5b47104e43766@dalkescientific.com>

Trying out Allen's XML

> <?xml version="1.0" standalone="no"?>
> <?xml-stylesheet type="text/xsl" href="/xsl/das.xsl"?>
> <!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">
> <!-- this doesn't work and screws up the xsl     
> xmlns="http://www.biodas.org/ns/das/genome/2.00" -->
> <SOURCES
>       xmlns:xlink="http://www.w3.org/1999/xlink"
>       xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">
>

The xmlns is needed, else "SOURCES" is in the unnamed namespace,
rather than the DAS2 namespace.

It looks like your XSLT might not declare the namespace?  I
can't find the document to check, at either of

   http://das.biopackages.net/xsl/das.xsl
   http://radius.genomics.ctrl.ucla.edu/xsl/das.xsl

The page at
  http://www.xml.com/pub/a/2001/04/04/trxml/

describes a bit on how to include namespace in your xslt


> <!-- xq242.xsl: converts xq241.html into xq243.xml -->
>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                 xmlns:xlink="http://www.w3.org/1999/xlink"
>                 version="1.0">
> <xsl:output method="xml" omit-xml-declaration="yes"/>
>
> <xsl:template match="a">
>   <author xlink:type="simple" xlink:href="{@href}">
>     <xsl:apply-templates/></author>
> </xsl:template>
>
> <xsl:template match="p">
>   <para><xsl:apply-templates/></para>
> </xsl:template>
>
> </xsl:stylesheet>

Note the use of the "xlink:" namespace abbreviation.

					Andrew
					dalke at dalkescientific.com


From dhoworth at mrc-lmb.cam.ac.uk  Mon Feb  6 09:27:34 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Mon, 06 Feb 2006 14:27:34 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <57bc93f8acc736e752d048d970b1f332@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>	<17368.18146.195226.166165@kinked.lbl.gov>	<17369.24994.880706.685148@kinked.lbl.gov>	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>	<43E73D44.5020107@mrc-lmb.cam.ac.uk>
	<57bc93f8acc736e752d048d970b1f332@dalkescientific.com>
Message-ID: <43E75CD6.7000909@mrc-lmb.cam.ac.uk>

Andrew Dalke wrote:
> Dave Howorth:
>> Doing that automatically invalidates the document does it not?
>>
>> http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog
>>
>> "Definition: An XML document is valid if it has an associated document 
>> type declaration and if the document complies with the constraints 
>> expressed in it.
>>
>> The document type declaration MUST appear before the first element in 
>> the document."
> 
> I think this page summarizes it nicely:
> http://www.xml.com/lpt/a/2002/09/04/xslt.html
> 
>     "Valid" is a technical term referring to the presence
>     of and conformance to a DOCTYPE declaration.

I think that's a paraphrase of the first para I quoted above?

> XML documents w/o a DTD are "well-formed".  XML documents
> with a DTD and which agree with the DTD are "valid".
> 
> In this case not being "valid" does not mean that the
> document is "invalid XML".

No, I believe you're wrong there; 'not valid' and 'invalid' have the 
same meaning both colloquially and as used in the spec. It's either 
valid or it isn't, and if it isn't then its invalid.

> As I understand things, it's perfectly fine to pass well-formed
> but not valid XML documents around.

I don't agree. There are occasions when it is acceptable but it's 
generally bad practice, IMHO. The discussion in sec 5 of the spec gives 
some motivation, particularly this section:

http://www.w3.org/TR/REC-xml/#safe-behavior

Or look here, or thousands of other places:
http://www.online-learning.com/demos/xml/valid_xml.html
http://en.wikipedia.org/wiki/Xml#Correctness_in_an_XML_document

In particular for interoperability of an open, distributed system with 
many writers and readers implemented by different groups (i.e. DAS), I 
suggest validity is essential.

I would have expected your experience of the PDB to make you keen on 
validation :)

Cheers, Dave


From dalke at dalkescientific.com  Mon Feb  6 10:09:58 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 15:09:58 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <43E75CD6.7000909@mrc-lmb.cam.ac.uk>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>	<17368.18146.195226.166165@kinked.lbl.gov>	<17369.24994.880706.685148@kinked.lbl.gov>	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>	<43E73D44.5020107@mrc-lmb.cam.ac.uk>
	<57bc93f8acc736e752d048d970b1f332@dalkescientific.com>
	<43E75CD6.7000909@mrc-lmb.cam.ac.uk>
Message-ID: <0aeda19421fdc7c75e2440ad0acd6391@dalkescientific.com>

Dave Howorth wrote:
> Andrew Dalke wrote:
>> I think this page summarizes it nicely:
>> http://www.xml.com/lpt/a/2002/09/04/xslt.html
>>     "Valid" is a technical term referring to the presence
>>     of and conformance to a DOCTYPE declaration.
>
> I think that's a paraphrase of the first para I quoted above?

It adds the phrase "technical term", making it (in my interpretation)
different from the word "valid" in its normal sense.

> No, I believe you're wrong there; 'not valid' and 'invalid' have the 
> same meaning both colloquially and as used in the spec. It's either 
> valid or it isn't, and if it isn't then its invalid.

I now agree that in the spec sense "invalid" and "not valid" are the
same.

I still think it has a technical difference from its normal use.
See for example the thread at
   http://www.stylusstudio.com/xmldev/200411/post50310.html

part of which says

> >But does it matter if a document is Not valid?
>
> Not necessarily.  It's up to you.  Requiring a document to be valid is
> a way of putting some constraints on it.  If you don't have any such
> constraints (unlikely, unless you are writing some very generic
> software like an editor), then there's no need for validity.  More
> likely, not all your constraints can be expressed by a DTD, and you
> will need to express them some other way.
>
> And of course you can require the document to be valid according to
> some other kind of schema, such as XML schemas or RelaxNG or
> Schematron.


>> As I understand things, it's perfectly fine to pass well-formed
>> but not valid XML documents around.
>
> I don't agree. There are occasions when it is acceptable but it's 
> generally bad practice, IMHO. The discussion in sec 5 of the spec 
> gives some motivation, particularly this section:
>
> http://www.w3.org/TR/REC-xml/#safe-behavior
>
> Or look here, or thousands of other places:
> http://www.online-learning.com/demos/xml/valid_xml.html
> http://en.wikipedia.org/wiki/Xml#Correctness_in_an_XML_document
>
> In particular for interoperability of an open, distributed system with 
> many writers and readers implemented by different groups (i.e. DAS), I 
> suggest validity is essential.

Quoting the wikipedia reference to DTDs:

> The oldest schema format for XML is the Document Type Definition 
> (DTD), inherited from SGML. While DTD support is ubiquitous due to its 
> inclusion in the XML 1.0 standard, it is seen as limited for the 
> following reasons:
>   *  It has no support for newer features of XML, most importantly 
> namespaces.

DAS2 uses namespaces.  Hence it cannot use DTDs.

We are defining Relax-NG schemas for the different formats,
which can be used for better validity checking than is supported
by DTDs.

"valid DAS2 document" ::= "meets the DAS2 spec"

"meets the DAS2 spec" is a stricter definition than
   "well-formed XML" + "meets the RNG spec"
which is stricter than
   "well-formed XML" + "meets the (hypthetical namespace-aware) DTD"


> I would have expected your experience of the PDB to make you keen
> on validation :)

Indeed, I'm working on the validator for DAS2, which uses the Relax-NG
schemas.  ;)

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Feb  6 11:03:07 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 16:03:07 +0000
Subject: [DAS2] <CATEGORY> elements
Message-ID: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com>

One discussion point from today is the <CATEGORY> elements.

The current draft of the spec says they look like this

       <CATEGORY type="segments" query_id="volvox/1/segments">
           <FORMAT name="fasta" mimetype="text/x-fasta" />
           <FORMAT name="raw" mimetype="text/plain" />
       </CATEGORY>
       <CATEGORY type="types" query_id="volvox/1/types">
           <FORMAT name="das2xml" mimetype="text/x-das-featuretype+xml" 
/>
       </CATEGORY>
       <CATEGORY type="features" query_id="volvox/1/features">
           <FORMAT name="das2xml" mimetype="text/x-das-type+xml" />
       </CATEGORY>
       <CATEGORY type="locks" query_id="volvox/1/locks" />


Andreas Prlic pointed out that since the document says
the "volvox" version "1" url is already known ("volvox/1")
and the type="segments" then the query_id can be built
from appending "segments" to the "volvox/1" (plus the "/")
to get "volvox/1/segments".

I originally responded from a ReST purity argument, in that
URLs should not be constructed from non-URL data.  This
lets Thomas, for example, use GUIDs for the objects rather
than the hierarchical structure I and others recommend.

During discussion a better answer came up, which I think
we talked about earlier but which is worth emphasizing
is that the "query_id"s don't need to be on the same server.

For example, the "regions" URL may and likely will point
to a common reference server, and a database may offer
only one set of "types" for all of the "features".

That is, something like this

   DAS server example.com
      genome A
         version x
           segments at "ensembl.org/das2/genome_A/build_1/segments"
           features at "example.com/A/version_x/features"
           types at "example.com/A/types"

         version y
           segments at "ensembl.org/das2/genome_A/build_1/segments"
           features at "example.com/A/version_y/features"
           types at "example.com/A/types"

         version z
           segments at "ensembl.org/das2/genome_A/build_2/segments"
           features at "example.com/A/version_z/features"
           types at "example.com/A/types"

   DAS server biodas.org
      genome A
         version 1
           segments at "ensembl.org/das2/genome_A/build_2/segments"
           features at "example.com/A/1/features"
           types at "example.com/A/types"  (note: on other server!)


					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Mon Feb  6 12:13:18 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 09:13:18 -0800
Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9A8@msex02.affymetrix.com>

Status report
DAS/2 XML - valid or not valid?
CATEGORY elements -- constructing query URLs
MAINTAINER information
Use of xml:base
update on feature properties - searching, etc.
 
 
From lstein at cshl.edu  Mon Feb  6 13:20:10 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 6 Feb 2006 13:20:10 -0500
Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9A8@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9A8@msex02.affymetrix.com>
Message-ID: <200602061320.11360.lstein@cshl.edu>

Hi Gregg,

I had a conflicting teleconference and wasn't sure whether there was a 
teleconference scheduled for the code sprint, so I didn't dial in. Just got 
the agenda now.

I am online on both MSN and AOL chats, and will be all week, if anyone wants 
to IM me.

Lincoln

On Monday 06 February 2006 12:13, Helt,Gregg wrote:
> Status report
> DAS/2 XML - valid or not valid?
> CATEGORY elements -- constructing query URLs
> MAINTAINER information
> Use of xml:base
> update on feature properties - searching, etc.
>
>
>
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dalke at dalkescientific.com  Mon Feb  6 13:42:24 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 18:42:24 +0000
Subject: [DAS2] version=
Message-ID: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com>

If we add a version= field to the Content-Type, or whatever
mechanism is proposed

Content-Type: application/x-das2features+xml; version=12345

What will a client do when it gets a version number it has
never heard of?  Should it use the newest version it supports?
The oldest?  Abort?


					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Mon Feb  6 14:50:14 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 06 Feb 2006 11:50:14 -0800
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
 6 Feb 2006
Message-ID: <C00CE876.1BB01%Steve_Chervitz@affymetrix.com>

Notes from the DAS/2 teleconference for the code sprint, 6 Feb 2006

$Id: das2-teleconf-2006-02-06.txt,v 1.2 2006/02/06 19:57:05 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  Sanger: Andreas Prlic, Thomas Down, Roy
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris
  UCLA: Allen Day
       
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Gregg's topics for discussion:

* Status report 
* DAS/2 XML - valid or not valid?
* CATEGORY elements -- constructing query URLs
* MAINTAINER information
* Use of xml:base
* update on feature properties - searching, etc.

Status Reports - what people are working on for the code sprint
------------------------------------------------------------

andrew
- getting folks up to speed on the spec changes, what he wrote.
- getting a feel for ensembl schema.
- change today: time zone specification b/c td's java time lib did
  something different than iso did.

aday: tag & branch?
gh: no branch, maybe tag
ad: tagging probably not necessary

gh: brings up a related issue:
 what is our mechanism for versioning - client & spec to understand
 which version of the spec they are/should be implementing
- can talk about it later during the xml validation issue discussion

ap: [missed it -- sorry!]

td: java om, feature xml done, can read and write.

roy: zmap das2 client, read/write das2, written in C. working with
ed griffith who's not available this week.
currently just a reader. from james gilbert, based on fmap from Acedb

gh: updating client and server (mostly client). top down syncing in
parallel, one command at a time. sources request is working on both
sides. will start w/ allen's server today, doing gh's sources query
against allen's server. segments and types today.

nh: apollo das2 client. reads das2 xml from andrew's example, write
out features in das2, now working on get, testing with server.

sc: affy das2 server stuff. streamlining updating it with feature data
from UCSC. also working on updating exon array data for use in IGB
client. working w/ gregg on other server-related work.
gh: graph data as well.

ee: working on igb client. talk w/ gregg later to get specifics.
gh: lots of ui stuff

Topic: xml validation
---------------------

ad: dtd's don't support namespaces, so we can't support dtds
gh: not that simple. where do we add namespaces?
ad: schemas have ns's
    testing....
gh: concern #1: is one of perception. don't like telling people we
don't have valid xml
ad: only means suports the dtd, not in human sense.
gh: it's one of perception
td: self-contained document + validation

gh: getting rid of doctype declaration is issue of versioning. how
will client know which version of spec it's supposed to be implementing?
need to deal with spec crawl. The only way i'm aware of is via looking
at dtd pointer changing.
gh: not worried about new categories, but changing things like
optional vs req'd attributes/elements.
ad: content-type contains version
td: or content negotiation
ap: xml schema validator at w3c.org can use that and claim it is
valid. can upload your files, push a button.
ad: I have an extension of properties with arbitrary binary data vs
text vs href. this is ok with relaxng, not by xsd.
ad: we could say what is valid das2 since we're the arbiters of what
is valid das xml document. e.g., well-formed, validates against the rng
schemas

gh: the rng we now have allows arbitrary xml?
ad: yes. can say there are arbitrary elements under some node. checked
in as file named common.rnc
gh: ok, getting rid of requirement for doctype declaration. any
versioning is done via content-type

gh: if we don't do content neg, a sources query goes out, whatever
version that the server supports comes back. this will be the latest
version of the spec the server supports.
ad: for backwards compatibility that won't be needed. extensibility
will be sufficient for a few years.
gh: don't believe it.
td: spec is churning fast now. there'll be less churn once there are impls.
gh: there were impls 3 or 4 mos ago (allen, gregg). so there have been
plenty of churn even with impls.so we'll need versioning, ok on
content-type.
aday: we definitely need versioning. need it now. also want a tagged
version we we can work at same time.
ad: content-type-xdas;version=1.1
in general not the right solution (not general purpose), but for this
case, makes sense. 
aday: can impl, header says 1.1
gh/ad: contents are a subset of the specification. so it's tied to a
version of the rng schema.
ad: the tag will be the cvs revision #

gh: this isn't temporary, where there will not be a time when we are
not generating churn.
ad: believes this is temporary, won't have to have it long-term
aday: no mechanism for it now.
ad: need a way to turn it into meaning. agreement on what string means
which verison of a program.

nh: second gregg. will always be an issue. ad says it's not good
long-term, maybe we should come up with it.
gh: we have some basis to go forward.

[A] das/2 server will specify spec version via content-type-xdas;version=X.X

Topic: category elements, how to construct a query url
------------------------------------------------------

ad: what is syntax of string used to specify ontology? SO:?
aday: attribute for it
gh: ontol term is a uri
aday: type element has ontology
gh: id of type is not nec an ontol term
ad: the attrib of feat type, ontol=something
gh: that's a uri, abs or rel point to a frag in so/fa ontol
ad: can't find how this should look. said SO:0000001. that should be
a uri?
gh: yes. in types xml that's returned, id and ontol are uri's. a
server will pick one for it's xml base. the other will have to be a
full uri.
ad: how do diff clients know a given term corresponds to what term in
the ontol?
gh: they will have to understand sofa/so.
ad: do they have persistent ids?
gh: my understanding is that they can use fragment notation for a
stable url for the term
aday: ontol docs aren't xml, no anchors for pointing to a
fragment. they're their own format. nervous about building dependency
on fragment record uris into our system
gh: good point. would be happier if it was recast as xml
aday: is now pointing to an xml document for ontology nodes
ad: happier if we could use "SO:xxx" i.e., a urn
gh: would like a re-cast as xml document, hosted at so/sofa
website. that xml would be like a std ontology representation so you
could extend it. so someone could point to an extension of it.


Category elements -- constructing query URLs
--------------------------------------------

gh: andreas' point (email): query id attribute, constructing these out
of relative uri, or based on base uri.
agree with andreas: we know what those will be.
for clarity of spec, we should specify: here's base uri, here's how you
construct the segments query, etc.
ad: trouble for segments- could be on ref server
gh: doubt that people will impl this way. will be specific to server
and will be related to everyone else's notion of chromosomes and
assemblies.
ad: where does the distributed nature of das come from? ref server
gh: das/1: ref server has residues to serve, regions (entry pts)
served up by everyone. this was the notion of ref vs non-ref
server to carry forward. non-ref server still serves up segments.
will have segments in it's reference space. reference would be genome
assembly version + organism. sufficient to globally identify it.
ap: had discussions about this. query id
td: issue comes from seqs being urls rather than opaque ids in a ns
defined by coord system. have a set of servers that share common coord
syst. then a seq identified by stringx on one server is same as on the
other server.
the remaining q: server that doesn't want to serve up seqs, what urls
does it use? can it use an opaque seq name that is known by that name of
ref server? 

gh: restating concerns here: using query string to construct uri's
1. confusion: arbitrary uri means more confusing spec, and how to
   implement it (can't just say /segment, but 'whatever is pointed at
   by such and such uri')
2. size of documents. right now, can use same xml:base for features
   document, can make feat ids and location id relative to it, nice
   and short. if seg is on other server, need to expand one of the ids

compresses well, but that will take longer than transmission.
this is only for features xml.

can use coords or assembly info to determine identity between urls.
want a defined ns.
ad: you want a way to say: these are relative urls to a base url for
that data type. so that this type url is relative to some base url for
types, similar for segments, features.
gh: we have this now, can be relative or absolute
ad: there is a default xml base like thing: one for type, segment,
features. so you could have relative ids to those bases.
gh: possibly, but not ideal. It's better to use a std xml base for all
of them. 
each server has it's own unique uris for segments.

I'm proposing that we decouple segments from residues and having
segments doesn't mean we can serve residues. reasoning:
- this leads to smaller xml docs
- simplifies the spec if we didn't have to construct query ids from
  category element

would rather specify the string that's appended in the spec.

sc: might could deal with this issue by adding structure to the
document in order to add different xml:bases for different data
types. e.g., use different parent elements that could define their own
xml:bases, one for types, segments, and feautures. might complicate
the spec tho. 

ad: single genome have same types across all dbs.
gh: across servers, dangerous.
ad/td: globally unique ids, could have everything in the same directory.
td: can we just use seq/name, type/name. i.e., codifying what the
convention now is.
ad: name is put at end of base url
a feature document may give types, segments, other features.
td: just use simple strings, not urls.
gh: std uri syntax isn't important, but a std query mechanism to get
all of these is. some uri you put a '/types' on or a '/segments'.
ad: you have this right now.
gh: but it's only defined for a server, not the whole spec. there's no
where in the spec that says this. confusing for people
reading/implementing the spec.
ap: If you make it free text, you don't know what to put for a given server?
ad: you get a document
ap: I already know the server, not necessarily a document.

ad: taking out the mention of any hierarchy, just refer to things as
feat query url.

[note taker is having trouble following the thread of this discussion.]

gh: let's sleep on it, discuss tomorrow, vote then.


From nomi at fruitfly.org  Mon Feb  6 15:49:51 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Mon, 6 Feb 2006 12:49:51 -0800 (PST)
Subject: New DAS/2 server for codesprint [was Re: [DAS2] Re: Apollo and
	DAS/2 priorities]
In-Reply-To: <Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
Message-ID: <17383.46703.563017.422300@kinked.lbl.gov>

thanks for setting up the new das/2 server, allen.  i'm having trouble
with some of the queries.

On 5 February 2006, Allen Day wrote:
 > Okay folks, an implementation of the document cited below is available 
 > here:
 > 
 > http://das.biopackages.net/codesprint
I get "Internal Server Error"

 > http://das.biopackages.net/codesprint/sequence
 > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment
these both work.

 > http://das.biopackages.net/codesprint/sequence/yeast
 > http://das.biopackages.net/codesprint/sequence/yeast/S228C
for these i get
Error loading stylesheet: A network error occured loading an XSLT stylesheet:
http://das.biopackages.net/xsl/das.xsl

i'm running firefox on mozilla, so i'm not surprised when it has problems
with stylesheets, but i used to be able to get data from the old das/2
server, even though it did have some complaint about not finding the
stylesheet.

http://das.biopackages.net/codesprint/sequence/human/17/feature
churned forever (or, at least, for several minutes--maybe it will
eventually return).

           Nomi


From nomi at fruitfly.org  Mon Feb  6 17:34:30 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Mon, 6 Feb 2006 14:34:30 -0800 (PST)
Subject: [DAS2] Re: New DAS/2 server for codesprint
In-Reply-To: <17383.46703.563017.422300@kinked.lbl.gov>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<17383.46703.563017.422300@kinked.lbl.gov>
Message-ID: <17383.52982.274142.351003@kinked.lbl.gov>

On 6 February 2006, Nomi Harris wrote:
 > thanks for setting up the new das/2 server, allen.  i'm having trouble
 > with some of the queries.

ok, i realized that some of the queries i was trying were senseless, but
here are some that should work that are just hanging:
http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments
http://das.biopackages.net/codesprint/sequence/yeast/S228C/types

        Nomi


From allenday at ucla.edu  Mon Feb  6 16:53:34 2006
From: allenday at ucla.edu (Allen Day)
Date: Mon, 6 Feb 2006 13:53:34 -0800 (PST)
Subject: New DAS/2 server for codesprint [was Re: [DAS2] Re: Apollo and
	DAS/2 priorities]
In-Reply-To: <17383.46703.563017.422300@kinked.lbl.gov>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<17383.46703.563017.422300@kinked.lbl.gov>
Message-ID: <Pine.LNX.4.58.0602061345360.29889@sumo.ctrl.ucla.edu>

On Mon, 6 Feb 2006, Nomi Harris wrote:

> thanks for setting up the new das/2 server, allen.  i'm having trouble
> with some of the queries.
> 
> On 5 February 2006, Allen Day wrote:
>  > Okay folks, an implementation of the document cited below is available 
>  > here:
>  > 
>  > http://das.biopackages.net/codesprint
> I get "Internal Server Error"

That's to be expected.  The spec does not specify what the response to
this request should be, or if it is even valid -- so I didn't implement
it.

>  > http://das.biopackages.net/codesprint/sequence
>  > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment
> these both work.
> 
>  > http://das.biopackages.net/codesprint/sequence/yeast
>  > http://das.biopackages.net/codesprint/sequence/yeast/S228C
> for these i get
> Error loading stylesheet: A network error occured loading an XSLT stylesheet:
> http://das.biopackages.net/xsl/das.xsl

This happens if you're browsing the URLs in a web browser that supports
xsl directives.  Previous versions of the server supported web browsers,
but at the cost of using a 'text/xml' Content-Type header.  Consensus in
the group was that web browsers are not a target platform, so this feature
no longer works -- so you won't be able to view the DAS2XML in your
browser anymore.  I just haven't removed the XSL references yet.

> i'm running firefox on mozilla, so i'm not surprised when it has problems
> with stylesheets, but i used to be able to get data from the old das/2
> server, even though it did have some complaint about not finding the
> stylesheet.
> 
> http://das.biopackages.net/codesprint/sequence/human/17/feature

The server is coded to throw an error if you ask for all features, so I'm
surprised it didn't just give you a 4xx or 5xx response.  I'll look into
it.

> churned forever (or, at least, for several minutes--maybe it will
> eventually return).
> 
>            Nomi
> 


From allenday at ucla.edu  Mon Feb  6 17:00:50 2006
From: allenday at ucla.edu (Allen Day)
Date: Mon, 6 Feb 2006 14:00:50 -0800 (PST)
Subject: [DAS2] Re: New DAS/2 server for codesprint
In-Reply-To: <17383.52982.274142.351003@kinked.lbl.gov>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<17383.46703.563017.422300@kinked.lbl.gov>
	<17383.52982.274142.351003@kinked.lbl.gov>
Message-ID: <Pine.LNX.4.58.0602061358060.29889@sumo.ctrl.ucla.edu>

Hi Nomi,

I just restarted the server, the "all features" request used all the
memory and hung the webserver.  I'll look into why that request wasn't
immediately denied as it used to be.

As for your .../segments and .../types, they should be .../segment and
.../type.  I see no reason to pluralize these URLs given that the sources
response allows me to provide them at any arbitrary URL:

  [...]
  <CATEGORY type="segments" query_id="yeast/S228C/segment"/>
  <CATEGORY type="types"    query_id="yeast/S228C/type"/>
  <CATEGORY type="locks"    query_id="yeast/S228C/lock"/>
  [...]

-Allen


On Mon, 6 Feb 2006, Nomi Harris wrote:

> On 6 February 2006, Nomi Harris wrote:
>  > thanks for setting up the new das/2 server, allen.  i'm having trouble
>  > with some of the queries.
> 
> ok, i realized that some of the queries i was trying were senseless, but
> here are some that should work that are just hanging:
> http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments
> http://das.biopackages.net/codesprint/sequence/yeast/S228C/types
> 
>         Nomi
> 


From Steve_Chervitz at affymetrix.com  Mon Feb  6 17:27:01 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 06 Feb 2006 14:27:01 -0800
Subject: [DAS2] version=
In-Reply-To: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com>
Message-ID: <C00D0D35.1BB30%Steve_Chervitz@affymetrix.com>


Andrew Dalke wrote on Mon, 6 Feb 2006 18:42:24
> 
> If we add a version= field to the Content-Type, or whatever
> mechanism is proposed
> 
> Content-Type: application/x-das2features+xml; version=12345
> 
> What will a client do when it gets a version number it has
> never heard of?  Should it use the newest version it supports?
> The oldest?  Abort?

Rather than have version data be something that the client has to discover
in the response, an then have to react to in some intelligent way, how about
adding an optional dasversion field to all requests, such as:

http://www.wormbase.org/das/genome/volvox/1/type?dasversion=1.1

The server would then either:

1) return the appropriate response document if the server supports the
requested version or a later version that is backward compatible with it,
or
2) return a 505 error 'DAS Version Not Supported', which we already have in
the spec.

This puts the onus on the server rather than the client, but I think it
would be less trouble on the server than the alternative scheme would be for
the client. The client can now be fairly dumb about versioning and assume
everything is kosher unless it gets an error.

We could put some of the onus for DAS version support on the revisers of the
spec: When a new version of the spec is released, we'll know right then what
parts will be backward compatible and what parts will not be. The reviser
could document whether the new version of the spec is backwards compatible
with which previous versions, with the appropriate level of granularity
(e.g., "all requests are backward compatible except for the types request").
This would serve as a guide for maintainers of das2 servers.

Thoughts?

Steve


From nomi at fruitfly.org  Mon Feb  6 18:41:23 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Mon, 6 Feb 2006 15:41:23 -0800 (PST)
Subject: [DAS2] version=
In-Reply-To: <C00D0D35.1BB30%Steve_Chervitz@affymetrix.com>
References: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com>
	<C00D0D35.1BB30%Steve_Chervitz@affymetrix.com>
Message-ID: <17383.56995.914058.889189@kinked.lbl.gov>

i think it would be nice to have it work both ways--the version is
reported by the server, but the client can also request a particular
version as you suggest.

whatever we decide on, can we please make the version IDs numerical so
that they can be compared easily (e.g. "if (dasversion > 1.3) ...")?

     Nomi


On 6 February 2006, Steve Chervitz wrote:
 > Andrew Dalke wrote on Mon, 6 Feb 2006 18:42:24
 > > 
 > > If we add a version= field to the Content-Type, or whatever
 > > mechanism is proposed
 > > 
 > > Content-Type: application/x-das2features+xml; version=12345
 > > 
 > > What will a client do when it gets a version number it has
 > > never heard of?  Should it use the newest version it supports?
 > > The oldest?  Abort?
 > 
 > Rather than have version data be something that the client has to discover
 > in the response, an then have to react to in some intelligent way, how about
 > adding an optional dasversion field to all requests, such as:
 > 
 > http://www.wormbase.org/das/genome/volvox/1/type?dasversion=1.1
 > 
 > The server would then either:
 > 
 > 1) return the appropriate response document if the server supports the
 > requested version or a later version that is backward compatible with it,
 > or
 > 2) return a 505 error 'DAS Version Not Supported', which we already have in
 > the spec.
 > 
 > This puts the onus on the server rather than the client, but I think it
 > would be less trouble on the server than the alternative scheme would be for
 > the client. The client can now be fairly dumb about versioning and assume
 > everything is kosher unless it gets an error.
 > 
 > We could put some of the onus for DAS version support on the revisers of the
 > spec: When a new version of the spec is released, we'll know right then what
 > parts will be backward compatible and what parts will not be. The reviser
 > could document whether the new version of the spec is backwards compatible
 > with which previous versions, with the appropriate level of granularity
 > (e.g., "all requests are backward compatible except for the types request").
 > This would serve as a guide for maintainers of das2 servers.
 > 
 > Thoughts?
 > 
 > Steve


From ed_erwin at affymetrix.com  Mon Feb  6 17:48:49 2006
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Mon, 06 Feb 2006 14:48:49 -0800
Subject: [DAS2] <CATEGORY> elements
In-Reply-To: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com>
References: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com>
Message-ID: <43E7D251.8050703@affymetrix.com>

Andrew Dalke wrote:
> One discussion point from today is the <CATEGORY> elements.
> 
> The current draft of the spec says they look like this
> 
>       <CATEGORY type="segments" query_id="volvox/1/segments">
>           <FORMAT name="fasta" mimetype="text/x-fasta" />
>           <FORMAT name="raw" mimetype="text/plain" />
>       </CATEGORY>
>       <CATEGORY type="types" query_id="volvox/1/types">
>           <FORMAT name="das2xml" mimetype="text/x-das-featuretype+xml" />
>       </CATEGORY>
>       <CATEGORY type="features" query_id="volvox/1/features">
>           <FORMAT name="das2xml" mimetype="text/x-das-type+xml" />
>       </CATEGORY>
>       <CATEGORY type="locks" query_id="volvox/1/locks" />
> 
> 
> Andreas Prlic pointed out that since the document says
> the "volvox" version "1" url is already known ("volvox/1")
> and the type="segments" then the query_id can be built
> from appending "segments" to the "volvox/1" (plus the "/")
> to get "volvox/1/segments".
> 
> I originally responded from a ReST purity argument, in that
> URLs should not be constructed from non-URL data.  This
> lets Thomas, for example, use GUIDs for the objects rather
> than the hierarchical structure I and others recommend.
> 
> During discussion a better answer came up, which I think
> we talked about earlier but which is worth emphasizing
> is that the "query_id"s don't need to be on the same server.
> 
> For example, the "regions" URL may and likely will point
> to a common reference server, and a database may offer
> only one set of "types" for all of the "features".
> 
> That is, something like this
> 
>   DAS server example.com
>      genome A
>         version x
>           segments at "ensembl.org/das2/genome_A/build_1/segments"
>           features at "example.com/A/version_x/features"
>           types at "example.com/A/types"


None of your examples vary the words "segments", "types" or "features", 
but it is legal to do so, right?:

            segments at "ensembl.org/das2/genome_A/build_1/segment"
            features at "example.com/A/version_x/things/and/more/things"
            types at "example.com/A/rhinoceros"

OK, so no one is likely to go that far, but is it legal for example to 
use non-plural "segment", "feature" and "type" ?


From ed_erwin at affymetrix.com  Mon Feb  6 17:51:11 2006
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Mon, 06 Feb 2006 14:51:11 -0800
Subject: [DAS2] version=
In-Reply-To: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com>
References: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com>
Message-ID: <43E7D2DF.7060507@affymetrix.com>

Andrew Dalke wrote:
> If we add a version= field to the Content-Type, or whatever
> mechanism is proposed
> 
> Content-Type: application/x-das2features+xml; version=12345
> 
> What will a client do when it gets a version number it has
> never heard of?  Should it use the newest version it supports?
> The oldest?  Abort?
> 

It is up to the client to decide what to do, and this does not need to 
be specified here.


From Gregg_Helt at affymetrix.com  Mon Feb  6 18:16:35 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 15:16:35 -0800
Subject: [DAS2] RE: New DAS/2 server for codesprint
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9A9@msex02.affymetrix.com>

Ack, you're right!  I didn't expect to get bitten by rogue query_ids so
soon...

	gregg

> -----Original Message-----
> From: Nomi Harris [mailto:nomi at fruitfly.org]
> Sent: Monday, February 06, 2006 3:48 PM
> To: Allen Day
> Cc: Helt,Gregg
> Subject: Re: New DAS/2 server for codesprint
> 
> On 6 February 2006, Allen Day wrote:
>  > Hi Nomi,
>  >
>  > I just restarted the server, the "all features" request used all
the
>  > memory and hung the webserver.  I'll look into why that request
wasn't
>  > immediately denied as it used to be.
>  >
>  > As for your .../segments and .../types, they should be .../segment
and
>  > .../type.  I see no reason to pluralize these URLs given that the
> sources
>  > response allows me to provide them at any arbitrary URL:
> 
> oops, gregg led me astray with that one.  right, /segment and /type
> work.  sorry for hanging your server with my inadvertent "all
features"
> request.
>         Nomi


From Gregg_Helt at affymetrix.com  Mon Feb  6 19:14:55 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 16:14:55 -0800
Subject: [DAS2] Re: New DAS/2 server for codesprint
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9AA@msex02.affymetrix.com>


Allen, can you recommend a reasonable region on yeast to do a features
query that will return features with some hierarchy (like
transcript/exons)?

	Thanks,
	Gregg


From Gregg_Helt at affymetrix.com  Mon Feb  6 19:29:12 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 16:29:12 -0800
Subject: [DAS2] Re: New DAS/2 server for codesprint
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9AB@msex02.affymetrix.com>

Actually, that "arbitrary URL" thing doesn't quite work with the current
biopackages server, which has an xml:base pointing to a server at UCLA
for the response to the sequence query:
http://das.biopackages.net/codesprint/sequence

<SOURCES
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">
  <SOURCE id="human" title="human genome" writeable="no" doc_href=""
taxon="Human">
      <VERSION id="human/17" title="Hsa" created="" modified="">
...
        <CATEGORY type="segments" query_id="human/17/segment"/>
      </VERSION>
...
  </SOURCE>
...
</SOURCES>

Which means (I think) that the segments query resolves to
http://radius.genomics.ctrl.ucla.edu/das/sequence/human/17/segment

which for me returns a 404 Not Found response.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Allen Day
> Sent: Monday, February 06, 2006 2:01 PM
> To: Nomi Harris
> Cc: DAS/2
> Subject: [DAS2] Re: New DAS/2 server for codesprint
...
> As for your .../segments and .../types, they should be .../segment and
> .../type.  I see no reason to pluralize these URLs given that the
sources
> response allows me to provide them at any arbitrary URL:
> 
>   [...]
>   <CATEGORY type="segments" query_id="yeast/S228C/segment"/>
>   <CATEGORY type="types"    query_id="yeast/S228C/type"/>
>   <CATEGORY type="locks"    query_id="yeast/S228C/lock"/>
>   [...]
> 
> -Allen
> 
> 
> 
> On Mon, 6 Feb 2006, Nomi Harris wrote:
> 
> > On 6 February 2006, Nomi Harris wrote:
> >  > thanks for setting up the new das/2 server, allen.  i'm having
> trouble
> >  > with some of the queries.
> >
> > ok, i realized that some of the queries i was trying were senseless,
but
> > here are some that should work that are just hanging:
> > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments
> > http://das.biopackages.net/codesprint/sequence/yeast/S228C/types
> >
> >         Nomi
> >
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Steve_Chervitz at affymetrix.com  Mon Feb  6 20:02:30 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 06 Feb 2006 17:02:30 -0800
Subject: [DAS2] Re: New DAS/2 server for codesprint
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9AA@msex02.affymetrix.com>
Message-ID: <C00D31A6.1BB4C%Steve_Chervitz@affymetrix.com>


There's a gene (RPL7A) with two introns on chr7 at roughly 366kbp - 364kbp:
http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YGL076C

Most genes with introns in cerevisiae (which aren't many) have just a single
intron that creates a small 5' exon, such as the alpha and beta tubulin
genes on chr13. Tub1 is on chr13 at 99Kbp, and tub3 is also on chr13 at
23Kbp. So the first 100Kb of chr13 would be another region to try.
http://db.yeastgenome.org/cgi-bin/locus.pl?locus=tub1

Steve


> From: "Helt,Gregg" <Gregg_Helt at affymetrix.com>
> Date: Mon, 6 Feb 2006 16:14:55 -0800
> To: Allen Day <allenday at ucla.edu>
> Cc: DAS/2 <das2 at portal.open-bio.org>
> Conversation: [DAS2] Re: New DAS/2 server for codesprint
> Subject: RE: [DAS2] Re: New DAS/2 server for codesprint
> 
> 
> Allen, can you recommend a reasonable region on yeast to do a features
> query that will return features with some hierarchy (like
> transcript/exons)?
> 
> Thanks,
> Gregg
> 
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Mon Feb  6 21:42:18 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 18:42:18 -0800
Subject: [DAS2] Modifying com.affymetrix.igb.das2 classes
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9AE@msex02.affymetrix.com>

Brian and Marc, 
 
I'm about to start seriously modifying the IGB DAS/2 classes in the
com.affymetrix.igb.das2 package.  There's code in there you wrote to
work with materials, assays, results, and ontology.  I think we
discussed at some point splitting this stuff out into a separate
package(s).  Which sounds good, especially since (as I understand it),
these domains are separate from the DAS/2 "sequence" domain.  The only
place there's a lot of mixture of code for these domains with the
sequence parts is in Das2VersionedSource.  Is it okay if I move this out
(or comment it out) of Das2VersionedSource while I renovate other parts
of the class?
 
            thanks,
            Gregg
 

From Gregg_Helt at affymetrix.com  Mon Feb  6 22:34:48 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 19:34:48 -0800
Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B0@msex02.affymetrix.com>


You're right, it looks like some of this code was already getting moved
over to the das2.assay and das2.ontology packages as subclasses of
Das2VersionedSource.  

However it's not clear to me if the equivalent of source and versioned
source for assay, ontology, and other domains are going to be similar
enough to the DAS/2 sequence domain to justify sharing a base
class/interface.  What do/will they share?

I'll go ahead with changes to the das2 package, and look into moving
much of this code into a das2.sequence package.

	Thanks,
	Gregg

> -----Original Message-----
> From: Brian O'Connor [mailto:boconnor at ucla.edu]
> Sent: Monday, February 06, 2006 7:09 PM
> To: Helt,Gregg
> Cc: Marc Carlson; Allen Day; DAS/2
> Subject: Re: Modifying com.affymetrix.igb.das2 classes
> 
> Hi Gregg,
> 
> Go for it!! Marc and I can take a look at it again when you're happy
> with the changes. The versioned source object really needed an
overhaul
> anyway to deal with the multiple domains of the DAS/2 server. I think
> there should be a VersionedSource parent and then children for each
> domain (i.e. VersionedSourceAssay). I think Marc started to do this
but
> he was afraid to alter the VersionedSource object too much for fear of
> breaking the IGB client.
> 
> --Brian
> 
> Helt,Gregg wrote:
> 
> > Brian and Marc,
> >
> > I'm about to start seriously modifying the IGB DAS/2 classes in the
> > com.affymetrix.igb.das2 package. There's code in there you wrote to
> > work with materials, assays, results, and ontology. I think we
> > discussed at some point splitting this stuff out into a separate
> > package(s). Which sounds good, especially since (as I understand
it),
> > these domains are separate from the DAS/2 "sequence" domain. The
only
> > place there's a lot of mixture of code for these domains with the
> > sequence parts is in Das2VersionedSource. Is it okay if I move this
> > out (or comment it out) of Das2VersionedSource while I renovate
other
> > parts of the class?
> >
> > thanks,
> >
> > Gregg
> >


From boconnor at ucla.edu  Mon Feb  6 22:09:22 2006
From: boconnor at ucla.edu (Brian O'Connor)
Date: Mon, 06 Feb 2006 19:09:22 -0800
Subject: [DAS2] Re: Modifying com.affymetrix.igb.das2 classes
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9AE@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9AE@msex02.affymetrix.com>
Message-ID: <43E80F62.4050403@ucla.edu>

Hi Gregg,

Go for it!! Marc and I can take a look at it again when you're happy 
with the changes. The versioned source object really needed an overhaul 
anyway to deal with the multiple domains of the DAS/2 server. I think 
there should be a VersionedSource parent and then children for each 
domain (i.e. VersionedSourceAssay). I think Marc started to do this but 
he was afraid to alter the VersionedSource object too much for fear of 
breaking the IGB client.

--Brian

Helt,Gregg wrote:

> Brian and Marc,
>
> I?m about to start seriously modifying the IGB DAS/2 classes in the 
> com.affymetrix.igb.das2 package. There?s code in there you wrote to 
> work with materials, assays, results, and ontology. I think we 
> discussed at some point splitting this stuff out into a separate 
> package(s). Which sounds good, especially since (as I understand it), 
> these domains are separate from the DAS/2 ?sequence? domain. The only 
> place there?s a lot of mixture of code for these domains with the 
> sequence parts is in Das2VersionedSource. Is it okay if I move this 
> out (or comment it out) of Das2VersionedSource while I renovate other 
> parts of the class?
>
> thanks,
>
> Gregg
>


From Gregg_Helt at affymetrix.com  Tue Feb  7 00:43:07 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 21:43:07 -0800
Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B1@msex02.affymetrix.com>


Okay, I just split the code that was in Das2VersionedSource.  Now
regions and types (w/o ontology) are handled in Das2VersionedSource, and
ontology, materials, results, and assays are handled by a subclass,
Das2VersionedSourcePlus.  I might do some further refactoring at a later
date, but for right now this works (and compiles/runs).

I also went ahead and committed almost all my DAS/2 code changes to the
genoviz repository.

	Gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Helt,Gregg
> Sent: Monday, February 06, 2006 7:35 PM
> To: Brian O'Connor
> Cc: DAS/2; Marc Carlson
> Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes
> 
> 
> You're right, it looks like some of this code was already getting
moved
> over to the das2.assay and das2.ontology packages as subclasses of
> Das2VersionedSource.
> 
> However it's not clear to me if the equivalent of source and versioned
> source for assay, ontology, and other domains are going to be similar
> enough to the DAS/2 sequence domain to justify sharing a base
> class/interface.  What do/will they share?
> 
> I'll go ahead with changes to the das2 package, and look into moving
> much of this code into a das2.sequence package.
> 
> 	Thanks,
> 	Gregg
> 
> > -----Original Message-----
> > From: Brian O'Connor [mailto:boconnor at ucla.edu]
> > Sent: Monday, February 06, 2006 7:09 PM
> > To: Helt,Gregg
> > Cc: Marc Carlson; Allen Day; DAS/2
> > Subject: Re: Modifying com.affymetrix.igb.das2 classes
> >
> > Hi Gregg,
> >
> > Go for it!! Marc and I can take a look at it again when you're happy
> > with the changes. The versioned source object really needed an
> overhaul
> > anyway to deal with the multiple domains of the DAS/2 server. I
think
> > there should be a VersionedSource parent and then children for each
> > domain (i.e. VersionedSourceAssay). I think Marc started to do this
> but
> > he was afraid to alter the VersionedSource object too much for fear
of
> > breaking the IGB client.
> >
> > --Brian
> >
> > Helt,Gregg wrote:
> >
> > > Brian and Marc,
> > >
> > > I'm about to start seriously modifying the IGB DAS/2 classes in
the
> > > com.affymetrix.igb.das2 package. There's code in there you wrote
to
> > > work with materials, assays, results, and ontology. I think we
> > > discussed at some point splitting this stuff out into a separate
> > > package(s). Which sounds good, especially since (as I understand
> it),
> > > these domains are separate from the DAS/2 "sequence" domain. The
> only
> > > place there's a lot of mixture of code for these domains with the
> > > sequence parts is in Das2VersionedSource. Is it okay if I move
this
> > > out (or comment it out) of Das2VersionedSource while I renovate
> other
> > > parts of the class?
> > >
> > > thanks,
> > >
> > > Gregg
> > >
> 
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Tue Feb  7 00:46:37 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 21:46:37 -0800
Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B2@msex02.affymetrix.com>

Will you be able to join the teleconference tomorrow (Tuesday?).  Suzi
is planning to join in, I'm hoping we can spend some time discussing
ontologies.

	Thanks
	Gregg

P.S.  
   9 AM Pacific time
   800-531-3250
   id: 2879055	

> -----Original Message-----
> From: Lincoln Stein [mailto:lstein at cshl.edu]
> Sent: Monday, February 06, 2006 10:20 AM
> To: das2 at portal.open-bio.org
> Cc: Helt,Gregg
> Subject: Re: [DAS2] Agenda for DAS/2 Code Sprint Teleconference
2005-02-06
> 
> Hi Gregg,
> 
> I had a conflicting teleconference and wasn't sure whether there was a
> teleconference scheduled for the code sprint, so I didn't dial in.
Just
> got
> the agenda now.
> 
> I am online on both MSN and AOL chats, and will be all week, if anyone
> wants
> to IM me.
> 
> Lincoln
> 
> On Monday 06 February 2006 12:13, Helt,Gregg wrote:
> > Status report
> > DAS/2 XML - valid or not valid?
> > CATEGORY elements -- constructing query URLs
> > MAINTAINER information
> > Use of xml:base
> > update on feature properties - searching, etc.
> >
> >
> >
> >
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From dalke at dalkescientific.com  Tue Feb  7 04:22:56 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 09:22:56 +0000
Subject: [DAS2] <CATEGORY> elements
In-Reply-To: <43E7D251.8050703@affymetrix.com>
References: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com>
	<43E7D251.8050703@affymetrix.com>
Message-ID: <8daf0ba1e5744f8e0b99fc644fb5dd38@dalkescientific.com>

Ed Erwin wrote:
> None of your examples vary the words "segments", "types" or 
> "features", but it is legal to do so, right?:
>
>            segments at "ensembl.org/das2/genome_A/build_1/segment"
>            features at "example.com/A/version_x/things/and/more/things"
>            types at "example.com/A/rhinoceros"
>
> OK, so no one is likely to go that far, but is it legal for example to 
> use non-plural "segment", "feature" and "type" ?

Yes.  My goal is two-fold.  First, make no assertions on the internal
organization of the DAS server.  Machines can change, directories
can move around.

The specific advantages are:
   - annotation servers can all point to the same "segments" server
   - multiple versions of the same genomic source and on the same
       machine can reuse the same "types" server

Another thought, perhaps too old-fashioned for modern web development,
is that the query URLs are cgi scripts in a "cgi-bin" directory
while the data files are flat-files in some other directory.

Simiarly, the query url if a CGI script might end with a ".cgi"
or ".pl" extension.

My second goal is to develop a recommended layout.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Feb  7 04:32:11 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 09:32:11 +0000
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
	6 Feb 2006
In-Reply-To: <C00CE876.1BB01%Steve_Chervitz@affymetrix.com>
References: <C00CE876.1BB01%Steve_Chervitz@affymetrix.com>
Message-ID: <97f6d51a2e54031ed49fe7997af383eb@dalkescientific.com>

> gh: would like a re-cast as xml document, hosted at so/sofa
> website. that xml would be like a std ontology representation so you
> could extend it. so someone could point to an extension of it.

I asked as an action item if Gregg would look into the solution
for this.  Do we refer to the ontology by a "GO:0123456" identifier
or by some URL scheme?  If so, what's the mapping from URL scheme
to something that clients and people can understand, eg, to
ask for everything which is an exon?

Does this mapping need a version number - does it change over time?

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Feb  7 05:38:28 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 10:38:28 +0000
Subject: [DAS2] per-database MAINTAINER
Message-ID: <294a2caeb29a823dd93fa1155012c8cb@dalkescientific.com>

Based on Andreas Prlic's work with the DAS2 registry I've
added a new MAINTAINER element to the SOURCE/VERSION part
of the SOURCES document.

I've updated das/das2/scratch/sources4.xml to have an
example.  It looks something like this

<?xml version="1.0" encoding="UTF-8"?>
<SOURCES
     xmlns="http://www.biodas.org/ns/das/genome/2.00"
     xml:base="http://dev.wormbase.org/das/genome/">

   <MAINTAINER email="someone at EBI" />

   <SOURCE id="volvox" title="Mr. Volvox" taxid="3066" 
xml:base="/DAS2/GENOME/">

     <VERSION id="volvox/b1" title="Build 1, October 2002"
            created="2002-10-15" modified="2002-10-25T09:56:23">

       <MAINTAINER name="Fred, down the hall" />
    </VERSION>
   </SOURCE>
</SOURCES>


The idea is that the database maintainer can be different
than the server maintainer.

On the other hand addition, if the SOURCES/SOURCE/VERSION/MAINTAINER
is not present then clients may assume that the database
maintainer is the same as the SOURCES/MAINTAINER

The maintainer elements are both optional.

					Andrew
					dalke at dalkescientific.com


From allenday at ucla.edu  Tue Feb  7 05:52:12 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 7 Feb 2006 02:52:12 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>

The XML is now as you requested, please confirm.

After some thought today I realized the new SOURCES response is fully
compatible with the existing server.  The doc at:

http://das.biopackages.net/codesprint/sequence

is now simply a static XML doc that points into the stable server (plus
the new "segments" response) implementation at:

http://das.biopackages.net/das/genome

The headers for the static document don't include the correct Content-Type
"application/x-das-blah ; version = XxX", it's simply "text/xml".  I'll
add the headers in the morning GMT+8.

There are probably also some other Content-Type headers that need to be
changed for the other responses -- let me know if you spot them.

-Allen


On Mon, 6 Feb 2006, Andrew Dalke wrote:

> Allen:
> > After looking closely over this first draft of new_spec.txt, it's 
> > apparent
> > that there are still some holes, e.g. what should the response to the
> > following requests look like?
> >
> > http://das.biopackages.net/codesprint/sequence/yeast
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <SOURCES
>        xmlns:xlink="http://www.w3.org/1999/xlink"
>        xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">
> taxon="Yeast">
>        <VERSION id="yeast/S228C" title="Sce" created="" modified="">
> 
>        <COORDINATES taxid="" source="" authority="">
>          <VERSION name=""/>
>        </COORDINATES>
> 
>        <ASSEMBLY>
>          <LINK href="" priority=""/>
>        </ASSEMBLY>
> 
>        <PROP key="" value=""/>
> 
>        <CATEGORY type="features" query_id="yeast/S228C/feature">
>          <!-- list non-das2xml templates here -->
>        </CATEGORY>
>        <CATEGORY type="segments" query_id="yeast/S228C/segment"/>
>        <CATEGORY type="types"    query_id="yeast/S228C/type"/>
>        <CATEGORY type="locks"    query_id="yeast/S228C/lock"/>
> 
>      </VERSION>
> 
>    </SOURCE>
> </SOURCES>
> 
> 
> > http://das.biopackages.net/codesprint/sequence/yeast/S228C
> 
> The same for this case.  There is only on VERSION for "yeast".
> 
> 
> Your XML, btw, starts
> 
> <?xml version="1.0" standalone="no"?>
> <?xml-stylesheet type="text/xsl" href="/xsl/das.xsl"?>
> <!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">
> <!-- this doesn't work and screws up the xsl     
> xmlns="http://www.biodas.org/ns/das/genome/2.00" -->
> <SOURCES
>        xmlns:xlink="http://www.w3.org/1999/xlink"
>        xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">
> 
> The "standalone" means that the DTD may affect the content of the
> documentation.
>    http://www.stylusstudio.com/w3c/xml11/sec-rmd.htm
> 
> > Markup declarations can affect the content of the document, as passed 
> > from an XML Processor to an application; examples are attribute 
> > defaults and entity declarations. The standalone document declaration, 
> > which MAY appear as a component of the XML declaration, signals 
> > whether or not there are such declarations which appear external to 
> > the Document Entity or in parameter entities. An external markup 
> > declaration is defined as a markup declaration occurring in the 
> > external subset or in a parameter entity (external or internal, the 
> > latter being included because non-validating processors are not 
> > required to read them).
> 
> For what we're doing, we don't need nor (I think) want that.  There's
> no reason for a client to consult the DTD to figure out the XML.
> 
> Instead, use
> 
> <?xml version="1.0"?>
> 
> and probably have the encoding
> 
> <?xml version="1.0" encoding="UTF-8"?>
> 
> That also means you can get rid of the
> 
> <!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">
> 
> statements.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From dalke at dalkescientific.com  Tue Feb  7 07:19:28 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 12:19:28 +0000
Subject: [DAS2] properties and queries
Message-ID: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com>

We've had a long discussion here about properties and how to
search them.  As it stands now the spec has a few holes in it.

Here are the properties we've talked about.

program_name: the program used to make the annotation, like
   "BLASTX 1.2.3"

notes:
   There can be 0 or more notes.  Notes might refer to other
   notes (eg, "the previous note said XYZ but I think ABC")

phase: (is it 0, 1, 2 or 1, 2, 3?)
   (And does anyone use this? People here don't use it; Thomas
    "reinfers it by counting along the transcript" "but maybe
    that's just me".  Others say they don't use the DAS1 phase.)

icon: a hypothetical image use for the feature, perhaps as
    a binary png;

curation history:
   a list of elements, each with
    - person
    - timestamp
    - reason for change

score: a floating point number, which may be in exponential
    notation like "1E-3"

Each one needs different search mechanisms.  For example,
   "annotations done by that buggy version of BLAST 1.2.3"
   "scores better than 1E-2"
   "changes by Andrew done in August 2004"
   "notes with the substring 'helicase'" (case sensitive or not?)
   "notes with the phrase 'E. Coli'" (substring might not work
       if there's the note has 'E.\nColi')

The property storage scheme doesn't handle this quite correctly.
Here are problems:

   - how do you store multiple notes?

Answer 1: use structured named, like "note_1", "note_2", "note_3", ..
HACK! Then what if a note is deleted?  Bigger problem; how do you
search the "note" field using the existing query language?

Answer 2: allow duplicate note elements, like
   <prop key="note" value="This is a note" />
   <prop key="note" value="The previous note is a lie!" />
   <prop key="note" value="Ignore the 2nd note - silly Cretan!" />

Question: so the order must be preserved if two fields have the
same name?  Can't implement with a dictionary/hash data type.

Question: what if there are duplicate "score" or "phase" elements?
Which one wins?

Answer 3: Notes are important and we know we need them now.
Let's have a <NOTE> element and not make it be a property.

<NOTE>This is a note</NOTE>
<NOTE>The previous note is a lie!</NOTE>
<NOTE>Is this an E or a NOT-E?</NOTE>

(perhaps also with timestamp and author name, but that's a different
question.)  Then we also define that the "note=" parameter in as
DAS query is a substring search of the <NOTE> elements of a feature.

I like this one.


   - How do you do numeric searches?

This is hypothetical.  There hasn't been a requirement for this.
'Course it may be because people haven't had the ability.  In
any case, how to search numeric fields like "score" with comparisons?


  - querying non-queryable fields

If there's embedded binary data, like an image, is it searchable?
Does a server complain and die? Ignore the request?

  - more complex text searches

"proteinase but not inhibitor"

  - complex data

We have support for non-DAS extensions, which might be

<sanger:curation-history xmlns:sanger="http://www.sanger.ac.uk/das/ext" 
 >
  <sanger:curation name="Andrew" date="2005-06-07">
    Change the this into that because of some reason or other
  </sanger:curation>


Thomas proposed that we support some sort of complex query
language, probably in XML, and get rid of the simple query scheme
we have now.

I argued against the complexity of that given that nearly all
of the queries will be "give me these feature types on this range
of that chromosome".  I also pointed out that developing a
generic query language is hard, and implementing it is harder.
Why require all that effort?

Roy commented the other way - in a server with only a few hundred
features, why require a query language at all?  Just return all
of the features in the request.

Here's what I proposed.

We have the "CATEGORY" (but after discussion I now want to take
it back to "CAPABILITY" since that's now much closer to what
it does - it describes where to go to do something)

So I'll use "CAPABILITY"

The current scheme has

<CAPABILITY type="features" query_url="http://...../features">
   <FORMAT ... />
</CAPABILITY>

This is an extensibility point.  Suppose Thomas has an XML
query search interface support on his server, with Sanger
clients that handle it.  Then there can be

<CAPABILITY type="thomas-xml-search" 
query_url="http.../search-features">
   <FORMAT ... />
</CAPABILITY>

A client can see the list of CAPABILITIES and decide to
use the feature search mechanism it likes best.

In addition, we could say that "this supports the normal DAS
query scheme but also supports extension vocabulary.  For example,

<CAPABILITY type="features" query_url="http://...../features">
   <SUPPORTS name="sanger-curation" />
   <FORMAT ... />
</CAPABILITY>

With this a client knows that the query_url supports the normal
DAS queries and also supports the "annotator", "annotation_before"
and "annotation_after" queries, like this

   .../features?annotator=Andrew;annotation_before=2005

Possible idea: if there is no SUPPORTs tag then the server
implements no search syntax and instead returns everything,
for the example Roy mentioned.

Okay, we're off to lunch.

					Andrew
					dalke at dalkescientific.com


From ap3 at sanger.ac.uk  Tue Feb  7 07:21:53 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 7 Feb 2006 12:21:53 +0000
Subject: [DAS2] das-regstry sources response
Message-ID: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk>

Hi!

I added a DAS2- sources response to a copy of the das registry running 
on my laptop.
the attached file shows how the das1 sources are described using the 
das2 spec.
- it fits together rather well.

I did not know what to put under the <ASSEMBLY>. The <COORDINATES> 
already contain all required info.
Therefore I propose to drop <ASSEMBLY>

Andreas


-------------- next part --------------
A non-text attachment was scrubbed...
Name: sources_response.xml
Type: application/octet-stream
Size: 32318 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20060207/922c594e/attachment.obj>
-------------- next part --------------


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891

From dalke at dalkescientific.com  Tue Feb  7 08:20:35 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 13:20:35 +0000
Subject: [DAS2] das-regstry sources response
In-Reply-To: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk>
References: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk>
Message-ID: <e906fc30e5d1c60aa2abcec7f6a4db56@dalkescientific.com>

Andreas:
> I did not know what to put under the <ASSEMBLY>. The <COORDINATES> 
> already contain all required info.
> Therefore I propose to drop <ASSEMBLY>

Removed and commited to CVS.

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Tue Feb  7 10:34:21 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Tue, 7 Feb 2006 07:34:21 -0800
Subject: [DAS2] Ontologies in DAS/2
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>

I talked to Suzi, she's planning to join our teleconference today to
discuss ontologies, wearing her hat as co-PI of the National Center for
Biomedical Ontology.  Hopefully Lincoln can join too.

I took a closer look at the DAS/2 ontology work Allen has done (see
http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone who
wants to contribute to the ontology discussion to read this doc.  It
specifies a way to retrieve ontologies in OBOXML format.  In this format
each ontology term gets an absolute URI through the same mechanism that
the rest of DAS/2 uses (URIs for ids, which can be either absolute or
relative but resolvable).  As Allen pointed out yesterday this would
solve our problem of how to uniquely specify ontology terms in the DAS/2
TYPES XML.

I couldn't find any documentation for the OBOXML format, other than the
code that generates it from OBO files.  But I'm using OBOXML as an
example here because it clearly has resolvable URIs for each ontology
term.  In Allen's spec, ontologies can also be returned in other
formats, but it's unclear to me whether terms in these other formats
would resolve to similar URIs.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Tuesday, February 07, 2006 1:32 AM
> To: DAS/2
> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> sprint,6 Feb 2006
> 
> > gh: would like a re-cast as xml document, hosted at so/sofa
> > website. that xml would be like a std ontology representation so you
> > could extend it. so someone could point to an extension of it.
> 
> I asked as an action item if Gregg would look into the solution
> for this.  Do we refer to the ontology by a "GO:0123456" identifier
> or by some URL scheme?  If so, what's the mapping from URL scheme
> to something that clients and people can understand, eg, to
> ask for everything which is an exon?
> 
> Does this mapping need a version number - does it change over time?
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org


From dalke at dalkescientific.com  Tue Feb  7 10:45:00 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 15:45:00 +0000
Subject: [DAS2] properties and queries
In-Reply-To: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com>
References: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com>
Message-ID: <16111cd36850795dfd46696a63fb1057@dalkescientific.com>

To summarize, the current thought here for properties and queries
is as follows  (it's a long summary.  More like an essay.  :)

Add support for zero or more <NOTE> elements in the feature, of
the form
   <NOTE>This is some arbitrary (but non-markup-ed) text</NOTE>


Add a features search keyword "note=" which takes a search string
to be found in the note elements.  (substring? soundex? regex?
the search engine calls up Lincoln and asks?)


Add support for zero or more <ALIAS> elements in the feature,
of the form
   <ALIAS name="Zorro">

(I missed this in the redraft.  It should have been there.
Feature filter "name" already says it searches the "name" and
"alias" fields for a feature.)


Ignore the "phase" property (contentious, perhaps?) or add it
as an attribute of something else in the feature element.


Ignore the "score" property.  As written in the current spec
   "score" A floating point number indicating a context-dependent
   score. This is to be used only when a more specific ontology-driven
   score cannot be used.  (Umm, where do the other scores go?)
Unless someone wants to define that score ontology and what it means
to search that field, this is a can of worms I don't want to open.


Ignore the "editable" property.  As written (and kibbitzed)
   "editable" indicates that features may be updateable (this is at the
   discretion of the server).  (But this is potentially per-user data.)

This should either be in the feature type or it should be in
some write-back specific data structure the client can fetch.
(To be discussed) It isn't a feature property.

This gets rid of all stated needs for arbitrary key/value data.


That doesn't mean there won't be future needs.

In that case, here's how to add new pieces of data.

1) use a non-DAS extension element.  Clients must ignore elements
they don't understand.

This is good for storing data, but not for searching.  The
thing is, the search mechanism (or multiple search mechanisms
perhaps) is data field specific.  Hence,

2) servers may provide extensions to the basic DAS query mechanism.
Currently the mechanism is:
   and-ed set of zero or more  keyword = (set, of, or, terms, for, 
keyword)
where "keyword" is well-defined by DAS except for the "att"
property keywords.

Query extensions add new keywords in the same syntax, and define
somewhere how that syntax works.  It must be backwards compatible
to the existing syntax and semantics.

The problem then is clients don't know that a server supports a
given query extension, so

3) add a <SUPPORTS> element to the <CAPABILITY> element.
(Also proposed, renaming "CATEGORY" back to "CAPABILITY".)
The CAPABILITY may have zero or more of

   <SUPPORTS name="some-unique-string" />

Here are the two defined unique strings,

   <SUPPORTS name="all" />
   <SUPPORTS name="das2" />

The "all" query says that a client may reasonably fetch all
the features in one go.  This would occur with a small DAS
server containing only a few hundred features.  In that case
there's no need to even have a CGI script running on the
back end - just a set of flat files.  The query is done by
fetching the URL with no parameters.

A rich server with millions of features might decide to
not support an "all" query.

The "das2" query is the one we've been talking about.

If a site develops a query extension it adds

   <SUPPORTS name="sanger-curation-search" />

so clients know what the server can do.  (In this case supporting
searches for "annotator", "annotation_before" and "annotation_after"
fields.)

That all said, this doesn't mean that the server shouldn't
have a property table.  It's a question of what it means
to search the property table.

People here want the following:
   multiple properties may have the same key and different value
   the order of the properties is not important
   the "att:" search is renamed a "prop:" search, like "prop:author"
   the search is a substring search.
   a feature matches a search if any of the properties with that name
      match the substring search

For example,
   source = BLAST 2.3.4
   author = Andrew Dalke
   author = Thomas Down

lets me search for

   features?prop:author=Andrew
all features with "Andrew" as a substring in the "author" property

   features?prop:author=Andrew;source=BLAST
all features with "Andrew" as a substring in the "author"
and with "BLAST" in the source name

   features?prop:author=Andrew,Thomas
all features with "Andrew" or "Thomas" as an author


Really what I think this essay is doing is saying that
storing data and searching data is different.  Servers can
develop new ways to extend DAS searches and flag that they
support new searches.  (Eg, the new search may be to support
a different way to search a field in the property table.)

But there needs to be a really basic substring search, given
that there will be simple string key/ string value data
for the property table.

Oh, and should the key/value table also include my proposed
"href" and embedded binary data fields like images?  Hmmmmm....

Lots of talk about this here.  Time for a tea break.

					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Tue Feb  7 11:00:52 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 7 Feb 2006 11:00:52 -0500
Subject: [DAS2] properties and queries
In-Reply-To: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com>
References: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com>
Message-ID: <200602071100.52818.lstein@cshl.edu>

Hi,

I use the phase information quite a lot and I know that others do as well. The 
phase is {0,1,2} and the meaning is described here:

	For features of type "CDS", the phase indicates where the feature
	begins with reference to the reading frame.  The phase is one of the
	integers 0, 1, or 2, indicating the number of bases that should be
	removed from the beginning of this feature to reach the first base of
	the next codon. In other words, a phase of "0" indicates that the next
	codon begins at the first base of the region described by the current
	line, a phase of "1" indicates that the next codon begins at the
	second base of this region, and a phase of "2" indicates that the
	codon begins at the third base of this region. This is NOT to be
	confused with the frame, which is simply start modulo 3.

Lincoln

On Tuesday 07 February 2006 07:19, Andrew Dalke wrote:
> We've had a long discussion here about properties and how to
> search them.  As it stands now the spec has a few holes in it.
>
> Here are the properties we've talked about.
>
> program_name: the program used to make the annotation, like
>    "BLASTX 1.2.3"
>
> notes:
>    There can be 0 or more notes.  Notes might refer to other
>    notes (eg, "the previous note said XYZ but I think ABC")
>
> phase: (is it 0, 1, 2 or 1, 2, 3?)
>    (And does anyone use this? People here don't use it; Thomas
>     "reinfers it by counting along the transcript" "but maybe
>     that's just me".  Others say they don't use the DAS1 phase.)
>
> icon: a hypothetical image use for the feature, perhaps as
>     a binary png;
>
> curation history:
>    a list of elements, each with
>     - person
>     - timestamp
>     - reason for change
>
> score: a floating point number, which may be in exponential
>     notation like "1E-3"
>
> Each one needs different search mechanisms.  For example,
>    "annotations done by that buggy version of BLAST 1.2.3"
>    "scores better than 1E-2"
>    "changes by Andrew done in August 2004"
>    "notes with the substring 'helicase'" (case sensitive or not?)
>    "notes with the phrase 'E. Coli'" (substring might not work
>        if there's the note has 'E.\nColi')
>
> The property storage scheme doesn't handle this quite correctly.
> Here are problems:
>
>    - how do you store multiple notes?
>
> Answer 1: use structured named, like "note_1", "note_2", "note_3", ..
> HACK! Then what if a note is deleted?  Bigger problem; how do you
> search the "note" field using the existing query language?
>
> Answer 2: allow duplicate note elements, like
>    <prop key="note" value="This is a note" />
>    <prop key="note" value="The previous note is a lie!" />
>    <prop key="note" value="Ignore the 2nd note - silly Cretan!" />
>
> Question: so the order must be preserved if two fields have the
> same name?  Can't implement with a dictionary/hash data type.
>
> Question: what if there are duplicate "score" or "phase" elements?
> Which one wins?
>
> Answer 3: Notes are important and we know we need them now.
> Let's have a <NOTE> element and not make it be a property.
>
> <NOTE>This is a note</NOTE>
> <NOTE>The previous note is a lie!</NOTE>
> <NOTE>Is this an E or a NOT-E?</NOTE>
>
> (perhaps also with timestamp and author name, but that's a different
> question.)  Then we also define that the "note=" parameter in as
> DAS query is a substring search of the <NOTE> elements of a feature.
>
> I like this one.
>
>
>    - How do you do numeric searches?
>
> This is hypothetical.  There hasn't been a requirement for this.
> 'Course it may be because people haven't had the ability.  In
> any case, how to search numeric fields like "score" with comparisons?
>
>
>   - querying non-queryable fields
>
> If there's embedded binary data, like an image, is it searchable?
> Does a server complain and die? Ignore the request?
>
>   - more complex text searches
>
> "proteinase but not inhibitor"
>
>   - complex data
>
> We have support for non-DAS extensions, which might be
>
> <sanger:curation-history xmlns:sanger="http://www.sanger.ac.uk/das/ext"
>
>   <sanger:curation name="Andrew" date="2005-06-07">
>     Change the this into that because of some reason or other
>   </sanger:curation>
>
>
> Thomas proposed that we support some sort of complex query
> language, probably in XML, and get rid of the simple query scheme
> we have now.
>
> I argued against the complexity of that given that nearly all
> of the queries will be "give me these feature types on this range
> of that chromosome".  I also pointed out that developing a
> generic query language is hard, and implementing it is harder.
> Why require all that effort?
>
> Roy commented the other way - in a server with only a few hundred
> features, why require a query language at all?  Just return all
> of the features in the request.
>
> Here's what I proposed.
>
> We have the "CATEGORY" (but after discussion I now want to take
> it back to "CAPABILITY" since that's now much closer to what
> it does - it describes where to go to do something)
>
> So I'll use "CAPABILITY"
>
> The current scheme has
>
> <CAPABILITY type="features" query_url="http://...../features">
>    <FORMAT ... />
> </CAPABILITY>
>
> This is an extensibility point.  Suppose Thomas has an XML
> query search interface support on his server, with Sanger
> clients that handle it.  Then there can be
>
> <CAPABILITY type="thomas-xml-search"
> query_url="http.../search-features">
>    <FORMAT ... />
> </CAPABILITY>
>
> A client can see the list of CAPABILITIES and decide to
> use the feature search mechanism it likes best.
>
> In addition, we could say that "this supports the normal DAS
> query scheme but also supports extension vocabulary.  For example,
>
> <CAPABILITY type="features" query_url="http://...../features">
>    <SUPPORTS name="sanger-curation" />
>    <FORMAT ... />
> </CAPABILITY>
>
> With this a client knows that the query_url supports the normal
> DAS queries and also supports the "annotator", "annotation_before"
> and "annotation_after" queries, like this
>
>    .../features?annotator=Andrew;annotation_before=2005
>
> Possible idea: if there is no SUPPORTs tag then the server
> implements no search syntax and instead returns everything,
> for the example Roy mentioned.
>
> Okay, we're off to lunch.
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Tue Feb  7 11:46:47 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 7 Feb 2006 11:46:47 -0500
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
Message-ID: <200602071146.48212.lstein@cshl.edu>

Hi,

I have group meeting from 12-1 every Tuesday, so I can't make this one. I'll 
be present for the telecon Wednesday at 12.

Lincoln


On Tuesday 07 February 2006 10:34, Helt,Gregg wrote:
> I talked to Suzi, she's planning to join our teleconference today to
> discuss ontologies, wearing her hat as co-PI of the National Center for
> Biomedical Ontology.  Hopefully Lincoln can join too.
>
> I took a closer look at the DAS/2 ontology work Allen has done (see
> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone who
> wants to contribute to the ontology discussion to read this doc.  It
> specifies a way to retrieve ontologies in OBOXML format.  In this format
> each ontology term gets an absolute URI through the same mechanism that
> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
> relative but resolvable).  As Allen pointed out yesterday this would
> solve our problem of how to uniquely specify ontology terms in the DAS/2
> TYPES XML.
>
> I couldn't find any documentation for the OBOXML format, other than the
> code that generates it from OBO files.  But I'm using OBOXML as an
> example here because it clearly has resolvable URIs for each ontology
> term.  In Allen's spec, ontologies can also be returned in other
> formats, but it's unclear to me whether terms in these other formats
> would resolve to similar URIs.
>
> 	gregg
>
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
>
> [mailto:das2-bounces at portal.open-
>
> > bio.org] On Behalf Of Andrew Dalke
> > Sent: Tuesday, February 07, 2006 1:32 AM
> > To: DAS/2
> > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> > sprint,6 Feb 2006
> >
> > > gh: would like a re-cast as xml document, hosted at so/sofa
> > > website. that xml would be like a std ontology representation so you
> > > could extend it. so someone could point to an extension of it.
> >
> > I asked as an action item if Gregg would look into the solution
> > for this.  Do we refer to the ontology by a "GO:0123456" identifier
> > or by some URL scheme?  If so, what's the mapping from URL scheme
> > to something that clients and people can understand, eg, to
> > ask for everything which is an exon?
> >
> > Does this mapping need a version number - does it change over time?
> >
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dalke at dalkescientific.com  Tue Feb  7 11:50:56 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 16:50:56 +0000
Subject: [DAS2] query_api and server layout
Message-ID: <a66a09d2312ce8288b3e55fcd2c22d28@dalkescientific.com>

Continuing from yesterday's discussion...

There are several things in a DAS server

- there is the list of all sources and versions
- there is a list of all versions for a source
- there is the versioned source information

The versioned source only really provides a bit of
overall configuration information and links to three URLs:

   - the query interface for features
   - the query interface for types
   - the query interface for segments

It doesn't say anything about where the actual feature,
type and segment data is stored.  It doesn't even mean
that the query URLs are on the same machine as the versioned
source document.  Hence Andreas can have his registry server.

DAS defines what those queries do.  The segments query URL
interface can be a shared reference server.  It has a
rather simple interface:
   - get URLs and information for each segment
       - given a sequence URL return the sequence data
   - return the assembly data

The segment and sequence data does not need to be on the
same machine as the segments query URL.  It likely will
be but does not need to be.


DAS defines what the types interface does.  At present it
is also very simple.  Be default it lists everything, or
you can ask it for an "ontology" or (proposed new query)
"exact_ontology", and it returns all DAS types which match
that request.

The actual DAS type data does not need to be on the same
server has the DAS query URL, though again it probably will
be.  The types query URL does not need to be on the same machine
as the segments query URL.

Similarly, the features query URL implements the DAS query
interface and returns a list of features.  The actual features
do not need to be on the same machine or directory location
as the feature query, or the types, or the segments.

Here are some possible reasons for the different locations:

Common case:
   - segments query URL and segments data on a reference server
   - versioned source provides its own types and features

New genome / internal project:
   - database implements all three query URLs

Registry server:
   - each versioned source entry points to the original machine's
       values for the segments, types and features query URLs

Multiple versions database, shared types:
   - segments points to the reference server
   - all versioned sources "types" query url point to the same URL
   - each versioned source gets it own features query

old-style CGI-based web server:
   - the "segments" query url points to the reference server
   - the individual features, types and sources are ".xml" files
       in the file system
   - the query URLs end with ".cgi" and start a CGI script


If we say that the URL for doing a types query is composed as:
   <the versioned source URL> + "/" (if missing) + "types"

then at the very least we preclude CGI-based servers.  No big
deal perhaps?  It also makes things slightly more duplicitous
when several versions of the database share the same DAS "types"
(and "segments").

I also think using a server-provided URL is easier than constructing
the URL in code.  Get the "query_url", perhaps resolved by the
xml:base.  That's it.  No need to add in the "/types".

Gregg worries about the network performance of having
   <FEATURE type="../../type/AB123">
    <LOC id="http://some.other.server" range="300:400"/>
    <REGION id="feature/QW41414" />
   </FEATURE>

because each location has the full URL to another server and
the type in this case refers to a types collection shared
by all of the versions of the source.

I've thought about that for a while.  It's a reasonable and
serious architectural concern.  I think the right response
is that that's an architecture decision we should leave up to
the data provider.  If Gregg wants more compact XML and that
on-the-fly compression slows things down too much then his
DAS server can make the segments, types and features all be
not only on the same machine but in the same directory.

The following is valid (omitting some required parts)

<SOURCE>
   <VERSION id="/h_sapiens/v1/">
    <CAPABILITY type="features" query_id="/h_sapiens/v1/features" />
    <CAPABILITY type="types" query_id="/h_sapiens/v1/types" />
    <CAPABILITY type="segments" query_id="/h_sapiens/v1/segments" />
   </VERSION>
</SOURCE>

The features request can return

GET /h_sapiens/v1/features
<FEATURES xmlns:das="...">
  <FEATURE id="F12345" type="Tabcde">
    <LOC id="C1" range="32:34"/>
    <REGION id="F789" />
  </FEATURE>
</FEATURES>

In this architecture, features start with an 'F', like
   /h_sapiens/v1/F12345
types start with a 'T', like
   /h_sapiens/v1/Tabcde
and regions start with a 'C', like
   /h_sapiens/v1/S1

This is about as compact as I think you can make it, yet it
still fits into the current DAS spec.  (You don't even need
the special character - it only makes it easier to see that
the names/URLs will never collide.)

					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Tue Feb  7 11:51:55 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 7 Feb 2006 11:51:55 -0500
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
Message-ID: <200602071151.56939.lstein@cshl.edu>

Allen's ideas seem very sensible and easy to manage. We can already serve 
associations between genomic features and GO terms via properties, so the 
concerns expressed in the discussion section about the big GO API shouldn't 
apply.

Lincoln

On Tuesday 07 February 2006 10:34, Helt,Gregg wrote:
> I talked to Suzi, she's planning to join our teleconference today to
> discuss ontologies, wearing her hat as co-PI of the National Center for
> Biomedical Ontology.  Hopefully Lincoln can join too.
>
> I took a closer look at the DAS/2 ontology work Allen has done (see
> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone who
> wants to contribute to the ontology discussion to read this doc.  It
> specifies a way to retrieve ontologies in OBOXML format.  In this format
> each ontology term gets an absolute URI through the same mechanism that
> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
> relative but resolvable).  As Allen pointed out yesterday this would
> solve our problem of how to uniquely specify ontology terms in the DAS/2
> TYPES XML.
>
> I couldn't find any documentation for the OBOXML format, other than the
> code that generates it from OBO files.  But I'm using OBOXML as an
> example here because it clearly has resolvable URIs for each ontology
> term.  In Allen's spec, ontologies can also be returned in other
> formats, but it's unclear to me whether terms in these other formats
> would resolve to similar URIs.
>
> 	gregg
>
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
>
> [mailto:das2-bounces at portal.open-
>
> > bio.org] On Behalf Of Andrew Dalke
> > Sent: Tuesday, February 07, 2006 1:32 AM
> > To: DAS/2
> > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> > sprint,6 Feb 2006
> >
> > > gh: would like a re-cast as xml document, hosted at so/sofa
> > > website. that xml would be like a std ontology representation so you
> > > could extend it. so someone could point to an extension of it.
> >
> > I asked as an action item if Gregg would look into the solution
> > for this.  Do we refer to the ontology by a "GO:0123456" identifier
> > or by some URL scheme?  If so, what's the mapping from URL scheme
> > to something that clients and people can understand, eg, to
> > ask for everything which is an exon?
> >
> > Does this mapping need a version number - does it change over time?
> >
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From Gregg_Helt at affymetrix.com  Tue Feb  7 11:54:39 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Tue, 7 Feb 2006 08:54:39 -0800
Subject: [DAS2] Proposed agenda for DAS/2 code sprint teleconference,
	Tuesday Feb 7
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B8@msex02.affymetrix.com>

Vote on how to construct URLs to query for segments, types, features: 
   1.) specified by query_id
   2.) hardwired to ~/segments, ~/types, ~/features
   3.) ?

Status Report

Integrating sequence ontology with DAS/2 (and possibly other ontologies)

Feature properties and queries over properties

MAINTAINER information

Use of xml:base

?


From dalke at dalkescientific.com  Tue Feb  7 12:01:38 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 17:01:38 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
	<Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>
Message-ID: <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com>

Allen
> The XML is now as you requested, please confirm.

Missing the namespace declaration.  You have


<SOURCES
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xml:base="http://das.biopackages.net/das/genome/">

should be

<SOURCES
       xmlns="http://www.biodas.org/ns/das/genome/2.00"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xml:base="http://das.biopackages.net/das/genome/">

The <PROP> element goes after the CATEGORY.  (Which I want to
rename back to CAPABILITY.)

The ASSEMBLY element no longer exists.

Fixing those by hand,

* file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: 
error: attribute "writeable" not allowed at this point; ignored
* file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: 
error: attribute "taxon" not allowed at this point; ignored

There is no more 'writeable' (that's, IMO) something to be decided
as part of the writeback spec.  It might be that we have a

<CAPABILITY type="writeback" />

and the existence of that indicate writeability.

It's also "taxid" and not "taxon".  I used "taxid" because that's
what NCBI uses for their data.

> There are probably also some other Content-Type headers that need to be
> changed for the other responses -- let me know if you spot them.

Haven't gotten that far yet.

					Andrew
					dalke at dalkescientific.com


From allenday at ucla.edu  Tue Feb  7 12:25:03 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 7 Feb 2006 09:25:03 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
	<Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>
	<5d78aba1ae5117624db08b4dd395c985@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0602070923370.29889@sumo.ctrl.ucla.edu>

On Tue, 7 Feb 2006, Andrew Dalke wrote:

> Allen
> > The XML is now as you requested, please confirm.
> 
> Missing the namespace declaration.  You have
> 
> 
> <SOURCES
>        xmlns:xlink="http://www.w3.org/1999/xlink"
>        xml:base="http://das.biopackages.net/das/genome/">
> 
> should be
> 
> <SOURCES
>        xmlns="http://www.biodas.org/ns/das/genome/2.00"
>        xmlns:xlink="http://www.w3.org/1999/xlink"
>        xml:base="http://das.biopackages.net/das/genome/">

done

> 
> The <PROP> element goes after the CATEGORY.  (Which I want to
> rename back to CAPABILITY.)

done

> 
> The ASSEMBLY element no longer exists.

done

> 
> Fixing those by hand,
> 
> * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: 
> error: attribute "writeable" not allowed at this point; ignored
> * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: 
> error: attribute "taxon" not allowed at this point; ignored
> 
> There is no more 'writeable' (that's, IMO) something to be decided
> as part of the writeback spec.  It might be that we have a
> 
> <CAPABILITY type="writeback" />
> 
> and the existence of that indicate writeability.

i have not made the change if this is an IMO.

> 
> It's also "taxid" and not "taxon".  I used "taxid" because that's
> what NCBI uses for their data.

done

-Allen

> 
> > There are probably also some other Content-Type headers that need to be
> > changed for the other responses -- let me know if you spot them.
> 
> Haven't gotten that far yet.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From ap3 at sanger.ac.uk  Tue Feb  7 12:44:41 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 7 Feb 2006 17:44:41 +0000
Subject: [DAS2] toy - das2 registry
Message-ID: <d04da1c45044d91fdbe7842f3e23f63c@sanger.ac.uk>

Hi!

A  "toy" das2 registry serving das1 servers,  via das2 responses can be 
accessed at

http://www.spice-3d.org/dasregistry/das2/sources/

I will work on adding the first das2 servers tomorrow.

Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From cjm at fruitfly.org  Tue Feb  7 12:29:09 2006
From: cjm at fruitfly.org (Chris Mungall)
Date: Tue, 7 Feb 2006 09:29:09 -0800 (PST)
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
Message-ID: <Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>


Hi all

I'm concerned that the XML in the URL below isn't quite Obo-XML, it's
Allen's modified version of it. In particular, the adding of an "id"
attribute which is redundant with the id element, and the modification of
the ID scheme to use slashes instead of :s.

I believe the latter may have been to make the ID scheme more DAS-y?

OBO IDs are composed of a prefix and a local ID. These are always joined
with a :. The prefix can be specified as shortform (eg GO) or a URI
prefix. When the long form is combined with the local ID you get your URI.

If DAS wants to use a modified version of Obo-XML, that's fine, but please
don't call it Obo-XML, it will cause huge confusion!

I would much prefer if you used Obo-XML as it is - if there are things
you'd like to see changed about the format we can perhaps work that out.
I'm concerned by the changing the ID to use / instead of :. This is wrong,
and if it's something that's required for DAS, how will you interoperate
with RDF etc?

In fact there are other parts where the xml is definitely not Obo-XML - it
looks like Allen has coded these by hand rather than taking existing XML.
That's fine, but it should be marked as such. For example, there is no
develops_from element in Obo-XML; all relations bar is_a are encoded as
relationship elements.

There is a DTD at the moment
http://www.godatabase.org/dev/xml/dtd

The docs are minimal as the explanation of all the fields is in the docs
for the obo text file format
http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf}

We'll be converting to RNG+XSD soon

You can get Obo-XML examples from
http://www.fruitfly.org/~cjm/obo-download

You can see the default rule for creating a URI in the OWL files; these
currently all get the geneontology.org URI prefix by default, but this
will change (we were going to use LSIDs but the majority of OWL tools
don't seem to handle URNs very well)

As far as DAS/2 supporting different file formats, Obo-XML and RDFS/OWL
would seem to be the natural contenders. We currently go from the former
to the latter via a simple XSLT, the reverse transformation is a little
more difficult.

Allen has inlined some comments from an email exchange with me in the
document. I agree about keeping the API minimal. On the other hand you
will need at least some inferencing machinery - I'd encourage you to reuse
existing reasoning services here.

Cheers
Chris

On Tue, 7 Feb 2006, Helt,Gregg wrote:

> I talked to Suzi, she's planning to join our teleconference today to
> discuss ontologies, wearing her hat as co-PI of the National Center for
> Biomedical Ontology.  Hopefully Lincoln can join too.
>
> I took a closer look at the DAS/2 ontology work Allen has done (see
> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone who
> wants to contribute to the ontology discussion to read this doc.  It
> specifies a way to retrieve ontologies in OBOXML format.  In this format
> each ontology term gets an absolute URI through the same mechanism that
> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
> relative but resolvable).  As Allen pointed out yesterday this would
> solve our problem of how to uniquely specify ontology terms in the DAS/2
> TYPES XML.
>
> I couldn't find any documentation for the OBOXML format, other than the
> code that generates it from OBO files.  But I'm using OBOXML as an
> example here because it clearly has resolvable URIs for each ontology
> term.  In Allen's spec, ontologies can also be returned in other
> formats, but it's unclear to me whether terms in these other formats
> would resolve to similar URIs.
>
> 	gregg
>
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
> [mailto:das2-bounces at portal.open-
> > bio.org] On Behalf Of Andrew Dalke
> > Sent: Tuesday, February 07, 2006 1:32 AM
> > To: DAS/2
> > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> > sprint,6 Feb 2006
> >
> > > gh: would like a re-cast as xml document, hosted at so/sofa
> > > website. that xml would be like a std ontology representation so you
> > > could extend it. so someone could point to an extension of it.
> >
> > I asked as an action item if Gregg would look into the solution
> > for this.  Do we refer to the ontology by a "GO:0123456" identifier
> > or by some URL scheme?  If so, what's the mapping from URL scheme
> > to something that clients and people can understand, eg, to
> > ask for everything which is an exon?
> >
> > Does this mapping need a version number - does it change over time?
> >
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
>


From cjm at fruitfly.org  Tue Feb  7 12:32:24 2006
From: cjm at fruitfly.org (chris mungall)
Date: Tue, 7 Feb 2006 09:32:24 -0800
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <200602071151.56939.lstein@cshl.edu>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
	<200602071151.56939.lstein@cshl.edu>
Message-ID: <afbcf2eb5f143730a0d0a64c70595d44@fruitfly.org>


What inferencing rules do you use for fetching features by their 
Ontology_terms?

On Feb 7, 2006, at 8:51 AM, Lincoln Stein wrote:

> Allen's ideas seem very sensible and easy to manage. We can already 
> serve
> associations between genomic features and GO terms via properties, so 
> the
> concerns expressed in the discussion section about the big GO API 
> shouldn't
> apply.
>
> Lincoln
>
> On Tuesday 07 February 2006 10:34, Helt,Gregg wrote:
>> I talked to Suzi, she's planning to join our teleconference today to
>> discuss ontologies, wearing her hat as co-PI of the National Center 
>> for
>> Biomedical Ontology.  Hopefully Lincoln can join too.
>>
>> I took a closer look at the DAS/2 ontology work Allen has done (see
>> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone 
>> who
>> wants to contribute to the ontology discussion to read this doc.  It
>> specifies a way to retrieve ontologies in OBOXML format.  In this 
>> format
>> each ontology term gets an absolute URI through the same mechanism 
>> that
>> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
>> relative but resolvable).  As Allen pointed out yesterday this would
>> solve our problem of how to uniquely specify ontology terms in the 
>> DAS/2
>> TYPES XML.
>>
>> I couldn't find any documentation for the OBOXML format, other than 
>> the
>> code that generates it from OBO files.  But I'm using OBOXML as an
>> example here because it clearly has resolvable URIs for each ontology
>> term.  In Allen's spec, ontologies can also be returned in other
>> formats, but it's unclear to me whether terms in these other formats
>> would resolve to similar URIs.
>>
>> 	gregg
>>
>>> -----Original Message-----
>>> From: das2-bounces at portal.open-bio.org
>>
>> [mailto:das2-bounces at portal.open-
>>
>>> bio.org] On Behalf Of Andrew Dalke
>>> Sent: Tuesday, February 07, 2006 1:32 AM
>>> To: DAS/2
>>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
>>> sprint,6 Feb 2006
>>>
>>>> gh: would like a re-cast as xml document, hosted at so/sofa
>>>> website. that xml would be like a std ontology representation so you
>>>> could extend it. so someone could point to an extension of it.
>>>
>>> I asked as an action item if Gregg would look into the solution
>>> for this.  Do we refer to the ontology by a "GO:0123456" identifier
>>> or by some URL scheme?  If so, what's the mapping from URL scheme
>>> to something that clients and people can understand, eg, to
>>> ask for everything which is an exon?
>>>
>>> Does this mapping need a version number - does it change over time?
>>>
>>> 					Andrew
>>> 					dalke at dalkescientific.com
>>>
>>> _______________________________________________
>>> DAS2 mailing list
>>> DAS2 at portal.open-bio.org
>>
>> _______________________________________________
>> DAS2 mailing list
>> DAS2 at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/das2
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Tue Feb  7 13:40:56 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 18:40:56 +0000
Subject: [DAS2] category -> capability
Message-ID: <98a28be1166142c23be61650f51b66ae@dalkescientific.com>

I've made the commit.  The element

SOURCES/SOURCE/VERSION/CATEGORY

  is now (in some shallow and some deep sense) back to

SOURCES/SOURCE/VERSION/CAPABILITY


					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Tue Feb  7 14:00:40 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Tue, 7 Feb 2006 11:00:40 -0800
Subject: [DAS2] Working with xml:base in Java?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B9@msex02.affymetrix.com>


	Thomas, I'm wondering what toolkits you're using for binding XML
to Java objects?  And particularly how you are dealing with resolving
URIs when xml:base is used.  So far I've mostly used various
implementations of SAX and DOM -- I've found some reports of builtin
xml:base support in Xerces SAX/DOM, but it's still unclear.

	I've been avoiding the issue up till now.  It won't be too hard
to implement URI resolution relative to xml:base, but I thought I'd
check around first and see if there's automated support of this in some
toolkit.

	Thanks,
	Gregg


From dalke at dalkescientific.com  Tue Feb  7 14:11:09 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 19:11:09 +0000
Subject: [DAS2] toy - das2 registry
In-Reply-To: <d04da1c45044d91fdbe7842f3e23f63c@sanger.ac.uk>
References: <d04da1c45044d91fdbe7842f3e23f63c@sanger.ac.uk>
Message-ID: <551a60258c89cd953f35c6a4450a444d@dalkescientific.com>

Andreas Prlic wrote:
> A  "toy" das2 registry serving das1 servers,  via das2 responses can 
> be accessed at
>
> http://www.spice-3d.org/dasregistry/das2/sources/
>
> I will work on adding the first das2 servers tomorrow.

There are differences between this and the spec.  These are

"CATEGORY" -> "CAPABILITIES"

Andreas knew that but didn't get it changed before having
to head out for a bit.

"testcode" should be "test_range" - it was added this afternoon
but I changed the name on Andreas.  (He agreed to the change.)

   # this is range string (eg, "Chr1/1:100" or "CloneABC123/500:599")
   # used in an "inside=" feature query.  It is used by the registry
   # server when doing a heartbeat check.
   attribute test_range { text }?,

The underlying problem is that a web server can be up while
the back-end database is down.  While a server should report
that as an error, sadly that's not always the case.  This
test_range is used by Andreas registry server in a periodic
feature query.  It should return a "reasonable" number of
features.

I decided to make it part of the spec for two reasons:
  - it simplifies auto-fill-in during registry discovery
  - clients can also use it to query the server and see if
      it's really alive or if it really means to return
      an empty list of features all the time.

It is optional.


The MAINTAINER "name" was required.  Andreas has examples where
there is only an email address and wants the name to be optional.
So now "name", "email" and "href" are all optional.  I would
like that one must be provided.

Finally, the "taxid" in the COORDINATES is optional.  The
RNG schema thought it was mandatory.

I've updated the schemas and the spec for the last two.  Committed.

Looks like I'll be spending most of tomorrow updating the
rest of the spec document.

I got a copy of Andreas' document and edited it to meet the
current spec and I've checked it in under
   "scratch/registry_sources.xml"
Feel free to test it out with your parsers.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Feb  7 14:28:49 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 19:28:49 +0000
Subject: [DAS2] format version
Message-ID: <4cd0c60fb7871ad6a70ad2b25cb73406@dalkescientific.com>

Just committed to the spec.  If I'm wrong and the version number
proves useful, I'll make it less snarky.  :)


This document defines several new content-types.  These are

application/x-das-sources+xml
application/x-das-features+xml
application/x-das-types+xml
application/x-das-segments+xml

A server may supply an optional "version" value for the Content-Type,
to specify which version of the specification it provides.  This is
(at present and unless others can convince me otherwise) meant to be
used only during this period of specification development while things
are in flux.  A client can look at the version string and use an
appropriate reader to handle it.

Example:

   Content-Type: application/x-das-types+xml; version=1

The list of versions is as follows:

   601071920:  this version

The versions will be increasing integers.  The format will be
"YMMDDHHMM" where "Y" is the year - 2005.  (This makes it a 32 bit
integer, in case you were wondering.)  There's no way this spec will
be in flux in 4 years time.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Feb  7 14:14:15 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 19:14:15 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <Pine.LNX.4.58.0602070923370.29889@sumo.ctrl.ucla.edu>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
	<Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>
	<5d78aba1ae5117624db08b4dd395c985@dalkescientific.com>
	<Pine.LNX.4.58.0602070923370.29889@sumo.ctrl.ucla.edu>
Message-ID: <7ca6d9841d7c3c334589da147c38de53@dalkescientific.com>

>> There is no more 'writeable' (that's, IMO) something to be decided
>> as part of the writeback spec.  It might be that we have a

> i have not made the change if this is an IMO.

Okay.  There is no "writeable".  The writeability is determined
by the <CAPABILITY> element.  If there is a CAPABILITY with
a type == "locks" then the server is (potentially) writeable
in the same way that "writeable='yes'" means that it's writeable.

Anyone else have an O?

					Andrew
					dalke at dalkescientific.com


From ed_erwin at affymetrix.com  Tue Feb  7 15:46:01 2006
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 07 Feb 2006 12:46:01 -0800
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <7ca6d9841d7c3c334589da147c38de53@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>	<17368.18146.195226.166165@kinked.lbl.gov>	<17369.24994.880706.685148@kinked.lbl.gov>	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>	<Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>	<5d78aba1ae5117624db08b4dd395c985@dalkescientific.com>	<Pine.LNX.4.58.0602070923370.29889@sumo.ctrl.ucla.edu>
	<7ca6d9841d7c3c334589da147c38de53@dalkescientific.com>
Message-ID: <43E90709.6060602@affymetrix.com>

This is something we should discuss when we discuss the 'writeable' 
parts of the spec.  But in my opinion, 'writeable' and 'lockable' are 
two separate <CAPABILITY>'s.  I see no reason not to allow some 
implementers to develop simple servers that are writeable but don't 
implement a locking mechanism.  Large public servers may want locking, 
but I'd bet that a non-locking server would very rarely lead to 
problems, especially in small projects.

(If the server is non-locking, the client could add a little more logic 
to check that nothing has changed since the last retrieval before doing 
a commit.)

Andrew Dalke wrote:
>>> There is no more 'writeable' (that's, IMO) something to be decided
>>> as part of the writeback spec.  It might be that we have a
> 
> 
>> i have not made the change if this is an IMO.
> 
> 
> Okay.  There is no "writeable".  The writeability is determined
> by the <CAPABILITY> element.  If there is a CAPABILITY with
> a type == "locks" then the server is (potentially) writeable
> in the same way that "writeable='yes'" means that it's writeable.
> 
> Anyone else have an O?
> 
>                     Andrew
>                     dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From allenday at ucla.edu  Tue Feb  7 16:20:53 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 7 Feb 2006 13:20:53 -0800 (PST)
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
Message-ID: <Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>

Hi Chris,

On Tue, 7 Feb 2006, Chris Mungall wrote:

> 
> Hi all
> 
> I'm concerned that the XML in the URL below isn't quite Obo-XML, it's
> Allen's modified version of it. In particular, the adding of an "id"
> attribute which is redundant with the id element, and the modification of
> the ID scheme to use slashes instead of :s.
> 
> I believe the latter may have been to make the ID scheme more DAS-y?

The slash was introduced to take advantage of xml:base and the
hierarchical relationship between namespaces and terms, e.g.

  xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001"

is equivalent to:

  /das/ontology/obo/1/ontology/SO/0000001

If we want the identifier to be SO:0000001, it means that we have to make
xml:base="/das/ontology/obo/1/ontology/SO.  This is problematic for two 
reasons:

  1) multiple xml:base cannot be defined for the entire document, meaning
     that URIs for other records referenced become very long.

  2) different ontologies cannot use the same xml:base

The only way I see out of this ATM is to treat : as a / internal to the
Ontology-DAS service.

> OBO IDs are composed of a prefix and a local ID. These are always joined
> with a :. The prefix can be specified as shortform (eg GO) or a URI
> prefix. When the long form is combined with the local ID you get your URI.
> 
> If DAS wants to use a modified version of Obo-XML, that's fine, but please
> don't call it Obo-XML, it will cause huge confusion!
> 
> I would much prefer if you used Obo-XML as it is - if there are things
> you'd like to see changed about the format we can perhaps work that out.
> I'm concerned by the changing the ID to use / instead of :. This is wrong,
> and if it's something that's required for DAS, how will you interoperate
> with RDF etc?
> 
> In fact there are other parts where the xml is definitely not Obo-XML - it
> looks like Allen has coded these by hand rather than taking existing XML.
> That's fine, but it should be marked as such. For example, there is no
> develops_from element in Obo-XML; all relations bar is_a are encoded as
> relationship elements.

The XML provided by the Ontology-DAS server is using templates to mark up
ontology records that have been loaded to a chado database using
perl-go-perl.  The develops_from node, IIRC, was created because there is
a section in a perl-go-perl .xslt that creates elements for all
relationship types.

> 
> There is a DTD at the moment
> http://www.godatabase.org/dev/xml/dtd

This didn't exist at the time I wrote my templates ( 4-6 months ago), or I
would have validated.

-Allen


> 
> The docs are minimal as the explanation of all the fields is in the docs
> for the obo text file format
> http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf}
> 
> We'll be converting to RNG+XSD soon
> 
> You can get Obo-XML examples from
> http://www.fruitfly.org/~cjm/obo-download
> 
> You can see the default rule for creating a URI in the OWL files; these
> currently all get the geneontology.org URI prefix by default, but this
> will change (we were going to use LSIDs but the majority of OWL tools
> don't seem to handle URNs very well)
> 
> As far as DAS/2 supporting different file formats, Obo-XML and RDFS/OWL
> would seem to be the natural contenders. We currently go from the former
> to the latter via a simple XSLT, the reverse transformation is a little
> more difficult.
> 
> Allen has inlined some comments from an email exchange with me in the
> document. I agree about keeping the API minimal. On the other hand you
> will need at least some inferencing machinery - I'd encourage you to reuse
> existing reasoning services here.
> 
> Cheers
> Chris
> 
> On Tue, 7 Feb 2006, Helt,Gregg wrote:
> 
> > I talked to Suzi, she's planning to join our teleconference today to
> > discuss ontologies, wearing her hat as co-PI of the National Center for
> > Biomedical Ontology.  Hopefully Lincoln can join too.
> >
> > I took a closer look at the DAS/2 ontology work Allen has done (see
> > http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone who
> > wants to contribute to the ontology discussion to read this doc.  It
> > specifies a way to retrieve ontologies in OBOXML format.  In this format
> > each ontology term gets an absolute URI through the same mechanism that
> > the rest of DAS/2 uses (URIs for ids, which can be either absolute or
> > relative but resolvable).  As Allen pointed out yesterday this would
> > solve our problem of how to uniquely specify ontology terms in the DAS/2
> > TYPES XML.
> >
> > I couldn't find any documentation for the OBOXML format, other than the
> > code that generates it from OBO files.  But I'm using OBOXML as an
> > example here because it clearly has resolvable URIs for each ontology
> > term.  In Allen's spec, ontologies can also be returned in other
> > formats, but it's unclear to me whether terms in these other formats
> > would resolve to similar URIs.
> >
> > 	gregg
> >
> > > -----Original Message-----
> > > From: das2-bounces at portal.open-bio.org
> > [mailto:das2-bounces at portal.open-
> > > bio.org] On Behalf Of Andrew Dalke
> > > Sent: Tuesday, February 07, 2006 1:32 AM
> > > To: DAS/2
> > > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> > > sprint,6 Feb 2006
> > >
> > > > gh: would like a re-cast as xml document, hosted at so/sofa
> > > > website. that xml would be like a std ontology representation so you
> > > > could extend it. so someone could point to an extension of it.
> > >
> > > I asked as an action item if Gregg would look into the solution
> > > for this.  Do we refer to the ontology by a "GO:0123456" identifier
> > > or by some URL scheme?  If so, what's the mapping from URL scheme
> > > to something that clients and people can understand, eg, to
> > > ask for everything which is an exon?
> > >
> > > Does this mapping need a version number - does it change over time?
> > >
> > > 					Andrew
> > > 					dalke at dalkescientific.com
> > >
> > > _______________________________________________
> > > DAS2 mailing list
> > > DAS2 at portal.open-bio.org
> >
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> >
> 
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From cjm at fruitfly.org  Tue Feb  7 16:59:12 2006
From: cjm at fruitfly.org (chris mungall)
Date: Tue, 7 Feb 2006 13:59:12 -0800
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>
Message-ID: <e00e781866375762b29061d2b510a10e@fruitfly.org>


On Feb 7, 2006, at 1:20 PM, Allen Day wrote:

> Hi Chris,
>
> On Tue, 7 Feb 2006, Chris Mungall wrote:
>
>>
>> Hi all
>>
>> I'm concerned that the XML in the URL below isn't quite Obo-XML, it's
>> Allen's modified version of it. In particular, the adding of an "id"
>> attribute which is redundant with the id element, and the 
>> modification of
>> the ID scheme to use slashes instead of :s.
>>
>> I believe the latter may have been to make the ID scheme more DAS-y?
>
> The slash was introduced to take advantage of xml:base and the
> hierarchical relationship between namespaces and terms, e.g.
>
>   xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001"
>
> is equivalent to:
>
>   /das/ontology/obo/1/ontology/SO/0000001

it's actually equivalent to:
/das/ontology/obo/1/ontologySO/0000001

> If we want the identifier to be SO:0000001, it means that we have to 
> make
> xml:base="/das/ontology/obo/1/ontology/SO.  This is problematic for two
> reasons:
>
>   1) multiple xml:base cannot be defined for the entire document, 
> meaning
>      that URIs for other records referenced become very long.

Why not just define a qname for every idspace? This is the standard way 
of doing this in XML

Using xml:base is not a big gain for brevity, since fairly soon some 
obo ontologies will reference other obo ontologies.

In fact is this even as issue if you get rid of the id attribute to 
conform to obo-xml? ids in obo-xml are encoded as elements, so xml:base 
rules are not applied. Obo has it's own rules for ID generation. This 
has the arguable disadvantage that we can't directly use xml:base and 
the whole xml namespace system for OBO IDs, we layer our own system on 
top. This is actually preferable for us.

>   2) different ontologies cannot use the same xml:base
>
> The only way I see out of this ATM is to treat : as a / internal to the
> Ontology-DAS service.

I'm still not sure what the problem is, and I think you may be stuck 
anyway when it comes to RDF/OWL ontologies

>
>> OBO IDs are composed of a prefix and a local ID. These are always 
>> joined
>> with a :. The prefix can be specified as shortform (eg GO) or a URI
>> prefix. When the long form is combined with the local ID you get your 
>> URI.
>>
>> If DAS wants to use a modified version of Obo-XML, that's fine, but 
>> please
>> don't call it Obo-XML, it will cause huge confusion!
>>
>> I would much prefer if you used Obo-XML as it is - if there are things
>> you'd like to see changed about the format we can perhaps work that 
>> out.
>> I'm concerned by the changing the ID to use / instead of :. This is 
>> wrong,
>> and if it's something that's required for DAS, how will you 
>> interoperate
>> with RDF etc?
>>
>> In fact there are other parts where the xml is definitely not Obo-XML 
>> - it
>> looks like Allen has coded these by hand rather than taking existing 
>> XML.
>> That's fine, but it should be marked as such. For example, there is no
>> develops_from element in Obo-XML; all relations bar is_a are encoded 
>> as
>> relationship elements.
>
> The XML provided by the Ontology-DAS server is using templates to mark 
> up
> ontology records that have been loaded to a chado database using
> perl-go-perl.  The develops_from node, IIRC, was created because there 
> is
> a section in a perl-go-perl .xslt that creates elements for all
> relationship types.

hmmm, I don't think so, but the point is moot anyway, just so long as 
the final version uses xml that validates, either against obo-xml or 
your own documented variant

>
>>
>> There is a DTD at the moment
>> http://www.godatabase.org/dev/xml/dtd
>
> This didn't exist at the time I wrote my templates ( 4-6 months ago), 
> or I
> would have validated.

it did, it's just not well signposted! sorry about that

look forward to seeing a demo. I do this you have to work out the 
semantics of retrieval by ontology term though.

cheers
chris

>
> -Allen
>
>
>
>>
>> The docs are minimal as the explanation of all the fields is in the 
>> docs
>> for the obo text file format
>> http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf}
>>
>> We'll be converting to RNG+XSD soon
>>
>> You can get Obo-XML examples from
>> http://www.fruitfly.org/~cjm/obo-download
>>
>> You can see the default rule for creating a URI in the OWL files; 
>> these
>> currently all get the geneontology.org URI prefix by default, but this
>> will change (we were going to use LSIDs but the majority of OWL tools
>> don't seem to handle URNs very well)
>>
>> As far as DAS/2 supporting different file formats, Obo-XML and 
>> RDFS/OWL
>> would seem to be the natural contenders. We currently go from the 
>> former
>> to the latter via a simple XSLT, the reverse transformation is a 
>> little
>> more difficult.
>>
>> Allen has inlined some comments from an email exchange with me in the
>> document. I agree about keeping the API minimal. On the other hand you
>> will need at least some inferencing machinery - I'd encourage you to 
>> reuse
>> existing reasoning services here.
>>
>> Cheers
>> Chris
>>
>> On Tue, 7 Feb 2006, Helt,Gregg wrote:
>>
>>> I talked to Suzi, she's planning to join our teleconference today to
>>> discuss ontologies, wearing her hat as co-PI of the National Center 
>>> for
>>> Biomedical Ontology.  Hopefully Lincoln can join too.
>>>
>>> I took a closer look at the DAS/2 ontology work Allen has done (see
>>> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone 
>>> who
>>> wants to contribute to the ontology discussion to read this doc.  It
>>> specifies a way to retrieve ontologies in OBOXML format.  In this 
>>> format
>>> each ontology term gets an absolute URI through the same mechanism 
>>> that
>>> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
>>> relative but resolvable).  As Allen pointed out yesterday this would
>>> solve our problem of how to uniquely specify ontology terms in the 
>>> DAS/2
>>> TYPES XML.
>>>
>>> I couldn't find any documentation for the OBOXML format, other than 
>>> the
>>> code that generates it from OBO files.  But I'm using OBOXML as an
>>> example here because it clearly has resolvable URIs for each ontology
>>> term.  In Allen's spec, ontologies can also be returned in other
>>> formats, but it's unclear to me whether terms in these other formats
>>> would resolve to similar URIs.
>>>
>>> 	gregg
>>>
>>>> -----Original Message-----
>>>> From: das2-bounces at portal.open-bio.org
>>> [mailto:das2-bounces at portal.open-
>>>> bio.org] On Behalf Of Andrew Dalke
>>>> Sent: Tuesday, February 07, 2006 1:32 AM
>>>> To: DAS/2
>>>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
>>>> sprint,6 Feb 2006
>>>>
>>>>> gh: would like a re-cast as xml document, hosted at so/sofa
>>>>> website. that xml would be like a std ontology representation so 
>>>>> you
>>>>> could extend it. so someone could point to an extension of it.
>>>>
>>>> I asked as an action item if Gregg would look into the solution
>>>> for this.  Do we refer to the ontology by a "GO:0123456" identifier
>>>> or by some URL scheme?  If so, what's the mapping from URL scheme
>>>> to something that clients and people can understand, eg, to
>>>> ask for everything which is an exon?
>>>>
>>>> Does this mapping need a version number - does it change over time?
>>>>
>>>> 					Andrew
>>>> 					dalke at dalkescientific.com
>>>>
>>>> _______________________________________________
>>>> DAS2 mailing list
>>>> DAS2 at portal.open-bio.org
>>>
>>>
>>> _______________________________________________
>>> DAS2 mailing list
>>> DAS2 at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/das2
>>>
>>
>>
>> _______________________________________________
>> DAS2 mailing list
>> DAS2 at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/das2
>>


From Steve_Chervitz at affymetrix.com  Tue Feb  7 19:30:52 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Tue, 07 Feb 2006 16:30:52 -0800
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
 7 Feb 2006
Message-ID: <C00E7BBC.1BC8A%Steve_Chervitz@affymetrix.com>

Notes from the DAS/2 teleconference for the code sprint, 7 Feb 2006

$Id: das2-teleconf-2006-02-07.txt,v 1.1 2006/02/08 00:37:41 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  Sanger: Andreas Prlic, Thomas Down
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris, Suzi Lewis
  UCLA: Allen Day
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda:
* Vote on constructing URLs/URIs to query segments, types, features
* Status report from people
* Ontologies
* Feat property changes

Topic: Constructing URLS/URIs to query segments, types, features
----------------------------------------------------------------
1.) specified by query_id
2.) hardwired to ~/segments, ~/types, ~/features
3.) ?

ad: lots of people have left here so the vote won't include all.
see email why a query url is useful
agree w/ gregg: short names could be a nice to have.
shouldn't have to worry about how you organize your urls
gh: yes it does: this/types this/segments etc.
ad: can take it out if there's confusion
gh: recommended structure is good.
ee/gh: people will look at the examples and do it that way. they won't
look at .rnc file
gh: make it clearer in the spec that these are merely suggestions of the
hierarchy, you don't have to do it this way.

ad: roy's view: likes the query id url for doing search for all
featues, or all types.
query id is the url used to do search against features.
uri could be relative or absolute.
gh: category element defines a query id for a subset of das.
it's the attribute query id in the category

ad: I also want to rename category back to capability.
how do we arrange urls in a versioned source.
construction off of strings or via attributes in a url
gh: votes for hardwired, but feels less strong today about it.
ad: majority vote is for query id, spec czar goes with that.

[A] query id
[A] andrew will update spec to have less mention of hierarchical structure
[A] allen will update server to do it the recommended way

gh: in addition to have an arbitrary query id to get segments, types,
features, there's a recommended way to do it via the hierarchy. server
should do it the recommended way (hierarchy)

ee: we should be flexible about it.
gh/ad: ok take out recommendation.

Topic: Status reports
---------------------

ad: see his emails.
gh: we need examples in spec document and scratch to be better
synchronized.
ad: should be, i've been trying to keep these in sync.
gh: plan to push into html, incorporate scratch into doc?
ad: yes, eventually.
will also add andreas' work to scratch too.

td: java xml binding libraries, how to put it into a workable server
ap: das registry, sources command, attribute handling, people can
connect to a toy server publically available.
gh: registry will respond?
ap: yes. toy server, toy data like das1, returning sources command.
gh: can you add allen's codesprint server? consider it registered.
ap: is fully working?
gh: can allen send a command to it to register it?
ap: no.
gh: would like to tell my client to do discovery rather than hard
wiring.

gh: comits to igb das/2 client to handle seq, segment, types. not
features query yet. given decision about url construction, can do this
fast so we can test on codesprint server seq, seg, types to bring up
something meaningful in gui. not features by today. affy das/2 server
is running behind. will sync up today as well.

nh: apollo working out sequence, segment, types request. now does
versioned sources. integrating those into query gui as well.

aday: changes early this am. server running under /codesprint is now a
static doc pointing back to the old server. adding segment command,
merging region and seq command. has made everything except
capabilities writeback stuff.
ad: there's another request recently, see my email.
aday: have gotten 40 emails from you in the last day!

aday: brian oconnor is working on bundling dependencies for an rpm
based release.
gh: I also did significant refactoring/moving assay/ontology stuff
into subclasses on client side. haven't seen brian's code, but should
run fine. 

Topic: Integrating Sequence Ontology with DAS/2
-----------------------------------------------

suzi: national center for biomedical ontology, one of 7
natl centers for biomedical computing. focus on needs regarding
developing and using ontologies.

gh: hoping to have a typing system in das/2 via types queries that
references SO but doesn't require client to fully understand
ontologies. too much of a burden. that's the challenge. this
translates into referring to ontology terms as opaque uris
suzi: 'understands' means they're ignoring any relationships between
types. 
gh: yes.
currently type has attrib for id, attrib for ontology.
ad: uri or arbitrary string
suzi: can use uri or string, preprocessed
ad: one or the other
gh: prefers uri
suzi: from uri you can get the string
gh: not clear how to construct uri for particular terms in an ontology
doc
suzi: this will happen in next few months. talking with daniel rubin
about this.
gh: this is where allen comes in. ontology das.
aday: next step is getting it hosted on NCBO server.
currently communicating with chris mungall. said they're planning on
implementing something similar soon, not sure if they'd accept allen's
solution. unclear.
working with gavin sherlock on ontology support for microarry samples,
tissue type, phenotype. was hoping people could pick this up and use
it. 
suzi: gavin and I could help push this.
gh: chris m posted concerns about obo xml that's in allen's scheme
isn't same as what he's using. re: how you make absolution uris.
aday: there's not much docs on obo xml format. did the best I could.
suzi: should be able to sort it out. just an inertia problem of
getting it installed. not a competition issue. fine with me. not
difficult?
aday: by end of week we'll have an rpm.
suzi: let's keep pushing on this to make it happen. I'll talk to gavin
tomorrow. can we install on sf site, or do we need to set it up
elsewhere?
aday: could conceivably set up a cgi on sf. uses custom apache
handler tho.

gh: more ontology q's can wait till tomorrow w/ lincoln.
concern: how do we deal w/ types that represent more
than one ontology terms. defer discussion till tomorrow.

Topic: Feature Properties
-------------------------

See andrew's post today.

ad: this ties into ontologies. two ontology related issues: two different
ways to query. ontology of a feature, and two diff ways to search a db
for that property: exactly equal, or a subtype.
this is a property with two diff searches you may want to do on it.
properties like note, alias, phase have ability to search key/val
properties, e.g., att:alias=something.
score is a floating point number you may want to support > or < on it.
regular exp searches, identical, etc.
td says use xml query language, but worried about complexity of this.
99% of time this is way more that you need.

scenario: given 4 different notes in a feature, is order important?
extensions: curation point gives curator's name and time stamp.
e.g., search for all featues modified by andrew in 2004.
discussion: pull this into a note element, perhaps phase and alias
too.
property table only supports a substring search. give me an author
name, e.g.
not saying getting rid of tag values.
server supporting new data types, extensions, feat search w/ sanger
curation elements for query. or thomas xml search.
this is why I want to move categories back to capabilities.
gh: more appropriate as capabilities than header.
ad: someone can get a document. andreas can combining many servers into
one, say: which one supports which.

to summarize: 
- properties are simple strings
- only substring searches
- change att: to prop:
- note and alias and phase are elements
- advertise that a server has extension to das query lang

gh: what about phase? lincoln needs it.
ad: if it's something that people will be editing, make it a element.
gh: phase is inappropriate for certain types. would like formal way
when it should be there or not.
ad: this is formalizing a way for server to tell client that there are
more types of searches available.
can't see how to do it automatically: eg for a given score, knowing
what is considered significant (low or high, e.g.).
td: if he needs a phase he re-infers it. doesn't work for partial CDS
tho.
gh: how much spec churn will this generate?
ad: [various things, half a dozen or so, some simplifying]
gh: does a colon in a query string need to be escaped? if so, this
makes it hard to read.
ad: could use prop_ rather than prop:
thomas and I had long discussion about this.

[A] andrew will incorporate these changes into feature properties

Topic: Maintainer information
-----------------------------

ad: modified examples under scratch
gh: maintainer at source or version level
ad: one for all sources level
ap: at sanger we have one central server with lots of sources. notes
who's responsible for which server.
gh: ownership cascades down to sub elements?
ad: yes


Topic: XML Base
---------------

gh: can be in any element. as well as xml:lang, don't really
understand.
ad: it's what the atom spec does, so we copied. maybe for
bidirectional languages.
gh: flexible uri resolution scheme w/ xml base. implementation in java
tools is spotty for xml:base. curious about java obj binding of xml
what support they have for resolving xml base. at this point will have
to roll it myself. want to ask thomas about this.
ap: he's using Stacks parser, gets global namespace.
gh: bigger concern for when we have to use sax, need to do xml:base
resolution, eg. when we need to retrieve lots of features.
ad: it can be done with sax.
gh: not hard, but it is a multistep process.
ad: multiple levels of xml:base

ad: tomorrow's agenda: go through roy's otter stuff, convert into new
das format. to get a feel for how data will look. see roy's email. to
use experience gathered from otter to make sure we're sufficiently
covering features.

gh: talking about writeback?
ad: premature. let's talk style sheets wed, and writeback
thursday. plus anything else that's come up about the spec.
want to know how style sheets will look. lincoln should be able to
help out there.


From nomi at fruitfly.org  Tue Feb  7 22:27:13 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Tue, 7 Feb 2006 19:27:13 -0800 (PST)
Subject: [DAS2] We need DAS/2 progress reports for the grant!
Message-ID: <17385.25873.660275.790249@kinked.lbl.gov>

Dear DAS/2 developers,

I am writing this on behalf of Gregg and the DAS/2 team.  This is so
important I'm actually using capital letters.

As you know, we have submitted a request for renewing the DAS/2 grant.
Our chances of having this renewal approved are iffy, especially since we
are asking for more money than in the original grant and NIH's budget is
very tight right now.

The reviewers are about to read our grant proposal and decide whether to
fund it, and we need to send them a supplementary progress report about
what we've accomplished since we submitted the grant in November.
Describing how much progress we've made towards implementing the DAS/2
protocol in both servers and clients will help make our case that we
deserve more funding to continue this important research.

Gregg has been trying for weeks to find out when this progress report was
due (we had figured we had until the end of February).  Today he
*finally* got through to our scientific review administrator, who said
that we have to send it to them no later than THIS THURSDAY.

Obviously, this is very short notice, so we are asking all of you to very
quickly put together a paragraph (no more!) describing your progress
between Nov 1 and the of the end of this week (i.e., you can project to
what you expect to have completed by Friday).  If you need context, I
have attached a copy of the grant; I will also send some of you
individual notes about what we need from you.

Please send us (the DAS2 mailing list, or, if you're feeling shy, just me
and Gregg) your paragraph in PLAIN TEXT so that I can more easily
assimilate them into a single document.  We plan to work on incorporating
your reports into our progress report tomorrow (Wed), send out a draft
tomorrow night (our time) for you to review, and incorporate any
suggestions into our final version that we'll send off on Thursday.

Sorry for the short notice, and thanks in advance for your help.

      Nomi and Gregg

-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAS2_renewal_grant_final2l.doc
Type: application/octet-stream
Size: 453632 bytes
Desc: DAS2 renewal grant proposal
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20060207/29609114/attachment.obj>

From allenday at ucla.edu  Tue Feb  7 22:14:49 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 7 Feb 2006 19:14:49 -0800 (PST)
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <e00e781866375762b29061d2b510a10e@fruitfly.org>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>
	<e00e781866375762b29061d2b510a10e@fruitfly.org>
Message-ID: <Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>

Chris,

Why have you chosen to make <id/> a subelement of <term/>?  Is it expected
that there will be multiple IDs for a given term, and if so is there not a
primary ID?  having an id attribute is a defacto standard for DOM libs, so
you can call getElementById().

-Allen

On Tue, 7 Feb 2006, chris mungall wrote:

> 
> On Feb 7, 2006, at 1:20 PM, Allen Day wrote:
> 
> > Hi Chris,
> >
> > On Tue, 7 Feb 2006, Chris Mungall wrote:
> >
> >>
> >> Hi all
> >>
> >> I'm concerned that the XML in the URL below isn't quite Obo-XML, it's
> >> Allen's modified version of it. In particular, the adding of an "id"
> >> attribute which is redundant with the id element, and the 
> >> modification of
> >> the ID scheme to use slashes instead of :s.
> >>
> >> I believe the latter may have been to make the ID scheme more DAS-y?
> >
> > The slash was introduced to take advantage of xml:base and the
> > hierarchical relationship between namespaces and terms, e.g.
> >
> >   xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001"
> >
> > is equivalent to:
> >
> >   /das/ontology/obo/1/ontology/SO/0000001
> 
> it's actually equivalent to:
> /das/ontology/obo/1/ontologySO/0000001
> 
> > If we want the identifier to be SO:0000001, it means that we have to 
> > make
> > xml:base="/das/ontology/obo/1/ontology/SO.  This is problematic for two
> > reasons:
> >
> >   1) multiple xml:base cannot be defined for the entire document, 
> > meaning
> >      that URIs for other records referenced become very long.
> 
> Why not just define a qname for every idspace? This is the standard way 
> of doing this in XML
> 
> Using xml:base is not a big gain for brevity, since fairly soon some 
> obo ontologies will reference other obo ontologies.
> 
> In fact is this even as issue if you get rid of the id attribute to 
> conform to obo-xml? ids in obo-xml are encoded as elements, so xml:base 
> rules are not applied. Obo has it's own rules for ID generation. This 
> has the arguable disadvantage that we can't directly use xml:base and 
> the whole xml namespace system for OBO IDs, we layer our own system on 
> top. This is actually preferable for us.
> 
> >   2) different ontologies cannot use the same xml:base
> >
> > The only way I see out of this ATM is to treat : as a / internal to the
> > Ontology-DAS service.
> 
> I'm still not sure what the problem is, and I think you may be stuck 
> anyway when it comes to RDF/OWL ontologies
> 
> >
> >> OBO IDs are composed of a prefix and a local ID. These are always 
> >> joined
> >> with a :. The prefix can be specified as shortform (eg GO) or a URI
> >> prefix. When the long form is combined with the local ID you get your 
> >> URI.
> >>
> >> If DAS wants to use a modified version of Obo-XML, that's fine, but 
> >> please
> >> don't call it Obo-XML, it will cause huge confusion!
> >>
> >> I would much prefer if you used Obo-XML as it is - if there are things
> >> you'd like to see changed about the format we can perhaps work that 
> >> out.
> >> I'm concerned by the changing the ID to use / instead of :. This is 
> >> wrong,
> >> and if it's something that's required for DAS, how will you 
> >> interoperate
> >> with RDF etc?
> >>
> >> In fact there are other parts where the xml is definitely not Obo-XML 
> >> - it
> >> looks like Allen has coded these by hand rather than taking existing 
> >> XML.
> >> That's fine, but it should be marked as such. For example, there is no
> >> develops_from element in Obo-XML; all relations bar is_a are encoded 
> >> as
> >> relationship elements.
> >
> > The XML provided by the Ontology-DAS server is using templates to mark 
> > up
> > ontology records that have been loaded to a chado database using
> > perl-go-perl.  The develops_from node, IIRC, was created because there 
> > is
> > a section in a perl-go-perl .xslt that creates elements for all
> > relationship types.
> 
> hmmm, I don't think so, but the point is moot anyway, just so long as 
> the final version uses xml that validates, either against obo-xml or 
> your own documented variant
> 
> >
> >>
> >> There is a DTD at the moment
> >> http://www.godatabase.org/dev/xml/dtd
> >
> > This didn't exist at the time I wrote my templates ( 4-6 months ago), 
> > or I
> > would have validated.
> 
> it did, it's just not well signposted! sorry about that
> 
> look forward to seeing a demo. I do this you have to work out the 
> semantics of retrieval by ontology term though.
> 
> cheers
> chris
> 
> >
> > -Allen
> >
> >
> >
> >>
> >> The docs are minimal as the explanation of all the fields is in the 
> >> docs
> >> for the obo text file format
> >> http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf}
> >>
> >> We'll be converting to RNG+XSD soon
> >>
> >> You can get Obo-XML examples from
> >> http://www.fruitfly.org/~cjm/obo-download
> >>
> >> You can see the default rule for creating a URI in the OWL files; 
> >> these
> >> currently all get the geneontology.org URI prefix by default, but this
> >> will change (we were going to use LSIDs but the majority of OWL tools
> >> don't seem to handle URNs very well)
> >>
> >> As far as DAS/2 supporting different file formats, Obo-XML and 
> >> RDFS/OWL
> >> would seem to be the natural contenders. We currently go from the 
> >> former
> >> to the latter via a simple XSLT, the reverse transformation is a 
> >> little
> >> more difficult.
> >>
> >> Allen has inlined some comments from an email exchange with me in the
> >> document. I agree about keeping the API minimal. On the other hand you
> >> will need at least some inferencing machinery - I'd encourage you to 
> >> reuse
> >> existing reasoning services here.
> >>
> >> Cheers
> >> Chris
> >>
> >> On Tue, 7 Feb 2006, Helt,Gregg wrote:
> >>
> >>> I talked to Suzi, she's planning to join our teleconference today to
> >>> discuss ontologies, wearing her hat as co-PI of the National Center 
> >>> for
> >>> Biomedical Ontology.  Hopefully Lincoln can join too.
> >>>
> >>> I took a closer look at the DAS/2 ontology work Allen has done (see
> >>> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone 
> >>> who
> >>> wants to contribute to the ontology discussion to read this doc.  It
> >>> specifies a way to retrieve ontologies in OBOXML format.  In this 
> >>> format
> >>> each ontology term gets an absolute URI through the same mechanism 
> >>> that
> >>> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
> >>> relative but resolvable).  As Allen pointed out yesterday this would
> >>> solve our problem of how to uniquely specify ontology terms in the 
> >>> DAS/2
> >>> TYPES XML.
> >>>
> >>> I couldn't find any documentation for the OBOXML format, other than 
> >>> the
> >>> code that generates it from OBO files.  But I'm using OBOXML as an
> >>> example here because it clearly has resolvable URIs for each ontology
> >>> term.  In Allen's spec, ontologies can also be returned in other
> >>> formats, but it's unclear to me whether terms in these other formats
> >>> would resolve to similar URIs.
> >>>
> >>> 	gregg
> >>>
> >>>> -----Original Message-----
> >>>> From: das2-bounces at portal.open-bio.org
> >>> [mailto:das2-bounces at portal.open-
> >>>> bio.org] On Behalf Of Andrew Dalke
> >>>> Sent: Tuesday, February 07, 2006 1:32 AM
> >>>> To: DAS/2
> >>>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> >>>> sprint,6 Feb 2006
> >>>>
> >>>>> gh: would like a re-cast as xml document, hosted at so/sofa
> >>>>> website. that xml would be like a std ontology representation so 
> >>>>> you
> >>>>> could extend it. so someone could point to an extension of it.
> >>>>
> >>>> I asked as an action item if Gregg would look into the solution
> >>>> for this.  Do we refer to the ontology by a "GO:0123456" identifier
> >>>> or by some URL scheme?  If so, what's the mapping from URL scheme
> >>>> to something that clients and people can understand, eg, to
> >>>> ask for everything which is an exon?
> >>>>
> >>>> Does this mapping need a version number - does it change over time?
> >>>>
> >>>> 					Andrew
> >>>> 					dalke at dalkescientific.com
> >>>>
> >>>> _______________________________________________
> >>>> DAS2 mailing list
> >>>> DAS2 at portal.open-bio.org
> >>>
> >>>
> >>> _______________________________________________
> >>> DAS2 mailing list
> >>> DAS2 at portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/das2
> >>>
> >>
> >>
> >> _______________________________________________
> >> DAS2 mailing list
> >> DAS2 at portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/das2
> >>
> 


From allenday at ucla.edu  Tue Feb  7 22:57:05 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 7 Feb 2006 19:57:05 -0800 (PST)
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <e00e781866375762b29061d2b510a10e@fruitfly.org>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>
	<e00e781866375762b29061d2b510a10e@fruitfly.org>
Message-ID: <Pine.LNX.4.58.0602071950350.29889@sumo.ctrl.ucla.edu>

Hi Chris,

> Why not just define a qname for every idspace? This is the standard way 
> of doing this in XML

Can you give a concrete example of this?  a search for "qname idspace"
returns a single godatabase.org result.


Anyway, I have stripped out the id= attributes from the <term/> and
<typedef/> elements.  You can see valid (by your DTD) obo xml produced
from the das server here:

Entire SO:
http://das.biopackages.net/das/ontology/obo/1/ontology/SO?format=legacy1

SO "exon" record:
http://das.biopackages.net/das/ontology/obo/1/ontology/SO/0000147?format=legacy1

-Allen


From Gregg_Helt at affymetrix.com  Wed Feb  8 03:36:01 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 8 Feb 2006 00:36:01 -0800
Subject: [DAS2] Working with xml:base in Java?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9BF@msex02.affymetrix.com>


	I've been mucking around trying to find an answer to my own
question about ways to easily handle xml:base in Java.  And I think the
answer if I want to continue to use DOM ends up being "code it
yourself".  But it took a while to get to that answer.  I'm writing down
these notes so I can refer back to them next time if the issues I
encountered come up again.  But I figured I might as well post in case
other DAS/2 implementers have similar problems.

	So the standard Java 1.5 distribution includes the
org.xml.dom.Node interface, which conveniently enough has a getBaseURI()
method that should do exactly what I want -- for any node in an XML
document, give me the resolved base URI for that node (regardless of how
complex a combination of xml:base attributes are used in the path to
that node).  Which I can then combine with whatever id attribute I'm
interested in (via Java networking classes) to get the full URI.
	But I need to guarantee compatibility with Java 1.4, so I can't
rely on 1.5.  Java 1.4 has a previous version of org.xml.dom.Node, but
with no getBaseURI() method.  Turns out this is because the 1.5 Node
interface complies with DOM-level3 spec (includes XML Base support) but
the 1.4 Node interface only supports DOM-level2 spec (no XML Base
support).  Okay, but I can download the Xerces2 distribution, which is a
Java library that also has a full implementation of DOM-level3.  So I
get that set up, add some calls to node.getBaseURI() to my code, and it
compiles fine.  But when I run the program I get an ugly
java.lang.NoSuchMethodError.  I dig around on the web and find the
problem is a class/package namespace collision -- both Xerces2 and the
builtin java libraries have a class named org.xml.dom.Node, but of
course they're different.  And replacing built-in java classes is not
normally allowed, so when the program is actually run and classes are
loaded the builtin Node class wins (the one w/o the getBaseURI()
method).  It would have been nice if they mentioned this in the JDK
Compatibility section of the Xerces2 FAQ...
	But there is some discussion of solutions to this problem on the
Xerces mailing list. There is actually a way to replace builtin java
packages via an "Endorsed Standards Override Mechanism", if they're on
the list of endorsed standards, which org.w3c.dom is.  This involves
putting the replacement package in an endorsed directory and setting a
system property to direct the JVM to look there for replacement
packages. But... whatever solution I use has to work with Java WebStart.
I can't find _any_ info on whether the package override mechanism works
with WebStart.  And even if it does work for some WebStart
implementations, I'd be wary of assuming it works for others -- it seems
like one of those things IT folks on the user end might get concerned
about.  I've also found other solutions to the package name clash, but
none that seems compatible with WebStart.

	So it looks like, considering my other constraints, if I want to
stick with DOM I'll need to code xml:base handling myself.  Looking at
the source code for Xerces2, doesn't look too hard.  Except... damn, the
getBaseURI() method implementation is actually commented in the Xerces
code as "Experimental".  Looking closer... um, I think it actually
doesn't implement the spec correctly.  Grr... 

To summarize, when it's time for my status report tomorrow, I think it's
best if I just remain silent.

	gregg

P.S. I suspect the answer for SAX will be similar.
P.P.S. XOM (http://www.xom.nu/) is starting to look pretty good, but I
may just be hallucinating at this point...
	
 
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Helt,Gregg
> Sent: Tuesday, February 07, 2006 11:01 AM
> To: Thomas Down
> Cc: DAS/2
> Subject: [DAS2] Working with xml:base in Java?
> 
> 
> 	Thomas, I'm wondering what toolkits you're using for binding XML
> to Java objects?  And particularly how you are dealing with resolving
> URIs when xml:base is used.  So far I've mostly used various
> implementations of SAX and DOM -- I've found some reports of builtin
> xml:base support in Xerces SAX/DOM, but it's still unclear.
> 
> 	I've been avoiding the issue up till now.  It won't be too hard
> to implement URI resolution relative to xml:base, but I thought I'd
> check around first and see if there's automated support of this in
some
> toolkit.
> 
> 	Thanks,
> 	Gregg
> 
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From td2 at sanger.ac.uk  Wed Feb  8 03:44:38 2006
From: td2 at sanger.ac.uk (Thomas Down)
Date: Wed, 8 Feb 2006 08:44:38 +0000
Subject: [DAS2] Re: Working with xml:base in Java?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9B9@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9B9@msex02.affymetrix.com>
Message-ID: <70790A43-AA5F-4F4A-8F20-50CDE30C7BB3@sanger.ac.uk>


On 7 Feb 2006, at 19:00, Helt,Gregg wrote:

>
> 	Thomas, I'm wondering what toolkits you're using for binding XML
> to Java objects?  And particularly how you are dealing with resolving
> URIs when xml:base is used.  So far I've mostly used various
> implementations of SAX and DOM -- I've found some reports of builtin
> xml:base support in Xerces SAX/DOM, but it's still unclear.
>
> 	I've been avoiding the issue up till now.  It won't be too hard
> to implement URI resolution relative to xml:base, but I thought I'd
> check around first and see if there's automated support of this in  
> some
> toolkit.

Hi Greg,

I'm actually using Stax (the streaming API for XML).  The  
implementation I use is called Woodstox:

          http://woodstox.codehaus.org/

(but there are a few others out there).  No builtin xml:base support  
but it's easy to write a little wrapper around XMLStreamReader to  
spot xml:base attributes and maintain a stack of base URIs.

I'm using java.net.URI to do the URI handling/resolution/ 
relativization.  Seems to be working okay... so far...

         Thomas.


From Gregg_Helt at affymetrix.com  Wed Feb  8 05:12:22 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 8 Feb 2006 02:12:22 -0800
Subject: [DAS2] RE: Working with xml:base in Java?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9C0@msex02.affymetrix.com>


> -----Original Message-----
> From: Thomas Down [mailto:td2 at sanger.ac.uk]
> Sent: Wednesday, February 08, 2006 12:45 AM
> To: Helt,Gregg
> Cc: DAS/2
> Subject: Re: Working with xml:base in Java?
> 
> 
> On 7 Feb 2006, at 19:00, Helt,Gregg wrote:
> 
> >
> > 	Thomas, I'm wondering what toolkits you're using for binding XML
> > to Java objects?  And particularly how you are dealing with
resolving
> > URIs when xml:base is used.  So far I've mostly used various
> > implementations of SAX and DOM -- I've found some reports of builtin
> > xml:base support in Xerces SAX/DOM, but it's still unclear.
> >
> > 	I've been avoiding the issue up till now.  It won't be too hard
> > to implement URI resolution relative to xml:base, but I thought I'd
> > check around first and see if there's automated support of this in
> > some
> > toolkit.
> 
> Hi Greg,
> 
> I'm actually using Stax (the streaming API for XML).  The
> implementation I use is called Woodstox:
> 
>           http://woodstox.codehaus.org/

I would like to check out Stax, haven't used it before.
 
> (but there are a few others out there).  No builtin xml:base support
> but it's easy to write a little wrapper around XMLStreamReader to
> spot xml:base attributes and maintain a stack of base URIs.
> 
> I'm using java.net.URI to do the URI handling/resolution/
> relativization.  Seems to be working okay... so far...

That's what I was thinking about when I said it wouldn't be too hard to
implement... But that was yesterday.  A long time ago.

Now I've taken a detour into re-reading the XML Base spec
http://www.w3.org/TR/xmlbase/, and things don't seem so easy.

I _think_ if there's at least one xml:base attribute in the element
hierarchy above where you're trying to determine a base URI, and
resolution of those xml:base attributes yields an absolute URI, it's all
good, that's the  base URI.  But on the other hand if this resolution
yields a relative URI instead of an absolute URI I'm not sure what
happens -- I would guess it's an error, but I can't see anywhere in the
XML Base spec that spells this out.  And if there's no xml:base to use
to determine a base URI, things get weird:
   if the document is "encapsulated within another entity", the base URI
is the URI of that entity (I have no idea if DAS/2 docs could appear in
such a context)
   otherwise the base URI is the URI used to retrieve the document
   oh, except if you burrow down into the spec pointers to RFC 2396
http://www.ietf.org/rfc/rfc2396.txt, if the request gets redirected you
need to make sure the base URI is the last URI used in the redirect
   oh yeah, and apparently external entity declarations can affect all
of this in ways I don't understand
   and there's probably other gotchas I've missed...

Now from the server side, none of this is really an issue.  Just pick
from a multitude of variants that XML Base allows when you send
responses to the client.  From the client side, if we really want DAS/2
to support XML Base (and I think we do), things get tricky.  It's
definitely pushing me towards using libraries that provide builtin
support for XML Base.

	Gregg


From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb  8 06:54:54 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 08 Feb 2006 11:54:54 +0000
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>	<e00e781866375762b29061d2b510a10e@fruitfly.org>
	<Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>
Message-ID: <43E9DC0E.30809@mrc-lmb.cam.ac.uk>

Allen Day wrote:
> Why have you chosen to make <id/> a subelement of <term/>?  Is it expected
> that there will be multiple IDs for a given term, and if so is there not a
> primary ID?  having an id attribute is a defacto standard for DOM libs, so
> you can call getElementById().

I'm curious about the DAS use of id attributes, especially given an 
expectation to use getElementById().

DAS has attributes that are URLs - they include the '/' character.

But getElementById() is an HTML or XHTML DOM method I believe.

Both HTML 4 and XHTML require that id attributes be of type ID, I think, 
and the ID type does not permit '/' characters (IDs are Names).

I find it pretty confusing that DAS uses an attribute that is called id 
that isn't an ID. And I'm curious to know if getElementById() works with 
it? Sounds like a sloppy implementation of the DOM. Or did I miss something?

Cheers, Dave


From dalke at dalkescientific.com  Wed Feb  8 11:36:11 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 8 Feb 2006 16:36:11 +0000
Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2)
In-Reply-To: <43E9DC0E.30809@mrc-lmb.cam.ac.uk>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>	<e00e781866375762b29061d2b510a10e@fruitfly.org>
	<Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>
	<43E9DC0E.30809@mrc-lmb.cam.ac.uk>
Message-ID: <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com>

Dave Howorth wrote:
> I'm curious about the DAS use of id attributes, especially given an 
> expectation to use getElementById().
>
> DAS has attributes that are URLs - they include the '/' character.
>
> But getElementById() is an HTML or XHTML DOM method I believe.
>
> Both HTML 4 and XHTML require that id attributes be of type ID, I 
> think, and the ID type does not permit '/' characters (IDs are Names).
>
> I find it pretty confusing that DAS uses an attribute that is called 
> id that isn't an ID. And I'm curious to know if getElementById() works 
> with it? Sounds like a sloppy implementation of the DOM. Or did I miss 
> something?

We've been talking about this and related matters most of the
day.  It started with Thomas' question "How do I get all of the
exons in the database which are from Vega?"  (Vega being some
other database.)

All of the features which are exons from Vega have the same DAS
data type.  This means he wants to do a feature query with
type = <the DAS type id>

He needs to get the DAS type id.  He can get all of the exons
using an ontology search.  But he wants to search for the string
"exon".  Given the discussion yesterday, will the type query
support "ontology='exon'" or must he use some other service to
convert "exon" to "SO:exon" or to "http://some/server.url"?

Suppose for now it is "SO:exon".  He does

     http://das.server/../types?ontology=SO:exon

That gets all of the exon types, but not the ones from Vega.
The Vega types have a source="Vega".  DAS type queries do
not support searching on that field.

PROPOSAL:  Add a "source=" (case-insensitive substring search)
field to the types query.  (I don't think there is any contention
here so I'll add it.)

     http://das.server/../types?ontology=SO:exon;source=Vega

That comes back with a single DAS type.

He now wants to search for all features with that type.  What
does he use for the query?  Is it (assuming proper escaping)

    http://das.server/../features?type=http://das.server/../type/T12345

?  That's rather excessive, especially if there are many
DAS types derived from the given ontology term.

All around people want to use "T12345" for that, and not the full
URL.  Are there people who do want to use the full URL?

The current system comes from saying the URL is the identifier
for a DAS object.

If as Dave points out we have a "id" which is a simple string
(of the format /[A-Za-z0-9_]+/ or so) then there's no problem.
We can use that for the query, as

    http://das.server/../features?type=T12345

PROPOSAL: do not use a URL for the identifier for objects

That fixes a few problems:
   - xml:base is no longer an issue; these are ids and not URLs
   - the names are short and sweet

It introduces a few problems.

Problem 1: a feature has a type.  How can the client get from the
type id to the type information if there is no URL to resolve?

   Solution 1: add a 'id=' term to the types query URL, eg
      http://das.server/../types?id=T12345
   (or possibly call it 'type=')

   Solution 2: append "/" + type id to the types query URL, eg
     http://das.server/../types/T1234

   Solution 3: have both an 'id' and an 'href' attribute

   Solution 4: the client downloads all the types and compares
    the id fields.

QUESTION:
   At Hinxton nearly all the DAS servers have only one or two types.
Ensembl has 45 types and Allen's has about 50.  Is it reasonable
to have clients just go ahead and download everything and not
worry about a query language?  Is Chado any different?

Problem 2: a feature can refer to its parent and part features.
It can refer to regions on other features.  How does a client get
information about the feature given the feature id?

   Solution 1: add a 'id=' term to the features query URL
   Solution 2: append "/" + feature id to the feature query URL
   Solution 3: have both an 'id' and an 'href' attribute


We discussed this a lot and decided on

PROPOSAL: add an 'id=' query to the types and features query.

We decided against solution 2 because of me - I don't like
working with URLs that way.  Thomas pointed out that an 'id='
query is useful, eg, if a feature has three parts then a client
can request

    http://das.server/../features?id=part1,part2,part3
(NOTE: we're also thinking of proposing this syntax for an 'OR'
query over the same term
    http://das.server/../features?id=part1;id=part2;id=part3
)

I pointed out that having both means there are two ways in the
server to look-up by id - extra machinery.

QUESTION: Who will want to refer to features and types by URL?

Possibilities:
   - hypothetical model where the queries return a list of URLs and
the server (through HTTP pipelining) asks only for the ones it
doesn't have already; saving bandwidth.  THIS IS NOT A USE CASE!

   - request a feature in a specific format (but that can be done
       through the query URL)

   - RDF people who want individually named items (not a use case)

?We couldn't come up with a case where someone would want to
refer to features and types as an individually named URL!

For segments there is a use case - you can ask for sequence by
range, and that's through the segment URLs.  However, that could
be done with the segment query URL so it's not a strong use case.
In any case, it hasn't been a problem so I'll put that off for now.

That being the case, there's no need to consider "Solution 2".
Why have URLs if no one wants to use them?

What did come up during the discussion here was that we had
planned to use URLs for writeback.  That model seems rather
nice.  "DELETE" and "PUT" to the correct URLs, rather than
going through a "POST to delete.cgi?type_id=", etc.

The model for writeback was something like "ask server to make
a copy, with region A:C available for editing.  User works
with region.  User commits region back to server."

In that case, the request for region might as easily make a
copy of the source, available through a special URL visible
only to that one user.  In this copy it can expose "url="
attributes for editing, perhaps also with a "writeable=" field
because some features will not be editable for that user.

I complained yesterday about "writeable" but that was because
for the general purpose server the concept of "writeable" was
user-specific and not appropriate.  In this writeback model
it's just fine.

Another thing came up during discussion of this.  Roy yesterday
proposed the idea of a simple server which only supports getting
"everything".  It doesn't support the DAS query specification.
That is, it only supports

   http://das.server/../types
   http://das.server/../features

and fetching those returns everything.  This is useful for small
data sets because those could be simple files, like

   http://das.server/../types.xml
   http://das.server/../features.xml

Still, for that case there would need to be "feature/F1", "type/T2",
etc.  In essense, a duplicate of every record.

Last December during discussion people said there was no use
case for this sort of flat-file oriented server.  This was not
a design goal.

Thomas mentioned that there is a use case.  Uploading of DAS
tracks to a server.  People complain now that it's hard to
do that.  With this url-less model people can upload a small
number of documents (or at .zip file of a directory) with
the versioned source, types, and features data.

<!-- this is "sources.xml" -->
<VERSION>
   <COORDINATES ... />
   <CAPABILITY type="types" query_url="types.xml">
     <FORMAT name="das2xml">
     <SUPPORTS name="all" />
   </CAPABILITY>
   <CAPABILITY type="features" query_url="features.xml">
     <FORMAT name="das2xml">
     <SUPPORTS name="all" />
   </CAPABILITY>
</VERSION>

<!-- this is features.xml -->
<FEATURES>
</FEATURES>

<!-- this is types.xml  -->
<TYPES>
</TYPES>

There is no need to have an "exploded" copy of all of the
records in parallel to the types and features xml files.

Big Advantage:

Stylesheets are much easier to write.  Refer to fields by
short id instead of long URL.

Conclusion:
   Proposal 1: "id"s are of the form /[A-Za-z0-9_]+/
   Proposal 2: FEATURE and TYPE elements have an option "url"
             (or "href") attribute
   Proposal 3: the feature and type queries support a 'id=' search
   Proposal 4: the type query supports a "source=" search

Churn factor:
   Allen's server doesn't need the 'type/' and 'feature/' fields
   Gregg and others don't need to worry about xml:base any more.
   Type and feature lookups need to track the query URL as well
     as the type and feature id
   We need a new 'id=' search capability

These don't seem big on a programming sense, more a conceptual one.

					Andrew
					dalke at dalkescientific.com


From cjm at fruitfly.org  Wed Feb  8 13:03:41 2006
From: cjm at fruitfly.org (chris mungall)
Date: Wed, 8 Feb 2006 10:03:41 -0800
Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2)
In-Reply-To: <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>	<e00e781866375762b29061d2b510a10e@fruitfly.org>
	<Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>
	<43E9DC0E.30809@mrc-lmb.cam.ac.uk>
	<701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com>
Message-ID: <94bafd156da54842f9093244ca6083d1@fruitfly.org>


I'm mostly skim the messages here, so I may be missing something, but 
I'm a little confused by this:

On Feb 8, 2006, at 8:36 AM, Andrew Dalke wrote:

>
>     http://das.server/../types?ontology=SO:exon

I don't understand this - SO:exon isn't an ontology

>
> That gets all of the exon types, but not the ones from Vega.
> The Vega types have a source="Vega".  DAS type queries do
> not support searching on that field.
>
> PROPOSAL:  Add a "source=" (case-insensitive substring search)
> field to the types query.  (I don't think there is any contention
> here so I'll add it.)
>
>     http://das.server/../types?ontology=SO:exon;source=Vega

What does 'types' return? A type from an ontology (eg SO:exon) or 
something else? Why would source be recorded here? Surely source would 
be a valid constraint on a feature query, but not a type query.

Perhaps it's the case that in DAS a 'type' means some kind of arbitrary 
grouping (eg features of type X and source Y), and 'ontology' means a 
term/type from an ontology. If it isn't too late I'd suggest changing 
these conventions.


From Gregg_Helt at affymetrix.com  Wed Feb  8 13:12:46 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 8 Feb 2006 10:12:46 -0800
Subject: [DAS2] Why use URIs for feature IDs?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>

      Regarding using URIs for DAS features, here's the quote from Paul
Prescod that I used in the original DAS/2 grant proposal addressing the
question "why use URIs?".  From
http://www.prescod.net/rest/rpc_for_get.html : 

You can give that URI address to anyone, anywhere and they can reuse it.
In particular this means that we can compose applications that were not
thought of in advance. Google is an example of an application that was
composed "after the fact" out of URIs. Yahoo is another...There are a
raft of deployed W3C recommendations that work with information related
through URIs. Many of these are XML-related specifications that work as
well in API-like applications as in user interface-based applications.
These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery,
xml-stylesheet.  Information published through HTTP URIs can be combined
through XInclude, queried and sorted through XQuery and XSLT, visually
rendered with xml-stylesheet, related through RDF, linked through XLink,
pointed into through XPointer.


From dalke at dalkescientific.com  Wed Feb  8 14:24:06 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 8 Feb 2006 19:24:06 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
Message-ID: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com>

Yes.  I like URLs.  I've been so in favor of URLs that until
this morning I had in the spec that the "id" *is* the URL.
There was no short form for the URL.  (still /is/ no short form
since it hasn't changed ;)

That meant several things:
   - everyone needs to disambiguate through the xml:base to
      figure out if two features are the same. (Neither Gregg nor
      Thomas liked that)

   - queries of the style we are doing become more complex
      (type=http://www.server/path/to/das/type/000A956826C8  vs.
       type=000A956826C8 )

   - passing URLs about make for bigger XML, hence slower.

The first is technical.  The second is emotional - that sort of
query looks ugly.  The last is .. I can't speak for the last.
In an earlier email I showed how a different site layout can
be as efficient as any id scheme.  Quickly, use
    http://www.../volvox/1/S      <- versioned source URL
    http://www.../volvox/1/T?..   <- types query url
    http://www.../volvox/1/T001   <- type urls
    http://www.../volvox/1/F?..   <- feature query urls
    http://www.../volvox/1/F001   <- type urls
and don't worry about any sort of hierarchy in the system.
Everything has the xml:base of "http://www.../volvox/1/"
so relative URLs are trivial strings.


Several said "just chop off the last bit of the URL to get
the id" or "combine some base feature URL with the feature
id to get the full URL."

Why is that useful?  Lincoln said on today's phone call that
he wants both a URL and an id, and expected that both would
be there.

I'm now going to be either stubborn or irritating or both.
Why have an id at all?  That is, why at all have a short string
(say of the form /[A-Za-z0-9_]/ when the URL is there and
meets all the functional requirements of an identifier?

(I'll use 'id' to refer to a short string, 'url' to refer to
a URL.  Both are identifiers.  I should be using 'uri' for
the latter, I know.  See comment below.)

Today I thought I came up with one reason to have ids and
to have a non-existant URL for a <FEATURE> element.  I
think now that I was wrong.

My use case was for uploading data to the Emsembl viewer
to display a new DAS track.  Put all of the types into one
file, in the types XML format.  Put all of the features into
another file in a features XML format.  Use arbitrary ids for
cross referencing, because there is no URL for them - they
don't exist in any form outside the document.

Upload them to the server.  The server reassembles the
annotations by cross referencing the ids.

I now see that that's a mistake.  As Gregg corrected me,
they use URIs not just URLs.  They could use
"das_private:ABC123" or a fully-qualified URL or a
xml:base and the partial URL or whatever scheme.  All
the server needs to know is how to compare the two URI
strings.  It's free to rename the strings if need be.

(Could it keep the original URLs?  Perhaps, but the
original data might not be accessible.  Consider an
exon predictor whose output you want to upload to the
Ensembl viewer.  There is no URL for that.)


Given that this isn't a valid use case for having an 'id'
and not having a 'url' now I ask again, what's the point of\
having *both* a unique URL and a unique 'id' for the elements?

Tradition?  Elegance?

With Dave Howorth's comment about the specialness of 'id'
I can see changing the attribute name to 'url'.... or 'uri'.

I've got to write a couple paragraphs for Nomi now.
I'll leave with the following comment from

http://tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages

> Designing XML Languages is hard. It?s boring, political, 
> time-consuming, unglamorous, irritating work. It always takes longer 
> than you think it will, and when you?re finished, there?s always this 
> feeling that you could have done more or should have done less or got 
> some detail essentially wrong.


					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Wed Feb  8 16:46:37 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 8 Feb 2006 13:46:37 -0800
Subject: [DAS2] Re: New DAS/2 server for codesprint
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9C5@msex02.affymetrix.com>

Following Steve's suggestion, I'm focusing on the region around YGL076C
(also known as RPL7A) on the yeast genome to get a small slice of
feature XML back from the codesprint server for a region where I know
what the genes  should be:

http://das.biopackages.net/das/genome/yeast/S228C/feature?overlaps=chrVI
I/364251:366080;type=SO:gene

This returns the YGL076C gene with three CDS and two introns.  A nearby
snoRNA also gets returned. 

	Gregg

> -----Original Message-----
> From: Chervitz, Steve 
> Sent: Monday, February 06, 2006 5:03 PM
> To: Helt,Gregg; Allen Day
> Cc: DAS/2
> Subject: Re: [DAS2] Re: New DAS/2 server for codesprint
> 
> 
> 
> There's a gene (RPL7A) with two introns on chr7 at roughly 
> 366kbp - 364kbp: 
> http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YGL076C
> 
> Most genes with introns in cerevisiae (which aren't many) 
> have just a single intron that creates a small 5' exon, such 
> as the alpha and beta tubulin genes on chr13. Tub1 is on 
> chr13 at 99Kbp, and tub3 is also on chr13 at 23Kbp. So the 
> first 100Kb of chr13 would be another region to try. 
> http://db.yeastgenome.org/cgi-bin/locus.pl?locus=tub1
> 
> Steve
> 
> 
> > From: "Helt,Gregg" <Gregg_Helt at affymetrix.com>
> > Date: Mon, 6 Feb 2006 16:14:55 -0800
> > To: Allen Day <allenday at ucla.edu>
> > Cc: DAS/2 <das2 at portal.open-bio.org>
> > Conversation: [DAS2] Re: New DAS/2 server for codesprint
> > Subject: RE: [DAS2] Re: New DAS/2 server for codesprint
> > 
> > 
> > Allen, can you recommend a reasonable region on yeast to do 
> a features 
> > query that will return features with some hierarchy (like 
> > transcript/exons)?
> > 
> > Thanks,
> > Gregg
> > 
> > 
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org 
> > http://portal.open-bio.org/mailman/listinfo/das2
> 
> 


From Steve_Chervitz at affymetrix.com  Wed Feb  8 16:47:18 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Wed, 08 Feb 2006 13:47:18 -0800
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
 8 Feb 2006h
Message-ID: <C00FA6E6.1BD57%Steve_Chervitz@affymetrix.com>

Notes from the DAS/2 teleconference for the code sprint, 8 Feb 2006

$Id: das2-teleconf-2006-02-08.txt,v 1.1 2006/02/08 21:51:14 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  Sanger: Thomas Down
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris
  UCLA: Allen Day, Brian O'connor
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda:

* progress report for grant renewal
* ontologies
* ids and urls
* style sheets
* status reports

Topic: Progress report for grant
--------------------------------

gh: needs to be in the mail by 5pm tomorrow, to be included as a hard
copy addendum to grant. will improve chances of funding for next cycle.
review will be done be end of feb.
nh: no later than 4pm pst today. state what you've accomplished since
Nov 1 and now, in particular this week. one or two paragraphs.
gh: 
1. highlight significant enhancements
2. involvement of sanger, ebi
3. registry work from andreas, http spec for that registry
4. writeback 

ad: andreas worked on registry server, will send write up soon post
telelconference. 

[A] Everyone write up 1-2 paragraphs of progress and send to Nomi ASAP


Topic: Ontologies
-----------------

gh: concerned about ontol attrib in types doc because, do we want it
to be possible for a type to be an instantiation of multiple terms in
the ontology.
ls: will make it hard to validate. one type = many ontol terms. don't
like it. types will be specializations of SO terms and will not have
multiple parents.
gh: thinking about people doing curation. if a type is anchored to one
tern in the ontol, and a feat can have only one type, a feat won't be
able to refer to >1 term in SO.
ls: any use case for this?
gh: still exploring this. eg., both a computed feature and an exon?
ls: no. separate category for predicted genes.
gh: is there something for 'computed exon' or 'computed cds'?
ls: think so.
sc: multiple branches like go?
ls: multiple relationship types do exist. something can be is_a or
part_of.
I wanted das/2 to be limited to what you can say in SO, with notion
that you can extend it. e.g., three predicted exons one with genefinder,
exonerate, etc.

ad: given a string 'exon' how does that get used to query server?
ls: find exon SO term, download list of types from das server, find
everything that inherits from exon ontology term.
clients need to know how to search the SO list.
they will have a local copy of SO that they'll refresh from time to
time.
gh: client isn't required to know the full structure, except maybe to
search higher-level terms. but the term in the ontology attribute is
sufficient. 
ls: could just search types and desc to find exons, but that relies on
implementer describing their types correctly.
gh: if a client wants to understand an ontol, the best way to go is
via what allen's proposing, searching via ontology das, preferably via
NCBO server.
ad: what is the actual string we're searching on?
aday: name or definition, or id.
ls: client should have a copy of the SO. unambiguous in this opinion.
client has SO, looks through types XML to find what the local types
are which the server supports which match what it's looking for in the
SO.
here's a flowchart:

- client downloads SO, caches.
- client downloads seq types list, caches.
- user searches to find exon
- client looks to find matches against 'exon', maybe 5 hits.
- prompts user to select which he's looking for
- client looks thru cached types xml to find server types of SO term
  that user selected
- client does feature query.

ad: what is the string that the user is looking for URL or string?
ls: in type xml how do we indicate the term?
gh: we've been discussing this the past few days
ls: why not replace the term with SO accession number? then we don't
have to figure out the correct representation of ontology in an
xml. can finish this by friday. chris mungall has weighed in, and xml
version of SO ontology is not completely stable.

gh: perferctly ok for client to know nothing about SO and treat these
as unique string.
ls: right. names will eventually be things like 'exon'.
aday: chris's main complaint is that the doc didn't validate. I didn't
have a dtd. got it and now it validates. I thought this was a done
deal. there is a document written that describes how to do what we're
talking about.
ls: the only thing to be resolved, in types xml document, how do we
refer to SO terms?
aday: an attribute there that allows you to put in uri. it's a
relative url that points to ontology das server to get obo xml for
that term.
ad: how do I go from string 'exon' to find out what that is?
aday: 
ls: lets say administrator of das server has local type called
foobar. associated w/ url for SO 'exon' term. andrew's question is,
user want's to search for exons, how to go from 'exon' to correct url
in SO to find what types correspond to that? what's to go from 'exon'
to foobar. 
aday: search SO for exon, local types.
there's a filter onontolgy that lets you search all terms and
definitions
gh: there's a reqt now that server must understnd parent child
relationships in ontology.
aday: server could do xpath query to pull out the terms you're
interested in w/o understanding ontology
ls: user types 'exon' returns all feats in the genome that are exons.
aday: two servers, feat and ontol server
gets all types from feat server, each has url to ontology das server,
maybe multiple ontology das servers. each must have it's ontology
searched returns supported or not. client assembles all search results
from static obo xml documents,
gh: for most clients this will be irrelevant. user will get a list of
types - genscan, blat alignment, for things they may be interested
in. they don't need to understand ontology nor does client. there may
be a url to look up info about the term. this is the typical
case. more sophisticated use cases can be put off till later.
ls: in types xml can we have two attributes, url and accession
so_accession="SO:12414", other will be url for obo xml.

[A] types will have separate attributes for URI and SO accession number

Topic: IDs and URLs
-------------------

ad: discussion about searching for exon, use case: client goes to
server to get list of all types, wants all features of a
given type in a given range. may filter based on contains or inside,
das-type=xxxxx. 
talking about that being a URL to get full name for it.
what is the thing you send to server to ask for the types?
gh: url
ad: make this an id so it's not a long complex url. just an id
specific to that server. such that you go to feat query url and get
it.
ls: can just chose the last component of the url, type id.
ad: why have ability to get feature type individually?
ls: will have to be uniquified, by adding url to types query.
ad: feat query =
ls: isn't this the way it was?
gh: every feat has unique uri.
ad: talking about filtering and querying.
ls: just give it the id not the whole url.
ad: now it is the url
ls: should be the id
does it make sense to be something that another server has defined?
probably not. just a local type.

[lots of back and forth here, didn't catch it all...]

ad: do we need ability to refer to feature or type by url?
gh: yes. for making rdf statements about das2 features.
ad: who will do this?
gh: I will if no one else does. web technology is moving in this direction.
ls: we want every object a das server serves to be referencable as a
url/uri. as for filtering mechanism, for type filter we can just use
the id of the type, a short string.
ad: agree, as of this morning the url and id are same thing.
ls: a relative uri, by definition the server should implicitly attach
the versioned data source url to it.
ad: xml processors
ls: define the way the filter query mechanism, hard code implicit
paths into it.
ls: featuresquery?type=something if 'something' has no slashes, server
implicitly adds http://myserver/das/types/...
ad: don't like pasting urls and strings together to get things.
don't like queries with implicit logic like that.
ls: perfectly happy saying you can use urls in the query strings. I'd
go with short ids
ad: propsing we have both, id and href. here's the case: people
uploading to server want to provide a das track, can provide two
documents. works well for < 1000 features

gh: we have to have uri for features.
ad: why?
gh: I will send you the page from the first grant.
ls: main reason is: to avoid namespace clashes when integrating data sets.
td: what do you mean by integrate?
ls: view of features from 4 diff annotation groups, want to search for
a particular feature by its id, need to indicate which data source
it's coming from.
td: won't you be keeping track of which data source anyway?
you never get a track that's a mixture of diff sources.
gh: dangerous to do this.
td: there must be something keeping track of which track is from.
gh: my assumption is that this is with uri
td: there's nothing that constrains a server to only use uris from itself.
gh: we sacrificed this when we went with capabilities.
ls: a server can emit a set of features, some use relative uris and
some absolute ones. if my server starts emiting features with
affymetrix uris, the assumption is these originate from affymetrix.
uris indicate that they originate from diff places even though you may
physically get them from a das server at a different location.
gh: thomas is right. given a feature uri you have no way to tell which
das server it came from. clients must keep track of this themselves.
ls: we wanted to divorce the origin of the feat from the sever that
serves it. should be possible to serve features that come from
somewhere else.
gh: making feature uri opaque was deliberate.
ad: when you do a feat query it could return the whole db. so the
server must know how to return a feature document that contains all
features. that server must know all the data.
gh: don't see problem
ad: all features and types have id and url. different. url is optional
gh: no, required. also, not url, but uri.
ad: ok. why should all records have a uri?
gh: compatibility with semantic web/rdf, lsid, future proofing.
ad: if they want to they can, if not they shouldn't be required. no
one is doing rdf now.

ls: what issue are you concerned about with respect to uri?
ad: like ontology search. give me all features of this das type, you
then have to give the url. this is different than id.
ls: completely happy treating id as the last component of uri and
doing a paste. why don't you like the paste?
ad: you can get features from two diff places, each ending with same
last word.
ls: what query is it that allows you to filter by feature id? we have
positional, type filtering and getting a single feature from server of
origin.
gh: there shouldn't be an id filter. just resolving uri for that
feature.
ls: we can't search a feature by regex match on it's id.
ad: i'm not saying that. I'm suggesting that the url be optional.
ls: I don't understand the point.
gh: why can't uri be required?
ad: see use case in email today subject="ids and urls". involves
uploading das tracks to a server.

[some trouble: not everyone has seen it]

ls: I say we have a policy that if there is big discussion, the email
should come more than 30 minutes before conf call.
gh: I've read most of it and am still confused.
ls: I still don't understand it after reading. you'll have to rephrase
it.
ad: all types and features have id and url.
ls: no, explain in a follow up email.
ad: ok

[A] Andrew will send follow up email to elaborate on his "ids and urls" use
case

[A] Everyone will try to absorb andrew's ids and urls use case

Topic: Style Sheets
-------------------

ad: how do you refer to elements in style sheets, by id or url?
gh: no opinion
ad: if everything is refered to by id, that makes style sheets easier to
write.
gh: has anyone gotten to implementation of style sheets for das/2?
ad: my proposal was a straw man.

Topic: Status reports
---------------------

gh: reading lots of specs. after yesterday's rant about xml:base last
night, implemented a stack. works fine for our current server.
we shouldn't throw out xml:base because of a few edge cases. we might
want to specify which subset of xml:base we use.
checked in code for igb client, does capabilities, specify feat,
types, segments. trouble when modeling sequences.

ee: working on das/2 client. building new widget as gregg asked for.

ad: working with andreas write up for registry.

td: understanding the spec. xml parsing.
gh: you are using stacks, have experience with it?
td: yes, less painful. streaming api for xml.
gh: tried xom. picky about namespaces. difficult to use with spec
that's not stable.
td: some trouble with dom
gh: sources, types, segments I use dom (small document). for features
use sax

nh: progress with apollo. list of versioned sources, show segments,
user picks, gets features. something that the parser doesn't like.
not sure where the problem comes from.

sc: working on setting up internal das server on 64bit machine
here. refining the pipeline for generating files for loading the affy
das server with updated data for various public and affy data
sources. also writing up and posting meeting notes.

aday: message from gavin about ontology responses. caching issue cased
trouble with model/controller. chris's obo dtd.
dependencies for server rpm were finished. now building the rpm.

td: prsing xml from codesprint server. a few things are matching the
spec from a few weeks back. prop, loc elements. will these be changed.
aday: feature xml?
td: yes. I'm still absorbing the changes, dozens of mails about feat
properties.
gh: more important is loc element, splitting into id and range. used
to be one thing, now is two. one is id, other is start,end,strand.
aday: will look into today.

nh: I'm also taking charge of getting grant progress report
done. especially need allen re: server, andreas via registry.

gh: any reports for write back.
brian: some work on that. not ready for prime time.
gh: roy?
ad: some talk about this puts and deletes on the urls.
gh: let's talk about it tomorrow.


From td2 at sanger.ac.uk  Wed Feb  8 18:20:34 2006
From: td2 at sanger.ac.uk (Thomas Down)
Date: Wed, 8 Feb 2006 23:20:34 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
Message-ID: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk>

[I should prefix my comments here by saying that I don't actually  
have a terribly strong opinion on this matter *except that* I'd  
really like the spec to be explicit on how feature query language  
works...  Does it go .../features?type=exon, .../features?type=types/ 
exon, or .../features?type=http://das2.sanger.ac.uk/ensembl35/types/ 
exon?].

Anyway, I'm still having a bit of trouble seeing why features need  
individually GETable URIs.  The use case I remember from the  
conference call was that it would be nice to be able to describe DAS/ 
2 features in RDF documents.  I guess that makes sense to me, but for  
this purpose is there anything wrong with a URI like:

            http://das2.sanger.ac.uk/ensembl35/features#id12345

This seems compatible with Andrew's ID proposal.

My memory of RDF/DAML/OWL/etc is that most objects which get  
described in such documents are actually fragment identifiers in  
larger documents, rather than individually GETable entities.  Am I  
missing something here?

                Thomas


On 8 Feb 2006, at 18:12, Helt,Gregg wrote:

>       Regarding using URIs for DAS features, here's the quote from  
> Paul
> Prescod that I used in the original DAS/2 grant proposal addressing  
> the
> question "why use URIs?".  From
> http://www.prescod.net/rest/rpc_for_get.html :
>
> You can give that URI address to anyone, anywhere and they can  
> reuse it.
> In particular this means that we can compose applications that were  
> not
> thought of in advance. Google is an example of an application that was
> composed "after the fact" out of URIs. Yahoo is another...There are a
> raft of deployed W3C recommendations that work with information  
> related
> through URIs. Many of these are XML-related specifications that  
> work as
> well in API-like applications as in user interface-based applications.
> These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery,
> xml-stylesheet.  Information published through HTTP URIs can be  
> combined
> through XInclude, queried and sorted through XQuery and XSLT, visually
> rendered with xml-stylesheet, related through RDF, linked through  
> XLink,
> pointed into through XPointer.
>
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Thu Feb  9 04:35:19 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 09:35:19 +0000
Subject: [DAS2] Re: New DAS/2 server for codesprint
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9C5@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C5@msex02.affymetrix.com>
Message-ID: <f9b035387c49e21c30707eb2df61c3b2@dalkescientific.com>

In the das2/scratch directory is a program called "verify_examples.py"
I ran it against

http://das.biopackages.net/das/genome/yeast/S228C/feature?overlaps=chrVI
I/364251:366080;type=SO:gene

as follows

[guest276:das/das2/scratch] dalke% python ./verify_examples.py
load FEATURES  
"http://das.biopackages.net/das/genome/yeast/S228C/feature? 
overlaps=chrVII/364251:366080;type=SO:gene"
! expected root tag  
'{http://www.biodas.org/ns/das/genome/2.00}FEATURES' got  
'{http://www.biodas.org/ns/das/2.00}FEATURELIST'
^D
[guest276:das/das2/scratch] dalke%

That is, it's a simple command language.  The command to
load a URL of the given type is

   load FEATURES "url"

In this case it warns that the top-level name is "FEATURELIST"
instead of "FEATURES", which is something that was changed
last summer, I think.

Saving locally and editing by hand I then get

! expected root tag  
'{http://www.biodas.org/ns/das/genome/2.00}FEATURES' got  
'{http://www.biodas.org/ns/das/2.00}FEATURES'

That's because

<FEATURES
   xmlns="http://www.biodas.org/ns/das/2.00"

should be

<FEATURES
   xmlns="http://www.biodas.org/ns/das/genome/2.00"

according to the spec.  I don't like the namespace though.

*** Does anyone mind if we change the namespace URL?  ***

Next is
* fatal: file not found: http://www.biodas.org/dtd/das2feature.dtd

That occurs because the XML says that it requires the DTD
to be understood (with the 'standalone="no"' at the top)

Taking that out and the DTD link,

*  
file:///Users/dalke/cvses/das/das2/scratch/biopackages_features.xml:10: 
4: error: attribute "type" not allowed at this point; ignored

That should be "type_id" instead of "type".  I've used "id"
as a convention to indicate that something is a URL inside of
DAS.  Change it to "url" or "uri" instead?

The PARENT should be after the LOC.  However, I think that the
ordering requirement is too fragile so I'll change the schema
so the elements can go in more arbitrary order.

There was an issue with the <PROP> element.  I'll explain
in the next email.

*  
file:///Users/dalke/cvses/das/das2/scratch/biopackages_features.xml:95: 
57: error: element "LOC" from namespace  
"http://www.biodas.org/ns/das/genome/2.00" not allowed in this context

That came from

   <FEATURE
     id="feature/Affymetrix_YG-S98:3128_at"
     type_id="type/SO:PCR_product"
     name="Affymetrix_YG-S98:3128_at"
   >

       <LOC id="segment/chrVII" range="319671:774428:-1"/>
       <LOC id="segment/chrVII" range="735985:736081:1"/>

   </FEATURE>


The RNC had a bug - it only allowed a single LOC element.  Fixed.


I've updated the schema and committed a copy of a features
data set from Allen's server to CVS under
    das/das2/scratch/biopackages_features.xml


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Feb  9 05:00:45 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 10:00:45 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
	<7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk>
Message-ID: <631777ea4ff08f68b6dde657effc18fe@dalkescientific.com>

Thomas Down wrote:
> Anyway, I'm still having a bit of trouble seeing why features need 
> individually GETable URIs.  The use case I remember from the 
> conference call was that it would be nice to be able to describe DAS/2 
> features in RDF documents.  I guess that makes sense to me, but for 
> this purpose is there anything wrong with a URI like:
>
>            http://das2.sanger.ac.uk/ensembl35/features#id12345

For that matter, the spec doesn't at present say that the
individual URLs need to be fetchable.  A client could treat them
as opaque and unresolvable URLs and still do what it wants.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Feb  9 06:15:18 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 11:15:18 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
	<4d48d7749f04bb97f6791a5726230fee@dalkescientific.com>
Message-ID: <4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com>

I'm going to incur the possibility of pitchforks here.. :)

Me:
> Yes.  I like URLs.  I've been so in favor of URLs that until
> this morning I had in the spec that the "id" *is* the URL.
> There was no short form for the URL.  (still /is/ no short form
> since it hasn't changed ;)
>

> I'm now going to be either stubborn or irritating or both.
> Why have an id at all?  That is, why at all have a short string
> (say of the form /[A-Za-z0-9_]/ when the URL is there and
> meets all the functional requirements of an identifier?

Here's the change - or not change since it reflects the
current spec.

Features and types have a single "id".  That id is a uri
in all its glory.

Referring to Dave's email, yes, special characters are
included - this is a uri.  Looking at
   http://blog.bitflux.ch/wiki/GetElementById_Pitfalls
the getElementById refers to the attribute with type "ID"
which happens to be named "id" for XHTML and SVG.  Given
   http://www.w3.org/TR/xml-id/
I have added xml:id as a common attribute for all of the
DAS items for independent and optional identification of
an element in a document.

There is no short-form id for features and types.  Queries
are done using the full URL.  For example, to find all elements
of type "http://www.example.com/das2/human/1/type/T12345" the
query string (assuming the query url is ".../1/feature_search.cgi")

   http://www.example.com/das2/human/1/feature_search.cgi?
type=http%3A%2F%2Fwww.example.com%2Fdas2%2Fhuman%2F1%2Ftype%2FT12345

The single and sole exception is for range queries.  Each
segment has a URL and a "name" attribute.  This name is a
unique short-form identifier used for range queries.  The name is
of the form /[A-Za-z_][A-Za-z_0-9]*/ .  To do a range query
for all features on a segment with name Chr1 and range 50 to 100 use
the format "X/50:100" and the query looks like

   http://www.example.com/das2/human/1/feature_search.cgi?
overlaps=X%2F50%3A100

The reason for this exception is three-fold:
   - the syntax for merging the URL and two/three fields became ugly
   - Gregg wants to send multiple ranges at a time, if the client
       knows enough about what it has already
   - the client may consult one of several reference servers given
      the coordinate system for the annotations.

These do not hold for feature types (features are independent
objects; there will be at most a handful in most servers; the
types are specific to the given set of features)

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Feb  9 06:41:35 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 11:41:35 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
	<4d48d7749f04bb97f6791a5726230fee@dalkescientific.com>
	<4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com>
Message-ID: <0255ae96de376ffd89e2af0d9766aed6@dalkescientific.com>

> I'm going to incur the possibility of pitchforks here.. :)

To mollify or intensify the pitchforks ...

Several people have said that "the id is the last component
of the URL" or "the URL is the base + '/' + the id".

That's what DAS1 did.  I don't like URL construction like
this. It makes the URL organization imposed by the specification
when it doesn't need to do so.  For example,

Allen prefers his URLs like this
    /feature?this=that    is the query interface
    /feature/F00001       is an identifier for the features

I might like it like this
    /feature_search.cgi?..   is the query interface
    /feature/F00001       is an identifier for the features

Still others as
    /features?this=that   is the query interface
    /feature/exon/A1      is an identifier for the features
    /feature/contig/A     is another identifier for the features

** NOTE: in this case the "last term of the URL" is not
sufficient as a unique short-form id  **

Or still others as
    /cgi-bin/fsearch.rb?this=that   is the query interface
    /data/F1                   is an identifier for the features
    /data/F2                   is another identifier

One advantage to hard-coding the URL organization into the
spec is the tradition from DAS1, and the general practice of
expecting one-off URL schemes during web scraping.

Another is that people understand it more easily.  It's
a lot easier to write out examples in one naming scheme than
it is to say "using the identifier from the record ..."

On the other hand, the programming is easier.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Feb  9 06:48:02 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 11:48:02 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <631777ea4ff08f68b6dde657effc18fe@dalkescientific.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
	<7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk>
	<631777ea4ff08f68b6dde657effc18fe@dalkescientific.com>
Message-ID: <2878cecec027ce28826c48d1a3a68e30@dalkescientific.com>

Churn factor:

The only part of the spec that changes is the query interface
for types.  The type feature filter must take a full URL and
not a partial URL nor a non-existant 'short id'.

Allen's server does not support queries given the full URL.

Here's what the spec says -- note that it quotes the previous
draft and I added some comments.

> Query parameter "type"
>
>   type=type_url
>
> Example:
>   $FQ?type=http%3A%2F%2Fwww.biodas.org%2FtypeA
>
> Match features with the given feature type.
>
> XXX the previous version of this document says
>
> Match features of the given type. A type is one of:
>   1. a typeid returned by the feature type document described
>   earlier. Only features exactly matching the type are returned.
>
>   2. a sequence ontology term, such as "exon". Features matching the
>   term or *any of its ISA descendents* are returned.
>
>   3. a sequence ontology accession number, such as SO:12345. Features
>   matching the accession number or *any of its ISA descendents* are
>   returned.
>
>   4. a reserved type beginning with the namespace "das:". The only such
>   reserved type is currently "das:feature-lock", used for feature
>   updating.
>
> XXX I think we should only have it do 1.  For 2 and 3 use the query
> parameter 'ontology'.  For 4, use a different query term, or don't use
> locks as features.

Based on the discussion yesterday, this changes to:
   1. we support this one, with fully resolved URLs
   2. the searching is done in the client so this option is removed
   3. the searching is done in the client so this option is removed
   4. we can always define "http://www.biodas.org/spec/special-type"
as a URL to send to the server if we want to define a special query.


					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Thu Feb  9 10:27:57 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 07:27:57 -0800
Subject: [DAS2] Why use URIs for feature IDs?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9C9@msex02.affymetrix.com>

I think that as Thomas says, using URI fragment notation, 
http://das2.sanger.ac.uk/ensembl35/features#id12345
is a perfectly valid URI and thus is acceptable as a feature ID.

But, if the intent is to construct feature URIs using fragment
identifiers in combination with either ID attributes (as defined in a
DTD) or xml:id attributes, as an alternative approach to URI = ID
attribute with xml:base resolution, I think it would get messy.

As I understand it a fragment identifier approach would mean
URI = (URL of doc feature XML is embedded in) + "#" + value of feature's
ID attribute.  But then if the feature is returned as part of a query,
say:
http://das2.sanger.ac.uk/ensembl35/features?overlaps=chr1/10000:20000
and the feature with attribute id="id12345", then the feature URI using
standard fragment notation would be 
http://das2.sanger.ac.uk/ensembl35/features?overlaps=chr1/10000:20000#id
12345
In other words there would be a very large number of possible feature
URIs, with query string gunk in them, identifying the same feature.
Unless we define a nonstandard way of constructing fragment identifiers
that chops off the query string.

Instead of something nonstandard I'd rather use xml:base, adhere to the
XML Base spec, and allow the feature id attribute to be full or relative
URIs.  Then specifying in the top element that 
xml:base = http://das2.sanger.ac.uk/ensembl35/features/, a feature
returned by the features query whose with attribute id="id12345"
resolves the feature URI to:
http://das2.sanger.ac.uk/ensembl35/features/id12345

There might even be a way to fiddle with xml:base and id to use a "#"
instead of the last "/", though I'm not at all sure about that.

	gregg

> From: Thomas Down [mailto:td2 at sanger.ac.uk]
> Sent: Wednesday, February 08, 2006 3:21 PM
> To: Helt,Gregg
> Cc: DAS/2
> Subject: Re: [DAS2] Why use URIs for feature IDs?
> 
> [I should prefix my comments here by saying that I don't actually
> have a terribly strong opinion on this matter *except that* I'd
> really like the spec to be explicit on how feature query language
> works...  Does it go .../features?type=exon, .../features?type=types/
> exon, or .../features?type=http://das2.sanger.ac.uk/ensembl35/types/
> exon?].
> 
> Anyway, I'm still having a bit of trouble seeing why features need
> individually GETable URIs.  The use case I remember from the
> conference call was that it would be nice to be able to describe DAS/
> 2 features in RDF documents.  I guess that makes sense to me, but for
> this purpose is there anything wrong with a URI like:
> 
>             http://das2.sanger.ac.uk/ensembl35/features#id12345
> 
> This seems compatible with Andrew's ID proposal.
> 
> My memory of RDF/DAML/OWL/etc is that most objects which get
> described in such documents are actually fragment identifiers in
> larger documents, rather than individually GETable entities.  Am I
> missing something here?
> 
>                 Thomas
> 
> 
> On 8 Feb 2006, at 18:12, Helt,Gregg wrote:
> 
> >       Regarding using URIs for DAS features, here's the quote from
> > Paul
> > Prescod that I used in the original DAS/2 grant proposal addressing
> > the
> > question "why use URIs?".  From
> > http://www.prescod.net/rest/rpc_for_get.html :
> >
> > You can give that URI address to anyone, anywhere and they can
> > reuse it.
> > In particular this means that we can compose applications that were
> > not
> > thought of in advance. Google is an example of an application that
was
> > composed "after the fact" out of URIs. Yahoo is another...There are
a
> > raft of deployed W3C recommendations that work with information
> > related
> > through URIs. Many of these are XML-related specifications that
> > work as
> > well in API-like applications as in user interface-based
applications.
> > These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery,
> > xml-stylesheet.  Information published through HTTP URIs can be
> > combined
> > through XInclude, queried and sorted through XQuery and XSLT,
visually
> > rendered with xml-stylesheet, related through RDF, linked through
> > XLink,
> > pointed into through XPointer.
> >
> >
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Thu Feb  9 10:43:27 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 15:43:27 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9C9@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C9@msex02.affymetrix.com>
Message-ID: <5920623233379c4200775188315082bb@dalkescientific.com>

Gregg
> As I understand it a fragment identifier approach would mean
> URI = (URL of doc feature XML is embedded in) + "#" + value of 
> feature's
> ID attribute.

As I understand it the part after the '#' is a query language
which is document type specific and used by the client.  DAS does not
define how that query language is used, so it has no meaning in the
DAS world.

http://www.ietf.org/rfc/rfc2396.txt

4. URI References

    The term "URI-reference" is used here to denote the common usage of a
    resource identifier.  A URI reference may be absolute or relative,
    and may have additional information attached in the form of a
    fragment identifier.  However, "the URI" that results from such a
    reference includes only the absolute URI after the fragment
    identifier (if any) is removed and after any relative URI is resolved
    to its absolute form.  Although it is possible to limit the
    discussion of URI syntax and semantics to that of the absolute
    result, most usage of URI is within general URI references, and it is
    impossible to obtain the URI from such a reference without also
    parsing the fragment and resolving the relative form.
  ....
4.1. Fragment Identifier

    When a URI reference is used to perform a retrieval action on the
    identified resource, the optional fragment identifier, separated from
    the URI by a crosshatch ("#") character, consists of additional
    reference information to be interpreted by the user agent after the
    retrieval action has been successfully completed.  As such, it is not
    part of a URI, but is often used in conjunction with a URI.

       fragment      = *uric

    The semantics of a fragment identifier is a property of the data
    resulting from a retrieval action, regardless of the type of URI used
    in the reference.  Therefore, the format and interpretation of
    fragment identifiers is dependent on the media type [RFC2046] of the
    retrieval result.  The character restrictions described in Section 2

    for URI also apply to the fragment in a URI-reference.  Individual
    media types may define additional restrictions or structure within
    the fragment for specifying different types of "partial views" that
    can be identified within that media type.

    A fragment identifier is only meaningful when a URI reference is
    intended for retrieval and the result of that retrieval is a document
    for which the identified fragment is consistently defined.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Feb  9 10:53:38 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 15:53:38 +0000
Subject: [DAS2] writeback via diffs
Message-ID: <7a182cd18dacf110341f5cec43436f38@dalkescientific.com>

Summary: We've been talking about the "update via a delta" model
as an alternative to the "lots of changes to the server" model.
Deltas mean the heavy work is done in the client (or middleware),
vs. the server.


We've been looking at the writeback spec.  It doesn't handle
the case of a complex feature with a parent/part relationship.

In the current scheme that's done as a:
   - get the write lock
   - POST the new feature (parent)
   - POST the new feature (child)
   - commit on the lock

What URL does the parent record have to point to the child?
Does the database defer referential integrity checks until
the commit on the lock?  Is this a case where the POST for
that feature returns an UPDATELIST document for every unknown/
placeholder identifier in the record?  Probably.

Another solution is to ask the server "give me two identifiers
which can be used for features".  (NOTE: must do this for
either URLs or 'short ids' because the client might guess
and override an existing feature.)  Cute. But no real takers here.


BTW, does the full DAS query system support searches of the
modified version of the server?  How does the server know that
the search request comes from a client working in an editable
view?

In talking about it we've been working on an idea we all
talked about last year; submitting a delta to the server
and moving the heavy work into the client.

That is, after the client is done locally it sends a
document which looks like

<WRITEBACK>
   <DELETE id="http://www/das/type/T12345" />
   <DELETE id="http://www/das/feature/exon/1" />
   <DELETE id="http://www/das/feature/exon/2" />
   <DELETE id="http://www/das/feature/contig/Ctg9" />
   <TYPES>
     <!-- this modifies an existing type -->
     <TYPE id="http://www/das/type/DEADBEEF">
       ... updated type information here ...
       <PROP key="name" value="Pa Cartwright" />
     </TYPE>
     <!-- this creates a new type -->
     <TYPE id="XXXXXXXXXX" >  <!-- see below for id discussion -->
     </TYPE>
  </TYPES>
  <FEATURES>
     <!-- this updates an existing feature -->
     <FEATURE id="http://www/das/feature/F9415"
          type_id="http://www/das/type/T12345">
       ...
     </FEATURE>
     <!-- this creates a new feature -->
     <FEATURE id="YYYYYY" type_id="http://www/das/type/T12345">
       ...
     </FEATURE>
   </FEATURES>
</WRITEBACK>

There are several things to note:
   - the <DELETE> elements, to remove existing types and features
   - the types and features are in the normal formats.
   - there is no way to update a part of a record/ the record
       is sent in full
   - new identifiers are still a problem


The use model for this is as follows, based on Otter.

   - get the SOURCES document, which will have

<CAPABILITY type="locks" url="http://www/../get_lock_info.pl" />

<CAPABILITY type="writeback" url="http://../post_updated_delta.py" />

   - get an exclusive write lock on a region
       - POST to the locks URL (and GET gets a list of the locks?)

       - only one region locked at a time (current spec allows the
          full query language; is that needed?)

       - user is authenticated via HTTP-level authentication
           (Q: allow https for any of this?)

       - optional timeout time in request; server may give shorter
           or longer timeout

       - user is allowed to edit all features in the given region

   - get all the features in that region  (because there may have
       been a commit before the write lock)

   - work with the data on the local copy of the server data

   - push the big red "COMMIT" button

   - server POSTS the delta to the server
       - user authentication again
       - also sends a lock-id or a nounce so the server can
           double-check that there wasn't some other change

   - server checks payload for referential integrity

The problem is the need for a URL.  We've come up with two
solutions.

   1. ask the server for things which can be used as identifiers.
These identifiers live for the life of the lock.

   2. reserve a private URI scheme, like "das-private:" followed
by a client-defined identifier.  On upload the server maps those
into valid local identifiers.  To work correctly for the client
the response document would need to contain mapping from private
identifiers to server identifiers.

The current spec uses the latter mechanism but does not specify
how the placeholder identifier is generated.  The mapping is
essentially the "UPDATELIST" from the current spec, though with
no need to support the status field on a per item basis - it
should be an all or none transaction.


Sending a delta gets rid of the DELETE and PUT (and POST update)
methods on the server.  Not ReSTful.  It places the burden on the
client for tracking the user edits instead of in the server.
But we have a good sense that it will work and is understandable.

It maps much more closely to the current Otter use.  We don't
know how Apollo/Chado wants to support writeback.

If we decide to stay with the existing ReSTy spec then our
recommendations are:

   - there's no need to support partial updates; clients send
the complete record to the server for update

   - the query language does not need to support the full
      DAS query language; only the "region" query (based on
      Otter experience)

   - there's no current need to extend the range of a lock
       nor to extend the time of the lock.

And I don't like that "lock=" is a parameter to the feature
and types URLs which creates locks for those types rather than
performs queries.  I would rather these be new URLs.

					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Thu Feb  9 11:12:32 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 9 Feb 2006 11:12:32 -0500
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <5920623233379c4200775188315082bb@dalkescientific.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C9@msex02.affymetrix.com>
	<5920623233379c4200775188315082bb@dalkescientific.com>
Message-ID: <200602091112.33548.lstein@cshl.edu>

Hi Folks,

I've drunk the W3C Kool-Aid and do feel that a major feature of DAS/2 as it 
now stands is that all data objects are referenceable as URIs. Furthermore, I 
think it is a handy-dandy feature for them to be fetchable URLs as well, 
having, I suppose, drunk the REST Kool-Aid. For this reason, I prefer the / 
notation to the # notation. Over and above the fact that the #fragment is not 
a part of the URI at all (according to the part of the spec that Andrew 
quoted), a practical issue with the # notation is that all browsers (and, I 
believe, some client-side libraries, although not the Perl LWP) strip out the 
# and whatever follows it. The server never gets a chance to act on the 
fragment.

Since xml:base is giving us a hard time with respect to the queries, and 
causing major confusion and dissension in the group, I'd prefer to go with 
Andrew's strict idea of making all the IDs passed to the queries full URIs. 
In other words, including the properly escaped http://etc.etc in the query 
string. This is going to make it a bit annoying to debug servers from within 
browsers, but will clean up the semantics considerably and once and for all 
remove the confusion about who "owns" a feature versus who "serves" a 
feature.

Lincoln


On Thursday 09 February 2006 10:43, Andrew Dalke wrote:
> Gregg
>
> > As I understand it a fragment identifier approach would mean
> > URI = (URL of doc feature XML is embedded in) + "#" + value of
> > feature's
> > ID attribute.
>
> As I understand it the part after the '#' is a query language
> which is document type specific and used by the client.  DAS does not
> define how that query language is used, so it has no meaning in the
> DAS world.
>
> http://www.ietf.org/rfc/rfc2396.txt
>
> 4. URI References
>
>     The term "URI-reference" is used here to denote the common usage of a
>     resource identifier.  A URI reference may be absolute or relative,
>     and may have additional information attached in the form of a
>     fragment identifier.  However, "the URI" that results from such a
>     reference includes only the absolute URI after the fragment
>     identifier (if any) is removed and after any relative URI is resolved
>     to its absolute form.  Although it is possible to limit the
>     discussion of URI syntax and semantics to that of the absolute
>     result, most usage of URI is within general URI references, and it is
>     impossible to obtain the URI from such a reference without also
>     parsing the fragment and resolving the relative form.
>   ....
> 4.1. Fragment Identifier
>
>     When a URI reference is used to perform a retrieval action on the
>     identified resource, the optional fragment identifier, separated from
>     the URI by a crosshatch ("#") character, consists of additional
>     reference information to be interpreted by the user agent after the
>     retrieval action has been successfully completed.  As such, it is not
>     part of a URI, but is often used in conjunction with a URI.
>
>        fragment      = *uric
>
>     The semantics of a fragment identifier is a property of the data
>     resulting from a retrieval action, regardless of the type of URI used
>     in the reference.  Therefore, the format and interpretation of
>     fragment identifiers is dependent on the media type [RFC2046] of the
>     retrieval result.  The character restrictions described in Section 2
>
>     for URI also apply to the fragment in a URI-reference.  Individual
>     media types may define additional restrictions or structure within
>     the fragment for specifying different types of "partial views" that
>     can be identified within that media type.
>
>     A fragment identifier is only meaningful when a URI reference is
>     intended for retrieval and the result of that retrieval is a document
>     for which the identified fragment is consistently defined.
>
>
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Thu Feb  9 11:15:48 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 9 Feb 2006 11:15:48 -0500
Subject: [DAS2] RE: Working with xml:base in Java?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9C0@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C0@msex02.affymetrix.com>
Message-ID: <200602091115.49675.lstein@cshl.edu>

The Perl libraries provide a very simple HTTP_Base attribute. As you parse 
your way through the XML, you can change the HTTP_Base using any of the 
relative or absolute address resolution modes, so that subsequent URLs are 
correctly resolved. Unfortunately it is a SAX model, so that you have to push 
previous bases onto a stack and restore them as needed.

Lincoln


On Wednesday 08 February 2006 05:12, Helt,Gregg wrote:
> > -----Original Message-----
> > From: Thomas Down [mailto:td2 at sanger.ac.uk]
> > Sent: Wednesday, February 08, 2006 12:45 AM
> > To: Helt,Gregg
> > Cc: DAS/2
> > Subject: Re: Working with xml:base in Java?
> >
> > On 7 Feb 2006, at 19:00, Helt,Gregg wrote:
> > > 	Thomas, I'm wondering what toolkits you're using for binding XML
> > > to Java objects?  And particularly how you are dealing with
>
> resolving
>
> > > URIs when xml:base is used.  So far I've mostly used various
> > > implementations of SAX and DOM -- I've found some reports of builtin
> > > xml:base support in Xerces SAX/DOM, but it's still unclear.
> > >
> > > 	I've been avoiding the issue up till now.  It won't be too hard
> > > to implement URI resolution relative to xml:base, but I thought I'd
> > > check around first and see if there's automated support of this in
> > > some
> > > toolkit.
> >
> > Hi Greg,
> >
> > I'm actually using Stax (the streaming API for XML).  The
> > implementation I use is called Woodstox:
> >
> >           http://woodstox.codehaus.org/
>
> I would like to check out Stax, haven't used it before.
>
> > (but there are a few others out there).  No builtin xml:base support
> > but it's easy to write a little wrapper around XMLStreamReader to
> > spot xml:base attributes and maintain a stack of base URIs.
> >
> > I'm using java.net.URI to do the URI handling/resolution/
> > relativization.  Seems to be working okay... so far...
>
> That's what I was thinking about when I said it wouldn't be too hard to
> implement... But that was yesterday.  A long time ago.
>
> Now I've taken a detour into re-reading the XML Base spec
> http://www.w3.org/TR/xmlbase/, and things don't seem so easy.
>
> I _think_ if there's at least one xml:base attribute in the element
> hierarchy above where you're trying to determine a base URI, and
> resolution of those xml:base attributes yields an absolute URI, it's all
> good, that's the  base URI.  But on the other hand if this resolution
> yields a relative URI instead of an absolute URI I'm not sure what
> happens -- I would guess it's an error, but I can't see anywhere in the
> XML Base spec that spells this out.  And if there's no xml:base to use
> to determine a base URI, things get weird:
>    if the document is "encapsulated within another entity", the base URI
> is the URI of that entity (I have no idea if DAS/2 docs could appear in
> such a context)
>    otherwise the base URI is the URI used to retrieve the document
>    oh, except if you burrow down into the spec pointers to RFC 2396
> http://www.ietf.org/rfc/rfc2396.txt, if the request gets redirected you
> need to make sure the base URI is the last URI used in the redirect
>    oh yeah, and apparently external entity declarations can affect all
> of this in ways I don't understand
>    and there's probably other gotchas I've missed...
>
> Now from the server side, none of this is really an issue.  Just pick
> from a multitude of variants that XML Base allows when you send
> responses to the client.  From the client side, if we really want DAS/2
> to support XML Base (and I think we do), things get tricky.  It's
> definitely pushing me towards using libraries that provide builtin
> support for XML Base.
>
> 	Gregg
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dalke at dalkescientific.com  Thu Feb  9 11:37:12 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 16:37:12 +0000
Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2)
In-Reply-To: <94bafd156da54842f9093244ca6083d1@fruitfly.org>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>	<e00e781866375762b29061d2b510a10e@fruitfly.org>
	<Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>
	<43E9DC0E.30809@mrc-lmb.cam.ac.uk>
	<701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com>
	<94bafd156da54842f9093244ca6083d1@fruitfly.org>
Message-ID: <c538ce3b541fc8430dee213bf9f6b45f@dalkescientific.com>

[Top-posting summary]

I agree with Chris that the DAS "type"s aren't really types.

Chris Mungall:
> I'm mostly skim the messages here, so I may be missing something, but 
> I'm a little confused by this:
>
> On Feb 8, 2006, at 8:36 AM, Andrew Dalke wrote:
>
>>
>>     http://das.server/../types?ontology=SO:exon
>
> I don't understand this - SO:exon isn't an ontology

I made it up; I mean "whatever the SO term is for an exon".
I think it's SO:0005845 ("single_exon") or SO:0000147 ("exon")


>> PROPOSAL:  Add a "source=" (case-insensitive substring search)
>> field to the types query.  (I don't think there is any contention
>> here so I'll add it.)
>>
>>     http://das.server/../types?ontology=SO:exon;source=Vega
>
> What does 'types' return? A type from an ontology (eg SO:exon) or 
> something else? Why would source be recorded here? Surely source would 
> be a valid constraint on a feature query, but not a type query.

A DAS type is a somewhat strange thing, in the type sense.  It
stores:
   - the link to the ontology
   - a list of the formats available for features of that type
   - this "source" field
   - potentially some per-source data used for depiction, or
      perhaps not

Thomas Down here has this use case.

He has a program which searches for exons.  All of the annotations
it makes for a month are from that program.  He wants them to be
the same type - conceptually "the exons predicted by the program".

Some of that data could be moved into the feature. The feature
can point directly to the ontology, and have a "source".

> Perhaps it's the case that in DAS a 'type' means some kind of 
> arbitrary grouping (eg features of type X and source Y), and 
> 'ontology' means a
> term/type from an ontology. If it isn't too late I'd suggest changing
> these conventions.

That is more like the case.  Got a better name.  "class"?  ROFL.  Or 
not.

It is not a type system.  It is closer to a group than
anything else.  I agree that "type" has connotations which are
not true for this case.

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Thu Feb  9 11:40:34 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 08:40:34 -0800
Subject: [DAS2] Why use URIs for feature IDs?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9CB@msex02.affymetrix.com>

Interesting, I hadn't fully absorbed part 4 of the URI spec (rfc2396).
So if I understand correctly:

If we replace everywhere we've called something a "URI" with "URI
reference" we're being correct -- a URI reference can be an absolute or
relative URI, and can also include a fragment identifier.  And according
to the spec saying "the URI" means the absolute URI, not the relative
URI.  So to restate, I think the ids we use in DAS/2 should be URI
references.  Maybe instead of "id" or "uri" we should use "uri_ref" for
the attribute name?

I still see no reason to exclude URI references with fragment
identifiers, though I agree with Lincoln that actually resolving a URL
with a fragment is problematic.  But we're not guaranteeing that these
URI references are URLs anyway.

The capabilities "query_id" attributes are another story.  These need to
be not just URI references but also resolve via XML-Base to full URLs.

	gregg  

> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Thursday, February 09, 2006 7:43 AM
> To: DAS/2
> Subject: Re: [DAS2] Why use URIs for feature IDs?
> 
> Gregg
> > As I understand it a fragment identifier approach would mean
> > URI = (URL of doc feature XML is embedded in) + "#" + value of
> > feature's
> > ID attribute.
> 
> As I understand it the part after the '#' is a query language
> which is document type specific and used by the client.  DAS does not
> define how that query language is used, so it has no meaning in the
> DAS world.
> 
> http://www.ietf.org/rfc/rfc2396.txt
> 
> 4. URI References
> 
>     The term "URI-reference" is used here to denote the common usage
of a
>     resource identifier.  A URI reference may be absolute or relative,
>     and may have additional information attached in the form of a
>     fragment identifier.  However, "the URI" that results from such a
>     reference includes only the absolute URI after the fragment
>     identifier (if any) is removed and after any relative URI is
resolved
>     to its absolute form.  Although it is possible to limit the
>     discussion of URI syntax and semantics to that of the absolute
>     result, most usage of URI is within general URI references, and it
is
>     impossible to obtain the URI from such a reference without also
>     parsing the fragment and resolving the relative form.
>   ....
> 4.1. Fragment Identifier
> 
>     When a URI reference is used to perform a retrieval action on the
>     identified resource, the optional fragment identifier, separated
from
>     the URI by a crosshatch ("#") character, consists of additional
>     reference information to be interpreted by the user agent after
the
>     retrieval action has been successfully completed.  As such, it is
not
>     part of a URI, but is often used in conjunction with a URI.
> 
>        fragment      = *uric
> 
>     The semantics of a fragment identifier is a property of the data
>     resulting from a retrieval action, regardless of the type of URI
used
>     in the reference.  Therefore, the format and interpretation of
>     fragment identifiers is dependent on the media type [RFC2046] of
the
>     retrieval result.  The character restrictions described in Section
2
> 
>     for URI also apply to the fragment in a URI-reference.  Individual
>     media types may define additional restrictions or structure within
>     the fragment for specifying different types of "partial views"
that
>     can be identified within that media type.
> 
>     A fragment identifier is only meaningful when a URI reference is
>     intended for retrieval and the result of that retrieval is a
document
>     for which the identified fragment is consistently defined.
> 
> 
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Thu Feb  9 11:57:02 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 08:57:02 -0800
Subject: [DAS2] Proposed agenda for DAS/2 code sprint teleconference, Feb 9
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9CC@msex02.affymetrix.com>

ids for features, sequences, types, etc.
stylesheets
writeback
update to NIH grant proposal
status report
 
Anything else we should add?
 

From dalke at dalkescientific.com  Thu Feb  9 13:28:48 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 18:28:48 +0000
Subject: [DAS2] arbitrary data in writeback
Message-ID: <a64069f19dbc770901366184200854e2@dalkescientific.com>

The DAS spec for features looks something like this

<FEATURES>
  <FEATURE>
   ...
   <PROP key="name" value="some data goes here" />
   <PROP key="homepage" href="http://blah/" />
   <PROP key="icon" mimetype="image/png">
iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFAQAAAABUH0DFAAAANUlEQVR4nGIM1AcAAAD// 
2LiYgAA
AAD//2I6wAAAAAD//2JycAAAAAD//2L6dgAAAAD//wMAEoUDipTjFscAAAAASUVORK5CYII=
   </PROP>

   <some_non_das_namespace:curation-history>
     ...
   </some_non_das_namespace:curation-history>

   <flybase:substitution>
      ..
   </flybase:substitution>
  </FEATURE>
</FEATURES>

There are two points for extension.  One is the PROP table
which is meant to be simple.  Clients can do substring searches
of PROP elements with "value"s, as in

    prop-name=blah+blah

All clients should be able to understand these data formats, though
there is no constraint for the key names.  They are convention only.

Right now a key gets either a string, a URL, or a chuck of binary data
which is uuencoded.  (The key can be present many times; is that
a problem with Apollo?)  The latter two (URL and binary data)
are *proposals*.  They are neat, but not based on user demand.  No
one has told me that they will use it.

Allen wants one more possibility, "existence", with no associated
value at all.  Nomi says that Apollo can't round-trip that data
except by also tracking the input XML.  I don't want a "it just
exists" field and would prefer those stored with an empty string.


Then there is the support for non-DAS elements as extensions.
These can contain arbitrary XML, so long as they are not in the
DAS XML namespace.

A client can ignore elements it doesn't understand.  However,
if it does writeback of a feature it *MUST* include all elements
it doesn't understand.  I can write that into the spec.

It doesn't need to do anything with that data.  It can keep it
around as a chunk of text.  It just needs to send it back to
the server when it does the writeback.

For that matter, it doesn't even need to keep it around.  It
can throw the unknown data to the wind and work with the stuff
it does know.  Just before doing the writeback, go back to the
server and get the features again.  From the documents get the
unknown extension elements and insert them into the data - as
text! - to be sent back to the server.

Clients may mess up and commit records without these elements.
The server will treat those as delete of those records.  Because
it cannot tell if the client really knows what to do with that
data.

This is the easiest solution as a spec writer.  We have nearly
all of the format for that transaction, excepting a bit about
being able to delete.

NOTE: a server may ignore the uploaded data.  For example, it
may modify the transaction history and throw out whatever the
client sent to it -- if that's how the <transaction-history>
element is specified.

The other solution is to be more fine grained, so that clients
send deltas, like

<FEATURES>
  <FEATURE>
   ...
   <PROP key="name" value="some data goes here" />
   <PROP key="homepage" href="http://blah/" />
   <PROP key="icon" mimetype="image/png">
iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFAQAAAABUH0DFAAAANUlEQVR4nGIM1AcAAAD// 
2LiYgAA
AAD//2I6wAAAAAD//2JycAAAAAD//2L6dgAAAAD//wMAEoUDipTjFscAAAAASUVORK5CYII=
   </PROP>

   <delete>
     <some_non_das_namespace:curation-history />
   </delete>

   <replace>
     <flybase:substitution>
        ..
     </flybase:substitution>
   </replace>
  </FEATURE>
</FEATURES>

but that gets complex.  You end up with a grammar for the
deltas.  Eg, "delete the first 'some_non_das_namespace:curation-history'
but not the others".  It's a harder grammar to write and a
harder semantic to implement on client and server.


I don't understand the case where complete writeback is a problem.
There was the mention of if a client deletes a feature when it
shouldn't have because of extra data that it just didn't know about.

I didn't follow that at all.

Please enlighten me!  :)

					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Thu Feb  9 14:06:03 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Thu, 09 Feb 2006 11:06:03 -0800
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
 9 Feb 2006
Message-ID: <C010D29B.1BDDB%Steve_Chervitz@affymetrix.com>

Notes from the DAS/2 teleconference for the code sprint, 9 Feb 2006

$Id: das2-teleconf-2006-02-09.txt,v 1.1 2006/02/09 19:13:39 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  Sanger: Thomas Down, Roy
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris, Suzi Lewis
  UCLA: Allen Day, Brian O'connor
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


[note taker missed the first 5-10 minutes]

Topic: encoded URLs
-------------------

ls: apache bug - unesacped //. must be percent encoded or apache can
run into problems
gh: most people don't bother escaping, we should make this clear in
the spec. every major library has ways of doing this automatically.

[A] update spec to state: contained urls w/in das query urls should be
encoded

Topic: Style sheets
-------------------

ad: see Jan 26/27 email, "style sheet question"
what i described is not the same as what das/1 style sheets supply.
we already have a mechanism
gh: embed ss in types element?
ad: or, new capapbility or link server for a given source.
gh: prefer this
td: easy to have a single style element
gh: would a types elem have ptr to ss or do you query for the
capability?
ad: if no one's interested we don't have to answer the
question. sounds like no one's interested in style sheets.
gh: we'll keep what you have in the spec for style sheets and move on.
ls: what is it? 
ad: yes. style is embedded in type record. it's now on a per-element
basis. 
ls: ok with this. attributes of types. is there a need for a separate
ss? true it mixes presentation with data model. people will look for the
info they need and can ignore.
ls: transition to separate sheets - visual style id pointing to ss
url. same as with html. instead of 'i' tag moved to font style info.

Topic: Writeback
----------------

gh: discussion in progress in uk. how big a change from current
writeback spec?
ad: spec: server does modification to data. this proposal: client can
now do more stuff with the data.
gh: writeback for client is considerably harder, rarer to impl.
ad: issues: can you still do searches for modified data on server?
ls: building objs from bottom up (children, to parent) so everything
has a url.
ad: each feat has parent and a part.
ls: true. temporary id mechanism, response indicates mapping to local
id is.
what happens is: client locks, uploads parents, children with temp
ids, does referential integrity checking, then reports mapping from
temp to local id.
gh: doing http DELETE imposes a constraint
ls: how handling id issue?
gh: you need something to create new, real id
ad: b/c they're in one transaction, server can
ls: delete is a problem because http delete only permits one at a
time. updates a problem too. post that creates new objs allows you to
create multiple new objs at same time, but push and delete only
operate one at time.
ad: at this point don't want to change data model.
ls: so everything will be a post then, under your proposal, for
writeback url.
ad: a single post.
gh: moving from http delete to a
trying to understand how this is a delta model.
ad: only updates things that changed, and listed deletions
ls: fine. writeback, create update and delete sections
td: granularity. not single characters. one feature.
ls: three transactions we previously had, put, post, and delete, and
roll up into a single transaction.
gh: when you send back a feat you ve already seen, do you restate all
the xml for that feature, since otherwise it is deleted?
ad: yes.
gh: would like the unit of ro
ls: this achieves per transaction integrity, since you don't have to
do multiple deletes. the lock idea, had to persist over multiple
transactions to allow for that atomicity.
gh: we need to keep lock so curators can guarantee that nothing
changes underneath them.
td: lock corresponds to a db transaction as well.
ls: no one's impl this writeback so there's no friction against
changing it. i'm fine with it. as long as people don't mind we're
losing a cute feature described in a grant.
gh: what does roy or ed g. think?
roy: have been involved in this. this mirrors some features that otter
does. a good idea. deletes and put aren't big winners, if updating
multiple feats and they refer to each other.
roy: whole xml doc is the transcaction
ls: if anything doesn't make sense, all requests in the writeback doc
are rolled back.
roy: yes. some error messages to understand what might be going wrong.

gh: splits and merges work too? merging one feature from two, or
splitting one transcript into two.
roy: fits in well. get back two ids of new features. otter give a lot
back in the xml after posting the data.
gh: treats id in feat is a placeholder and it sends a real id back to
you. 
ls: your given a temporary placeholder then it give you real id.
might want to put a formal merge and split commands. because in
proposed new system (and old) to split one exon to two, you have to
either delete the original one, or update it to change one boundary
and create a new one. you've lost the ability to keep track of the
original and the two new ones.
ad: feats have place for arbitrary annotations. creational history log
could be maintained.
ls: how upload this to a server. splitting exon into two daughters is
different from deleting and creating two new ones.
ad: no needs this, for future.
gh: it's needed now.
ls: splitting genes into two pieces is important. people want to keep
track of this. formal merges and splits permits this tracking.
gh: my take, prefer fewer verbs as possible. if we can formally define
splits and merges as combos of delets and creates, perfer this.
ls: semantically difficult for server to know that a delete followed
by two creates is different than a split.
td: ancestor id on the features can solve this.
ad: haven't heard about this use case. features have place where you
can stick in new data. database can read it to understand history.
gh: like idea of curational track of ancestors. before, people said
we can't require dbs to do this.
td: optional property
ls: could thread it through feature properties.
ad: this version, or for 2.1?
gh: initial write back must support splits and merges.
[broad agreement]
ls: make sure it will work.
what happens when track of ancestors and the ancestor object disappears.
gh: can't assume a db has identifier for every curation in it's past
state.
roy: weakness of the current otter schema, james is working on a
fix. tag a release and go back to genes as of that release.
ls: acedb had this feature to rollback to older versions of gene
model.
aday: the schem we're using has support to previous version.
roy: tedious. big script, but a good thing to have.
ls: a few hours of more discussion to see what's involved in
supporting tracking curational merges, splits, renames, etc. to make
sure it's the write decision to put it into a curational property of
feature rather than having a formal database merges and split
operations. i'm ok doing it this way if it seems ok.
gh, aday: me too

Topic: NIH grant proposal
-------------------------

gh: i'm the bottle neck

Status reports:
---------------

gh: igb das client still. checked in code. you can get das2 client in
igb poiting to codesprint das2 server. sources, segments, types. no
features yet. working on this today. should go faster today.
ad: sent email to allen about some things about server that don't
agree with spec. properties
aday: features have no properties associated with them. do we need
valtype or href.
nh: a key with no value doesn't make sense. using 'true' if no value.
aday: ok. but need an agreement on what to do for properties with no
associated value or type
ad: can make it so.
aday: now put in empty string
ad: use for both value and href
aday: can't have both.
ad: what's interpretation if you have both?
can take out href part and have value= empty string
nh: client deals with empty value.
ad: leave it as a string
suzi: uneasy about this.
td: it does have a value, empty string.
suzi: some places where empty string doesn't make sense. data gets
dirty. if you're gonna have a tag-value structure, and may or may not
be a value, it's bad. some things are tag-value, some things just have
a value. it seems ambiguous, no guaranteed behavior.
ad: guaratee is for all keys to have a value. can be empty string.
gh: string or empty string is ok
ad: only used for clients who know what it means.
may have to update apollo
gh: if we allow arbitrary xml in features, client will have to
remember this xml or it will disappear.
ls: a huge issue w/ apollo in past. when communicating w/ db's that
have extra stuff, in the xml that isn't on client side data model.
suzi: my take, the client should not have to pass it all through.
nh: it forces client to be a complete database
gh: then the delta writeback
ls: works ok for deletes, updates become an issue
ad: you have to deal with text you don't understand.
ls: you have to keep track of tags you don't understand, other wise
they are deleted.
gh: trade off, simplicity of writeback, and what client has to
remember.
ls: client says: i don't understand it, but i can't delete it.
gh: how hard is it to have an abritrary xml chunk by client?
ls: give it an empty tag to say you want it to go away.
nh: how do you delete things that came in empty and you want to delete
them?
ls: can have attribute="delete me". this creates a burden on server
side. 
[client folks like this..]
decided to keep everything you know know and send it back. round trip
it.
ad: client can throw away what it wants. can go back to server
ls: boomerang.
gh: a variety of ways to make sure the data gets stored.
roy: will be in feature. just hold a pointer to it.
suxi: hard for apollow. passive round tripping is fine.. difficulty is
with deletes. ignoring stuff, don't know what it is. delete a
transcript or whole gene. some of that stuff you don't know what it
is, describes a mutant phenotype. you deleted from genomic record, but
there's other data that shouldn't be deleted. client would have to be
fully cognizant of it, beyond genome sequence features. client now
needs to model all the other data too.
ls: difficult to understand how a client could deal with it.
ad: just xml is a opaque chunk.
why can't client send back full record?
suzi: won't solve the full problem. if annotator said delete it
gh: client says delete that feature. it won't pass back any stuff
underneath the feature. some stuff underneath it that shouldn't be
deleted.
ad: that's what you have back ups for.
suzi: beyond this.
to deal with this, we made deletes be more atomic. had to be handled
at server side, otherwise, we have to put all that knowledge into
client. gets tied to a particular group.
ad: knowledge of what?
suzi: additional information
if you delete whole thing at top, any pass through data is also gone.
gh: not hard on client, just what does the server do with that?
suzi: this is why it belongs on server side. knows what matters and
what doesn't matter. if you don't want clients tied to a particular
db. that solution will be inadequate. we had to put the info on the
client and make the operations as fine grained as we could.

ap: writeback issues have been discussed. suggest to take this up
tomorrow. 
ad: could someone write up why a client couldn't just track the tings
that it wanted? then we can consider.


Status reports, cont'd
----------------------

roy: zmap client. can get sources and types from server. parsing it
creating internal objects. can't draw features yet. long discussion
about write back today.
ad: validator stuff
td: talking about writeback.
ap: working on registry. first das/2 server. distinguish between das/1
and das/2 via accession points.


brian: rpm build for allen's server. will post today at
biopackages.net
suzi: spoke to chris about web services for ontology. he will talk
with allen. thing about ids to deal with. also, if we do a web service
that isn't das like, it should be doable. should be able to get the
terms. also, if we want to have stop codon replacement, you also have
to say what position, what it's replaced with (uridine). how is this
done in das spec?
gh: can you post to the list?
suzi: yes. 
aday: will raise writeback issues as well.
suzi: small point mutations, indel, substitution (base and position)
aday: nearly got apache config file done, impl new std error
documents, 300, with error document.
nh: more apollo client progress. haven't dealt with types yet.
ee: igb improvements.
sc: pipeline for populating affy das server with array data. completed
pipeline for exon array design data.


From nomi at fruitfly.org  Thu Feb  9 15:08:33 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Thu, 9 Feb 2006 12:08:33 -0800 (PST)
Subject: [DAS2] unary properties
In-Reply-To: <a64069f19dbc770901366184200854e2@dalkescientific.com>
References: <a64069f19dbc770901366184200854e2@dalkescientific.com>
Message-ID: <17387.41281.765157.17683@kinked.lbl.gov>

On 9 February 2006, Andrew Dalke wrote:
 > Allen wants one more possibility, "existence", with no associated
 > value at all.  Nomi says that Apollo can't round-trip that data
 > except by also tracking the input XML.  I don't want a "it just
 > exists" field and would prefer those stored with an empty string.

fwiw, the empty string (rather than no string) doesn't help apollo--the
way it stores properties, if you ask for the value of property "foo" and
there's no "foo" in the property table, you get back "" (this was to
avoid having to put a million null-pointer checks).  so apollo would not
be able to differentiate--for purposes of writeback OR display without
apollo--between
    <PROP key="foo" value="">
and
    <PROP key="foo">
internally, both of these would look like "i don't know anything about
property foo," unless i saved them as "foo=true" when they were read in,
and then how would it know how to write them out correctly?

i would suggest that either
1. we use two different terms to differentiate between key/value
properties and properties that are valueless (though really i think they
are *keyless* rather than valueless).  perhaps the latter could be called
"attributes" or something?
   <PROP key="foo" value="true">
   <ATTRIBUTE value="foo">
(actually, ATTRIBUTE is probably a bad choice since it has a meaning in
xml, but you get the idea.)

OR (and i prefer this):
2. every property is required to have a key and either a value or an
href.
the valueless (or keyless) properties in the yeast data look like
      <PROP ptype="property/molecular_function unknown"/>

i guess these are like the default cases where other features might
(although i haven't seen any of these) have properties like
    <PROP key="molecular_function" value="transcription regulator activity">

but where did "property/molecular_function unknown" come from in the
first place?  what i think it should look like is
    <PROP key="molecular_function" value="unknown">

and then we avoid the whole keyless-property issue and make the
information more accessible to clients (and hence to users).  the way it
is now, it's an uninterpretable blob of text (really more of a comment
than a property), where as separating into key/value suddenly gives it
more meaning.

     Nomi


From Gregg_Helt at affymetrix.com  Thu Feb  9 15:05:14 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 12:05:14 -0800
Subject: [DAS2] unary properties
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9CD@msex02.affymetrix.com>

Looks to me like these might be GO terms, which should probably be
represented more like:

<PROP key="gene_ontology" value="rRNA modification" />

and possibly include an href to a description of that GO term.

Of course one could argue whether the attribute values should be URI
references rather than arbitrary strings, but you get the idea.

	gregg

> -----Original Message-----
> From: Nomi Harris [mailto:nomi at fruitfly.org]
> Sent: Thursday, February 09, 2006 12:56 PM
> To: Andrew Dalke; allenday at ucla.edu
> Cc: nomi at fruitfly.org; Helt,Gregg
> Subject: Re: [DAS2] unary properties
> 
> On 9 February 2006, Nomi Harris wrote:
>  > the valueless (or keyless) properties in the yeast data look like
>  >       <PROP ptype="property/molecular_function unknown"/>
> 
> i just looked at another region and found some more interesting
valuless
> (though i think they should be called keyless) properties:
> 
>         <PROP key="rRNA modification" value="" href=""/>
>       <PROP key="nucleolus" value="" href=""/>
>       <PROP key="snRNA 2'-O-ribose methylation guide activity"
value=""
> href=""/>
> 
> these really seem to me to be missing important information.
"nucleous"?
> we're going to randomly mention cell parts?  what this really should
say
> is
>       <PROP key="cellular_component" value="nucleolus"/>
> right?
> 
> so i think this is buggy data--it is missing the keys, and that should
be
> fixed.  in fact, i think having the spec insist that properties have
both
> key and value would help to catch errors like this.
> 
>         Nomi


From Gregg_Helt at affymetrix.com  Thu Feb  9 18:18:42 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 15:18:42 -0800
Subject: [DAS2] Refinements to range attribute and query filters in spec
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>

 
In the latest spec, the format for range queries is 
      seqid/min:max:strand
and the format for range attributes in feature elements is 
      min:max:strand
 
In the earlier spec
(http://biodas.org/documents/das2/das2_get.html#ranges) everything but
the seqid component of the range query was optional.  Are min and max
still optional, as in these examples from the previous version of the
spec?
    Chr1/1000     Chr1 beginning at position 1000 and going to the end.
    Chr1/:2000    Chr1 from the start to position 2000.
I personally find these kind of ranges confusing and not particularly
useful, and would rather make min and max required for both the range
attribute and range-based query filters. 
 
Also, the latest spec states: 
 
A region may be on the forward or reverse strand or on both strands.
These are respectively denoted 1, -1 and 0.  The reverse strand is the
reverse complement of the forward strand.  Unspecified strand means
forward strand.
 
So for a features query, are the four overlap filters below equivalent?
Chr1/1000:2000
Chr1/1000:2000:1
Chr1/1000:2000:-1
Chr1/1000:2000:0
Or does the addition of strand information further filter the returned
features by strand?  But if that's the case, then according to the spec
having no strand specified means forward.  So that would mean
overlaps="Chr1/1000:2000" would only return forward strand annotations,
and not any on the reverse strand?  To me that's counterintuitive, from
a filtering perspective I'd rather no strand info mean "both strands".
My main point though is we need to be explicit about how strand info or
lack thereof affects features queries with range-based filters.
 
      gregg
 

From suzi at fruitfly.org  Thu Feb  9 19:29:57 2006
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Thu, 9 Feb 2006 16:29:57 -0800
Subject: [DAS2] question or two
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
Message-ID: <54bc0e433303827918fe475855669a89@fruitfly.org>

if an annotator wants to indicate a stop-codon-readthrough (which may 
or may not be a seleno-cysteine mechanism). how would DAS send this 
info through? need SO type (the readthrough), the location (relative to 
transcript or genome), and the mechanism.

tRNA anticodon or AA?

alternative translation table? infer this from organism?

-S


From Gregg_Helt at affymetrix.com  Thu Feb  9 20:43:16 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 17:43:16 -0800
Subject: [DAS2] feature NOTE and ALIAS elements?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9D5@msex02.affymetrix.com>


> -----Original Message-----
> From: das2-bounces at portal.open-bio.org 
> [mailto:das2-bounces at portal.open-bio.org] On Behalf Of Andrew Dalke
> Sent: Tuesday, February 07, 2006 7:45 AM
> To: DAS/2
> Subject: Re: [DAS2] properties and queries
> 
> 
> To summarize, the current thought here for properties and 
> queries is as follows  (it's a long summary.  More like an essay.  :)
> 
> Add support for zero or more <NOTE> elements in the feature, 
> of the form
>    <NOTE>This is some arbitrary (but non-markup-ed) text</NOTE>
> 
> 
> Add a features search keyword "note=" which takes a search 
> string to be found in the note elements.  (substring? 
> soundex? regex? the search engine calls up Lincoln and asks?)
> 
> 
> Add support for zero or more <ALIAS> elements in the feature, 
> of the form
>    <ALIAS name="Zorro">
> 
> (I missed this in the redraft.  It should have been there. 
> Feature filter "name" already says it searches the "name" and 
> "alias" fields for a feature.)

Is the plan still as stated above, to have optional NOTE and ALIAS
elements in features?  I don't see these elements in the feature schema,
and the spec doc says they're built-in properties instead (values for
PROP key attribute that have defined meaning).

	Gregg
  

From td2 at sanger.ac.uk  Fri Feb 10 03:54:16 2006
From: td2 at sanger.ac.uk (Thomas Down)
Date: Fri, 10 Feb 2006 08:54:16 +0000
Subject: [DAS2] Refinements to range attribute and query filters in spec
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
Message-ID: <4A9E3BE1-9E24-4D25-AAD1-1851F18857D0@sanger.ac.uk>


On 9 Feb 2006, at 23:18, Helt,Gregg wrote:

>
> In the latest spec, the format for range queries is
>       seqid/min:max:strand
> and the format for range attributes in feature elements is
>       min:max:strand
>
> In the earlier spec
> (http://biodas.org/documents/das2/das2_get.html#ranges) everything but
> the seqid component of the range query was optional.  Are min and max
> still optional, as in these examples from the previous version of the
> spec?
>     Chr1/1000     Chr1 beginning at position 1000 and going to the  
> end.
>     Chr1/:2000    Chr1 from the start to position 2000.
> I personally find these kind of ranges confusing and not particularly
> useful, and would rather make min and max required for both the range
> attribute and range-based query filters.

I think it's reasonable for a client to want to fetch all features  
attached to a given sequence ID.  This would certainly be sensible  
behaviour for clients which always work on reasonably short sequences  
(e.g. protein-specialized clients), but even genome-centric clients  
might want to do this when they've had a hint that a particular  
feature type is "low density" (e.g. chromosome banding patterns?).

I'm not sure if anyone would want to query a range where only one of  
min and max are specified.

> Also, the latest spec states:
>
> A region may be on the forward or reverse strand or on both strands.
> These are respectively denoted 1, -1 and 0.  The reverse strand is the
> reverse complement of the forward strand.  Unspecified strand means
> forward strand.
>
> So for a features query, are the four overlap filters below  
> equivalent?
> Chr1/1000:2000
> Chr1/1000:2000:1
> Chr1/1000:2000:-1
> Chr1/1000:2000:0
> Or does the addition of strand information further filter the returned
> features by strand?  But if that's the case, then according to the  
> spec
> having no strand specified means forward.  So that would mean
> overlaps="Chr1/1000:2000" would only return forward strand  
> annotations,
> and not any on the reverse strand?  To me that's counterintuitive,  
> from
> a filtering perspective I'd rather no strand info mean "both strands".
> My main point though is we need to be explicit about how strand  
> info or
> lack thereof affects features queries with range-based filters.

Hmmm, I'd been interpreting Chr1/1000:2000 as "return features on  
both strands", but from the paragraph you quote I guess this is  
wrong.  I'd be happy to see this changes to "Unspecified strand means  
both strands".

              Thomas.


From dalke at dalkescientific.com  Fri Feb 10 05:47:26 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 10:47:26 +0000
Subject: [DAS2] Refinements to range attribute and query filters in spec
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
Message-ID: <c6739648961f07e66a22c6471b78211e@dalkescientific.com>

Gregg:
> In the latest spec, the format for range queries is
>       seqid/min:max:strand
> and the format for range attributes in feature elements is
>       min:max:strand


> I personally find these kind of ranges confusing and not particularly
> useful, and would rather make min and max required for both the range
> attribute and range-based query filters.

Agreed on this side.  All clients can easily get the upper limit,
and the lower limit is always 0.

> My main point though is we need to be explicit about how strand info or
> lack thereof affects features queries with range-based filters.

It was a confusion on my part.  There are three places which
refer to location + strand.

   1. specifying a feature location
   2. fetching a sequence
   3. doing a range search

"1. specifying a feature location"

We've been talking here about limiting the use of strands
for these.  Features definitely need a strand.  If the
strand is not specified then the feature is on both strands.
or has no meaning.  If needed, resolve the ambiguity by
looking at the type (or other property).  If you really,
really want to specify that it's on both strands then use
the 0.

The location element currently looks like this
   <LOC id="some_url_for_sequence"/>  <!-- on whole sequence -->
   <LOC id="some_url_for_sequence" range="300:500" />
   <LOC id="some_url_for_sequence" range="300:500:-1" />  <!-- on strand 
-->

Given the decision yesterday that segments are special,
in terms of identification, I propose using the short id,
so these look like, respectively

   <LOC segment="Chr1"/>
   <LOC segment="Chr1/300:500"/>
   <LOC segment="Chr1/300:500:-1"/>

"2. fetching a sequence"

Why does the server needs to support a reverse complement feature?
Let's leave it out and make the client do a string reversal if
it needs it.

"3. doing a range search"

Is there any reason to specify the strandedness when doing
a feature query?

Discussion here seems to be "would be nice but that lack
is one of the things people have never complained about
in DAS1".

I propose removing strandedness from the features query.

If others disagree then here are two solutions:
   A. have a "strand=" parameter, so that the strandedness
is different from the ranges.  If you want a query for
  the union of range Chr1/A:B:-1 and range Chr1/X:Y:1
then tough - make two requests, one for each strand.

   B. ranges may specify the strand (as now) but if not
specified then it means "of any strand".

We worked on a few cases where it might be useful to
make mixed strand queries.  There weren't any compelling
reasons.  Even in the worst case scenario without strand
support in the features query is that you get on average
twice the number of features back, and worst case for
option A is the need to make two queries.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Feb 10 05:48:18 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 10:48:18 +0000
Subject: [DAS2] Re: feature NOTE and ALIAS elements?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9D5@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9D5@msex02.affymetrix.com>
Message-ID: <bd2c309106b184d5a31d540afa353abc@dalkescientific.com>

Gregg:
> Is the plan still as stated above, to have optional NOTE and ALIAS
> elements in features?  I don't see these elements in the feature 
> schema,
> and the spec doc says they're built-in properties instead (values for
> PROP key attribute that have defined meaning).

Yes.  I haven't updated the spec other than a few minor
points in the last couple of days.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Feb 10 10:04:45 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 15:04:45 +0000
Subject: [DAS2] 'OR' syntax in query language
Message-ID: <8593bb5041e0d054840da98c200d3e03@dalkescientific.com>

We talked a bit about the DAS query language.

It is currently of the form (modulo URL escaping)

   name=Andrew,Roy;inside=Chr/100:200

This is the same as

(    name contains the substring "Andrew"
   OR name contains the substring "Roy"
) AND (
      feature is inside 100:200 on the segment named 'Chr'
)

That is, there is an AND of all terms, and a single term
may have multiple OR-ed subqueries, merged by commas.

We want to change this to the form

   name=Andrew;name=Roy;inside

That is, the query key can exist more than once.  Queries
with the same key are 'OR'ed, elsewise they are 'AND'ed.


The advantage is the simplicity of not having to worry
about another quoting rule, in this case how to search
for terms containing a ",".

The only disadvantage is with servers which don't handle
multiple keys in a query - but we think those client
libraries are long since deceased.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Feb 10 10:15:05 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 15:15:05 +0000
Subject: [DAS2] range searches
Message-ID: <80684d437a99822fd017cceee83b02b4@dalkescientific.com>

I think Gregg has thought the most about this one.

We have 4 classes of range search:

'inside' (feature completely inside request range)
'overlaps' (feature overlaps the request range)
'contains' (feature completely contains request range)
'identical' (feature is exactly the request range)

They exist for smart clients which want to limit the
region request size based on previously fetched knowledge.

Example: client is viewing "500:600" and zooms out to
"400:700".  In that case the client could ask for
features which
   overlap 400:500 OR overlap 600:700
   excluding those which overlap 500:600.

If that's the case, the selection language isn't powerful
enough.  There's no way to choose "excluding".

The other option is to issue only the overlap queries.

Does the query language need to be more powerful to
allow "excluding what I know about these regions" for
people like Gregg?

Another question came up; are queries like

   overlap 400:500 OR inside 900:1000

useful?  I don't think so.  If it is, it is not supported
by the current language which only does AND of dissimilar
terms.


					Andrew
					dalke at dalkescientific.com


From ap3 at sanger.ac.uk  Fri Feb 10 10:21:25 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Fri, 10 Feb 2006 15:21:25 +0000
Subject: [DAS2] registry status
Message-ID: <2fa320fbca91abfa9f175b64d0d8105c@sanger.ac.uk>

Hi!

the developmental registry has been updated:
it now supports 2 requests:

http://www.spice-3d.org/dasregistry/das2/sources
lists das2 servers

http://www.spice-3d.org/dasregistry/das1/sources
lists das1 servers.

The next step will be to provide user upload of das2 sources

Andreas


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From dalke at dalkescientific.com  Fri Feb 10 10:49:10 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 15:49:10 +0000
Subject: [DAS2] curation history and splits&merges
Message-ID: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com>

We talked some on tracking curation history.

We decided it was a hard topic and we would defer further
discussion to the next sprint.  We're getting rather
frazzled here after nearly 5 days of hard work.

Here are some things that came up.

The writeback delta needs a field for user comments.

How persistent is an identifier for an object?  Is
it for the exact version of a feature or is it for
the concept of a the given feature?

That is, if there's a feature change the server could
assign it a new id/url.  It would need to tell the
annotation about the new id, just like it tells the
client about the newly created ids.

This makes updates more like a changeset version control
system, where there is a version number for each stable
data set.  Compare to CVS where there is a version number
for each file/record but not for the whole system.

But the current Otter database is more the CVS route.
While the changeset version seems nicer, there will
be some (I assume non-trivial) work to make Otter support
it.

There are advantages.  You could do searches with
timewarps by using a "changeset=" parameter in the
query.  The DAS mechanism handles that just fine,
since interlinks between no-longer current URLs would
be correct.

There needs to be a way to get the history of an
element. There are two thoughts:

   - put the curation history in the feature document (via
some embedded XML)

   - link to a URL which provides the curational history
document for the given element

We prefer the latter.


For splits and merges there needs to be support in
the delta to say if there is a relationship to existing
or about to be deleted features.  We did not work on
that, other than to get a feel that it works.

Again, no server handles this so we decided it table it
for the future, and work on it more for the next sprint.

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Fri Feb 10 11:36:49 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Fri, 10 Feb 2006 08:36:49 -0800
Subject: [DAS2] IGB DAS/2 client partially working -- and using registry!
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9D7@msex02.affymetrix.com>


Attached is a screenshot of IGB with data from a yeast test region
(chrVII, ~364-366kb) loaded from Allen's codesprint server by way of
Andreas' DAS/2 registry.  Still need to work on synchronizing up source
names, etc., but this is looking good.  As we had planned, having the
registry return a sources document allowed very easy integration!  

You may notice there is also a branch of the sources tree that is a
direct path to the codesprint server.  That just means I gave the
discovery engine two URLs to start from -- the registry and the
codesprint server.

This is the same version of IGB as the current head of the CVS
repository (as of today 8:30 AM PST).  I'm tempted to roll up a jar so
people can try it without having to compile the source, but on the other
hand it's pretty fragile right now, and the image conveys the gist of
it.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andreas Prlic
> Sent: Friday, February 10, 2006 7:21 AM
> To: DAS/2
> Subject: [DAS2] registry status
> 
> Hi!
> 
> the developmental registry has been updated:
> it now supports 2 requests:
> 
> http://www.spice-3d.org/dasregistry/das2/sources
> lists das2 servers
> 
> http://www.spice-3d.org/dasregistry/das1/sources
> lists das1 servers.
> 
> The next step will be to provide user upload of das2 sources
> 
> Andreas
> 
> 
> 
> 
>
-----------------------------------------------------------------------
> 
> Andreas Prlic      Wellcome Trust Sanger Institute
>                                Hinxton, Cambridge CB10 1SA, UK
> 			 +44 (0) 1223 49 6891
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAS2_in_IGB.JPG
Type: image/jpeg
Size: 170143 bytes
Desc: DAS2_in_IGB.JPG
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20060210/6d528b58/attachment.jpe>

From Gregg_Helt at affymetrix.com  Fri Feb 10 12:01:11 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Fri, 10 Feb 2006 09:01:11 -0800
Subject: [DAS2] Proposed agenda for DAS/2 Code Sprint teleconference, Feb 10
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9D9@msex02.affymetrix.com>

Properties
Range-based queries
Status reports - summarize overall progress during code sprint
Discuss next code sprint - goals, etc. 
???


From dalke at dalkescientific.com  Fri Feb 10 13:14:47 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 18:14:47 +0000
Subject: [DAS2] changes commited
Message-ID: <6425fabe79dc6d27fd3a797b837d32de@dalkescientific.com>

removed the <PROP> href= and type= options in
the spec and all examples.

changed the url "," syntax for OR'ed terms into
multiple "key=value;key=value" terms.

changed "att=key:value" into "prop-key=value"


					Andrew
					dalke at dalkescientific.com


From suzi at fruitfly.org  Fri Feb 10 14:48:58 2006
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Fri, 10 Feb 2006 11:48:58 -0800
Subject: [DAS2] question on properties
In-Reply-To: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com>
References: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com>
Message-ID: <26b9a50a02deb2be55150c6a5a47d419@fruitfly.org>

You probably know the answer to this Andrew.

One of the cases we encountered was unique properties vs cumulative 
properties.

For a simplistic (i.e. don't quibble to closely, I'm just trying to 
explain) example pretend that "ssn" and "comment" are both properties.

On the client side the appropriate behavior for these is different if 
the data coming over from the server contains >1 prop element with that 
tag.

If the client sees "ssn" twice it winces and then either ignores or 
overwrites with the 2nd value.

If the client sees "comment" twice then it appends the additional 
comment.

Question: Is this kind of information included in the spec? Uniqueness 
vs. cumulative

  
From Steve_Chervitz at affymetrix.com  Fri Feb 10 17:10:28 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Fri, 10 Feb 2006 14:10:28 -0800
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
 10 Feb 2006
Message-ID: <C0124F54.1BEF6%Steve_Chervitz@affymetrix.com>

Notes from the DAS/2 teleconference for the code sprint, 10 Feb 2006

$Id: das2-teleconf-2006-02-10.txt,v 1.1 2006/02/10 22:13:17 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  Sanger: Thomas Down, Andreas Prlic
  Sweden: Andrew Dalke
  UCLA: Allen Day
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


[note taker missed the first 5 minutes]

Topic: Properties
-----------------

gh: Properties are all tag-value
ad: yes
gh: don't think we need your binary thing.
ad: ok drop it
gh: href is needed. can always point it to a binary something out there.
can the value just be a url?
ad: can make it relative to xml base
gh: do you need some property with tag value and href at same time?
ls: how would you interpret that? should be either value or href.
ad: there's nothing to say how to interpret the url.
gh: nice to have multiple links out to somewhere else and to have some
indication what they are w/out traversing the link. e.g., this is the
genbank ref, ensembl ref, protein, etc.
if xid had an extra field with label, title e.g. that would suffice.
ad: sounds ok

[A] xids will have title + href, properties will have tag + value

Topic: Exercising the spec
---------------------------

gh: we need the reference server to actually exercise this part of the
spec. xid. possibly other things like: target overlap, inside, cigar
strings. encoding, decoding.
aday: oh no. 
ls: line element. cigar string is something that no one has tested yet.
gh: if we don't have server doing it by next code sprint
aday: any impls out there we could use?
gh: bioperl has a gff3 parser.
aday: I wrote it, and I didn't impl cigar string parsing.
ls: there's a cigar processor in bioperl AlignIO. in theory not hard
to do. 
gh: lbl folks (Nomi et al) have a java one, too. I think.
gh: other parts of spec that aren't getting exercised? I doubt if
anyone has used xml lang.
ad: added xml id. just there for other reasons, but not what we need
it for.
gh: we talked about all ids being xml ids and combing xml id and xml
base, can't remember why we stopped discussing.
ad: don't think we need to. style sheet has uses for this maybe.
ad: has anyone generated doc href yet?
td: can add this stuff easily now.
gh: for testing purposes, just throw a doc href everywhere it's
allowed.
ad: are servers supporting retrieval of seq data?
aday: yes
ad: support for alt feature formats?
aday: can do old compact formats, not sure about coverage.
gh: yes, alt feat formats are handled, but server isn't up and running
yet. igb das/2 client can handle it already.
ad: retrival of assembly?
aday: no assembly data
ad: i don't touch assembly
gh: may be for next code sprint.


Topic: range based query
------------------------

gh: thomas and i don't like optional mins and maxes.
ls: fine as long as you can always determine the size of the
reference. provide beginning and end.
gh: exception: if you want the whole sequence, can you just not supply
range?
ad: yes
gh: :1 and :-1 how to interpret nothing for strand on end and 0 for
strand at end?
ls: features that have strand +1, -1, features that have no strand or
on both strands (0) features that may have a strand but you don't know
(empty)
gh: when you put it in the query there's a differences between i don't
know and i will accept anything.
use case: transfrags from transcriptome project. unknown strand, but I
know it *is* one or the other strand.
ls: how about this arrangement:
 empty = i don't care
    0  = has strand but i dont know
    1  = forward strand
   -1  = reverse strand
    2  = both strands
ad: could be organized by track (everything in a track has same strand.
gh: don't think is good to structure a query so it's required that you
do have strand. you might could have diff strand designation on same
track. 
ls: you want to be able to distinguish things that are on both
strands, things that are on either strand, but you don't know which.
gh: biggest concern: given a range based query to server
1000-2000 means everything that overlaps, any strandedness within this
range.
ad: should support stranded searches. client can filter out
opposed to do a strand request against seq to get the rev comp. client
should be able to do this.
gh: in range attrib of features, you can add colon to indicate
strandedness.
ad: yes
gh: if no :strand does this mean unknown or don't care?
ls: defaults to *, anything. you get fwd, rev, don't know, don't care.
gh: required things on fwd strand to be :1, not make it a default.
ad: ok. if not there, means ambiguous, unknown, or not
appropriate. see email i sent.
if you get rid of search for strand in region query, most of this
issue goes away.
gh: don't think people would use this often (stranded query)
ad: you can make two queries to server instead of one.
gh: this is a resolution for all range-related issues.
ad: check my email to make sure it covers this.

[A] everyone review andrew's email re: range queries and strand issues.

gh: also or-ing of diff range-based queries is not useful for me.
I mainly need intersects of overlaps and inside. or-ing is equivalent
to using multiple queries.
td: why do you need and overlaps and inside?
gh: optimization on client side. keeps track of what it has
received. wants to minimize re-fetching.
td: can you just use overlap and not overlap?
gh: that may be equivalent, but the way I do it, you can guarantee you
never get the same feat twice with that combo. will require and-ing of
two range-based queries.

ad: modifying query lang, or-ing together two. include first range and
include second range should use multiple query keys because of the
comma. you will have to escape any comma if it's inside of query
string. 
gh: don't like the implicit 'and' if different but 'or' if keys the
same. it depends on the query.
ad: now all queries are and-ed, but commas mean multiple.
ls: comma syntax seems natural. the occasional query that had to have
an escaped comma didn't cause any bother.
td: this was as it is in das/1. exons and repeat. type=exon,
type=repeat. so the suggestion is to use the das/1 behavior.
ad: three independent segments
gh: types as well. can have any number of types= and segment= all
or-ed together. I still need anding of overlaps and inside.
td: different key are or-ed, same keys are and-ed.
ls: hoisted by my own petard here. works for me.
gh: allen?
aday: what's changed?
ls: the whole query language has changed in a fundamental way.
aday: dealing with multiple attributes with same name. fine.
gh: will server accept full urls for types?
aday: not now but will impl this.
gh: all types should be full uri's now. my client can't deal but will
soon.

Topic: status reports
---------------------
gh: state what what you hoped to accomplish and what you actually
accomplished. 

gh: hoped to get igb das client up to date with spec, working with one
das2 server, and get affy das2 server up and going.
affy das2 server will take longer. maybe by next code sprint.
igb is now using latest das2 spec, calling allen's server, and using
registry as well. happy with results. not everything done, but some
unexpected things (registry).
wrote up progress report for grant: going out 3pm today (we got
another day) a 2pg summary. will send out to everyone later.
todo: get das2 server up. client: deal with full uri issue. this is a
basic fuctionality of the client. smart handling of uris.

ee: igb client. big thing is make it treat all data sources too all
behave similar way das1/das2, quick load, separate files, regardless
of the data format. want to make it all seamless. going well.

sc: streamlined pipeline for populating das sever with affy exon array
data. didn't get to pipeline for external data (UCSC tracks), but have
basic framework in place.

ad: decided to do more writeback at next sprint. when is next sprint?
gh: march 13-17. lincoln will be in UK and can participate from there.
ad: I'm in the states next week. will come to emeryville for next
sprint.

[A] next code sprint is 13-17 March. Mark your calendars.

ad: hoped to work on spec, resolve detailed questions, make sure it works
with people's needs. will work on incorporating latest ideas into spec.
validator: have one but is not fit for public consumption. not at
where it was last summer on the previous version of spec.

ap: das interface for registry, can serve das1 and das2 sources w/ new
source command. java client - not yet. registry: todo UI so users can
upload to das registry.

td: hoping to write server. got something up for feat, types,
segments, need to run through andrew's validator. hope to work on
writeback, but didn't happen (but good discussion on it). want to get
more data included, ensembl database.
roy has been working on zmap client, coming along fine.

aday: primary goals: to support new version of spec -- not fully done
uri problem in query parsing. apache config integration is
done. installation and rpm for server - done for FC3 i386, available
in the next couple of days (brian o'connor). general documentation
improvement in code for server - not done.
Next step: post, put, delete, writeback framework (originally planned
this but may need to rethink),  impl transaction logs (maybe in
flux). adding more unit tests.
ad: writeback spec won't happen for at least 2 weeks. need to write up
what we've done on current spec first.

ls: will be available from 14th on. at ensembl meeting up to the 13th.
gh: allen come to emeryville?
aday: maybe.
gh: will have to explore how to fund hosting folks here for next
codesprint. 

gh: speaking for nomi - she had apollo working for parsing features
and displaying them. some issues with higher level integration into
apollo. making good progress.

gh: time to wrap it up. thanks for your hard work.
[applause]

[A] next teleconf will be on 20 Feb, 9:30 PST 5:30 UK (regular time)
we're skipping 13 feb (next monday) given all our time this week.


From dalke at dalkescientific.com  Fri Feb 10 21:11:05 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat, 11 Feb 2006 02:11:05 +0000
Subject: [DAS2] Re: question on properties
In-Reply-To: <26b9a50a02deb2be55150c6a5a47d419@fruitfly.org>
References: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com>
	<26b9a50a02deb2be55150c6a5a47d419@fruitfly.org>
Message-ID: <c9f971ac2d043705eb5ff56e6051217c@dalkescientific.com>

Suzi:
> On the client side the appropriate behavior for these is different if 
> the data coming over from the server contains >1 prop element with 
> that tag.
>
> If the client sees "ssn" twice it winces and then either ignores or 
> overwrites with the 2nd value.

Or it says "error, error, cannot compute" and stops.  From one
of the guidelines ("the zen") of Python: "when in doubt, refuse
the temptation to guess."

> If the client sees "comment" twice then it appends the additional 
> comment.
>
> Question: Is this kind of information included in the spec? Uniqueness 
> vs. cumulative

Here's my thoughts.

We have several points for client/server extensions.
One is this property table, which is a set of key/value
strings.

Because they are strings you can use them for almost anything,
with the correct interpretation by the client and server.
That requires collusion between the two.

This is the extension point which is most familiar to everyone.
But it's open to the problem you pointed out.

The other is this non-DAS extension XML, which lets the
server add *anything*.  If the client doesn't know what the
field does it must ignore it.  If it does writeback with
that feature it must include the ignored element, and not
make any changes.

That means your server can add

<suzi:ssn xmlns:suzi="mailto:suzi at fruitfly.org">123-45-1534</suzi:ssn>

If the client doesn't know what to do, it ignores it.
It will never change the field.

If the client knows what that field does it must follow the
constraints set down for it, else the server should stop
with an error and not allow the update to occur.

There are two downsides to this approach.  There's no
way for a dumb client to understand that field, so no user
will ever see it, and there's no way to do a search on
that field.

(A server can extend the search syntax and tell the client
about the new syntax, but a dumb client doesn't know about
that.)

If there is need to support the dumb client then the
only way to support the data type constraints is in
the server.  It must check a given field and possibly
stop with an error or resolve ambiguities.  We can have
that the server reports an error message that the client
and/or user can use to figure out what's wrong.

Thinking about it a bit, it's possible to combine these
two.  For example, a server can have

   <PROP key="ssn" value="123-45-1534" />

then list as an extension

   <suzi:says-the-ssn-in-special/>

All this latter XML does is flag sufficiently aware clients
that the server implements the special SSN requirements.

A dumb client can ignore the flag, users add a new SSN,
and the server bails out, while the smart client early
on knows that that isn't going to be allowed.

This hybrid solution doesn't seem right to me though.

I currently (and without any experience) prefer putting
schema constrained fields in as extension elements.
Think of the property table as something exposed to the
user as a completely editable table, with no ability to
limit what that person does.

For the case of the SSN that might be overkill.  For
other things, like the current stage of a feature in
the curational process, it's best to put that data
there and not in the generic property table.

There is a long history of using generic key/value
tables as an ad-hoc way to extend a protocol.  I'm
trying to improve on that by defining a way for a
server to add well-structure, schema-dependent and
searchable data (for smart clients) without needing
to piggy back on a bunch of strings.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Feb 20 10:31:42 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 20 Feb 2006 08:31:42 -0700
Subject: [DAS2] today's conf. call and President's Day
Message-ID: <ec529ed5e113de48c2dfd64380c3ce4d@dalkescientific.com>

Today is President's Day in the US.

Are the other US people working today?

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Mon Feb 20 11:47:13 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 20 Feb 2006 08:47:13 -0800
Subject: [DAS2] today's conf. call and President's Day
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9EC@msex02.affymetrix.com>

It's a day off for Affymetrix, but I'm working anyway.  Unless there are
major objections I'd like to go ahead and do the conference call at the
standard time (9:30 AM Pacific time).  There may be a few less people
joining in from the US.

	thanks,
	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Monday, February 20, 2006 7:32 AM
> To: DAS/2
> Subject: [DAS2] today's conf. call and President's Day
> 
> Today is President's Day in the US.
> 
> Are the other US people working today?
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From lstein at cshl.edu  Mon Feb 20 12:37:06 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 20 Feb 2006 12:37:06 -0500
Subject: [DAS2] today's conf. call and President's Day
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9EC@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9EC@msex02.affymetrix.com>
Message-ID: <200602201237.06497.lstein@cshl.edu>

Hi,

I've dialed in and all I"m getting is hold music. Could you confirm this info?

 800 531-3250
 287-9055

Thanks!

Lincoln

On Monday 20 February 2006 11:47, Helt,Gregg wrote:
> It's a day off for Affymetrix, but I'm working anyway.  Unless there are
> major objections I'd like to go ahead and do the conference call at the
> standard time (9:30 AM Pacific time).  There may be a few less people
> joining in from the US.
>
>  thanks,
>  gregg
>
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
>
> [mailto:das2-bounces at portal.open-
>
> > bio.org] On Behalf Of Andrew Dalke
> > Sent: Monday, February 20, 2006 7:32 AM
> > To: DAS/2
> > Subject: [DAS2] today's conf. call and President's Day
> >
> > Today is President's Day in the US.
> >
> > Are the other US people working today?
> >
> >      Andrew
> >      dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)


From lstein at cshl.edu  Mon Feb 20 11:50:38 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 20 Feb 2006 11:50:38 -0500
Subject: [DAS2] today's conf. call and President's Day
In-Reply-To: <ec529ed5e113de48c2dfd64380c3ce4d@dalkescientific.com>
References: <ec529ed5e113de48c2dfd64380c3ce4d@dalkescientific.com>
Message-ID: <200602201150.38431.lstein@cshl.edu>

I am working today!

Lincoln

On Monday 20 February 2006 10:31, Andrew Dalke wrote:
> Today is President's Day in the US.
>
> Are the other US people working today?
>
>      Andrew
>      dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)


From dalke at dalkescientific.com  Mon Feb 20 12:28:56 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 20 Feb 2006 10:28:56 -0700
Subject: [DAS2] today's conf. call and President's Day
In-Reply-To: <E07FE3FF-7EA2-4BDC-8FBA-2992A5CBBEDE@sanger.ac.uk>
References: <ec529ed5e113de48c2dfd64380c3ce4d@dalkescientific.com>
	<E07FE3FF-7EA2-4BDC-8FBA-2992A5CBBEDE@sanger.ac.uk>
Message-ID: <db7e92f877d56f0b329931710318f0cc@dalkescientific.com>

Thomas Down wrote:
> Well, I can't speak for US people, but I do know that Andreas Prlic is 
> on holiday today and I presume won't be joining the conference call.  
> I can join if there's anything that needs discussing urgently, but 
> otherwise I'd be happy to leave it 'til next week.

Status update for me:

   Last week was a break for me from the sprint - I was winded.  I
worked a bit here and there on how to do a GUI interface for the 
validation.
I hope to get a demo page of the results up within a day or so.

   This week I'll be working on that and a new draft of the spec.

   Also, I'm now back home in Santa Fe, where we haven't had rain nor
snow for 100 days - my cacti are drooping!  :(


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Feb 27 09:50:10 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 27 Feb 2006 08:50:10 -0600
Subject: [DAS2] will miss today's conf. call
Message-ID: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com>

Hi all,

   Not only am I on the road back from the Python conference but
my cell phone battery is nearing dead so I won't be able to make
it to today's phone conference call.

   Here's my status.  I've been working on the validator, to
the detriment of the next spec rewrite.

   This validator does single-document checks.  That is, it
does not do internal integrity checks to make sure that
the results of, say, a range query only returns features in
that range, or that the features are in the range given by
the segments.

   I plugged the results into a web server running on my
laptop.  It's using some new Python libraries which are
not yet installed on the OBF machine, but which I can install
after I get back to Santa Fe.  The GUI is similar to what
I threw together at Sanger during the Sprint - enter a URL
and a document type, view the results.

   What took long is the code to pin down where the errors
happened, for example, to show which attribute was the
extra attribute in an element.

  I've attached sample output for your viewing pleasure.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20060227/026be2fd/attachment.html>
-------------- next part --------------


There is enough there for a Javascript jockey to make an
neat little interactive viewer, eg, click on the error
message to be shown where it occurs in the document.
Also, the marker I'm using to show where the error occurs
in the body of the text needs work - the method I use
isn't that cross platform portable.

I think the next steps for me are:
   - get the validator working as-is on the OBF web site
       (should be on-line by tomorrow)
   - get back to writing the 3rd draft of the spec.

					Andrew
					dalke at dalkescientific.com

From ap3 at sanger.ac.uk  Mon Feb 27 12:41:08 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon, 27 Feb 2006 17:41:08 +0000
Subject: [DAS2] will miss today's conf. call
In-Reply-To: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com>
References: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com>
Message-ID: <197aeffa03988a8fc098f27926ee511d@sanger.ac.uk>

any conference call today?
- listening to the hold music

Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From nomi at fruitfly.org  Mon Feb 27 12:43:00 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Mon, 27 Feb 2006 09:43:00 -0800
Subject: [DAS2] will miss today's conf. call
In-Reply-To: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com>
References: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com>
Message-ID: <17411.14884.410370.608675@spongecake.lbl.gov>

are we having a teleconference today?  i got bored of waiting on hold for
the moderator.  someone email me if it's happening.

the validator sounds useful!

    Nomi


From boconnor at ucla.edu  Mon Feb 27 19:46:02 2006
From: boconnor at ucla.edu (Brian O'Connor)
Date: Mon, 27 Feb 2006 16:46:02 -0800
Subject: [DAS2] DAS2 Reference Server @ UCLA
Message-ID: <44039D4A.5000503@ucla.edu>

Hi,

If anyone is using the DAS/2 server at UCLA (das.biopackages.net) there 
will be some maintenance on the server later today (after 5pm Pacific).  
This won't affect the DAS/2 codebase, I'm just moving around some of our 
other production websites and there will be some downtime.  The outage 
should just last a few minutes.

--Brian


From ap3 at sanger.ac.uk  Wed Feb  1 12:42:16 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Wed, 1 Feb 2006 12:42:16 +0000
Subject: [DAS2] code sprint final infos
Message-ID: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk>

Hi!

This is to provide final organisatorial infos about the DAS 2 code 
sprint next week.

- We start Monday 10:00 (Sanger time) in the Morgan building -
   meeting point is the small meeting room next to the kitchen 1st floor 
(we get a better room later).

- The sanger guest wireless network supports Skype. so instant 
messaging and voice over IP calls
will be possible during all the time.

- every day at 17:00 (Sanger time = 9:00 pacific time) there will be a 
conference call on the usual DAS2 line

Greetings,
Andreas


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From allenday at ucla.edu  Wed Feb  1 22:42:26 2006
From: allenday at ucla.edu (Allen Day)
Date: Wed, 1 Feb 2006 14:42:26 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0602011439140.1651@sumo.ctrl.ucla.edu>

I just looked over your changes, and will begin making the changes to the
server repository today.

I'd like to update the server at das.biopackages.net with my changes on
Friday, unless there are objections.

I'll be taking notes along the way and will post to the list if anything
in your document is unclear to me.

At first glance, I agree -- the changes are minor.

-Allen


On Mon, 30 Jan 2006, Andrew Dalke wrote:

> Allen:
> > Is the spec going to be in a stable state for the code sprint?  I'd 
> > like
> > to use this time to sync the server implementation with a stable 
> > version
> > of the spec.  It looks like there have been many substantial changes.
> 
> I have just (within the last few minutes) completed the first draft
> of the update of the spec.
> 
> It's not in HTML - that calls for too much work for this stage.
> It's text, in CVS under das/das2/new_spec.txt
> 
> There are many parts which need clarification.  These are marked
> with a "XXX" along with my comments.
> 
> The RNC files are in
> 
>    das/das2/scratch/*.rnc
> along with some test XML files.  These XML files are not meant
> to be realistic.  They are meant more to check edge cases.
> 
> I do no think there are major changes to the spec.  Most of the
> changes have actually trimmed things down, like getting rid of
> the "properties" subtree and merging the different "sources" requests
> into a single document.
> 
> 
> Here are the major interfaces
> 
> $PREFIX/sequence - a "sources" request
>    This is the top-level entry point to a DAS 2 server.  It returns a
>    list of the available genomic sequence and their versions.
>    [sequence-namespace]
> 
> $PREFIX/sequence/$SOURCE - a "source" request
>    Returns the available versions of the given genomic sequence.
> 
> $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
>    Returns information about a given version of a genomic sequence.
>    Clients may assume that the sequence and assembly are constant for a
>    given version of a source. Note that annotation data on a server
>    with curational write-back support may change without changing the
>    version.
> 
> 
> For a given version here are the sub-parts.  Note that I've gone ahead
> and split the query urls (segment, features and types each have query
> interfaces) from the base directory used as containers for the segments,
> features and types.
> 
>   $VERSION/segments - the segments query URL; summarizes the top-level
>      segments in the data source
> 
>   $VERSION/segment/$SEGMENT_ID - a segment query; used to get detailed
>      information about the identified segment
> 
>   $VERSION/features - the feature filter query URL.  Features are
>     locatable annotations or experimental results.  The feature filter
>     URL supports query parameters to select a subset of the features
>     based on position, feature type and other properties.
> 
>   $VERSION/feature/$FEATURE_ID - a feature query; used to get detailed
>      information about the identified feature
> 
>   $VERSION/types - the types query URL which returns a list of all
>     feature types.  Feature types include ontology and depiction
>     details for all features of the given type.
> 
>   $VERSION/type/$TYPE_ID - details about the specified feature type
> 
> Oh, and there are internal conflicts which will be straightened
> out in the next draft.  These shouldn't be big.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From Gregg_Helt at affymetrix.com  Wed Feb  1 23:14:30 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 1 Feb 2006 15:14:30 -0800
Subject: [DAS2] Re: Apollo and DAS/2 priorities
Message-ID: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>


That would be great if you could update the biopackages server before
the code sprint starts!  Then client implementers will have a server to
test with.

	thanks,
	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Allen Day
> Sent: Wednesday, February 01, 2006 2:42 PM
> To: Andrew Dalke
> Cc: DAS/2
> Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> 
> I just looked over your changes, and will begin making the changes to
the
> server repository today.
> 
> I'd like to update the server at das.biopackages.net with my changes
on
> Friday, unless there are objections.
> 
> I'll be taking notes along the way and will post to the list if
anything
> in your document is unclear to me.
> 
> At first glance, I agree -- the changes are minor.
> 
> -Allen
> 
> 
> On Mon, 30 Jan 2006, Andrew Dalke wrote:
> 
> > Allen:
> > > Is the spec going to be in a stable state for the code sprint?
I'd
> > > like
> > > to use this time to sync the server implementation with a stable
> > > version
> > > of the spec.  It looks like there have been many substantial
changes.
> >
> > I have just (within the last few minutes) completed the first draft
> > of the update of the spec.
> >
> > It's not in HTML - that calls for too much work for this stage.
> > It's text, in CVS under das/das2/new_spec.txt
> >
> > There are many parts which need clarification.  These are marked
> > with a "XXX" along with my comments.
> >
> > The RNC files are in
> >
> >    das/das2/scratch/*.rnc
> > along with some test XML files.  These XML files are not meant
> > to be realistic.  They are meant more to check edge cases.
> >
> > I do no think there are major changes to the spec.  Most of the
> > changes have actually trimmed things down, like getting rid of
> > the "properties" subtree and merging the different "sources"
requests
> > into a single document.
> >
> >
> > Here are the major interfaces
> >
> > $PREFIX/sequence - a "sources" request
> >    This is the top-level entry point to a DAS 2 server.  It returns
a
> >    list of the available genomic sequence and their versions.
> >    [sequence-namespace]
> >
> > $PREFIX/sequence/$SOURCE - a "source" request
> >    Returns the available versions of the given genomic sequence.
> >
> > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> >    Returns information about a given version of a genomic sequence.
> >    Clients may assume that the sequence and assembly are constant
for a
> >    given version of a source. Note that annotation data on a server
> >    with curational write-back support may change without changing
the
> >    version.
> >
> >
> > For a given version here are the sub-parts.  Note that I've gone
ahead
> > and split the query urls (segment, features and types each have
query
> > interfaces) from the base directory used as containers for the
segments,
> > features and types.
> >
> >   $VERSION/segments - the segments query URL; summarizes the
top-level
> >      segments in the data source
> >
> >   $VERSION/segment/$SEGMENT_ID - a segment query; used to get
detailed
> >      information about the identified segment
> >
> >   $VERSION/features - the feature filter query URL.  Features are
> >     locatable annotations or experimental results.  The feature
filter
> >     URL supports query parameters to select a subset of the features
> >     based on position, feature type and other properties.
> >
> >   $VERSION/feature/$FEATURE_ID - a feature query; used to get
detailed
> >      information about the identified feature
> >
> >   $VERSION/types - the types query URL which returns a list of all
> >     feature types.  Feature types include ontology and depiction
> >     details for all features of the given type.
> >
> >   $VERSION/type/$TYPE_ID - details about the specified feature type
> >
> > Oh, and there are internal conflicts which will be straightened
> > out in the next draft.  These shouldn't be big.
> >
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> >
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From allenday at ucla.edu  Wed Feb  1 23:27:11 2006
From: allenday at ucla.edu (Allen Day)
Date: Wed, 1 Feb 2006 15:27:11 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
Message-ID: <Pine.LNX.4.58.0602011526030.1651@sumo.ctrl.ucla.edu>

That's what I was thinking too, but I was worried about the existing 
Genoviz clients "in the wild" having the server suddenly break.

So you're saying it's okay with you if those clients have a service
interruption?

-Allen


On Wed, 1 Feb 2006, Helt,Gregg wrote:

> 
> That would be great if you could update the biopackages server before
> the code sprint starts!  Then client implementers will have a server to
> test with.
> 
> 	thanks,
> 	gregg
> 
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
> [mailto:das2-bounces at portal.open-
> > bio.org] On Behalf Of Allen Day
> > Sent: Wednesday, February 01, 2006 2:42 PM
> > To: Andrew Dalke
> > Cc: DAS/2
> > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > 
> > I just looked over your changes, and will begin making the changes to
> the
> > server repository today.
> > 
> > I'd like to update the server at das.biopackages.net with my changes
> on
> > Friday, unless there are objections.
> > 
> > I'll be taking notes along the way and will post to the list if
> anything
> > in your document is unclear to me.
> > 
> > At first glance, I agree -- the changes are minor.
> > 
> > -Allen
> > 
> > 
> > On Mon, 30 Jan 2006, Andrew Dalke wrote:
> > 
> > > Allen:
> > > > Is the spec going to be in a stable state for the code sprint?
> I'd
> > > > like
> > > > to use this time to sync the server implementation with a stable
> > > > version
> > > > of the spec.  It looks like there have been many substantial
> changes.
> > >
> > > I have just (within the last few minutes) completed the first draft
> > > of the update of the spec.
> > >
> > > It's not in HTML - that calls for too much work for this stage.
> > > It's text, in CVS under das/das2/new_spec.txt
> > >
> > > There are many parts which need clarification.  These are marked
> > > with a "XXX" along with my comments.
> > >
> > > The RNC files are in
> > >
> > >    das/das2/scratch/*.rnc
> > > along with some test XML files.  These XML files are not meant
> > > to be realistic.  They are meant more to check edge cases.
> > >
> > > I do no think there are major changes to the spec.  Most of the
> > > changes have actually trimmed things down, like getting rid of
> > > the "properties" subtree and merging the different "sources"
> requests
> > > into a single document.
> > >
> > >
> > > Here are the major interfaces
> > >
> > > $PREFIX/sequence - a "sources" request
> > >    This is the top-level entry point to a DAS 2 server.  It returns
> a
> > >    list of the available genomic sequence and their versions.
> > >    [sequence-namespace]
> > >
> > > $PREFIX/sequence/$SOURCE - a "source" request
> > >    Returns the available versions of the given genomic sequence.
> > >
> > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> > >    Returns information about a given version of a genomic sequence.
> > >    Clients may assume that the sequence and assembly are constant
> for a
> > >    given version of a source. Note that annotation data on a server
> > >    with curational write-back support may change without changing
> the
> > >    version.
> > >
> > >
> > > For a given version here are the sub-parts.  Note that I've gone
> ahead
> > > and split the query urls (segment, features and types each have
> query
> > > interfaces) from the base directory used as containers for the
> segments,
> > > features and types.
> > >
> > >   $VERSION/segments - the segments query URL; summarizes the
> top-level
> > >      segments in the data source
> > >
> > >   $VERSION/segment/$SEGMENT_ID - a segment query; used to get
> detailed
> > >      information about the identified segment
> > >
> > >   $VERSION/features - the feature filter query URL.  Features are
> > >     locatable annotations or experimental results.  The feature
> filter
> > >     URL supports query parameters to select a subset of the features
> > >     based on position, feature type and other properties.
> > >
> > >   $VERSION/feature/$FEATURE_ID - a feature query; used to get
> detailed
> > >      information about the identified feature
> > >
> > >   $VERSION/types - the types query URL which returns a list of all
> > >     feature types.  Feature types include ontology and depiction
> > >     details for all features of the given type.
> > >
> > >   $VERSION/type/$TYPE_ID - details about the specified feature type
> > >
> > > Oh, and there are internal conflicts which will be straightened
> > > out in the next draft.  These shouldn't be big.
> > >
> > > 					Andrew
> > > 					dalke at dalkescientific.com
> > >
> > > _______________________________________________
> > > DAS2 mailing list
> > > DAS2 at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/das2
> > >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> 


From allenday at ucla.edu  Wed Feb  1 23:30:22 2006
From: allenday at ucla.edu (Allen Day)
Date: Wed, 1 Feb 2006 15:30:22 -0800 (PST)
Subject: [DAS2] code sprint final infos
In-Reply-To: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk>
References: <11a8e78c4d35855959fa41eb23a4039d@sanger.ac.uk>
Message-ID: <Pine.LNX.4.58.0602011527310.1651@sumo.ctrl.ucla.edu>

What IM service are we using, and where can we collate all user IDs?  
Perhaps it would be better to meet up in an IRC channel.

I propose gathering in #codesprint on EFnet.

-Allen

On Wed, 1 Feb 2006, Andreas Prlic wrote:

> Hi!
> 
> This is to provide final organisatorial infos about the DAS 2 code 
> sprint next week.
> 
> - We start Monday 10:00 (Sanger time) in the Morgan building -
>    meeting point is the small meeting room next to the kitchen 1st floor 
> (we get a better room later).
> 
> - The sanger guest wireless network supports Skype. so instant 
> messaging and voice over IP calls
> will be possible during all the time.
> 
> - every day at 17:00 (Sanger time = 9:00 pacific time) there will be a 
> conference call on the usual DAS2 line
> 
> Greetings,
> Andreas
> 
> 
> 
> 
> -----------------------------------------------------------------------
> 
> Andreas Prlic      Wellcome Trust Sanger Institute
>                                Hinxton, Cambridge CB10 1SA, UK
> 			 +44 (0) 1223 49 6891
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From nomi at fruitfly.org  Thu Feb  2 00:37:44 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Wed, 1 Feb 2006 16:37:44 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
Message-ID: <17377.21592.854840.243376@kinked.lbl.gov>

On 1 February 2006, Helt,Gregg wrote:
 > That would be great if you could update the biopackages server before
 > the code sprint starts!  Then client implementers will have a server to
 > test with.

yes!!

On 1 February 2006, Allen Day wrote:
 > That's what I was thinking too, but I was worried about the existing 
 > Genoviz clients "in the wild" having the server suddenly break.

are there really a lot of users (as opposed to das developers) who are
using the biopackages server?

On 1 February 2006, Allen Day wrote:
 > What IM service are we using, and where can we collate all user IDs?  
 > Perhaps it would be better to meet up in an IRC channel.
 > 
 > I propose gathering in #codesprint on EFnet.

i need details on this as well.  i've never bothered registering for an
IM service or IRC channel.

   Nomi


From ed_erwin at affymetrix.com  Wed Feb  1 23:44:35 2006
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Wed, 01 Feb 2006 15:44:35 -0800
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <Pine.LNX.4.58.0602011526030.1651@sumo.ctrl.ucla.edu>
References: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
	<Pine.LNX.4.58.0602011526030.1651@sumo.ctrl.ucla.edu>
Message-ID: <43E147E3.1030705@affymetrix.com>


Gregg asked me to say "No".  Please do not break the current server that 
IGB is using.

Please make your changes on a server at a different URL.

Thanks
Ed

Allen Day wrote:
> That's what I was thinking too, but I was worried about the existing 
> Genoviz clients "in the wild" having the server suddenly break.
> 
> So you're saying it's okay with you if those clients have a service
> interruption?
> 
> -Allen
> 
> 
> On Wed, 1 Feb 2006, Helt,Gregg wrote:
> 
> 
>>That would be great if you could update the biopackages server before
>>the code sprint starts!  Then client implementers will have a server to
>>test with.
>>
>>	thanks,
>>	gregg
>>
>>
>>>-----Original Message-----
>>>From: das2-bounces at portal.open-bio.org
>>
>>[mailto:das2-bounces at portal.open-
>>
>>>bio.org] On Behalf Of Allen Day
>>>Sent: Wednesday, February 01, 2006 2:42 PM
>>>To: Andrew Dalke
>>>Cc: DAS/2
>>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
>>>
>>>I just looked over your changes, and will begin making the changes to
>>
>>the
>>
>>>server repository today.
>>>
>>>I'd like to update the server at das.biopackages.net with my changes
>>
>>on
>>
>>>Friday, unless there are objections.
>>>
>>>I'll be taking notes along the way and will post to the list if
>>
>>anything
>>
>>>in your document is unclear to me.
>>>
>>>At first glance, I agree -- the changes are minor.
>>>
>>>-Allen
>>>
>>>
>>>On Mon, 30 Jan 2006, Andrew Dalke wrote:
>>>
>>>
>>>>Allen:
>>>>
>>>>>Is the spec going to be in a stable state for the code sprint?
>>
>>I'd
>>
>>>>>like
>>>>>to use this time to sync the server implementation with a stable
>>>>>version
>>>>>of the spec.  It looks like there have been many substantial
>>
>>changes.
>>
>>>>I have just (within the last few minutes) completed the first draft
>>>>of the update of the spec.
>>>>
>>>>It's not in HTML - that calls for too much work for this stage.
>>>>It's text, in CVS under das/das2/new_spec.txt
>>>>
>>>>There are many parts which need clarification.  These are marked
>>>>with a "XXX" along with my comments.
>>>>
>>>>The RNC files are in
>>>>
>>>>   das/das2/scratch/*.rnc
>>>>along with some test XML files.  These XML files are not meant
>>>>to be realistic.  They are meant more to check edge cases.
>>>>
>>>>I do no think there are major changes to the spec.  Most of the
>>>>changes have actually trimmed things down, like getting rid of
>>>>the "properties" subtree and merging the different "sources"
>>
>>requests
>>
>>>>into a single document.
>>>>
>>>>
>>>>Here are the major interfaces
>>>>
>>>>$PREFIX/sequence - a "sources" request
>>>>   This is the top-level entry point to a DAS 2 server.  It returns
>>
>>a
>>
>>>>   list of the available genomic sequence and their versions.
>>>>   [sequence-namespace]
>>>>
>>>>$PREFIX/sequence/$SOURCE - a "source" request
>>>>   Returns the available versions of the given genomic sequence.
>>>>
>>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
>>>>   Returns information about a given version of a genomic sequence.
>>>>   Clients may assume that the sequence and assembly are constant
>>
>>for a
>>
>>>>   given version of a source. Note that annotation data on a server
>>>>   with curational write-back support may change without changing
>>
>>the
>>
>>>>   version.
>>>>
>>>>
>>>>For a given version here are the sub-parts.  Note that I've gone
>>
>>ahead
>>
>>>>and split the query urls (segment, features and types each have
>>
>>query
>>
>>>>interfaces) from the base directory used as containers for the
>>
>>segments,
>>
>>>>features and types.
>>>>
>>>>  $VERSION/segments - the segments query URL; summarizes the
>>
>>top-level
>>
>>>>     segments in the data source
>>>>
>>>>  $VERSION/segment/$SEGMENT_ID - a segment query; used to get
>>
>>detailed
>>
>>>>     information about the identified segment
>>>>
>>>>  $VERSION/features - the feature filter query URL.  Features are
>>>>    locatable annotations or experimental results.  The feature
>>
>>filter
>>
>>>>    URL supports query parameters to select a subset of the features
>>>>    based on position, feature type and other properties.
>>>>
>>>>  $VERSION/feature/$FEATURE_ID - a feature query; used to get
>>
>>detailed
>>
>>>>     information about the identified feature
>>>>
>>>>  $VERSION/types - the types query URL which returns a list of all
>>>>    feature types.  Feature types include ontology and depiction
>>>>    details for all features of the given type.
>>>>
>>>>  $VERSION/type/$TYPE_ID - details about the specified feature type
>>>>
>>>>Oh, and there are internal conflicts which will be straightened
>>>>out in the next draft.  These shouldn't be big.
>>>>
>>>>					Andrew
>>>>					dalke at dalkescientific.com
>>>>
>>>>_______________________________________________
>>>>DAS2 mailing list
>>>>DAS2 at portal.open-bio.org
>>>>http://portal.open-bio.org/mailman/listinfo/das2
>>>>
>>>
>>>_______________________________________________
>>>DAS2 mailing list
>>>DAS2 at portal.open-bio.org
>>>http://portal.open-bio.org/mailman/listinfo/das2
>>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Wed Feb  1 23:51:23 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 1 Feb 2006 15:51:23 -0800
Subject: [DAS2] Re: Apollo and DAS/2 priorities
Message-ID: <C71929195D04BF48BAECD499AF717B480198C994@msex02.affymetrix.com>

Yes, what Ed said, that's what I meant.  Updated server, but at a
different address.  Otherwise the current release of IGB will break when
trying to use the biopackages server.

Once our IGB code has caught up to the updated server, we can roll out a
new release to point to the new server instead of the old one.  But not
yet.

	Thanks,
	Gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Ed Erwin
> Sent: Wednesday, February 01, 2006 3:45 PM
> To: Allen Day
> Cc: DAS/2
> Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> 
> 
> Gregg asked me to say "No".  Please do not break the current server
that
> IGB is using.
> 
> Please make your changes on a server at a different URL.
> 
> Thanks
> Ed
> 
> Allen Day wrote:
> > That's what I was thinking too, but I was worried about the existing
> > Genoviz clients "in the wild" having the server suddenly break.
> >
> > So you're saying it's okay with you if those clients have a service
> > interruption?
> >
> > -Allen
> >
> >
> > On Wed, 1 Feb 2006, Helt,Gregg wrote:
> >
> >
> >>That would be great if you could update the biopackages server
before
> >>the code sprint starts!  Then client implementers will have a server
to
> >>test with.
> >>
> >>	thanks,
> >>	gregg
> >>
> >>
> >>>-----Original Message-----
> >>>From: das2-bounces at portal.open-bio.org
> >>
> >>[mailto:das2-bounces at portal.open-
> >>
> >>>bio.org] On Behalf Of Allen Day
> >>>Sent: Wednesday, February 01, 2006 2:42 PM
> >>>To: Andrew Dalke
> >>>Cc: DAS/2
> >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> >>>
> >>>I just looked over your changes, and will begin making the changes
to
> >>
> >>the
> >>
> >>>server repository today.
> >>>
> >>>I'd like to update the server at das.biopackages.net with my
changes
> >>
> >>on
> >>
> >>>Friday, unless there are objections.
> >>>
> >>>I'll be taking notes along the way and will post to the list if
> >>
> >>anything
> >>
> >>>in your document is unclear to me.
> >>>
> >>>At first glance, I agree -- the changes are minor.
> >>>
> >>>-Allen
> >>>
> >>>
> >>>On Mon, 30 Jan 2006, Andrew Dalke wrote:
> >>>
> >>>
> >>>>Allen:
> >>>>
> >>>>>Is the spec going to be in a stable state for the code sprint?
> >>
> >>I'd
> >>
> >>>>>like
> >>>>>to use this time to sync the server implementation with a stable
> >>>>>version
> >>>>>of the spec.  It looks like there have been many substantial
> >>
> >>changes.
> >>
> >>>>I have just (within the last few minutes) completed the first
draft
> >>>>of the update of the spec.
> >>>>
> >>>>It's not in HTML - that calls for too much work for this stage.
> >>>>It's text, in CVS under das/das2/new_spec.txt
> >>>>
> >>>>There are many parts which need clarification.  These are marked
> >>>>with a "XXX" along with my comments.
> >>>>
> >>>>The RNC files are in
> >>>>
> >>>>   das/das2/scratch/*.rnc
> >>>>along with some test XML files.  These XML files are not meant
> >>>>to be realistic.  They are meant more to check edge cases.
> >>>>
> >>>>I do no think there are major changes to the spec.  Most of the
> >>>>changes have actually trimmed things down, like getting rid of
> >>>>the "properties" subtree and merging the different "sources"
> >>
> >>requests
> >>
> >>>>into a single document.
> >>>>
> >>>>
> >>>>Here are the major interfaces
> >>>>
> >>>>$PREFIX/sequence - a "sources" request
> >>>>   This is the top-level entry point to a DAS 2 server.  It
returns
> >>
> >>a
> >>
> >>>>   list of the available genomic sequence and their versions.
> >>>>   [sequence-namespace]
> >>>>
> >>>>$PREFIX/sequence/$SOURCE - a "source" request
> >>>>   Returns the available versions of the given genomic sequence.
> >>>>
> >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> >>>>   Returns information about a given version of a genomic
sequence.
> >>>>   Clients may assume that the sequence and assembly are constant
> >>
> >>for a
> >>
> >>>>   given version of a source. Note that annotation data on a
server
> >>>>   with curational write-back support may change without changing
> >>
> >>the
> >>
> >>>>   version.
> >>>>
> >>>>
> >>>>For a given version here are the sub-parts.  Note that I've gone
> >>
> >>ahead
> >>
> >>>>and split the query urls (segment, features and types each have
> >>
> >>query
> >>
> >>>>interfaces) from the base directory used as containers for the
> >>
> >>segments,
> >>
> >>>>features and types.
> >>>>
> >>>>  $VERSION/segments - the segments query URL; summarizes the
> >>
> >>top-level
> >>
> >>>>     segments in the data source
> >>>>
> >>>>  $VERSION/segment/$SEGMENT_ID - a segment query; used to get
> >>
> >>detailed
> >>
> >>>>     information about the identified segment
> >>>>
> >>>>  $VERSION/features - the feature filter query URL.  Features are
> >>>>    locatable annotations or experimental results.  The feature
> >>
> >>filter
> >>
> >>>>    URL supports query parameters to select a subset of the
features
> >>>>    based on position, feature type and other properties.
> >>>>
> >>>>  $VERSION/feature/$FEATURE_ID - a feature query; used to get
> >>
> >>detailed
> >>
> >>>>     information about the identified feature
> >>>>
> >>>>  $VERSION/types - the types query URL which returns a list of all
> >>>>    feature types.  Feature types include ontology and depiction
> >>>>    details for all features of the given type.
> >>>>
> >>>>  $VERSION/type/$TYPE_ID - details about the specified feature
type
> >>>>
> >>>>Oh, and there are internal conflicts which will be straightened
> >>>>out in the next draft.  These shouldn't be big.
> >>>>
> >>>>					Andrew
> >>>>					dalke at dalkescientific.com
> >>>>
> >>>>_______________________________________________
> >>>>DAS2 mailing list
> >>>>DAS2 at portal.open-bio.org
> >>>>http://portal.open-bio.org/mailman/listinfo/das2
> >>>>
> >>>
> >>>_______________________________________________
> >>>DAS2 mailing list
> >>>DAS2 at portal.open-bio.org
> >>>http://portal.open-bio.org/mailman/listinfo/das2
> >>
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From allenday at ucla.edu  Thu Feb  2 00:07:54 2006
From: allenday at ucla.edu (Allen Day)
Date: Wed, 1 Feb 2006 16:07:54 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C994@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C994@msex02.affymetrix.com>
Message-ID: <Pine.LNX.4.58.0602011600150.1651@sumo.ctrl.ucla.edu>

Okay, I will tag the current server and leave it at:

http://das.biopackages.net/das

I saw in the most recent commits by Andrew that the root-level "/das" is
no longer needed, so I propose putting an updated server at:

http://das.biopackages.net/codesprint

If we're going to keep the current server in a "maintained but deprecated"  
mode like this, I'll be making changes to the "new" server before Friday.

When the new version of IGB comes out we can then upgrade the current
server.

Sound good?

-Allen


On Wed, 1 Feb 2006, Helt,Gregg wrote:

> Yes, what Ed said, that's what I meant.  Updated server, but at a
> different address.  Otherwise the current release of IGB will break when
> trying to use the biopackages server.
> 
> Once our IGB code has caught up to the updated server, we can roll out a
> new release to point to the new server instead of the old one.  But not
> yet.
> 
> 	Thanks,
> 	Gregg
> 
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
> [mailto:das2-bounces at portal.open-
> > bio.org] On Behalf Of Ed Erwin
> > Sent: Wednesday, February 01, 2006 3:45 PM
> > To: Allen Day
> > Cc: DAS/2
> > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > 
> > 
> > Gregg asked me to say "No".  Please do not break the current server
> that
> > IGB is using.
> > 
> > Please make your changes on a server at a different URL.
> > 
> > Thanks
> > Ed
> > 
> > Allen Day wrote:
> > > That's what I was thinking too, but I was worried about the existing
> > > Genoviz clients "in the wild" having the server suddenly break.
> > >
> > > So you're saying it's okay with you if those clients have a service
> > > interruption?
> > >
> > > -Allen
> > >
> > >
> > > On Wed, 1 Feb 2006, Helt,Gregg wrote:
> > >
> > >
> > >>That would be great if you could update the biopackages server
> before
> > >>the code sprint starts!  Then client implementers will have a server
> to
> > >>test with.
> > >>
> > >>	thanks,
> > >>	gregg
> > >>
> > >>
> > >>>-----Original Message-----
> > >>>From: das2-bounces at portal.open-bio.org
> > >>
> > >>[mailto:das2-bounces at portal.open-
> > >>
> > >>>bio.org] On Behalf Of Allen Day
> > >>>Sent: Wednesday, February 01, 2006 2:42 PM
> > >>>To: Andrew Dalke
> > >>>Cc: DAS/2
> > >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > >>>
> > >>>I just looked over your changes, and will begin making the changes
> to
> > >>
> > >>the
> > >>
> > >>>server repository today.
> > >>>
> > >>>I'd like to update the server at das.biopackages.net with my
> changes
> > >>
> > >>on
> > >>
> > >>>Friday, unless there are objections.
> > >>>
> > >>>I'll be taking notes along the way and will post to the list if
> > >>
> > >>anything
> > >>
> > >>>in your document is unclear to me.
> > >>>
> > >>>At first glance, I agree -- the changes are minor.
> > >>>
> > >>>-Allen
> > >>>
> > >>>
> > >>>On Mon, 30 Jan 2006, Andrew Dalke wrote:
> > >>>
> > >>>
> > >>>>Allen:
> > >>>>
> > >>>>>Is the spec going to be in a stable state for the code sprint?
> > >>
> > >>I'd
> > >>
> > >>>>>like
> > >>>>>to use this time to sync the server implementation with a stable
> > >>>>>version
> > >>>>>of the spec.  It looks like there have been many substantial
> > >>
> > >>changes.
> > >>
> > >>>>I have just (within the last few minutes) completed the first
> draft
> > >>>>of the update of the spec.
> > >>>>
> > >>>>It's not in HTML - that calls for too much work for this stage.
> > >>>>It's text, in CVS under das/das2/new_spec.txt
> > >>>>
> > >>>>There are many parts which need clarification.  These are marked
> > >>>>with a "XXX" along with my comments.
> > >>>>
> > >>>>The RNC files are in
> > >>>>
> > >>>>   das/das2/scratch/*.rnc
> > >>>>along with some test XML files.  These XML files are not meant
> > >>>>to be realistic.  They are meant more to check edge cases.
> > >>>>
> > >>>>I do no think there are major changes to the spec.  Most of the
> > >>>>changes have actually trimmed things down, like getting rid of
> > >>>>the "properties" subtree and merging the different "sources"
> > >>
> > >>requests
> > >>
> > >>>>into a single document.
> > >>>>
> > >>>>
> > >>>>Here are the major interfaces
> > >>>>
> > >>>>$PREFIX/sequence - a "sources" request
> > >>>>   This is the top-level entry point to a DAS 2 server.  It
> returns
> > >>
> > >>a
> > >>
> > >>>>   list of the available genomic sequence and their versions.
> > >>>>   [sequence-namespace]
> > >>>>
> > >>>>$PREFIX/sequence/$SOURCE - a "source" request
> > >>>>   Returns the available versions of the given genomic sequence.
> > >>>>
> > >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> > >>>>   Returns information about a given version of a genomic
> sequence.
> > >>>>   Clients may assume that the sequence and assembly are constant
> > >>
> > >>for a
> > >>
> > >>>>   given version of a source. Note that annotation data on a
> server
> > >>>>   with curational write-back support may change without changing
> > >>
> > >>the
> > >>
> > >>>>   version.
> > >>>>
> > >>>>
> > >>>>For a given version here are the sub-parts.  Note that I've gone
> > >>
> > >>ahead
> > >>
> > >>>>and split the query urls (segment, features and types each have
> > >>
> > >>query
> > >>
> > >>>>interfaces) from the base directory used as containers for the
> > >>
> > >>segments,
> > >>
> > >>>>features and types.
> > >>>>
> > >>>>  $VERSION/segments - the segments query URL; summarizes the
> > >>
> > >>top-level
> > >>
> > >>>>     segments in the data source
> > >>>>
> > >>>>  $VERSION/segment/$SEGMENT_ID - a segment query; used to get
> > >>
> > >>detailed
> > >>
> > >>>>     information about the identified segment
> > >>>>
> > >>>>  $VERSION/features - the feature filter query URL.  Features are
> > >>>>    locatable annotations or experimental results.  The feature
> > >>
> > >>filter
> > >>
> > >>>>    URL supports query parameters to select a subset of the
> features
> > >>>>    based on position, feature type and other properties.
> > >>>>
> > >>>>  $VERSION/feature/$FEATURE_ID - a feature query; used to get
> > >>
> > >>detailed
> > >>
> > >>>>     information about the identified feature
> > >>>>
> > >>>>  $VERSION/types - the types query URL which returns a list of all
> > >>>>    feature types.  Feature types include ontology and depiction
> > >>>>    details for all features of the given type.
> > >>>>
> > >>>>  $VERSION/type/$TYPE_ID - details about the specified feature
> type
> > >>>>
> > >>>>Oh, and there are internal conflicts which will be straightened
> > >>>>out in the next draft.  These shouldn't be big.
> > >>>>
> > >>>>					Andrew
> > >>>>					dalke at dalkescientific.com
> > >>>>
> > >>>>_______________________________________________
> > >>>>DAS2 mailing list
> > >>>>DAS2 at portal.open-bio.org
> > >>>>http://portal.open-bio.org/mailman/listinfo/das2
> > >>>>
> > >>>
> > >>>_______________________________________________
> > >>>DAS2 mailing list
> > >>>DAS2 at portal.open-bio.org
> > >>>http://portal.open-bio.org/mailman/listinfo/das2
> > >>
> > > _______________________________________________
> > > DAS2 mailing list
> > > DAS2 at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/das2
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> 


From Gregg_Helt at affymetrix.com  Thu Feb  2 01:03:47 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 1 Feb 2006 17:03:47 -0800
Subject: [DAS2] Alternative feature formats in current DAS/2 spec
Message-ID: <C71929195D04BF48BAECD499AF717B480198C995@msex02.affymetrix.com>

When discussing alternative feature formats, the spec reads:
The feature query URL supports the optional "format" parameter used to
request that the results be returns in an alternative format.  The
format names are listed in the versioned source document in the
<FORMAT> element of the "feature" <CATEGORY>.
 
I think the second sentence should instead read something like:
The possible format names for a particular feature type are listed in
the types document in the <FORMAT> elements for a given type. 
 
Also, the spec says:
Some of search results may not be expressible in the specified format.
The server should silently skip those feature records and return only
those records which can be converted.
 
I would argue that if any of the search results cannot be returned in
the specified format, then the server should really just return an
error.  Silently suppressing information is not good.  A generic
400-"Bad Request" would work, although a 415-"Unsupported Media Type"
might be more appropriate.
 
        gregg
 

From allenday at ucla.edu  Thu Feb  2 01:16:04 2006
From: allenday at ucla.edu (Allen Day)
Date: Wed, 1 Feb 2006 17:16:04 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C993@msex02.affymetrix.com>
Message-ID: <Pine.LNX.4.58.0602011714370.1651@sumo.ctrl.ucla.edu>

There are still many references to "region" in Andrew's .txt document.  
Is it safe to assume that anywhere "region" is mentioned, it should really
be "segment" now?  I believe the answer is yes.

I'm asking to see if I need to change the feature filter implementation.

-Allen


On Wed, 1 Feb 2006, Helt,Gregg wrote:

> 
> That would be great if you could update the biopackages server before
> the code sprint starts!  Then client implementers will have a server to
> test with.
> 
> 	thanks,
> 	gregg
> 
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
> [mailto:das2-bounces at portal.open-
> > bio.org] On Behalf Of Allen Day
> > Sent: Wednesday, February 01, 2006 2:42 PM
> > To: Andrew Dalke
> > Cc: DAS/2
> > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > 
> > I just looked over your changes, and will begin making the changes to
> the
> > server repository today.
> > 
> > I'd like to update the server at das.biopackages.net with my changes
> on
> > Friday, unless there are objections.
> > 
> > I'll be taking notes along the way and will post to the list if
> anything
> > in your document is unclear to me.
> > 
> > At first glance, I agree -- the changes are minor.
> > 
> > -Allen
> > 
> > 
> > On Mon, 30 Jan 2006, Andrew Dalke wrote:
> > 
> > > Allen:
> > > > Is the spec going to be in a stable state for the code sprint?
> I'd
> > > > like
> > > > to use this time to sync the server implementation with a stable
> > > > version
> > > > of the spec.  It looks like there have been many substantial
> changes.
> > >
> > > I have just (within the last few minutes) completed the first draft
> > > of the update of the spec.
> > >
> > > It's not in HTML - that calls for too much work for this stage.
> > > It's text, in CVS under das/das2/new_spec.txt
> > >
> > > There are many parts which need clarification.  These are marked
> > > with a "XXX" along with my comments.
> > >
> > > The RNC files are in
> > >
> > >    das/das2/scratch/*.rnc
> > > along with some test XML files.  These XML files are not meant
> > > to be realistic.  They are meant more to check edge cases.
> > >
> > > I do no think there are major changes to the spec.  Most of the
> > > changes have actually trimmed things down, like getting rid of
> > > the "properties" subtree and merging the different "sources"
> requests
> > > into a single document.
> > >
> > >
> > > Here are the major interfaces
> > >
> > > $PREFIX/sequence - a "sources" request
> > >    This is the top-level entry point to a DAS 2 server.  It returns
> a
> > >    list of the available genomic sequence and their versions.
> > >    [sequence-namespace]
> > >
> > > $PREFIX/sequence/$SOURCE - a "source" request
> > >    Returns the available versions of the given genomic sequence.
> > >
> > > $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> > >    Returns information about a given version of a genomic sequence.
> > >    Clients may assume that the sequence and assembly are constant
> for a
> > >    given version of a source. Note that annotation data on a server
> > >    with curational write-back support may change without changing
> the
> > >    version.
> > >
> > >
> > > For a given version here are the sub-parts.  Note that I've gone
> ahead
> > > and split the query urls (segment, features and types each have
> query
> > > interfaces) from the base directory used as containers for the
> segments,
> > > features and types.
> > >
> > >   $VERSION/segments - the segments query URL; summarizes the
> top-level
> > >      segments in the data source
> > >
> > >   $VERSION/segment/$SEGMENT_ID - a segment query; used to get
> detailed
> > >      information about the identified segment
> > >
> > >   $VERSION/features - the feature filter query URL.  Features are
> > >     locatable annotations or experimental results.  The feature
> filter
> > >     URL supports query parameters to select a subset of the features
> > >     based on position, feature type and other properties.
> > >
> > >   $VERSION/feature/$FEATURE_ID - a feature query; used to get
> detailed
> > >      information about the identified feature
> > >
> > >   $VERSION/types - the types query URL which returns a list of all
> > >     feature types.  Feature types include ontology and depiction
> > >     details for all features of the given type.
> > >
> > >   $VERSION/type/$TYPE_ID - details about the specified feature type
> > >
> > > Oh, and there are internal conflicts which will be straightened
> > > out in the next draft.  These shouldn't be big.
> > >
> > > 					Andrew
> > > 					dalke at dalkescientific.com
> > >
> > > _______________________________________________
> > > DAS2 mailing list
> > > DAS2 at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/das2
> > >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> 


From allenday at ucla.edu  Sat Feb  4 10:43:10 2006
From: allenday at ucla.edu (Allen Day)
Date: Sat, 4 Feb 2006 02:43:10 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <Pine.LNX.4.58.0602011600150.1651@sumo.ctrl.ucla.edu>
References: <C71929195D04BF48BAECD499AF717B480198C994@msex02.affymetrix.com>
	<Pine.LNX.4.58.0602011600150.1651@sumo.ctrl.ucla.edu>
Message-ID: <Pine.LNX.4.58.0602040232090.19184@sumo.ctrl.ucla.edu>

There is a database server down, which is why I haven't posted the new
code to /codesprint yet.  Hopefully it will be back online tomorrow.

However, on my dev box I was able to make the server code serve up almost
all of what is described in Andrew's new_spec.txt file.  The large
remaining problems are:

* Properties ( <PROP/> elements ).  I still don't fully understand how
these work, if the previous implementation continues to be valid, or if
the implementation has been invalidated by the new document.

* Alternate default Content-Type header for the same command, e.g.

  /sequence/.../segment       # Content-Type: application/x-das-blah+xml
  /sequence/.../segment/chrM  # Content-Type: text/x-fasta

This is an artifact of an earlier design decision assumed Content-Type had
a single default and would only be modified if a ?format= parameter was
passed.  This is difficult to fix properly, so right now the fasta is
served up under the XML Content-Type.

-Allen


On Wed, 1 Feb 2006, Allen Day wrote:

> Okay, I will tag the current server and leave it at:
> 
> http://das.biopackages.net/das
> 
> I saw in the most recent commits by Andrew that the root-level "/das" is
> no longer needed, so I propose putting an updated server at:
> 
> http://das.biopackages.net/codesprint
> 
> If we're going to keep the current server in a "maintained but deprecated"  
> mode like this, I'll be making changes to the "new" server before Friday.
> 
> When the new version of IGB comes out we can then upgrade the current
> server.
> 
> Sound good?
> 
> -Allen
> 
> 
> On Wed, 1 Feb 2006, Helt,Gregg wrote:
> 
> > Yes, what Ed said, that's what I meant.  Updated server, but at a
> > different address.  Otherwise the current release of IGB will break when
> > trying to use the biopackages server.
> > 
> > Once our IGB code has caught up to the updated server, we can roll out a
> > new release to point to the new server instead of the old one.  But not
> > yet.
> > 
> > 	Thanks,
> > 	Gregg
> > 
> > > -----Original Message-----
> > > From: das2-bounces at portal.open-bio.org
> > [mailto:das2-bounces at portal.open-
> > > bio.org] On Behalf Of Ed Erwin
> > > Sent: Wednesday, February 01, 2006 3:45 PM
> > > To: Allen Day
> > > Cc: DAS/2
> > > Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > > 
> > > 
> > > Gregg asked me to say "No".  Please do not break the current server
> > that
> > > IGB is using.
> > > 
> > > Please make your changes on a server at a different URL.
> > > 
> > > Thanks
> > > Ed
> > > 
> > > Allen Day wrote:
> > > > That's what I was thinking too, but I was worried about the existing
> > > > Genoviz clients "in the wild" having the server suddenly break.
> > > >
> > > > So you're saying it's okay with you if those clients have a service
> > > > interruption?
> > > >
> > > > -Allen
> > > >
> > > >
> > > > On Wed, 1 Feb 2006, Helt,Gregg wrote:
> > > >
> > > >
> > > >>That would be great if you could update the biopackages server
> > before
> > > >>the code sprint starts!  Then client implementers will have a server
> > to
> > > >>test with.
> > > >>
> > > >>	thanks,
> > > >>	gregg
> > > >>
> > > >>
> > > >>>-----Original Message-----
> > > >>>From: das2-bounces at portal.open-bio.org
> > > >>
> > > >>[mailto:das2-bounces at portal.open-
> > > >>
> > > >>>bio.org] On Behalf Of Allen Day
> > > >>>Sent: Wednesday, February 01, 2006 2:42 PM
> > > >>>To: Andrew Dalke
> > > >>>Cc: DAS/2
> > > >>>Subject: Re: [DAS2] Re: Apollo and DAS/2 priorities
> > > >>>
> > > >>>I just looked over your changes, and will begin making the changes
> > to
> > > >>
> > > >>the
> > > >>
> > > >>>server repository today.
> > > >>>
> > > >>>I'd like to update the server at das.biopackages.net with my
> > changes
> > > >>
> > > >>on
> > > >>
> > > >>>Friday, unless there are objections.
> > > >>>
> > > >>>I'll be taking notes along the way and will post to the list if
> > > >>
> > > >>anything
> > > >>
> > > >>>in your document is unclear to me.
> > > >>>
> > > >>>At first glance, I agree -- the changes are minor.
> > > >>>
> > > >>>-Allen
> > > >>>
> > > >>>
> > > >>>On Mon, 30 Jan 2006, Andrew Dalke wrote:
> > > >>>
> > > >>>
> > > >>>>Allen:
> > > >>>>
> > > >>>>>Is the spec going to be in a stable state for the code sprint?
> > > >>
> > > >>I'd
> > > >>
> > > >>>>>like
> > > >>>>>to use this time to sync the server implementation with a stable
> > > >>>>>version
> > > >>>>>of the spec.  It looks like there have been many substantial
> > > >>
> > > >>changes.
> > > >>
> > > >>>>I have just (within the last few minutes) completed the first
> > draft
> > > >>>>of the update of the spec.
> > > >>>>
> > > >>>>It's not in HTML - that calls for too much work for this stage.
> > > >>>>It's text, in CVS under das/das2/new_spec.txt
> > > >>>>
> > > >>>>There are many parts which need clarification.  These are marked
> > > >>>>with a "XXX" along with my comments.
> > > >>>>
> > > >>>>The RNC files are in
> > > >>>>
> > > >>>>   das/das2/scratch/*.rnc
> > > >>>>along with some test XML files.  These XML files are not meant
> > > >>>>to be realistic.  They are meant more to check edge cases.
> > > >>>>
> > > >>>>I do no think there are major changes to the spec.  Most of the
> > > >>>>changes have actually trimmed things down, like getting rid of
> > > >>>>the "properties" subtree and merging the different "sources"
> > > >>
> > > >>requests
> > > >>
> > > >>>>into a single document.
> > > >>>>
> > > >>>>
> > > >>>>Here are the major interfaces
> > > >>>>
> > > >>>>$PREFIX/sequence - a "sources" request
> > > >>>>   This is the top-level entry point to a DAS 2 server.  It
> > returns
> > > >>
> > > >>a
> > > >>
> > > >>>>   list of the available genomic sequence and their versions.
> > > >>>>   [sequence-namespace]
> > > >>>>
> > > >>>>$PREFIX/sequence/$SOURCE - a "source" request
> > > >>>>   Returns the available versions of the given genomic sequence.
> > > >>>>
> > > >>>>$PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
> > > >>>>   Returns information about a given version of a genomic
> > sequence.
> > > >>>>   Clients may assume that the sequence and assembly are constant
> > > >>
> > > >>for a
> > > >>
> > > >>>>   given version of a source. Note that annotation data on a
> > server
> > > >>>>   with curational write-back support may change without changing
> > > >>
> > > >>the
> > > >>
> > > >>>>   version.
> > > >>>>
> > > >>>>
> > > >>>>For a given version here are the sub-parts.  Note that I've gone
> > > >>
> > > >>ahead
> > > >>
> > > >>>>and split the query urls (segment, features and types each have
> > > >>
> > > >>query
> > > >>
> > > >>>>interfaces) from the base directory used as containers for the
> > > >>
> > > >>segments,
> > > >>
> > > >>>>features and types.
> > > >>>>
> > > >>>>  $VERSION/segments - the segments query URL; summarizes the
> > > >>
> > > >>top-level
> > > >>
> > > >>>>     segments in the data source
> > > >>>>
> > > >>>>  $VERSION/segment/$SEGMENT_ID - a segment query; used to get
> > > >>
> > > >>detailed
> > > >>
> > > >>>>     information about the identified segment
> > > >>>>
> > > >>>>  $VERSION/features - the feature filter query URL.  Features are
> > > >>>>    locatable annotations or experimental results.  The feature
> > > >>
> > > >>filter
> > > >>
> > > >>>>    URL supports query parameters to select a subset of the
> > features
> > > >>>>    based on position, feature type and other properties.
> > > >>>>
> > > >>>>  $VERSION/feature/$FEATURE_ID - a feature query; used to get
> > > >>
> > > >>detailed
> > > >>
> > > >>>>     information about the identified feature
> > > >>>>
> > > >>>>  $VERSION/types - the types query URL which returns a list of all
> > > >>>>    feature types.  Feature types include ontology and depiction
> > > >>>>    details for all features of the given type.
> > > >>>>
> > > >>>>  $VERSION/type/$TYPE_ID - details about the specified feature
> > type
> > > >>>>
> > > >>>>Oh, and there are internal conflicts which will be straightened
> > > >>>>out in the next draft.  These shouldn't be big.
> > > >>>>
> > > >>>>					Andrew
> > > >>>>					dalke at dalkescientific.com
> > > >>>>
> > > >>>>_______________________________________________
> > > >>>>DAS2 mailing list
> > > >>>>DAS2 at portal.open-bio.org
> > > >>>>http://portal.open-bio.org/mailman/listinfo/das2
> > > >>>>
> > > >>>
> > > >>>_______________________________________________
> > > >>>DAS2 mailing list
> > > >>>DAS2 at portal.open-bio.org
> > > >>>http://portal.open-bio.org/mailman/listinfo/das2
> > > >>
> > > > _______________________________________________
> > > > DAS2 mailing list
> > > > DAS2 at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/das2
> > > _______________________________________________
> > > DAS2 mailing list
> > > DAS2 at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/das2
> > 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From allenday at ucla.edu  Mon Feb  6 07:13:59 2006
From: allenday at ucla.edu (Allen Day)
Date: Sun, 5 Feb 2006 23:13:59 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>

Okay folks, an implementation of the document cited below is available 
here:

http://das.biopackages.net/codesprint
http://das.biopackages.net/codesprint/sequence
http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment
etc.

After looking closely over this first draft of new_spec.txt, it's apparent 
that there are still some holes, e.g. what should the response to the 
following requests look like?

http://das.biopackages.net/codesprint/sequence/yeast
http://das.biopackages.net/codesprint/sequence/yeast/S228C

For now I have left responses the same as in the old HTML version of the
spec.  Of course if you find bugs, let me know.

The server at:

http://das.biopackages.net/das

is currently unavailable.  This is due to limitations in Apache/mod_perl
that won't allow different versions of the same class to coexist in a
family of processes.  I'd like to discuss how we should handle this in the
conference call tomrorow (today, if you're not in GMT+8).

-Allen


On Mon, 30 Jan 2006, Andrew Dalke wrote:

> Allen:
> > Is the spec going to be in a stable state for the code sprint?  I'd 
> > like
> > to use this time to sync the server implementation with a stable 
> > version
> > of the spec.  It looks like there have been many substantial changes.
> 
> I have just (within the last few minutes) completed the first draft
> of the update of the spec.
> 
> It's not in HTML - that calls for too much work for this stage.
> It's text, in CVS under das/das2/new_spec.txt
> 
> There are many parts which need clarification.  These are marked
> with a "XXX" along with my comments.
> 
> The RNC files are in
> 
>    das/das2/scratch/*.rnc
> along with some test XML files.  These XML files are not meant
> to be realistic.  They are meant more to check edge cases.
> 
> I do no think there are major changes to the spec.  Most of the
> changes have actually trimmed things down, like getting rid of
> the "properties" subtree and merging the different "sources" requests
> into a single document.
> 
> 
> Here are the major interfaces
> 
> $PREFIX/sequence - a "sources" request
>    This is the top-level entry point to a DAS 2 server.  It returns a
>    list of the available genomic sequence and their versions.
>    [sequence-namespace]
> 
> $PREFIX/sequence/$SOURCE - a "source" request
>    Returns the available versions of the given genomic sequence.
> 
> $PREFIX/sequence/$SOURCE/$VERSION - a "versioned source" request
>    Returns information about a given version of a genomic sequence.
>    Clients may assume that the sequence and assembly are constant for a
>    given version of a source. Note that annotation data on a server
>    with curational write-back support may change without changing the
>    version.
> 
> 
> For a given version here are the sub-parts.  Note that I've gone ahead
> and split the query urls (segment, features and types each have query
> interfaces) from the base directory used as containers for the segments,
> features and types.
> 
>   $VERSION/segments - the segments query URL; summarizes the top-level
>      segments in the data source
> 
>   $VERSION/segment/$SEGMENT_ID - a segment query; used to get detailed
>      information about the identified segment
> 
>   $VERSION/features - the feature filter query URL.  Features are
>     locatable annotations or experimental results.  The feature filter
>     URL supports query parameters to select a subset of the features
>     based on position, feature type and other properties.
> 
>   $VERSION/feature/$FEATURE_ID - a feature query; used to get detailed
>      information about the identified feature
> 
>   $VERSION/types - the types query URL which returns a list of all
>     feature types.  Feature types include ontology and depiction
>     details for all features of the given type.
> 
>   $VERSION/type/$TYPE_ID - details about the specified feature type
> 
> Oh, and there are internal conflicts which will be straightened
> out in the next draft.  These shouldn't be big.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From dalke at dalkescientific.com  Mon Feb  6 11:33:34 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 11:33:34 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
Message-ID: <c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>

Allen:
> After looking closely over this first draft of new_spec.txt, it's 
> apparent
> that there are still some holes, e.g. what should the response to the
> following requests look like?
>
> http://das.biopackages.net/codesprint/sequence/yeast

<?xml version="1.0" encoding="UTF-8"?>
<SOURCES
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">
taxon="Yeast">
       <VERSION id="yeast/S228C" title="Sce" created="" modified="">

       <COORDINATES taxid="" source="" authority="">
         <VERSION name=""/>
       </COORDINATES>

       <ASSEMBLY>
         <LINK href="" priority=""/>
       </ASSEMBLY>

       <PROP key="" value=""/>

       <CATEGORY type="features" query_id="yeast/S228C/feature">
         <!-- list non-das2xml templates here -->
       </CATEGORY>
       <CATEGORY type="segments" query_id="yeast/S228C/segment"/>
       <CATEGORY type="types"    query_id="yeast/S228C/type"/>
       <CATEGORY type="locks"    query_id="yeast/S228C/lock"/>

     </VERSION>

   </SOURCE>
</SOURCES>


> http://das.biopackages.net/codesprint/sequence/yeast/S228C

The same for this case.  There is only on VERSION for "yeast".


Your XML, btw, starts

<?xml version="1.0" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="/xsl/das.xsl"?>
<!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">
<!-- this doesn't work and screws up the xsl     
xmlns="http://www.biodas.org/ns/das/genome/2.00" -->
<SOURCES
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">

The "standalone" means that the DTD may affect the content of the
documentation.
   http://www.stylusstudio.com/w3c/xml11/sec-rmd.htm

> Markup declarations can affect the content of the document, as passed 
> from an XML Processor to an application; examples are attribute 
> defaults and entity declarations. The standalone document declaration, 
> which MAY appear as a component of the XML declaration, signals 
> whether or not there are such declarations which appear external to 
> the Document Entity or in parameter entities. An external markup 
> declaration is defined as a markup declaration occurring in the 
> external subset or in a parameter entity (external or internal, the 
> latter being included because non-validating processors are not 
> required to read them).

For what we're doing, we don't need nor (I think) want that.  There's
no reason for a client to consult the DTD to figure out the XML.

Instead, use

<?xml version="1.0"?>

and probably have the encoding

<?xml version="1.0" encoding="UTF-8"?>

That also means you can get rid of the

<!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">

statements.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Feb  6 12:02:40 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 12:02:40 +0000
Subject: [DAS2] timezone change
Message-ID: <6c3ddd6d7dc01dc99f2e1e932e64e733@dalkescientific.com>

To make it easier for Thomas' Java library, the timezone
in the datestamps may also be of the form "0500".

Here are the valid forms and new examples

       TZD  = time zone designator (optional; one of the formats
                      "Z", +hh:mm, +hhmm, -hh:mm, or -hhmm)


    1959-21-52T09:35+0300

    2042-03-18T01:19:00-11:15


					Andrew
					dalke at dalkescientific.com


From dhoworth at mrc-lmb.cam.ac.uk  Mon Feb  6 12:12:52 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Mon, 06 Feb 2006 12:12:52 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>	<17368.18146.195226.166165@kinked.lbl.gov>	<17369.24994.880706.685148@kinked.lbl.gov>	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
Message-ID: <43E73D44.5020107@mrc-lmb.cam.ac.uk>

Andrew Dalke wrote:
> That also means you can get rid of the
> 
> <!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">

Doing that automatically invalidates the document does it not?

http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog

"Definition: An XML document is valid if it has an associated document 
type declaration and if the document complies with the constraints 
expressed in it.

The document type declaration MUST appear before the first element in 
the document."

Cheers, Dave


From dalke at dalkescientific.com  Mon Feb  6 13:42:03 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 13:42:03 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <43E73D44.5020107@mrc-lmb.cam.ac.uk>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>	<17368.18146.195226.166165@kinked.lbl.gov>	<17369.24994.880706.685148@kinked.lbl.gov>	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
	<43E73D44.5020107@mrc-lmb.cam.ac.uk>
Message-ID: <57bc93f8acc736e752d048d970b1f332@dalkescientific.com>

Dave Howorth:
> Doing that automatically invalidates the document does it not?
>
> http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog
>
> "Definition: An XML document is valid if it has an associated document 
> type declaration and if the document complies with the constraints 
> expressed in it.
>
> The document type declaration MUST appear before the first element in 
> the document."

I think this page summarizes it nicely:
http://www.xml.com/lpt/a/2002/09/04/xslt.html

     "Valid" is a technical term referring to the presence
     of and conformance to a DOCTYPE declaration.

XML documents w/o a DTD are "well-formed".  XML documents
with a DTD and which agree with the DTD are "valid".

In this case not being "valid" does not mean that the
document is "invalid XML".

As I understand things, it's perfectly fine to pass well-formed
but not valid XML documents around.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Feb  6 13:53:10 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 13:53:10 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
Message-ID: <3a3400e925dccf8583a5b47104e43766@dalkescientific.com>

Trying out Allen's XML

> <?xml version="1.0" standalone="no"?>
> <?xml-stylesheet type="text/xsl" href="/xsl/das.xsl"?>
> <!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">
> <!-- this doesn't work and screws up the xsl     
> xmlns="http://www.biodas.org/ns/das/genome/2.00" -->
> <SOURCES
>       xmlns:xlink="http://www.w3.org/1999/xlink"
>       xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">
>

The xmlns is needed, else "SOURCES" is in the unnamed namespace,
rather than the DAS2 namespace.

It looks like your XSLT might not declare the namespace?  I
can't find the document to check, at either of

   http://das.biopackages.net/xsl/das.xsl
   http://radius.genomics.ctrl.ucla.edu/xsl/das.xsl

The page at
  http://www.xml.com/pub/a/2001/04/04/trxml/

describes a bit on how to include namespace in your xslt


> <!-- xq242.xsl: converts xq241.html into xq243.xml -->
>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                 xmlns:xlink="http://www.w3.org/1999/xlink"
>                 version="1.0">
> <xsl:output method="xml" omit-xml-declaration="yes"/>
>
> <xsl:template match="a">
>   <author xlink:type="simple" xlink:href="{@href}">
>     <xsl:apply-templates/></author>
> </xsl:template>
>
> <xsl:template match="p">
>   <para><xsl:apply-templates/></para>
> </xsl:template>
>
> </xsl:stylesheet>

Note the use of the "xlink:" namespace abbreviation.

					Andrew
					dalke at dalkescientific.com


From dhoworth at mrc-lmb.cam.ac.uk  Mon Feb  6 14:27:34 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Mon, 06 Feb 2006 14:27:34 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <57bc93f8acc736e752d048d970b1f332@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>	<17368.18146.195226.166165@kinked.lbl.gov>	<17369.24994.880706.685148@kinked.lbl.gov>	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>	<43E73D44.5020107@mrc-lmb.cam.ac.uk>
	<57bc93f8acc736e752d048d970b1f332@dalkescientific.com>
Message-ID: <43E75CD6.7000909@mrc-lmb.cam.ac.uk>

Andrew Dalke wrote:
> Dave Howorth:
>> Doing that automatically invalidates the document does it not?
>>
>> http://www.w3.org/TR/2004/REC-xml-20040204/#NT-prolog
>>
>> "Definition: An XML document is valid if it has an associated document 
>> type declaration and if the document complies with the constraints 
>> expressed in it.
>>
>> The document type declaration MUST appear before the first element in 
>> the document."
> 
> I think this page summarizes it nicely:
> http://www.xml.com/lpt/a/2002/09/04/xslt.html
> 
>     "Valid" is a technical term referring to the presence
>     of and conformance to a DOCTYPE declaration.

I think that's a paraphrase of the first para I quoted above?

> XML documents w/o a DTD are "well-formed".  XML documents
> with a DTD and which agree with the DTD are "valid".
> 
> In this case not being "valid" does not mean that the
> document is "invalid XML".

No, I believe you're wrong there; 'not valid' and 'invalid' have the 
same meaning both colloquially and as used in the spec. It's either 
valid or it isn't, and if it isn't then its invalid.

> As I understand things, it's perfectly fine to pass well-formed
> but not valid XML documents around.

I don't agree. There are occasions when it is acceptable but it's 
generally bad practice, IMHO. The discussion in sec 5 of the spec gives 
some motivation, particularly this section:

http://www.w3.org/TR/REC-xml/#safe-behavior

Or look here, or thousands of other places:
http://www.online-learning.com/demos/xml/valid_xml.html
http://en.wikipedia.org/wiki/Xml#Correctness_in_an_XML_document

In particular for interoperability of an open, distributed system with 
many writers and readers implemented by different groups (i.e. DAS), I 
suggest validity is essential.

I would have expected your experience of the PDB to make you keen on 
validation :)

Cheers, Dave


From dalke at dalkescientific.com  Mon Feb  6 15:09:58 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 15:09:58 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <43E75CD6.7000909@mrc-lmb.cam.ac.uk>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>	<17368.18146.195226.166165@kinked.lbl.gov>	<17369.24994.880706.685148@kinked.lbl.gov>	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>	<43E73D44.5020107@mrc-lmb.cam.ac.uk>
	<57bc93f8acc736e752d048d970b1f332@dalkescientific.com>
	<43E75CD6.7000909@mrc-lmb.cam.ac.uk>
Message-ID: <0aeda19421fdc7c75e2440ad0acd6391@dalkescientific.com>

Dave Howorth wrote:
> Andrew Dalke wrote:
>> I think this page summarizes it nicely:
>> http://www.xml.com/lpt/a/2002/09/04/xslt.html
>>     "Valid" is a technical term referring to the presence
>>     of and conformance to a DOCTYPE declaration.
>
> I think that's a paraphrase of the first para I quoted above?

It adds the phrase "technical term", making it (in my interpretation)
different from the word "valid" in its normal sense.

> No, I believe you're wrong there; 'not valid' and 'invalid' have the 
> same meaning both colloquially and as used in the spec. It's either 
> valid or it isn't, and if it isn't then its invalid.

I now agree that in the spec sense "invalid" and "not valid" are the
same.

I still think it has a technical difference from its normal use.
See for example the thread at
   http://www.stylusstudio.com/xmldev/200411/post50310.html

part of which says

> >But does it matter if a document is Not valid?
>
> Not necessarily.  It's up to you.  Requiring a document to be valid is
> a way of putting some constraints on it.  If you don't have any such
> constraints (unlikely, unless you are writing some very generic
> software like an editor), then there's no need for validity.  More
> likely, not all your constraints can be expressed by a DTD, and you
> will need to express them some other way.
>
> And of course you can require the document to be valid according to
> some other kind of schema, such as XML schemas or RelaxNG or
> Schematron.


>> As I understand things, it's perfectly fine to pass well-formed
>> but not valid XML documents around.
>
> I don't agree. There are occasions when it is acceptable but it's 
> generally bad practice, IMHO. The discussion in sec 5 of the spec 
> gives some motivation, particularly this section:
>
> http://www.w3.org/TR/REC-xml/#safe-behavior
>
> Or look here, or thousands of other places:
> http://www.online-learning.com/demos/xml/valid_xml.html
> http://en.wikipedia.org/wiki/Xml#Correctness_in_an_XML_document
>
> In particular for interoperability of an open, distributed system with 
> many writers and readers implemented by different groups (i.e. DAS), I 
> suggest validity is essential.

Quoting the wikipedia reference to DTDs:

> The oldest schema format for XML is the Document Type Definition 
> (DTD), inherited from SGML. While DTD support is ubiquitous due to its 
> inclusion in the XML 1.0 standard, it is seen as limited for the 
> following reasons:
>   *  It has no support for newer features of XML, most importantly 
> namespaces.

DAS2 uses namespaces.  Hence it cannot use DTDs.

We are defining Relax-NG schemas for the different formats,
which can be used for better validity checking than is supported
by DTDs.

"valid DAS2 document" ::= "meets the DAS2 spec"

"meets the DAS2 spec" is a stricter definition than
   "well-formed XML" + "meets the RNG spec"
which is stricter than
   "well-formed XML" + "meets the (hypthetical namespace-aware) DTD"


> I would have expected your experience of the PDB to make you keen
> on validation :)

Indeed, I'm working on the validator for DAS2, which uses the Relax-NG
schemas.  ;)

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Feb  6 16:03:07 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 16:03:07 +0000
Subject: [DAS2] <CATEGORY> elements
Message-ID: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com>

One discussion point from today is the <CATEGORY> elements.

The current draft of the spec says they look like this

       <CATEGORY type="segments" query_id="volvox/1/segments">
           <FORMAT name="fasta" mimetype="text/x-fasta" />
           <FORMAT name="raw" mimetype="text/plain" />
       </CATEGORY>
       <CATEGORY type="types" query_id="volvox/1/types">
           <FORMAT name="das2xml" mimetype="text/x-das-featuretype+xml" 
/>
       </CATEGORY>
       <CATEGORY type="features" query_id="volvox/1/features">
           <FORMAT name="das2xml" mimetype="text/x-das-type+xml" />
       </CATEGORY>
       <CATEGORY type="locks" query_id="volvox/1/locks" />


Andreas Prlic pointed out that since the document says
the "volvox" version "1" url is already known ("volvox/1")
and the type="segments" then the query_id can be built
from appending "segments" to the "volvox/1" (plus the "/")
to get "volvox/1/segments".

I originally responded from a ReST purity argument, in that
URLs should not be constructed from non-URL data.  This
lets Thomas, for example, use GUIDs for the objects rather
than the hierarchical structure I and others recommend.

During discussion a better answer came up, which I think
we talked about earlier but which is worth emphasizing
is that the "query_id"s don't need to be on the same server.

For example, the "regions" URL may and likely will point
to a common reference server, and a database may offer
only one set of "types" for all of the "features".

That is, something like this

   DAS server example.com
      genome A
         version x
           segments at "ensembl.org/das2/genome_A/build_1/segments"
           features at "example.com/A/version_x/features"
           types at "example.com/A/types"

         version y
           segments at "ensembl.org/das2/genome_A/build_1/segments"
           features at "example.com/A/version_y/features"
           types at "example.com/A/types"

         version z
           segments at "ensembl.org/das2/genome_A/build_2/segments"
           features at "example.com/A/version_z/features"
           types at "example.com/A/types"

   DAS server biodas.org
      genome A
         version 1
           segments at "ensembl.org/das2/genome_A/build_2/segments"
           features at "example.com/A/1/features"
           types at "example.com/A/types"  (note: on other server!)


					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Mon Feb  6 17:13:18 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 09:13:18 -0800
Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9A8@msex02.affymetrix.com>

Status report
DAS/2 XML - valid or not valid?
CATEGORY elements -- constructing query URLs
MAINTAINER information
Use of xml:base
update on feature properties - searching, etc.
 
 
From lstein at cshl.edu  Mon Feb  6 18:20:10 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 6 Feb 2006 13:20:10 -0500
Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9A8@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9A8@msex02.affymetrix.com>
Message-ID: <200602061320.11360.lstein@cshl.edu>

Hi Gregg,

I had a conflicting teleconference and wasn't sure whether there was a 
teleconference scheduled for the code sprint, so I didn't dial in. Just got 
the agenda now.

I am online on both MSN and AOL chats, and will be all week, if anyone wants 
to IM me.

Lincoln

On Monday 06 February 2006 12:13, Helt,Gregg wrote:
> Status report
> DAS/2 XML - valid or not valid?
> CATEGORY elements -- constructing query URLs
> MAINTAINER information
> Use of xml:base
> update on feature properties - searching, etc.
>
>
>
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dalke at dalkescientific.com  Mon Feb  6 18:42:24 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Feb 2006 18:42:24 +0000
Subject: [DAS2] version=
Message-ID: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com>

If we add a version= field to the Content-Type, or whatever
mechanism is proposed

Content-Type: application/x-das2features+xml; version=12345

What will a client do when it gets a version number it has
never heard of?  Should it use the newest version it supports?
The oldest?  Abort?


					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Mon Feb  6 19:50:14 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 06 Feb 2006 11:50:14 -0800
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
 6 Feb 2006
Message-ID: <C00CE876.1BB01%Steve_Chervitz@affymetrix.com>

Notes from the DAS/2 teleconference for the code sprint, 6 Feb 2006

$Id: das2-teleconf-2006-02-06.txt,v 1.2 2006/02/06 19:57:05 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  Sanger: Andreas Prlic, Thomas Down, Roy
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris
  UCLA: Allen Day
       
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Gregg's topics for discussion:

* Status report 
* DAS/2 XML - valid or not valid?
* CATEGORY elements -- constructing query URLs
* MAINTAINER information
* Use of xml:base
* update on feature properties - searching, etc.

Status Reports - what people are working on for the code sprint
------------------------------------------------------------

andrew
- getting folks up to speed on the spec changes, what he wrote.
- getting a feel for ensembl schema.
- change today: time zone specification b/c td's java time lib did
  something different than iso did.

aday: tag & branch?
gh: no branch, maybe tag
ad: tagging probably not necessary

gh: brings up a related issue:
 what is our mechanism for versioning - client & spec to understand
 which version of the spec they are/should be implementing
- can talk about it later during the xml validation issue discussion

ap: [missed it -- sorry!]

td: java om, feature xml done, can read and write.

roy: zmap das2 client, read/write das2, written in C. working with
ed griffith who's not available this week.
currently just a reader. from james gilbert, based on fmap from Acedb

gh: updating client and server (mostly client). top down syncing in
parallel, one command at a time. sources request is working on both
sides. will start w/ allen's server today, doing gh's sources query
against allen's server. segments and types today.

nh: apollo das2 client. reads das2 xml from andrew's example, write
out features in das2, now working on get, testing with server.

sc: affy das2 server stuff. streamlining updating it with feature data
from UCSC. also working on updating exon array data for use in IGB
client. working w/ gregg on other server-related work.
gh: graph data as well.

ee: working on igb client. talk w/ gregg later to get specifics.
gh: lots of ui stuff

Topic: xml validation
---------------------

ad: dtd's don't support namespaces, so we can't support dtds
gh: not that simple. where do we add namespaces?
ad: schemas have ns's
    testing....
gh: concern #1: is one of perception. don't like telling people we
don't have valid xml
ad: only means suports the dtd, not in human sense.
gh: it's one of perception
td: self-contained document + validation

gh: getting rid of doctype declaration is issue of versioning. how
will client know which version of spec it's supposed to be implementing?
need to deal with spec crawl. The only way i'm aware of is via looking
at dtd pointer changing.
gh: not worried about new categories, but changing things like
optional vs req'd attributes/elements.
ad: content-type contains version
td: or content negotiation
ap: xml schema validator at w3c.org can use that and claim it is
valid. can upload your files, push a button.
ad: I have an extension of properties with arbitrary binary data vs
text vs href. this is ok with relaxng, not by xsd.
ad: we could say what is valid das2 since we're the arbiters of what
is valid das xml document. e.g., well-formed, validates against the rng
schemas

gh: the rng we now have allows arbitrary xml?
ad: yes. can say there are arbitrary elements under some node. checked
in as file named common.rnc
gh: ok, getting rid of requirement for doctype declaration. any
versioning is done via content-type

gh: if we don't do content neg, a sources query goes out, whatever
version that the server supports comes back. this will be the latest
version of the spec the server supports.
ad: for backwards compatibility that won't be needed. extensibility
will be sufficient for a few years.
gh: don't believe it.
td: spec is churning fast now. there'll be less churn once there are impls.
gh: there were impls 3 or 4 mos ago (allen, gregg). so there have been
plenty of churn even with impls.so we'll need versioning, ok on
content-type.
aday: we definitely need versioning. need it now. also want a tagged
version we we can work at same time.
ad: content-type-xdas;version=1.1
in general not the right solution (not general purpose), but for this
case, makes sense. 
aday: can impl, header says 1.1
gh/ad: contents are a subset of the specification. so it's tied to a
version of the rng schema.
ad: the tag will be the cvs revision #

gh: this isn't temporary, where there will not be a time when we are
not generating churn.
ad: believes this is temporary, won't have to have it long-term
aday: no mechanism for it now.
ad: need a way to turn it into meaning. agreement on what string means
which verison of a program.

nh: second gregg. will always be an issue. ad says it's not good
long-term, maybe we should come up with it.
gh: we have some basis to go forward.

[A] das/2 server will specify spec version via content-type-xdas;version=X.X

Topic: category elements, how to construct a query url
------------------------------------------------------

ad: what is syntax of string used to specify ontology? SO:?
aday: attribute for it
gh: ontol term is a uri
aday: type element has ontology
gh: id of type is not nec an ontol term
ad: the attrib of feat type, ontol=something
gh: that's a uri, abs or rel point to a frag in so/fa ontol
ad: can't find how this should look. said SO:0000001. that should be
a uri?
gh: yes. in types xml that's returned, id and ontol are uri's. a
server will pick one for it's xml base. the other will have to be a
full uri.
ad: how do diff clients know a given term corresponds to what term in
the ontol?
gh: they will have to understand sofa/so.
ad: do they have persistent ids?
gh: my understanding is that they can use fragment notation for a
stable url for the term
aday: ontol docs aren't xml, no anchors for pointing to a
fragment. they're their own format. nervous about building dependency
on fragment record uris into our system
gh: good point. would be happier if it was recast as xml
aday: is now pointing to an xml document for ontology nodes
ad: happier if we could use "SO:xxx" i.e., a urn
gh: would like a re-cast as xml document, hosted at so/sofa
website. that xml would be like a std ontology representation so you
could extend it. so someone could point to an extension of it.


Category elements -- constructing query URLs
--------------------------------------------

gh: andreas' point (email): query id attribute, constructing these out
of relative uri, or based on base uri.
agree with andreas: we know what those will be.
for clarity of spec, we should specify: here's base uri, here's how you
construct the segments query, etc.
ad: trouble for segments- could be on ref server
gh: doubt that people will impl this way. will be specific to server
and will be related to everyone else's notion of chromosomes and
assemblies.
ad: where does the distributed nature of das come from? ref server
gh: das/1: ref server has residues to serve, regions (entry pts)
served up by everyone. this was the notion of ref vs non-ref
server to carry forward. non-ref server still serves up segments.
will have segments in it's reference space. reference would be genome
assembly version + organism. sufficient to globally identify it.
ap: had discussions about this. query id
td: issue comes from seqs being urls rather than opaque ids in a ns
defined by coord system. have a set of servers that share common coord
syst. then a seq identified by stringx on one server is same as on the
other server.
the remaining q: server that doesn't want to serve up seqs, what urls
does it use? can it use an opaque seq name that is known by that name of
ref server? 

gh: restating concerns here: using query string to construct uri's
1. confusion: arbitrary uri means more confusing spec, and how to
   implement it (can't just say /segment, but 'whatever is pointed at
   by such and such uri')
2. size of documents. right now, can use same xml:base for features
   document, can make feat ids and location id relative to it, nice
   and short. if seg is on other server, need to expand one of the ids

compresses well, but that will take longer than transmission.
this is only for features xml.

can use coords or assembly info to determine identity between urls.
want a defined ns.
ad: you want a way to say: these are relative urls to a base url for
that data type. so that this type url is relative to some base url for
types, similar for segments, features.
gh: we have this now, can be relative or absolute
ad: there is a default xml base like thing: one for type, segment,
features. so you could have relative ids to those bases.
gh: possibly, but not ideal. It's better to use a std xml base for all
of them. 
each server has it's own unique uris for segments.

I'm proposing that we decouple segments from residues and having
segments doesn't mean we can serve residues. reasoning:
- this leads to smaller xml docs
- simplifies the spec if we didn't have to construct query ids from
  category element

would rather specify the string that's appended in the spec.

sc: might could deal with this issue by adding structure to the
document in order to add different xml:bases for different data
types. e.g., use different parent elements that could define their own
xml:bases, one for types, segments, and feautures. might complicate
the spec tho. 

ad: single genome have same types across all dbs.
gh: across servers, dangerous.
ad/td: globally unique ids, could have everything in the same directory.
td: can we just use seq/name, type/name. i.e., codifying what the
convention now is.
ad: name is put at end of base url
a feature document may give types, segments, other features.
td: just use simple strings, not urls.
gh: std uri syntax isn't important, but a std query mechanism to get
all of these is. some uri you put a '/types' on or a '/segments'.
ad: you have this right now.
gh: but it's only defined for a server, not the whole spec. there's no
where in the spec that says this. confusing for people
reading/implementing the spec.
ap: If you make it free text, you don't know what to put for a given server?
ad: you get a document
ap: I already know the server, not necessarily a document.

ad: taking out the mention of any hierarchy, just refer to things as
feat query url.

[note taker is having trouble following the thread of this discussion.]

gh: let's sleep on it, discuss tomorrow, vote then.


From nomi at fruitfly.org  Mon Feb  6 20:49:51 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Mon, 6 Feb 2006 12:49:51 -0800 (PST)
Subject: New DAS/2 server for codesprint [was Re: [DAS2] Re: Apollo and
	DAS/2 priorities]
In-Reply-To: <Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
Message-ID: <17383.46703.563017.422300@kinked.lbl.gov>

thanks for setting up the new das/2 server, allen.  i'm having trouble
with some of the queries.

On 5 February 2006, Allen Day wrote:
 > Okay folks, an implementation of the document cited below is available 
 > here:
 > 
 > http://das.biopackages.net/codesprint
I get "Internal Server Error"

 > http://das.biopackages.net/codesprint/sequence
 > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment
these both work.

 > http://das.biopackages.net/codesprint/sequence/yeast
 > http://das.biopackages.net/codesprint/sequence/yeast/S228C
for these i get
Error loading stylesheet: A network error occured loading an XSLT stylesheet:
http://das.biopackages.net/xsl/das.xsl

i'm running firefox on mozilla, so i'm not surprised when it has problems
with stylesheets, but i used to be able to get data from the old das/2
server, even though it did have some complaint about not finding the
stylesheet.

http://das.biopackages.net/codesprint/sequence/human/17/feature
churned forever (or, at least, for several minutes--maybe it will
eventually return).

           Nomi


From nomi at fruitfly.org  Mon Feb  6 22:34:30 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Mon, 6 Feb 2006 14:34:30 -0800 (PST)
Subject: [DAS2] Re: New DAS/2 server for codesprint
In-Reply-To: <17383.46703.563017.422300@kinked.lbl.gov>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<17383.46703.563017.422300@kinked.lbl.gov>
Message-ID: <17383.52982.274142.351003@kinked.lbl.gov>

On 6 February 2006, Nomi Harris wrote:
 > thanks for setting up the new das/2 server, allen.  i'm having trouble
 > with some of the queries.

ok, i realized that some of the queries i was trying were senseless, but
here are some that should work that are just hanging:
http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments
http://das.biopackages.net/codesprint/sequence/yeast/S228C/types

        Nomi


From allenday at ucla.edu  Mon Feb  6 21:53:34 2006
From: allenday at ucla.edu (Allen Day)
Date: Mon, 6 Feb 2006 13:53:34 -0800 (PST)
Subject: New DAS/2 server for codesprint [was Re: [DAS2] Re: Apollo and
	DAS/2 priorities]
In-Reply-To: <17383.46703.563017.422300@kinked.lbl.gov>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<17383.46703.563017.422300@kinked.lbl.gov>
Message-ID: <Pine.LNX.4.58.0602061345360.29889@sumo.ctrl.ucla.edu>

On Mon, 6 Feb 2006, Nomi Harris wrote:

> thanks for setting up the new das/2 server, allen.  i'm having trouble
> with some of the queries.
> 
> On 5 February 2006, Allen Day wrote:
>  > Okay folks, an implementation of the document cited below is available 
>  > here:
>  > 
>  > http://das.biopackages.net/codesprint
> I get "Internal Server Error"

That's to be expected.  The spec does not specify what the response to
this request should be, or if it is even valid -- so I didn't implement
it.

>  > http://das.biopackages.net/codesprint/sequence
>  > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segment
> these both work.
> 
>  > http://das.biopackages.net/codesprint/sequence/yeast
>  > http://das.biopackages.net/codesprint/sequence/yeast/S228C
> for these i get
> Error loading stylesheet: A network error occured loading an XSLT stylesheet:
> http://das.biopackages.net/xsl/das.xsl

This happens if you're browsing the URLs in a web browser that supports
xsl directives.  Previous versions of the server supported web browsers,
but at the cost of using a 'text/xml' Content-Type header.  Consensus in
the group was that web browsers are not a target platform, so this feature
no longer works -- so you won't be able to view the DAS2XML in your
browser anymore.  I just haven't removed the XSL references yet.

> i'm running firefox on mozilla, so i'm not surprised when it has problems
> with stylesheets, but i used to be able to get data from the old das/2
> server, even though it did have some complaint about not finding the
> stylesheet.
> 
> http://das.biopackages.net/codesprint/sequence/human/17/feature

The server is coded to throw an error if you ask for all features, so I'm
surprised it didn't just give you a 4xx or 5xx response.  I'll look into
it.

> churned forever (or, at least, for several minutes--maybe it will
> eventually return).
> 
>            Nomi
> 


From allenday at ucla.edu  Mon Feb  6 22:00:50 2006
From: allenday at ucla.edu (Allen Day)
Date: Mon, 6 Feb 2006 14:00:50 -0800 (PST)
Subject: [DAS2] Re: New DAS/2 server for codesprint
In-Reply-To: <17383.52982.274142.351003@kinked.lbl.gov>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<17383.46703.563017.422300@kinked.lbl.gov>
	<17383.52982.274142.351003@kinked.lbl.gov>
Message-ID: <Pine.LNX.4.58.0602061358060.29889@sumo.ctrl.ucla.edu>

Hi Nomi,

I just restarted the server, the "all features" request used all the
memory and hung the webserver.  I'll look into why that request wasn't
immediately denied as it used to be.

As for your .../segments and .../types, they should be .../segment and
.../type.  I see no reason to pluralize these URLs given that the sources
response allows me to provide them at any arbitrary URL:

  [...]
  <CATEGORY type="segments" query_id="yeast/S228C/segment"/>
  <CATEGORY type="types"    query_id="yeast/S228C/type"/>
  <CATEGORY type="locks"    query_id="yeast/S228C/lock"/>
  [...]

-Allen


On Mon, 6 Feb 2006, Nomi Harris wrote:

> On 6 February 2006, Nomi Harris wrote:
>  > thanks for setting up the new das/2 server, allen.  i'm having trouble
>  > with some of the queries.
> 
> ok, i realized that some of the queries i was trying were senseless, but
> here are some that should work that are just hanging:
> http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments
> http://das.biopackages.net/codesprint/sequence/yeast/S228C/types
> 
>         Nomi
> 


From Steve_Chervitz at affymetrix.com  Mon Feb  6 22:27:01 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 06 Feb 2006 14:27:01 -0800
Subject: [DAS2] version=
In-Reply-To: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com>
Message-ID: <C00D0D35.1BB30%Steve_Chervitz@affymetrix.com>


Andrew Dalke wrote on Mon, 6 Feb 2006 18:42:24
> 
> If we add a version= field to the Content-Type, or whatever
> mechanism is proposed
> 
> Content-Type: application/x-das2features+xml; version=12345
> 
> What will a client do when it gets a version number it has
> never heard of?  Should it use the newest version it supports?
> The oldest?  Abort?

Rather than have version data be something that the client has to discover
in the response, an then have to react to in some intelligent way, how about
adding an optional dasversion field to all requests, such as:

http://www.wormbase.org/das/genome/volvox/1/type?dasversion=1.1

The server would then either:

1) return the appropriate response document if the server supports the
requested version or a later version that is backward compatible with it,
or
2) return a 505 error 'DAS Version Not Supported', which we already have in
the spec.

This puts the onus on the server rather than the client, but I think it
would be less trouble on the server than the alternative scheme would be for
the client. The client can now be fairly dumb about versioning and assume
everything is kosher unless it gets an error.

We could put some of the onus for DAS version support on the revisers of the
spec: When a new version of the spec is released, we'll know right then what
parts will be backward compatible and what parts will not be. The reviser
could document whether the new version of the spec is backwards compatible
with which previous versions, with the appropriate level of granularity
(e.g., "all requests are backward compatible except for the types request").
This would serve as a guide for maintainers of das2 servers.

Thoughts?

Steve


From nomi at fruitfly.org  Mon Feb  6 23:41:23 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Mon, 6 Feb 2006 15:41:23 -0800 (PST)
Subject: [DAS2] version=
In-Reply-To: <C00D0D35.1BB30%Steve_Chervitz@affymetrix.com>
References: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com>
	<C00D0D35.1BB30%Steve_Chervitz@affymetrix.com>
Message-ID: <17383.56995.914058.889189@kinked.lbl.gov>

i think it would be nice to have it work both ways--the version is
reported by the server, but the client can also request a particular
version as you suggest.

whatever we decide on, can we please make the version IDs numerical so
that they can be compared easily (e.g. "if (dasversion > 1.3) ...")?

     Nomi


On 6 February 2006, Steve Chervitz wrote:
 > Andrew Dalke wrote on Mon, 6 Feb 2006 18:42:24
 > > 
 > > If we add a version= field to the Content-Type, or whatever
 > > mechanism is proposed
 > > 
 > > Content-Type: application/x-das2features+xml; version=12345
 > > 
 > > What will a client do when it gets a version number it has
 > > never heard of?  Should it use the newest version it supports?
 > > The oldest?  Abort?
 > 
 > Rather than have version data be something that the client has to discover
 > in the response, an then have to react to in some intelligent way, how about
 > adding an optional dasversion field to all requests, such as:
 > 
 > http://www.wormbase.org/das/genome/volvox/1/type?dasversion=1.1
 > 
 > The server would then either:
 > 
 > 1) return the appropriate response document if the server supports the
 > requested version or a later version that is backward compatible with it,
 > or
 > 2) return a 505 error 'DAS Version Not Supported', which we already have in
 > the spec.
 > 
 > This puts the onus on the server rather than the client, but I think it
 > would be less trouble on the server than the alternative scheme would be for
 > the client. The client can now be fairly dumb about versioning and assume
 > everything is kosher unless it gets an error.
 > 
 > We could put some of the onus for DAS version support on the revisers of the
 > spec: When a new version of the spec is released, we'll know right then what
 > parts will be backward compatible and what parts will not be. The reviser
 > could document whether the new version of the spec is backwards compatible
 > with which previous versions, with the appropriate level of granularity
 > (e.g., "all requests are backward compatible except for the types request").
 > This would serve as a guide for maintainers of das2 servers.
 > 
 > Thoughts?
 > 
 > Steve


From ed_erwin at affymetrix.com  Mon Feb  6 22:48:49 2006
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Mon, 06 Feb 2006 14:48:49 -0800
Subject: [DAS2] <CATEGORY> elements
In-Reply-To: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com>
References: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com>
Message-ID: <43E7D251.8050703@affymetrix.com>

Andrew Dalke wrote:
> One discussion point from today is the <CATEGORY> elements.
> 
> The current draft of the spec says they look like this
> 
>       <CATEGORY type="segments" query_id="volvox/1/segments">
>           <FORMAT name="fasta" mimetype="text/x-fasta" />
>           <FORMAT name="raw" mimetype="text/plain" />
>       </CATEGORY>
>       <CATEGORY type="types" query_id="volvox/1/types">
>           <FORMAT name="das2xml" mimetype="text/x-das-featuretype+xml" />
>       </CATEGORY>
>       <CATEGORY type="features" query_id="volvox/1/features">
>           <FORMAT name="das2xml" mimetype="text/x-das-type+xml" />
>       </CATEGORY>
>       <CATEGORY type="locks" query_id="volvox/1/locks" />
> 
> 
> Andreas Prlic pointed out that since the document says
> the "volvox" version "1" url is already known ("volvox/1")
> and the type="segments" then the query_id can be built
> from appending "segments" to the "volvox/1" (plus the "/")
> to get "volvox/1/segments".
> 
> I originally responded from a ReST purity argument, in that
> URLs should not be constructed from non-URL data.  This
> lets Thomas, for example, use GUIDs for the objects rather
> than the hierarchical structure I and others recommend.
> 
> During discussion a better answer came up, which I think
> we talked about earlier but which is worth emphasizing
> is that the "query_id"s don't need to be on the same server.
> 
> For example, the "regions" URL may and likely will point
> to a common reference server, and a database may offer
> only one set of "types" for all of the "features".
> 
> That is, something like this
> 
>   DAS server example.com
>      genome A
>         version x
>           segments at "ensembl.org/das2/genome_A/build_1/segments"
>           features at "example.com/A/version_x/features"
>           types at "example.com/A/types"


None of your examples vary the words "segments", "types" or "features", 
but it is legal to do so, right?:

            segments at "ensembl.org/das2/genome_A/build_1/segment"
            features at "example.com/A/version_x/things/and/more/things"
            types at "example.com/A/rhinoceros"

OK, so no one is likely to go that far, but is it legal for example to 
use non-plural "segment", "feature" and "type" ?


From ed_erwin at affymetrix.com  Mon Feb  6 22:51:11 2006
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Mon, 06 Feb 2006 14:51:11 -0800
Subject: [DAS2] version=
In-Reply-To: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com>
References: <2c35bd2e8140fb8773547f2409d1db8d@dalkescientific.com>
Message-ID: <43E7D2DF.7060507@affymetrix.com>

Andrew Dalke wrote:
> If we add a version= field to the Content-Type, or whatever
> mechanism is proposed
> 
> Content-Type: application/x-das2features+xml; version=12345
> 
> What will a client do when it gets a version number it has
> never heard of?  Should it use the newest version it supports?
> The oldest?  Abort?
> 

It is up to the client to decide what to do, and this does not need to 
be specified here.


From Gregg_Helt at affymetrix.com  Mon Feb  6 23:16:35 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 15:16:35 -0800
Subject: [DAS2] RE: New DAS/2 server for codesprint
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9A9@msex02.affymetrix.com>

Ack, you're right!  I didn't expect to get bitten by rogue query_ids so
soon...

	gregg

> -----Original Message-----
> From: Nomi Harris [mailto:nomi at fruitfly.org]
> Sent: Monday, February 06, 2006 3:48 PM
> To: Allen Day
> Cc: Helt,Gregg
> Subject: Re: New DAS/2 server for codesprint
> 
> On 6 February 2006, Allen Day wrote:
>  > Hi Nomi,
>  >
>  > I just restarted the server, the "all features" request used all
the
>  > memory and hung the webserver.  I'll look into why that request
wasn't
>  > immediately denied as it used to be.
>  >
>  > As for your .../segments and .../types, they should be .../segment
and
>  > .../type.  I see no reason to pluralize these URLs given that the
> sources
>  > response allows me to provide them at any arbitrary URL:
> 
> oops, gregg led me astray with that one.  right, /segment and /type
> work.  sorry for hanging your server with my inadvertent "all
features"
> request.
>         Nomi


From Gregg_Helt at affymetrix.com  Tue Feb  7 00:14:55 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 16:14:55 -0800
Subject: [DAS2] Re: New DAS/2 server for codesprint
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9AA@msex02.affymetrix.com>


Allen, can you recommend a reasonable region on yeast to do a features
query that will return features with some hierarchy (like
transcript/exons)?

	Thanks,
	Gregg


From Gregg_Helt at affymetrix.com  Tue Feb  7 00:29:12 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 16:29:12 -0800
Subject: [DAS2] Re: New DAS/2 server for codesprint
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9AB@msex02.affymetrix.com>

Actually, that "arbitrary URL" thing doesn't quite work with the current
biopackages server, which has an xml:base pointing to a server at UCLA
for the response to the sequence query:
http://das.biopackages.net/codesprint/sequence

<SOURCES
      xmlns:xlink="http://www.w3.org/1999/xlink"
      xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">
  <SOURCE id="human" title="human genome" writeable="no" doc_href=""
taxon="Human">
      <VERSION id="human/17" title="Hsa" created="" modified="">
...
        <CATEGORY type="segments" query_id="human/17/segment"/>
      </VERSION>
...
  </SOURCE>
...
</SOURCES>

Which means (I think) that the segments query resolves to
http://radius.genomics.ctrl.ucla.edu/das/sequence/human/17/segment

which for me returns a 404 Not Found response.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Allen Day
> Sent: Monday, February 06, 2006 2:01 PM
> To: Nomi Harris
> Cc: DAS/2
> Subject: [DAS2] Re: New DAS/2 server for codesprint
...
> As for your .../segments and .../types, they should be .../segment and
> .../type.  I see no reason to pluralize these URLs given that the
sources
> response allows me to provide them at any arbitrary URL:
> 
>   [...]
>   <CATEGORY type="segments" query_id="yeast/S228C/segment"/>
>   <CATEGORY type="types"    query_id="yeast/S228C/type"/>
>   <CATEGORY type="locks"    query_id="yeast/S228C/lock"/>
>   [...]
> 
> -Allen
> 
> 
> 
> On Mon, 6 Feb 2006, Nomi Harris wrote:
> 
> > On 6 February 2006, Nomi Harris wrote:
> >  > thanks for setting up the new das/2 server, allen.  i'm having
> trouble
> >  > with some of the queries.
> >
> > ok, i realized that some of the queries i was trying were senseless,
but
> > here are some that should work that are just hanging:
> > http://das.biopackages.net/codesprint/sequence/yeast/S228C/segments
> > http://das.biopackages.net/codesprint/sequence/yeast/S228C/types
> >
> >         Nomi
> >
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Steve_Chervitz at affymetrix.com  Tue Feb  7 01:02:30 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Mon, 06 Feb 2006 17:02:30 -0800
Subject: [DAS2] Re: New DAS/2 server for codesprint
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9AA@msex02.affymetrix.com>
Message-ID: <C00D31A6.1BB4C%Steve_Chervitz@affymetrix.com>


There's a gene (RPL7A) with two introns on chr7 at roughly 366kbp - 364kbp:
http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YGL076C

Most genes with introns in cerevisiae (which aren't many) have just a single
intron that creates a small 5' exon, such as the alpha and beta tubulin
genes on chr13. Tub1 is on chr13 at 99Kbp, and tub3 is also on chr13 at
23Kbp. So the first 100Kb of chr13 would be another region to try.
http://db.yeastgenome.org/cgi-bin/locus.pl?locus=tub1

Steve


> From: "Helt,Gregg" <Gregg_Helt at affymetrix.com>
> Date: Mon, 6 Feb 2006 16:14:55 -0800
> To: Allen Day <allenday at ucla.edu>
> Cc: DAS/2 <das2 at portal.open-bio.org>
> Conversation: [DAS2] Re: New DAS/2 server for codesprint
> Subject: RE: [DAS2] Re: New DAS/2 server for codesprint
> 
> 
> Allen, can you recommend a reasonable region on yeast to do a features
> query that will return features with some hierarchy (like
> transcript/exons)?
> 
> Thanks,
> Gregg
> 
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Tue Feb  7 02:42:18 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 18:42:18 -0800
Subject: [DAS2] Modifying com.affymetrix.igb.das2 classes
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9AE@msex02.affymetrix.com>

Brian and Marc, 
 
I'm about to start seriously modifying the IGB DAS/2 classes in the
com.affymetrix.igb.das2 package.  There's code in there you wrote to
work with materials, assays, results, and ontology.  I think we
discussed at some point splitting this stuff out into a separate
package(s).  Which sounds good, especially since (as I understand it),
these domains are separate from the DAS/2 "sequence" domain.  The only
place there's a lot of mixture of code for these domains with the
sequence parts is in Das2VersionedSource.  Is it okay if I move this out
(or comment it out) of Das2VersionedSource while I renovate other parts
of the class?
 
            thanks,
            Gregg
 

From Gregg_Helt at affymetrix.com  Tue Feb  7 03:34:48 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 19:34:48 -0800
Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B0@msex02.affymetrix.com>


You're right, it looks like some of this code was already getting moved
over to the das2.assay and das2.ontology packages as subclasses of
Das2VersionedSource.  

However it's not clear to me if the equivalent of source and versioned
source for assay, ontology, and other domains are going to be similar
enough to the DAS/2 sequence domain to justify sharing a base
class/interface.  What do/will they share?

I'll go ahead with changes to the das2 package, and look into moving
much of this code into a das2.sequence package.

	Thanks,
	Gregg

> -----Original Message-----
> From: Brian O'Connor [mailto:boconnor at ucla.edu]
> Sent: Monday, February 06, 2006 7:09 PM
> To: Helt,Gregg
> Cc: Marc Carlson; Allen Day; DAS/2
> Subject: Re: Modifying com.affymetrix.igb.das2 classes
> 
> Hi Gregg,
> 
> Go for it!! Marc and I can take a look at it again when you're happy
> with the changes. The versioned source object really needed an
overhaul
> anyway to deal with the multiple domains of the DAS/2 server. I think
> there should be a VersionedSource parent and then children for each
> domain (i.e. VersionedSourceAssay). I think Marc started to do this
but
> he was afraid to alter the VersionedSource object too much for fear of
> breaking the IGB client.
> 
> --Brian
> 
> Helt,Gregg wrote:
> 
> > Brian and Marc,
> >
> > I'm about to start seriously modifying the IGB DAS/2 classes in the
> > com.affymetrix.igb.das2 package. There's code in there you wrote to
> > work with materials, assays, results, and ontology. I think we
> > discussed at some point splitting this stuff out into a separate
> > package(s). Which sounds good, especially since (as I understand
it),
> > these domains are separate from the DAS/2 "sequence" domain. The
only
> > place there's a lot of mixture of code for these domains with the
> > sequence parts is in Das2VersionedSource. Is it okay if I move this
> > out (or comment it out) of Das2VersionedSource while I renovate
other
> > parts of the class?
> >
> > thanks,
> >
> > Gregg
> >


From boconnor at ucla.edu  Tue Feb  7 03:09:22 2006
From: boconnor at ucla.edu (Brian O'Connor)
Date: Mon, 06 Feb 2006 19:09:22 -0800
Subject: [DAS2] Re: Modifying com.affymetrix.igb.das2 classes
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9AE@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9AE@msex02.affymetrix.com>
Message-ID: <43E80F62.4050403@ucla.edu>

Hi Gregg,

Go for it!! Marc and I can take a look at it again when you're happy 
with the changes. The versioned source object really needed an overhaul 
anyway to deal with the multiple domains of the DAS/2 server. I think 
there should be a VersionedSource parent and then children for each 
domain (i.e. VersionedSourceAssay). I think Marc started to do this but 
he was afraid to alter the VersionedSource object too much for fear of 
breaking the IGB client.

--Brian

Helt,Gregg wrote:

> Brian and Marc,
>
> I?m about to start seriously modifying the IGB DAS/2 classes in the 
> com.affymetrix.igb.das2 package. There?s code in there you wrote to 
> work with materials, assays, results, and ontology. I think we 
> discussed at some point splitting this stuff out into a separate 
> package(s). Which sounds good, especially since (as I understand it), 
> these domains are separate from the DAS/2 ?sequence? domain. The only 
> place there?s a lot of mixture of code for these domains with the 
> sequence parts is in Das2VersionedSource. Is it okay if I move this 
> out (or comment it out) of Das2VersionedSource while I renovate other 
> parts of the class?
>
> thanks,
>
> Gregg
>


From Gregg_Helt at affymetrix.com  Tue Feb  7 05:43:07 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 21:43:07 -0800
Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B1@msex02.affymetrix.com>


Okay, I just split the code that was in Das2VersionedSource.  Now
regions and types (w/o ontology) are handled in Das2VersionedSource, and
ontology, materials, results, and assays are handled by a subclass,
Das2VersionedSourcePlus.  I might do some further refactoring at a later
date, but for right now this works (and compiles/runs).

I also went ahead and committed almost all my DAS/2 code changes to the
genoviz repository.

	Gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Helt,Gregg
> Sent: Monday, February 06, 2006 7:35 PM
> To: Brian O'Connor
> Cc: DAS/2; Marc Carlson
> Subject: [DAS2] RE: Modifying com.affymetrix.igb.das2 classes
> 
> 
> You're right, it looks like some of this code was already getting
moved
> over to the das2.assay and das2.ontology packages as subclasses of
> Das2VersionedSource.
> 
> However it's not clear to me if the equivalent of source and versioned
> source for assay, ontology, and other domains are going to be similar
> enough to the DAS/2 sequence domain to justify sharing a base
> class/interface.  What do/will they share?
> 
> I'll go ahead with changes to the das2 package, and look into moving
> much of this code into a das2.sequence package.
> 
> 	Thanks,
> 	Gregg
> 
> > -----Original Message-----
> > From: Brian O'Connor [mailto:boconnor at ucla.edu]
> > Sent: Monday, February 06, 2006 7:09 PM
> > To: Helt,Gregg
> > Cc: Marc Carlson; Allen Day; DAS/2
> > Subject: Re: Modifying com.affymetrix.igb.das2 classes
> >
> > Hi Gregg,
> >
> > Go for it!! Marc and I can take a look at it again when you're happy
> > with the changes. The versioned source object really needed an
> overhaul
> > anyway to deal with the multiple domains of the DAS/2 server. I
think
> > there should be a VersionedSource parent and then children for each
> > domain (i.e. VersionedSourceAssay). I think Marc started to do this
> but
> > he was afraid to alter the VersionedSource object too much for fear
of
> > breaking the IGB client.
> >
> > --Brian
> >
> > Helt,Gregg wrote:
> >
> > > Brian and Marc,
> > >
> > > I'm about to start seriously modifying the IGB DAS/2 classes in
the
> > > com.affymetrix.igb.das2 package. There's code in there you wrote
to
> > > work with materials, assays, results, and ontology. I think we
> > > discussed at some point splitting this stuff out into a separate
> > > package(s). Which sounds good, especially since (as I understand
> it),
> > > these domains are separate from the DAS/2 "sequence" domain. The
> only
> > > place there's a lot of mixture of code for these domains with the
> > > sequence parts is in Das2VersionedSource. Is it okay if I move
this
> > > out (or comment it out) of Das2VersionedSource while I renovate
> other
> > > parts of the class?
> > >
> > > thanks,
> > >
> > > Gregg
> > >
> 
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Tue Feb  7 05:46:37 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 6 Feb 2006 21:46:37 -0800
Subject: [DAS2] Agenda for DAS/2 Code Sprint Teleconference 2005-02-06
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B2@msex02.affymetrix.com>

Will you be able to join the teleconference tomorrow (Tuesday?).  Suzi
is planning to join in, I'm hoping we can spend some time discussing
ontologies.

	Thanks
	Gregg

P.S.  
   9 AM Pacific time
   800-531-3250
   id: 2879055	

> -----Original Message-----
> From: Lincoln Stein [mailto:lstein at cshl.edu]
> Sent: Monday, February 06, 2006 10:20 AM
> To: das2 at portal.open-bio.org
> Cc: Helt,Gregg
> Subject: Re: [DAS2] Agenda for DAS/2 Code Sprint Teleconference
2005-02-06
> 
> Hi Gregg,
> 
> I had a conflicting teleconference and wasn't sure whether there was a
> teleconference scheduled for the code sprint, so I didn't dial in.
Just
> got
> the agenda now.
> 
> I am online on both MSN and AOL chats, and will be all week, if anyone
> wants
> to IM me.
> 
> Lincoln
> 
> On Monday 06 February 2006 12:13, Helt,Gregg wrote:
> > Status report
> > DAS/2 XML - valid or not valid?
> > CATEGORY elements -- constructing query URLs
> > MAINTAINER information
> > Use of xml:base
> > update on feature properties - searching, etc.
> >
> >
> >
> >
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> 
> --
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu


From dalke at dalkescientific.com  Tue Feb  7 09:22:56 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 09:22:56 +0000
Subject: [DAS2] <CATEGORY> elements
In-Reply-To: <43E7D251.8050703@affymetrix.com>
References: <4157a339b02c085fd6df0f4bf1ff1cc0@dalkescientific.com>
	<43E7D251.8050703@affymetrix.com>
Message-ID: <8daf0ba1e5744f8e0b99fc644fb5dd38@dalkescientific.com>

Ed Erwin wrote:
> None of your examples vary the words "segments", "types" or 
> "features", but it is legal to do so, right?:
>
>            segments at "ensembl.org/das2/genome_A/build_1/segment"
>            features at "example.com/A/version_x/things/and/more/things"
>            types at "example.com/A/rhinoceros"
>
> OK, so no one is likely to go that far, but is it legal for example to 
> use non-plural "segment", "feature" and "type" ?

Yes.  My goal is two-fold.  First, make no assertions on the internal
organization of the DAS server.  Machines can change, directories
can move around.

The specific advantages are:
   - annotation servers can all point to the same "segments" server
   - multiple versions of the same genomic source and on the same
       machine can reuse the same "types" server

Another thought, perhaps too old-fashioned for modern web development,
is that the query URLs are cgi scripts in a "cgi-bin" directory
while the data files are flat-files in some other directory.

Simiarly, the query url if a CGI script might end with a ".cgi"
or ".pl" extension.

My second goal is to develop a recommended layout.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Feb  7 09:32:11 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 09:32:11 +0000
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
	6 Feb 2006
In-Reply-To: <C00CE876.1BB01%Steve_Chervitz@affymetrix.com>
References: <C00CE876.1BB01%Steve_Chervitz@affymetrix.com>
Message-ID: <97f6d51a2e54031ed49fe7997af383eb@dalkescientific.com>

> gh: would like a re-cast as xml document, hosted at so/sofa
> website. that xml would be like a std ontology representation so you
> could extend it. so someone could point to an extension of it.

I asked as an action item if Gregg would look into the solution
for this.  Do we refer to the ontology by a "GO:0123456" identifier
or by some URL scheme?  If so, what's the mapping from URL scheme
to something that clients and people can understand, eg, to
ask for everything which is an exon?

Does this mapping need a version number - does it change over time?

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Feb  7 10:38:28 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 10:38:28 +0000
Subject: [DAS2] per-database MAINTAINER
Message-ID: <294a2caeb29a823dd93fa1155012c8cb@dalkescientific.com>

Based on Andreas Prlic's work with the DAS2 registry I've
added a new MAINTAINER element to the SOURCE/VERSION part
of the SOURCES document.

I've updated das/das2/scratch/sources4.xml to have an
example.  It looks something like this

<?xml version="1.0" encoding="UTF-8"?>
<SOURCES
     xmlns="http://www.biodas.org/ns/das/genome/2.00"
     xml:base="http://dev.wormbase.org/das/genome/">

   <MAINTAINER email="someone at EBI" />

   <SOURCE id="volvox" title="Mr. Volvox" taxid="3066" 
xml:base="/DAS2/GENOME/">

     <VERSION id="volvox/b1" title="Build 1, October 2002"
            created="2002-10-15" modified="2002-10-25T09:56:23">

       <MAINTAINER name="Fred, down the hall" />
    </VERSION>
   </SOURCE>
</SOURCES>


The idea is that the database maintainer can be different
than the server maintainer.

On the other hand addition, if the SOURCES/SOURCE/VERSION/MAINTAINER
is not present then clients may assume that the database
maintainer is the same as the SOURCES/MAINTAINER

The maintainer elements are both optional.

					Andrew
					dalke at dalkescientific.com


From allenday at ucla.edu  Tue Feb  7 10:52:12 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 7 Feb 2006 02:52:12 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>

The XML is now as you requested, please confirm.

After some thought today I realized the new SOURCES response is fully
compatible with the existing server.  The doc at:

http://das.biopackages.net/codesprint/sequence

is now simply a static XML doc that points into the stable server (plus
the new "segments" response) implementation at:

http://das.biopackages.net/das/genome

The headers for the static document don't include the correct Content-Type
"application/x-das-blah ; version = XxX", it's simply "text/xml".  I'll
add the headers in the morning GMT+8.

There are probably also some other Content-Type headers that need to be
changed for the other responses -- let me know if you spot them.

-Allen


On Mon, 6 Feb 2006, Andrew Dalke wrote:

> Allen:
> > After looking closely over this first draft of new_spec.txt, it's 
> > apparent
> > that there are still some holes, e.g. what should the response to the
> > following requests look like?
> >
> > http://das.biopackages.net/codesprint/sequence/yeast
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <SOURCES
>        xmlns:xlink="http://www.w3.org/1999/xlink"
>        xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">
> taxon="Yeast">
>        <VERSION id="yeast/S228C" title="Sce" created="" modified="">
> 
>        <COORDINATES taxid="" source="" authority="">
>          <VERSION name=""/>
>        </COORDINATES>
> 
>        <ASSEMBLY>
>          <LINK href="" priority=""/>
>        </ASSEMBLY>
> 
>        <PROP key="" value=""/>
> 
>        <CATEGORY type="features" query_id="yeast/S228C/feature">
>          <!-- list non-das2xml templates here -->
>        </CATEGORY>
>        <CATEGORY type="segments" query_id="yeast/S228C/segment"/>
>        <CATEGORY type="types"    query_id="yeast/S228C/type"/>
>        <CATEGORY type="locks"    query_id="yeast/S228C/lock"/>
> 
>      </VERSION>
> 
>    </SOURCE>
> </SOURCES>
> 
> 
> > http://das.biopackages.net/codesprint/sequence/yeast/S228C
> 
> The same for this case.  There is only on VERSION for "yeast".
> 
> 
> Your XML, btw, starts
> 
> <?xml version="1.0" standalone="no"?>
> <?xml-stylesheet type="text/xsl" href="/xsl/das.xsl"?>
> <!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">
> <!-- this doesn't work and screws up the xsl     
> xmlns="http://www.biodas.org/ns/das/genome/2.00" -->
> <SOURCES
>        xmlns:xlink="http://www.w3.org/1999/xlink"
>        xml:base="http://radius.genomics.ctrl.ucla.edu/das/sequence/">
> 
> The "standalone" means that the DTD may affect the content of the
> documentation.
>    http://www.stylusstudio.com/w3c/xml11/sec-rmd.htm
> 
> > Markup declarations can affect the content of the document, as passed 
> > from an XML Processor to an application; examples are attribute 
> > defaults and entity declarations. The standalone document declaration, 
> > which MAY appear as a component of the XML declaration, signals 
> > whether or not there are such declarations which appear external to 
> > the Document Entity or in parameter entities. An external markup 
> > declaration is defined as a markup declaration occurring in the 
> > external subset or in a parameter entity (external or internal, the 
> > latter being included because non-validating processors are not 
> > required to read them).
> 
> For what we're doing, we don't need nor (I think) want that.  There's
> no reason for a client to consult the DTD to figure out the XML.
> 
> Instead, use
> 
> <?xml version="1.0"?>
> 
> and probably have the encoding
> 
> <?xml version="1.0" encoding="UTF-8"?>
> 
> That also means you can get rid of the
> 
> <!DOCTYPE DAS2DSN SYSTEM "http://www.biodas.org/dtd/das2dsn.dtd">
> 
> statements.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From dalke at dalkescientific.com  Tue Feb  7 12:19:28 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 12:19:28 +0000
Subject: [DAS2] properties and queries
Message-ID: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com>

We've had a long discussion here about properties and how to
search them.  As it stands now the spec has a few holes in it.

Here are the properties we've talked about.

program_name: the program used to make the annotation, like
   "BLASTX 1.2.3"

notes:
   There can be 0 or more notes.  Notes might refer to other
   notes (eg, "the previous note said XYZ but I think ABC")

phase: (is it 0, 1, 2 or 1, 2, 3?)
   (And does anyone use this? People here don't use it; Thomas
    "reinfers it by counting along the transcript" "but maybe
    that's just me".  Others say they don't use the DAS1 phase.)

icon: a hypothetical image use for the feature, perhaps as
    a binary png;

curation history:
   a list of elements, each with
    - person
    - timestamp
    - reason for change

score: a floating point number, which may be in exponential
    notation like "1E-3"

Each one needs different search mechanisms.  For example,
   "annotations done by that buggy version of BLAST 1.2.3"
   "scores better than 1E-2"
   "changes by Andrew done in August 2004"
   "notes with the substring 'helicase'" (case sensitive or not?)
   "notes with the phrase 'E. Coli'" (substring might not work
       if there's the note has 'E.\nColi')

The property storage scheme doesn't handle this quite correctly.
Here are problems:

   - how do you store multiple notes?

Answer 1: use structured named, like "note_1", "note_2", "note_3", ..
HACK! Then what if a note is deleted?  Bigger problem; how do you
search the "note" field using the existing query language?

Answer 2: allow duplicate note elements, like
   <prop key="note" value="This is a note" />
   <prop key="note" value="The previous note is a lie!" />
   <prop key="note" value="Ignore the 2nd note - silly Cretan!" />

Question: so the order must be preserved if two fields have the
same name?  Can't implement with a dictionary/hash data type.

Question: what if there are duplicate "score" or "phase" elements?
Which one wins?

Answer 3: Notes are important and we know we need them now.
Let's have a <NOTE> element and not make it be a property.

<NOTE>This is a note</NOTE>
<NOTE>The previous note is a lie!</NOTE>
<NOTE>Is this an E or a NOT-E?</NOTE>

(perhaps also with timestamp and author name, but that's a different
question.)  Then we also define that the "note=" parameter in as
DAS query is a substring search of the <NOTE> elements of a feature.

I like this one.


   - How do you do numeric searches?

This is hypothetical.  There hasn't been a requirement for this.
'Course it may be because people haven't had the ability.  In
any case, how to search numeric fields like "score" with comparisons?


  - querying non-queryable fields

If there's embedded binary data, like an image, is it searchable?
Does a server complain and die? Ignore the request?

  - more complex text searches

"proteinase but not inhibitor"

  - complex data

We have support for non-DAS extensions, which might be

<sanger:curation-history xmlns:sanger="http://www.sanger.ac.uk/das/ext" 
 >
  <sanger:curation name="Andrew" date="2005-06-07">
    Change the this into that because of some reason or other
  </sanger:curation>


Thomas proposed that we support some sort of complex query
language, probably in XML, and get rid of the simple query scheme
we have now.

I argued against the complexity of that given that nearly all
of the queries will be "give me these feature types on this range
of that chromosome".  I also pointed out that developing a
generic query language is hard, and implementing it is harder.
Why require all that effort?

Roy commented the other way - in a server with only a few hundred
features, why require a query language at all?  Just return all
of the features in the request.

Here's what I proposed.

We have the "CATEGORY" (but after discussion I now want to take
it back to "CAPABILITY" since that's now much closer to what
it does - it describes where to go to do something)

So I'll use "CAPABILITY"

The current scheme has

<CAPABILITY type="features" query_url="http://...../features">
   <FORMAT ... />
</CAPABILITY>

This is an extensibility point.  Suppose Thomas has an XML
query search interface support on his server, with Sanger
clients that handle it.  Then there can be

<CAPABILITY type="thomas-xml-search" 
query_url="http.../search-features">
   <FORMAT ... />
</CAPABILITY>

A client can see the list of CAPABILITIES and decide to
use the feature search mechanism it likes best.

In addition, we could say that "this supports the normal DAS
query scheme but also supports extension vocabulary.  For example,

<CAPABILITY type="features" query_url="http://...../features">
   <SUPPORTS name="sanger-curation" />
   <FORMAT ... />
</CAPABILITY>

With this a client knows that the query_url supports the normal
DAS queries and also supports the "annotator", "annotation_before"
and "annotation_after" queries, like this

   .../features?annotator=Andrew;annotation_before=2005

Possible idea: if there is no SUPPORTs tag then the server
implements no search syntax and instead returns everything,
for the example Roy mentioned.

Okay, we're off to lunch.

					Andrew
					dalke at dalkescientific.com


From ap3 at sanger.ac.uk  Tue Feb  7 12:21:53 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 7 Feb 2006 12:21:53 +0000
Subject: [DAS2] das-regstry sources response
Message-ID: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk>

Hi!

I added a DAS2- sources response to a copy of the das registry running 
on my laptop.
the attached file shows how the das1 sources are described using the 
das2 spec.
- it fits together rather well.

I did not know what to put under the <ASSEMBLY>. The <COORDINATES> 
already contain all required info.
Therefore I propose to drop <ASSEMBLY>

Andreas


-------------- next part --------------
A non-text attachment was scrubbed...
Name: sources_response.xml
Type: application/octet-stream
Size: 32318 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20060207/922c594e/attachment-0001.obj>
-------------- next part --------------


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891

From dalke at dalkescientific.com  Tue Feb  7 13:20:35 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 13:20:35 +0000
Subject: [DAS2] das-regstry sources response
In-Reply-To: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk>
References: <09c1c46ca239256496b81b2c67638bd5@sanger.ac.uk>
Message-ID: <e906fc30e5d1c60aa2abcec7f6a4db56@dalkescientific.com>

Andreas:
> I did not know what to put under the <ASSEMBLY>. The <COORDINATES> 
> already contain all required info.
> Therefore I propose to drop <ASSEMBLY>

Removed and commited to CVS.

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Tue Feb  7 15:34:21 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Tue, 7 Feb 2006 07:34:21 -0800
Subject: [DAS2] Ontologies in DAS/2
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>

I talked to Suzi, she's planning to join our teleconference today to
discuss ontologies, wearing her hat as co-PI of the National Center for
Biomedical Ontology.  Hopefully Lincoln can join too.

I took a closer look at the DAS/2 ontology work Allen has done (see
http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone who
wants to contribute to the ontology discussion to read this doc.  It
specifies a way to retrieve ontologies in OBOXML format.  In this format
each ontology term gets an absolute URI through the same mechanism that
the rest of DAS/2 uses (URIs for ids, which can be either absolute or
relative but resolvable).  As Allen pointed out yesterday this would
solve our problem of how to uniquely specify ontology terms in the DAS/2
TYPES XML.

I couldn't find any documentation for the OBOXML format, other than the
code that generates it from OBO files.  But I'm using OBOXML as an
example here because it clearly has resolvable URIs for each ontology
term.  In Allen's spec, ontologies can also be returned in other
formats, but it's unclear to me whether terms in these other formats
would resolve to similar URIs.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Tuesday, February 07, 2006 1:32 AM
> To: DAS/2
> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> sprint,6 Feb 2006
> 
> > gh: would like a re-cast as xml document, hosted at so/sofa
> > website. that xml would be like a std ontology representation so you
> > could extend it. so someone could point to an extension of it.
> 
> I asked as an action item if Gregg would look into the solution
> for this.  Do we refer to the ontology by a "GO:0123456" identifier
> or by some URL scheme?  If so, what's the mapping from URL scheme
> to something that clients and people can understand, eg, to
> ask for everything which is an exon?
> 
> Does this mapping need a version number - does it change over time?
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org


From dalke at dalkescientific.com  Tue Feb  7 15:45:00 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 15:45:00 +0000
Subject: [DAS2] properties and queries
In-Reply-To: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com>
References: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com>
Message-ID: <16111cd36850795dfd46696a63fb1057@dalkescientific.com>

To summarize, the current thought here for properties and queries
is as follows  (it's a long summary.  More like an essay.  :)

Add support for zero or more <NOTE> elements in the feature, of
the form
   <NOTE>This is some arbitrary (but non-markup-ed) text</NOTE>


Add a features search keyword "note=" which takes a search string
to be found in the note elements.  (substring? soundex? regex?
the search engine calls up Lincoln and asks?)


Add support for zero or more <ALIAS> elements in the feature,
of the form
   <ALIAS name="Zorro">

(I missed this in the redraft.  It should have been there.
Feature filter "name" already says it searches the "name" and
"alias" fields for a feature.)


Ignore the "phase" property (contentious, perhaps?) or add it
as an attribute of something else in the feature element.


Ignore the "score" property.  As written in the current spec
   "score" A floating point number indicating a context-dependent
   score. This is to be used only when a more specific ontology-driven
   score cannot be used.  (Umm, where do the other scores go?)
Unless someone wants to define that score ontology and what it means
to search that field, this is a can of worms I don't want to open.


Ignore the "editable" property.  As written (and kibbitzed)
   "editable" indicates that features may be updateable (this is at the
   discretion of the server).  (But this is potentially per-user data.)

This should either be in the feature type or it should be in
some write-back specific data structure the client can fetch.
(To be discussed) It isn't a feature property.

This gets rid of all stated needs for arbitrary key/value data.


That doesn't mean there won't be future needs.

In that case, here's how to add new pieces of data.

1) use a non-DAS extension element.  Clients must ignore elements
they don't understand.

This is good for storing data, but not for searching.  The
thing is, the search mechanism (or multiple search mechanisms
perhaps) is data field specific.  Hence,

2) servers may provide extensions to the basic DAS query mechanism.
Currently the mechanism is:
   and-ed set of zero or more  keyword = (set, of, or, terms, for, 
keyword)
where "keyword" is well-defined by DAS except for the "att"
property keywords.

Query extensions add new keywords in the same syntax, and define
somewhere how that syntax works.  It must be backwards compatible
to the existing syntax and semantics.

The problem then is clients don't know that a server supports a
given query extension, so

3) add a <SUPPORTS> element to the <CAPABILITY> element.
(Also proposed, renaming "CATEGORY" back to "CAPABILITY".)
The CAPABILITY may have zero or more of

   <SUPPORTS name="some-unique-string" />

Here are the two defined unique strings,

   <SUPPORTS name="all" />
   <SUPPORTS name="das2" />

The "all" query says that a client may reasonably fetch all
the features in one go.  This would occur with a small DAS
server containing only a few hundred features.  In that case
there's no need to even have a CGI script running on the
back end - just a set of flat files.  The query is done by
fetching the URL with no parameters.

A rich server with millions of features might decide to
not support an "all" query.

The "das2" query is the one we've been talking about.

If a site develops a query extension it adds

   <SUPPORTS name="sanger-curation-search" />

so clients know what the server can do.  (In this case supporting
searches for "annotator", "annotation_before" and "annotation_after"
fields.)

That all said, this doesn't mean that the server shouldn't
have a property table.  It's a question of what it means
to search the property table.

People here want the following:
   multiple properties may have the same key and different value
   the order of the properties is not important
   the "att:" search is renamed a "prop:" search, like "prop:author"
   the search is a substring search.
   a feature matches a search if any of the properties with that name
      match the substring search

For example,
   source = BLAST 2.3.4
   author = Andrew Dalke
   author = Thomas Down

lets me search for

   features?prop:author=Andrew
all features with "Andrew" as a substring in the "author" property

   features?prop:author=Andrew;source=BLAST
all features with "Andrew" as a substring in the "author"
and with "BLAST" in the source name

   features?prop:author=Andrew,Thomas
all features with "Andrew" or "Thomas" as an author


Really what I think this essay is doing is saying that
storing data and searching data is different.  Servers can
develop new ways to extend DAS searches and flag that they
support new searches.  (Eg, the new search may be to support
a different way to search a field in the property table.)

But there needs to be a really basic substring search, given
that there will be simple string key/ string value data
for the property table.

Oh, and should the key/value table also include my proposed
"href" and embedded binary data fields like images?  Hmmmmm....

Lots of talk about this here.  Time for a tea break.

					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Tue Feb  7 16:00:52 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 7 Feb 2006 11:00:52 -0500
Subject: [DAS2] properties and queries
In-Reply-To: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com>
References: <84f283568ecaf1dbf6b91c71ade27ba8@dalkescientific.com>
Message-ID: <200602071100.52818.lstein@cshl.edu>

Hi,

I use the phase information quite a lot and I know that others do as well. The 
phase is {0,1,2} and the meaning is described here:

	For features of type "CDS", the phase indicates where the feature
	begins with reference to the reading frame.  The phase is one of the
	integers 0, 1, or 2, indicating the number of bases that should be
	removed from the beginning of this feature to reach the first base of
	the next codon. In other words, a phase of "0" indicates that the next
	codon begins at the first base of the region described by the current
	line, a phase of "1" indicates that the next codon begins at the
	second base of this region, and a phase of "2" indicates that the
	codon begins at the third base of this region. This is NOT to be
	confused with the frame, which is simply start modulo 3.

Lincoln

On Tuesday 07 February 2006 07:19, Andrew Dalke wrote:
> We've had a long discussion here about properties and how to
> search them.  As it stands now the spec has a few holes in it.
>
> Here are the properties we've talked about.
>
> program_name: the program used to make the annotation, like
>    "BLASTX 1.2.3"
>
> notes:
>    There can be 0 or more notes.  Notes might refer to other
>    notes (eg, "the previous note said XYZ but I think ABC")
>
> phase: (is it 0, 1, 2 or 1, 2, 3?)
>    (And does anyone use this? People here don't use it; Thomas
>     "reinfers it by counting along the transcript" "but maybe
>     that's just me".  Others say they don't use the DAS1 phase.)
>
> icon: a hypothetical image use for the feature, perhaps as
>     a binary png;
>
> curation history:
>    a list of elements, each with
>     - person
>     - timestamp
>     - reason for change
>
> score: a floating point number, which may be in exponential
>     notation like "1E-3"
>
> Each one needs different search mechanisms.  For example,
>    "annotations done by that buggy version of BLAST 1.2.3"
>    "scores better than 1E-2"
>    "changes by Andrew done in August 2004"
>    "notes with the substring 'helicase'" (case sensitive or not?)
>    "notes with the phrase 'E. Coli'" (substring might not work
>        if there's the note has 'E.\nColi')
>
> The property storage scheme doesn't handle this quite correctly.
> Here are problems:
>
>    - how do you store multiple notes?
>
> Answer 1: use structured named, like "note_1", "note_2", "note_3", ..
> HACK! Then what if a note is deleted?  Bigger problem; how do you
> search the "note" field using the existing query language?
>
> Answer 2: allow duplicate note elements, like
>    <prop key="note" value="This is a note" />
>    <prop key="note" value="The previous note is a lie!" />
>    <prop key="note" value="Ignore the 2nd note - silly Cretan!" />
>
> Question: so the order must be preserved if two fields have the
> same name?  Can't implement with a dictionary/hash data type.
>
> Question: what if there are duplicate "score" or "phase" elements?
> Which one wins?
>
> Answer 3: Notes are important and we know we need them now.
> Let's have a <NOTE> element and not make it be a property.
>
> <NOTE>This is a note</NOTE>
> <NOTE>The previous note is a lie!</NOTE>
> <NOTE>Is this an E or a NOT-E?</NOTE>
>
> (perhaps also with timestamp and author name, but that's a different
> question.)  Then we also define that the "note=" parameter in as
> DAS query is a substring search of the <NOTE> elements of a feature.
>
> I like this one.
>
>
>    - How do you do numeric searches?
>
> This is hypothetical.  There hasn't been a requirement for this.
> 'Course it may be because people haven't had the ability.  In
> any case, how to search numeric fields like "score" with comparisons?
>
>
>   - querying non-queryable fields
>
> If there's embedded binary data, like an image, is it searchable?
> Does a server complain and die? Ignore the request?
>
>   - more complex text searches
>
> "proteinase but not inhibitor"
>
>   - complex data
>
> We have support for non-DAS extensions, which might be
>
> <sanger:curation-history xmlns:sanger="http://www.sanger.ac.uk/das/ext"
>
>   <sanger:curation name="Andrew" date="2005-06-07">
>     Change the this into that because of some reason or other
>   </sanger:curation>
>
>
> Thomas proposed that we support some sort of complex query
> language, probably in XML, and get rid of the simple query scheme
> we have now.
>
> I argued against the complexity of that given that nearly all
> of the queries will be "give me these feature types on this range
> of that chromosome".  I also pointed out that developing a
> generic query language is hard, and implementing it is harder.
> Why require all that effort?
>
> Roy commented the other way - in a server with only a few hundred
> features, why require a query language at all?  Just return all
> of the features in the request.
>
> Here's what I proposed.
>
> We have the "CATEGORY" (but after discussion I now want to take
> it back to "CAPABILITY" since that's now much closer to what
> it does - it describes where to go to do something)
>
> So I'll use "CAPABILITY"
>
> The current scheme has
>
> <CAPABILITY type="features" query_url="http://...../features">
>    <FORMAT ... />
> </CAPABILITY>
>
> This is an extensibility point.  Suppose Thomas has an XML
> query search interface support on his server, with Sanger
> clients that handle it.  Then there can be
>
> <CAPABILITY type="thomas-xml-search"
> query_url="http.../search-features">
>    <FORMAT ... />
> </CAPABILITY>
>
> A client can see the list of CAPABILITIES and decide to
> use the feature search mechanism it likes best.
>
> In addition, we could say that "this supports the normal DAS
> query scheme but also supports extension vocabulary.  For example,
>
> <CAPABILITY type="features" query_url="http://...../features">
>    <SUPPORTS name="sanger-curation" />
>    <FORMAT ... />
> </CAPABILITY>
>
> With this a client knows that the query_url supports the normal
> DAS queries and also supports the "annotator", "annotation_before"
> and "annotation_after" queries, like this
>
>    .../features?annotator=Andrew;annotation_before=2005
>
> Possible idea: if there is no SUPPORTs tag then the server
> implements no search syntax and instead returns everything,
> for the example Roy mentioned.
>
> Okay, we're off to lunch.
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Tue Feb  7 16:46:47 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 7 Feb 2006 11:46:47 -0500
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
Message-ID: <200602071146.48212.lstein@cshl.edu>

Hi,

I have group meeting from 12-1 every Tuesday, so I can't make this one. I'll 
be present for the telecon Wednesday at 12.

Lincoln


On Tuesday 07 February 2006 10:34, Helt,Gregg wrote:
> I talked to Suzi, she's planning to join our teleconference today to
> discuss ontologies, wearing her hat as co-PI of the National Center for
> Biomedical Ontology.  Hopefully Lincoln can join too.
>
> I took a closer look at the DAS/2 ontology work Allen has done (see
> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone who
> wants to contribute to the ontology discussion to read this doc.  It
> specifies a way to retrieve ontologies in OBOXML format.  In this format
> each ontology term gets an absolute URI through the same mechanism that
> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
> relative but resolvable).  As Allen pointed out yesterday this would
> solve our problem of how to uniquely specify ontology terms in the DAS/2
> TYPES XML.
>
> I couldn't find any documentation for the OBOXML format, other than the
> code that generates it from OBO files.  But I'm using OBOXML as an
> example here because it clearly has resolvable URIs for each ontology
> term.  In Allen's spec, ontologies can also be returned in other
> formats, but it's unclear to me whether terms in these other formats
> would resolve to similar URIs.
>
> 	gregg
>
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
>
> [mailto:das2-bounces at portal.open-
>
> > bio.org] On Behalf Of Andrew Dalke
> > Sent: Tuesday, February 07, 2006 1:32 AM
> > To: DAS/2
> > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> > sprint,6 Feb 2006
> >
> > > gh: would like a re-cast as xml document, hosted at so/sofa
> > > website. that xml would be like a std ontology representation so you
> > > could extend it. so someone could point to an extension of it.
> >
> > I asked as an action item if Gregg would look into the solution
> > for this.  Do we refer to the ontology by a "GO:0123456" identifier
> > or by some URL scheme?  If so, what's the mapping from URL scheme
> > to something that clients and people can understand, eg, to
> > ask for everything which is an exon?
> >
> > Does this mapping need a version number - does it change over time?
> >
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dalke at dalkescientific.com  Tue Feb  7 16:50:56 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 16:50:56 +0000
Subject: [DAS2] query_api and server layout
Message-ID: <a66a09d2312ce8288b3e55fcd2c22d28@dalkescientific.com>

Continuing from yesterday's discussion...

There are several things in a DAS server

- there is the list of all sources and versions
- there is a list of all versions for a source
- there is the versioned source information

The versioned source only really provides a bit of
overall configuration information and links to three URLs:

   - the query interface for features
   - the query interface for types
   - the query interface for segments

It doesn't say anything about where the actual feature,
type and segment data is stored.  It doesn't even mean
that the query URLs are on the same machine as the versioned
source document.  Hence Andreas can have his registry server.

DAS defines what those queries do.  The segments query URL
interface can be a shared reference server.  It has a
rather simple interface:
   - get URLs and information for each segment
       - given a sequence URL return the sequence data
   - return the assembly data

The segment and sequence data does not need to be on the
same machine as the segments query URL.  It likely will
be but does not need to be.


DAS defines what the types interface does.  At present it
is also very simple.  Be default it lists everything, or
you can ask it for an "ontology" or (proposed new query)
"exact_ontology", and it returns all DAS types which match
that request.

The actual DAS type data does not need to be on the same
server has the DAS query URL, though again it probably will
be.  The types query URL does not need to be on the same machine
as the segments query URL.

Similarly, the features query URL implements the DAS query
interface and returns a list of features.  The actual features
do not need to be on the same machine or directory location
as the feature query, or the types, or the segments.

Here are some possible reasons for the different locations:

Common case:
   - segments query URL and segments data on a reference server
   - versioned source provides its own types and features

New genome / internal project:
   - database implements all three query URLs

Registry server:
   - each versioned source entry points to the original machine's
       values for the segments, types and features query URLs

Multiple versions database, shared types:
   - segments points to the reference server
   - all versioned sources "types" query url point to the same URL
   - each versioned source gets it own features query

old-style CGI-based web server:
   - the "segments" query url points to the reference server
   - the individual features, types and sources are ".xml" files
       in the file system
   - the query URLs end with ".cgi" and start a CGI script


If we say that the URL for doing a types query is composed as:
   <the versioned source URL> + "/" (if missing) + "types"

then at the very least we preclude CGI-based servers.  No big
deal perhaps?  It also makes things slightly more duplicitous
when several versions of the database share the same DAS "types"
(and "segments").

I also think using a server-provided URL is easier than constructing
the URL in code.  Get the "query_url", perhaps resolved by the
xml:base.  That's it.  No need to add in the "/types".

Gregg worries about the network performance of having
   <FEATURE type="../../type/AB123">
    <LOC id="http://some.other.server" range="300:400"/>
    <REGION id="feature/QW41414" />
   </FEATURE>

because each location has the full URL to another server and
the type in this case refers to a types collection shared
by all of the versions of the source.

I've thought about that for a while.  It's a reasonable and
serious architectural concern.  I think the right response
is that that's an architecture decision we should leave up to
the data provider.  If Gregg wants more compact XML and that
on-the-fly compression slows things down too much then his
DAS server can make the segments, types and features all be
not only on the same machine but in the same directory.

The following is valid (omitting some required parts)

<SOURCE>
   <VERSION id="/h_sapiens/v1/">
    <CAPABILITY type="features" query_id="/h_sapiens/v1/features" />
    <CAPABILITY type="types" query_id="/h_sapiens/v1/types" />
    <CAPABILITY type="segments" query_id="/h_sapiens/v1/segments" />
   </VERSION>
</SOURCE>

The features request can return

GET /h_sapiens/v1/features
<FEATURES xmlns:das="...">
  <FEATURE id="F12345" type="Tabcde">
    <LOC id="C1" range="32:34"/>
    <REGION id="F789" />
  </FEATURE>
</FEATURES>

In this architecture, features start with an 'F', like
   /h_sapiens/v1/F12345
types start with a 'T', like
   /h_sapiens/v1/Tabcde
and regions start with a 'C', like
   /h_sapiens/v1/S1

This is about as compact as I think you can make it, yet it
still fits into the current DAS spec.  (You don't even need
the special character - it only makes it easier to see that
the names/URLs will never collide.)

					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Tue Feb  7 16:51:55 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 7 Feb 2006 11:51:55 -0500
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
Message-ID: <200602071151.56939.lstein@cshl.edu>

Allen's ideas seem very sensible and easy to manage. We can already serve 
associations between genomic features and GO terms via properties, so the 
concerns expressed in the discussion section about the big GO API shouldn't 
apply.

Lincoln

On Tuesday 07 February 2006 10:34, Helt,Gregg wrote:
> I talked to Suzi, she's planning to join our teleconference today to
> discuss ontologies, wearing her hat as co-PI of the National Center for
> Biomedical Ontology.  Hopefully Lincoln can join too.
>
> I took a closer look at the DAS/2 ontology work Allen has done (see
> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone who
> wants to contribute to the ontology discussion to read this doc.  It
> specifies a way to retrieve ontologies in OBOXML format.  In this format
> each ontology term gets an absolute URI through the same mechanism that
> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
> relative but resolvable).  As Allen pointed out yesterday this would
> solve our problem of how to uniquely specify ontology terms in the DAS/2
> TYPES XML.
>
> I couldn't find any documentation for the OBOXML format, other than the
> code that generates it from OBO files.  But I'm using OBOXML as an
> example here because it clearly has resolvable URIs for each ontology
> term.  In Allen's spec, ontologies can also be returned in other
> formats, but it's unclear to me whether terms in these other formats
> would resolve to similar URIs.
>
> 	gregg
>
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
>
> [mailto:das2-bounces at portal.open-
>
> > bio.org] On Behalf Of Andrew Dalke
> > Sent: Tuesday, February 07, 2006 1:32 AM
> > To: DAS/2
> > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> > sprint,6 Feb 2006
> >
> > > gh: would like a re-cast as xml document, hosted at so/sofa
> > > website. that xml would be like a std ontology representation so you
> > > could extend it. so someone could point to an extension of it.
> >
> > I asked as an action item if Gregg would look into the solution
> > for this.  Do we refer to the ontology by a "GO:0123456" identifier
> > or by some URL scheme?  If so, what's the mapping from URL scheme
> > to something that clients and people can understand, eg, to
> > ask for everything which is an exon?
> >
> > Does this mapping need a version number - does it change over time?
> >
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From Gregg_Helt at affymetrix.com  Tue Feb  7 16:54:39 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Tue, 7 Feb 2006 08:54:39 -0800
Subject: [DAS2] Proposed agenda for DAS/2 code sprint teleconference,
	Tuesday Feb 7
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B8@msex02.affymetrix.com>

Vote on how to construct URLs to query for segments, types, features: 
   1.) specified by query_id
   2.) hardwired to ~/segments, ~/types, ~/features
   3.) ?

Status Report

Integrating sequence ontology with DAS/2 (and possibly other ontologies)

Feature properties and queries over properties

MAINTAINER information

Use of xml:base

?


From dalke at dalkescientific.com  Tue Feb  7 17:01:38 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 17:01:38 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
	<Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>
Message-ID: <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com>

Allen
> The XML is now as you requested, please confirm.

Missing the namespace declaration.  You have


<SOURCES
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xml:base="http://das.biopackages.net/das/genome/">

should be

<SOURCES
       xmlns="http://www.biodas.org/ns/das/genome/2.00"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xml:base="http://das.biopackages.net/das/genome/">

The <PROP> element goes after the CATEGORY.  (Which I want to
rename back to CAPABILITY.)

The ASSEMBLY element no longer exists.

Fixing those by hand,

* file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: 
error: attribute "writeable" not allowed at this point; ignored
* file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: 
error: attribute "taxon" not allowed at this point; ignored

There is no more 'writeable' (that's, IMO) something to be decided
as part of the writeback spec.  It might be that we have a

<CAPABILITY type="writeback" />

and the existence of that indicate writeability.

It's also "taxid" and not "taxon".  I used "taxid" because that's
what NCBI uses for their data.

> There are probably also some other Content-Type headers that need to be
> changed for the other responses -- let me know if you spot them.

Haven't gotten that far yet.

					Andrew
					dalke at dalkescientific.com


From allenday at ucla.edu  Tue Feb  7 17:25:03 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 7 Feb 2006 09:25:03 -0800 (PST)
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <5d78aba1ae5117624db08b4dd395c985@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
	<Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>
	<5d78aba1ae5117624db08b4dd395c985@dalkescientific.com>
Message-ID: <Pine.LNX.4.58.0602070923370.29889@sumo.ctrl.ucla.edu>

On Tue, 7 Feb 2006, Andrew Dalke wrote:

> Allen
> > The XML is now as you requested, please confirm.
> 
> Missing the namespace declaration.  You have
> 
> 
> <SOURCES
>        xmlns:xlink="http://www.w3.org/1999/xlink"
>        xml:base="http://das.biopackages.net/das/genome/">
> 
> should be
> 
> <SOURCES
>        xmlns="http://www.biodas.org/ns/das/genome/2.00"
>        xmlns:xlink="http://www.w3.org/1999/xlink"
>        xml:base="http://das.biopackages.net/das/genome/">

done

> 
> The <PROP> element goes after the CATEGORY.  (Which I want to
> rename back to CAPABILITY.)

done

> 
> The ASSEMBLY element no longer exists.

done

> 
> Fixing those by hand,
> 
> * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: 
> error: attribute "writeable" not allowed at this point; ignored
> * file:///Users/dalke/cvses/das/das2/scratch/allen_sources.xml:7:84: 
> error: attribute "taxon" not allowed at this point; ignored
> 
> There is no more 'writeable' (that's, IMO) something to be decided
> as part of the writeback spec.  It might be that we have a
> 
> <CAPABILITY type="writeback" />
> 
> and the existence of that indicate writeability.

i have not made the change if this is an IMO.

> 
> It's also "taxid" and not "taxon".  I used "taxid" because that's
> what NCBI uses for their data.

done

-Allen

> 
> > There are probably also some other Content-Type headers that need to be
> > changed for the other responses -- let me know if you spot them.
> 
> Haven't gotten that far yet.
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From ap3 at sanger.ac.uk  Tue Feb  7 17:44:41 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Tue, 7 Feb 2006 17:44:41 +0000
Subject: [DAS2] toy - das2 registry
Message-ID: <d04da1c45044d91fdbe7842f3e23f63c@sanger.ac.uk>

Hi!

A  "toy" das2 registry serving das1 servers,  via das2 responses can be 
accessed at

http://www.spice-3d.org/dasregistry/das2/sources/

I will work on adding the first das2 servers tomorrow.

Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From cjm at fruitfly.org  Tue Feb  7 17:29:09 2006
From: cjm at fruitfly.org (Chris Mungall)
Date: Tue, 7 Feb 2006 09:29:09 -0800 (PST)
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
Message-ID: <Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>


Hi all

I'm concerned that the XML in the URL below isn't quite Obo-XML, it's
Allen's modified version of it. In particular, the adding of an "id"
attribute which is redundant with the id element, and the modification of
the ID scheme to use slashes instead of :s.

I believe the latter may have been to make the ID scheme more DAS-y?

OBO IDs are composed of a prefix and a local ID. These are always joined
with a :. The prefix can be specified as shortform (eg GO) or a URI
prefix. When the long form is combined with the local ID you get your URI.

If DAS wants to use a modified version of Obo-XML, that's fine, but please
don't call it Obo-XML, it will cause huge confusion!

I would much prefer if you used Obo-XML as it is - if there are things
you'd like to see changed about the format we can perhaps work that out.
I'm concerned by the changing the ID to use / instead of :. This is wrong,
and if it's something that's required for DAS, how will you interoperate
with RDF etc?

In fact there are other parts where the xml is definitely not Obo-XML - it
looks like Allen has coded these by hand rather than taking existing XML.
That's fine, but it should be marked as such. For example, there is no
develops_from element in Obo-XML; all relations bar is_a are encoded as
relationship elements.

There is a DTD at the moment
http://www.godatabase.org/dev/xml/dtd

The docs are minimal as the explanation of all the fields is in the docs
for the obo text file format
http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf}

We'll be converting to RNG+XSD soon

You can get Obo-XML examples from
http://www.fruitfly.org/~cjm/obo-download

You can see the default rule for creating a URI in the OWL files; these
currently all get the geneontology.org URI prefix by default, but this
will change (we were going to use LSIDs but the majority of OWL tools
don't seem to handle URNs very well)

As far as DAS/2 supporting different file formats, Obo-XML and RDFS/OWL
would seem to be the natural contenders. We currently go from the former
to the latter via a simple XSLT, the reverse transformation is a little
more difficult.

Allen has inlined some comments from an email exchange with me in the
document. I agree about keeping the API minimal. On the other hand you
will need at least some inferencing machinery - I'd encourage you to reuse
existing reasoning services here.

Cheers
Chris

On Tue, 7 Feb 2006, Helt,Gregg wrote:

> I talked to Suzi, she's planning to join our teleconference today to
> discuss ontologies, wearing her hat as co-PI of the National Center for
> Biomedical Ontology.  Hopefully Lincoln can join too.
>
> I took a closer look at the DAS/2 ontology work Allen has done (see
> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone who
> wants to contribute to the ontology discussion to read this doc.  It
> specifies a way to retrieve ontologies in OBOXML format.  In this format
> each ontology term gets an absolute URI through the same mechanism that
> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
> relative but resolvable).  As Allen pointed out yesterday this would
> solve our problem of how to uniquely specify ontology terms in the DAS/2
> TYPES XML.
>
> I couldn't find any documentation for the OBOXML format, other than the
> code that generates it from OBO files.  But I'm using OBOXML as an
> example here because it clearly has resolvable URIs for each ontology
> term.  In Allen's spec, ontologies can also be returned in other
> formats, but it's unclear to me whether terms in these other formats
> would resolve to similar URIs.
>
> 	gregg
>
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
> [mailto:das2-bounces at portal.open-
> > bio.org] On Behalf Of Andrew Dalke
> > Sent: Tuesday, February 07, 2006 1:32 AM
> > To: DAS/2
> > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> > sprint,6 Feb 2006
> >
> > > gh: would like a re-cast as xml document, hosted at so/sofa
> > > website. that xml would be like a std ontology representation so you
> > > could extend it. so someone could point to an extension of it.
> >
> > I asked as an action item if Gregg would look into the solution
> > for this.  Do we refer to the ontology by a "GO:0123456" identifier
> > or by some URL scheme?  If so, what's the mapping from URL scheme
> > to something that clients and people can understand, eg, to
> > ask for everything which is an exon?
> >
> > Does this mapping need a version number - does it change over time?
> >
> > 					Andrew
> > 					dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
>


From cjm at fruitfly.org  Tue Feb  7 17:32:24 2006
From: cjm at fruitfly.org (chris mungall)
Date: Tue, 7 Feb 2006 09:32:24 -0800
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <200602071151.56939.lstein@cshl.edu>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
	<200602071151.56939.lstein@cshl.edu>
Message-ID: <afbcf2eb5f143730a0d0a64c70595d44@fruitfly.org>


What inferencing rules do you use for fetching features by their 
Ontology_terms?

On Feb 7, 2006, at 8:51 AM, Lincoln Stein wrote:

> Allen's ideas seem very sensible and easy to manage. We can already 
> serve
> associations between genomic features and GO terms via properties, so 
> the
> concerns expressed in the discussion section about the big GO API 
> shouldn't
> apply.
>
> Lincoln
>
> On Tuesday 07 February 2006 10:34, Helt,Gregg wrote:
>> I talked to Suzi, she's planning to join our teleconference today to
>> discuss ontologies, wearing her hat as co-PI of the National Center 
>> for
>> Biomedical Ontology.  Hopefully Lincoln can join too.
>>
>> I took a closer look at the DAS/2 ontology work Allen has done (see
>> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone 
>> who
>> wants to contribute to the ontology discussion to read this doc.  It
>> specifies a way to retrieve ontologies in OBOXML format.  In this 
>> format
>> each ontology term gets an absolute URI through the same mechanism 
>> that
>> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
>> relative but resolvable).  As Allen pointed out yesterday this would
>> solve our problem of how to uniquely specify ontology terms in the 
>> DAS/2
>> TYPES XML.
>>
>> I couldn't find any documentation for the OBOXML format, other than 
>> the
>> code that generates it from OBO files.  But I'm using OBOXML as an
>> example here because it clearly has resolvable URIs for each ontology
>> term.  In Allen's spec, ontologies can also be returned in other
>> formats, but it's unclear to me whether terms in these other formats
>> would resolve to similar URIs.
>>
>> 	gregg
>>
>>> -----Original Message-----
>>> From: das2-bounces at portal.open-bio.org
>>
>> [mailto:das2-bounces at portal.open-
>>
>>> bio.org] On Behalf Of Andrew Dalke
>>> Sent: Tuesday, February 07, 2006 1:32 AM
>>> To: DAS/2
>>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
>>> sprint,6 Feb 2006
>>>
>>>> gh: would like a re-cast as xml document, hosted at so/sofa
>>>> website. that xml would be like a std ontology representation so you
>>>> could extend it. so someone could point to an extension of it.
>>>
>>> I asked as an action item if Gregg would look into the solution
>>> for this.  Do we refer to the ontology by a "GO:0123456" identifier
>>> or by some URL scheme?  If so, what's the mapping from URL scheme
>>> to something that clients and people can understand, eg, to
>>> ask for everything which is an exon?
>>>
>>> Does this mapping need a version number - does it change over time?
>>>
>>> 					Andrew
>>> 					dalke at dalkescientific.com
>>>
>>> _______________________________________________
>>> DAS2 mailing list
>>> DAS2 at portal.open-bio.org
>>
>> _______________________________________________
>> DAS2 mailing list
>> DAS2 at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/das2
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Tue Feb  7 18:40:56 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 18:40:56 +0000
Subject: [DAS2] category -> capability
Message-ID: <98a28be1166142c23be61650f51b66ae@dalkescientific.com>

I've made the commit.  The element

SOURCES/SOURCE/VERSION/CATEGORY

  is now (in some shallow and some deep sense) back to

SOURCES/SOURCE/VERSION/CAPABILITY


					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Tue Feb  7 19:00:40 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Tue, 7 Feb 2006 11:00:40 -0800
Subject: [DAS2] Working with xml:base in Java?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9B9@msex02.affymetrix.com>


	Thomas, I'm wondering what toolkits you're using for binding XML
to Java objects?  And particularly how you are dealing with resolving
URIs when xml:base is used.  So far I've mostly used various
implementations of SAX and DOM -- I've found some reports of builtin
xml:base support in Xerces SAX/DOM, but it's still unclear.

	I've been avoiding the issue up till now.  It won't be too hard
to implement URI resolution relative to xml:base, but I thought I'd
check around first and see if there's automated support of this in some
toolkit.

	Thanks,
	Gregg


From dalke at dalkescientific.com  Tue Feb  7 19:11:09 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 19:11:09 +0000
Subject: [DAS2] toy - das2 registry
In-Reply-To: <d04da1c45044d91fdbe7842f3e23f63c@sanger.ac.uk>
References: <d04da1c45044d91fdbe7842f3e23f63c@sanger.ac.uk>
Message-ID: <551a60258c89cd953f35c6a4450a444d@dalkescientific.com>

Andreas Prlic wrote:
> A  "toy" das2 registry serving das1 servers,  via das2 responses can 
> be accessed at
>
> http://www.spice-3d.org/dasregistry/das2/sources/
>
> I will work on adding the first das2 servers tomorrow.

There are differences between this and the spec.  These are

"CATEGORY" -> "CAPABILITIES"

Andreas knew that but didn't get it changed before having
to head out for a bit.

"testcode" should be "test_range" - it was added this afternoon
but I changed the name on Andreas.  (He agreed to the change.)

   # this is range string (eg, "Chr1/1:100" or "CloneABC123/500:599")
   # used in an "inside=" feature query.  It is used by the registry
   # server when doing a heartbeat check.
   attribute test_range { text }?,

The underlying problem is that a web server can be up while
the back-end database is down.  While a server should report
that as an error, sadly that's not always the case.  This
test_range is used by Andreas registry server in a periodic
feature query.  It should return a "reasonable" number of
features.

I decided to make it part of the spec for two reasons:
  - it simplifies auto-fill-in during registry discovery
  - clients can also use it to query the server and see if
      it's really alive or if it really means to return
      an empty list of features all the time.

It is optional.


The MAINTAINER "name" was required.  Andreas has examples where
there is only an email address and wants the name to be optional.
So now "name", "email" and "href" are all optional.  I would
like that one must be provided.

Finally, the "taxid" in the COORDINATES is optional.  The
RNG schema thought it was mandatory.

I've updated the schemas and the spec for the last two.  Committed.

Looks like I'll be spending most of tomorrow updating the
rest of the spec document.

I got a copy of Andreas' document and edited it to meet the
current spec and I've checked it in under
   "scratch/registry_sources.xml"
Feel free to test it out with your parsers.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Feb  7 19:28:49 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 19:28:49 +0000
Subject: [DAS2] format version
Message-ID: <4cd0c60fb7871ad6a70ad2b25cb73406@dalkescientific.com>

Just committed to the spec.  If I'm wrong and the version number
proves useful, I'll make it less snarky.  :)


This document defines several new content-types.  These are

application/x-das-sources+xml
application/x-das-features+xml
application/x-das-types+xml
application/x-das-segments+xml

A server may supply an optional "version" value for the Content-Type,
to specify which version of the specification it provides.  This is
(at present and unless others can convince me otherwise) meant to be
used only during this period of specification development while things
are in flux.  A client can look at the version string and use an
appropriate reader to handle it.

Example:

   Content-Type: application/x-das-types+xml; version=1

The list of versions is as follows:

   601071920:  this version

The versions will be increasing integers.  The format will be
"YMMDDHHMM" where "Y" is the year - 2005.  (This makes it a 32 bit
integer, in case you were wondering.)  There's no way this spec will
be in flux in 4 years time.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Tue Feb  7 19:14:15 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Tue, 7 Feb 2006 19:14:15 +0000
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <Pine.LNX.4.58.0602070923370.29889@sumo.ctrl.ucla.edu>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>
	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>
	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>
	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>
	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>
	<17368.18146.195226.166165@kinked.lbl.gov>
	<17369.24994.880706.685148@kinked.lbl.gov>
	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>
	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>
	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>
	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>
	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>
	<Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>
	<5d78aba1ae5117624db08b4dd395c985@dalkescientific.com>
	<Pine.LNX.4.58.0602070923370.29889@sumo.ctrl.ucla.edu>
Message-ID: <7ca6d9841d7c3c334589da147c38de53@dalkescientific.com>

>> There is no more 'writeable' (that's, IMO) something to be decided
>> as part of the writeback spec.  It might be that we have a

> i have not made the change if this is an IMO.

Okay.  There is no "writeable".  The writeability is determined
by the <CAPABILITY> element.  If there is a CAPABILITY with
a type == "locks" then the server is (potentially) writeable
in the same way that "writeable='yes'" means that it's writeable.

Anyone else have an O?

					Andrew
					dalke at dalkescientific.com


From ed_erwin at affymetrix.com  Tue Feb  7 20:46:01 2006
From: ed_erwin at affymetrix.com (Ed Erwin)
Date: Tue, 07 Feb 2006 12:46:01 -0800
Subject: [DAS2] Re: Apollo and DAS/2 priorities
In-Reply-To: <7ca6d9841d7c3c334589da147c38de53@dalkescientific.com>
References: <5d78d8ff5605327d8a23bde5b4177f1d@dalkescientific.com>	<CE75C281-908F-4BC8-B01D-4239946F7FF9@sanger.ac.uk>	<08c9b852196e449cba6b16f99c3c3212@dalkescientific.com>	<8C0328BC-6AFE-4C35-B2EF-DF85FFA71E2D@sanger.ac.uk>	<8ddfbc3373d9d3cbcdb66e24d52d2ba7@dalkescientific.com>	<17368.18146.195226.166165@kinked.lbl.gov>	<17369.24994.880706.685148@kinked.lbl.gov>	<24afbfd39f79595678721f1ef75a239e@dalkescientific.com>	<Pine.LNX.4.58.0601271755170.1651@sumo.ctrl.ucla.edu>	<c53c8b6f9c6512e363b64d96641e07ea@dalkescientific.com>	<Pine.LNX.4.58.0602052306410.25579@sumo.ctrl.ucla.edu>	<c2d8869c1ff0e38ad53b7df455b240c8@dalkescientific.com>	<Pine.LNX.4.58.0602070245550.15849@sumo.ctrl.ucla.edu>	<5d78aba1ae5117624db08b4dd395c985@dalkescientific.com>	<Pine.LNX.4.58.0602070923370.29889@sumo.ctrl.ucla.edu>
	<7ca6d9841d7c3c334589da147c38de53@dalkescientific.com>
Message-ID: <43E90709.6060602@affymetrix.com>

This is something we should discuss when we discuss the 'writeable' 
parts of the spec.  But in my opinion, 'writeable' and 'lockable' are 
two separate <CAPABILITY>'s.  I see no reason not to allow some 
implementers to develop simple servers that are writeable but don't 
implement a locking mechanism.  Large public servers may want locking, 
but I'd bet that a non-locking server would very rarely lead to 
problems, especially in small projects.

(If the server is non-locking, the client could add a little more logic 
to check that nothing has changed since the last retrieval before doing 
a commit.)

Andrew Dalke wrote:
>>> There is no more 'writeable' (that's, IMO) something to be decided
>>> as part of the writeback spec.  It might be that we have a
> 
> 
>> i have not made the change if this is an IMO.
> 
> 
> Okay.  There is no "writeable".  The writeability is determined
> by the <CAPABILITY> element.  If there is a CAPABILITY with
> a type == "locks" then the server is (potentially) writeable
> in the same way that "writeable='yes'" means that it's writeable.
> 
> Anyone else have an O?
> 
>                     Andrew
>                     dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From allenday at ucla.edu  Tue Feb  7 21:20:53 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 7 Feb 2006 13:20:53 -0800 (PST)
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
Message-ID: <Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>

Hi Chris,

On Tue, 7 Feb 2006, Chris Mungall wrote:

> 
> Hi all
> 
> I'm concerned that the XML in the URL below isn't quite Obo-XML, it's
> Allen's modified version of it. In particular, the adding of an "id"
> attribute which is redundant with the id element, and the modification of
> the ID scheme to use slashes instead of :s.
> 
> I believe the latter may have been to make the ID scheme more DAS-y?

The slash was introduced to take advantage of xml:base and the
hierarchical relationship between namespaces and terms, e.g.

  xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001"

is equivalent to:

  /das/ontology/obo/1/ontology/SO/0000001

If we want the identifier to be SO:0000001, it means that we have to make
xml:base="/das/ontology/obo/1/ontology/SO.  This is problematic for two 
reasons:

  1) multiple xml:base cannot be defined for the entire document, meaning
     that URIs for other records referenced become very long.

  2) different ontologies cannot use the same xml:base

The only way I see out of this ATM is to treat : as a / internal to the
Ontology-DAS service.

> OBO IDs are composed of a prefix and a local ID. These are always joined
> with a :. The prefix can be specified as shortform (eg GO) or a URI
> prefix. When the long form is combined with the local ID you get your URI.
> 
> If DAS wants to use a modified version of Obo-XML, that's fine, but please
> don't call it Obo-XML, it will cause huge confusion!
> 
> I would much prefer if you used Obo-XML as it is - if there are things
> you'd like to see changed about the format we can perhaps work that out.
> I'm concerned by the changing the ID to use / instead of :. This is wrong,
> and if it's something that's required for DAS, how will you interoperate
> with RDF etc?
> 
> In fact there are other parts where the xml is definitely not Obo-XML - it
> looks like Allen has coded these by hand rather than taking existing XML.
> That's fine, but it should be marked as such. For example, there is no
> develops_from element in Obo-XML; all relations bar is_a are encoded as
> relationship elements.

The XML provided by the Ontology-DAS server is using templates to mark up
ontology records that have been loaded to a chado database using
perl-go-perl.  The develops_from node, IIRC, was created because there is
a section in a perl-go-perl .xslt that creates elements for all
relationship types.

> 
> There is a DTD at the moment
> http://www.godatabase.org/dev/xml/dtd

This didn't exist at the time I wrote my templates ( 4-6 months ago), or I
would have validated.

-Allen


> 
> The docs are minimal as the explanation of all the fields is in the docs
> for the obo text file format
> http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf}
> 
> We'll be converting to RNG+XSD soon
> 
> You can get Obo-XML examples from
> http://www.fruitfly.org/~cjm/obo-download
> 
> You can see the default rule for creating a URI in the OWL files; these
> currently all get the geneontology.org URI prefix by default, but this
> will change (we were going to use LSIDs but the majority of OWL tools
> don't seem to handle URNs very well)
> 
> As far as DAS/2 supporting different file formats, Obo-XML and RDFS/OWL
> would seem to be the natural contenders. We currently go from the former
> to the latter via a simple XSLT, the reverse transformation is a little
> more difficult.
> 
> Allen has inlined some comments from an email exchange with me in the
> document. I agree about keeping the API minimal. On the other hand you
> will need at least some inferencing machinery - I'd encourage you to reuse
> existing reasoning services here.
> 
> Cheers
> Chris
> 
> On Tue, 7 Feb 2006, Helt,Gregg wrote:
> 
> > I talked to Suzi, she's planning to join our teleconference today to
> > discuss ontologies, wearing her hat as co-PI of the National Center for
> > Biomedical Ontology.  Hopefully Lincoln can join too.
> >
> > I took a closer look at the DAS/2 ontology work Allen has done (see
> > http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone who
> > wants to contribute to the ontology discussion to read this doc.  It
> > specifies a way to retrieve ontologies in OBOXML format.  In this format
> > each ontology term gets an absolute URI through the same mechanism that
> > the rest of DAS/2 uses (URIs for ids, which can be either absolute or
> > relative but resolvable).  As Allen pointed out yesterday this would
> > solve our problem of how to uniquely specify ontology terms in the DAS/2
> > TYPES XML.
> >
> > I couldn't find any documentation for the OBOXML format, other than the
> > code that generates it from OBO files.  But I'm using OBOXML as an
> > example here because it clearly has resolvable URIs for each ontology
> > term.  In Allen's spec, ontologies can also be returned in other
> > formats, but it's unclear to me whether terms in these other formats
> > would resolve to similar URIs.
> >
> > 	gregg
> >
> > > -----Original Message-----
> > > From: das2-bounces at portal.open-bio.org
> > [mailto:das2-bounces at portal.open-
> > > bio.org] On Behalf Of Andrew Dalke
> > > Sent: Tuesday, February 07, 2006 1:32 AM
> > > To: DAS/2
> > > Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> > > sprint,6 Feb 2006
> > >
> > > > gh: would like a re-cast as xml document, hosted at so/sofa
> > > > website. that xml would be like a std ontology representation so you
> > > > could extend it. so someone could point to an extension of it.
> > >
> > > I asked as an action item if Gregg would look into the solution
> > > for this.  Do we refer to the ontology by a "GO:0123456" identifier
> > > or by some URL scheme?  If so, what's the mapping from URL scheme
> > > to something that clients and people can understand, eg, to
> > > ask for everything which is an exon?
> > >
> > > Does this mapping need a version number - does it change over time?
> > >
> > > 					Andrew
> > > 					dalke at dalkescientific.com
> > >
> > > _______________________________________________
> > > DAS2 mailing list
> > > DAS2 at portal.open-bio.org
> >
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
> >
> 
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
> 


From cjm at fruitfly.org  Tue Feb  7 21:59:12 2006
From: cjm at fruitfly.org (chris mungall)
Date: Tue, 7 Feb 2006 13:59:12 -0800
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>
Message-ID: <e00e781866375762b29061d2b510a10e@fruitfly.org>


On Feb 7, 2006, at 1:20 PM, Allen Day wrote:

> Hi Chris,
>
> On Tue, 7 Feb 2006, Chris Mungall wrote:
>
>>
>> Hi all
>>
>> I'm concerned that the XML in the URL below isn't quite Obo-XML, it's
>> Allen's modified version of it. In particular, the adding of an "id"
>> attribute which is redundant with the id element, and the 
>> modification of
>> the ID scheme to use slashes instead of :s.
>>
>> I believe the latter may have been to make the ID scheme more DAS-y?
>
> The slash was introduced to take advantage of xml:base and the
> hierarchical relationship between namespaces and terms, e.g.
>
>   xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001"
>
> is equivalent to:
>
>   /das/ontology/obo/1/ontology/SO/0000001

it's actually equivalent to:
/das/ontology/obo/1/ontologySO/0000001

> If we want the identifier to be SO:0000001, it means that we have to 
> make
> xml:base="/das/ontology/obo/1/ontology/SO.  This is problematic for two
> reasons:
>
>   1) multiple xml:base cannot be defined for the entire document, 
> meaning
>      that URIs for other records referenced become very long.

Why not just define a qname for every idspace? This is the standard way 
of doing this in XML

Using xml:base is not a big gain for brevity, since fairly soon some 
obo ontologies will reference other obo ontologies.

In fact is this even as issue if you get rid of the id attribute to 
conform to obo-xml? ids in obo-xml are encoded as elements, so xml:base 
rules are not applied. Obo has it's own rules for ID generation. This 
has the arguable disadvantage that we can't directly use xml:base and 
the whole xml namespace system for OBO IDs, we layer our own system on 
top. This is actually preferable for us.

>   2) different ontologies cannot use the same xml:base
>
> The only way I see out of this ATM is to treat : as a / internal to the
> Ontology-DAS service.

I'm still not sure what the problem is, and I think you may be stuck 
anyway when it comes to RDF/OWL ontologies

>
>> OBO IDs are composed of a prefix and a local ID. These are always 
>> joined
>> with a :. The prefix can be specified as shortform (eg GO) or a URI
>> prefix. When the long form is combined with the local ID you get your 
>> URI.
>>
>> If DAS wants to use a modified version of Obo-XML, that's fine, but 
>> please
>> don't call it Obo-XML, it will cause huge confusion!
>>
>> I would much prefer if you used Obo-XML as it is - if there are things
>> you'd like to see changed about the format we can perhaps work that 
>> out.
>> I'm concerned by the changing the ID to use / instead of :. This is 
>> wrong,
>> and if it's something that's required for DAS, how will you 
>> interoperate
>> with RDF etc?
>>
>> In fact there are other parts where the xml is definitely not Obo-XML 
>> - it
>> looks like Allen has coded these by hand rather than taking existing 
>> XML.
>> That's fine, but it should be marked as such. For example, there is no
>> develops_from element in Obo-XML; all relations bar is_a are encoded 
>> as
>> relationship elements.
>
> The XML provided by the Ontology-DAS server is using templates to mark 
> up
> ontology records that have been loaded to a chado database using
> perl-go-perl.  The develops_from node, IIRC, was created because there 
> is
> a section in a perl-go-perl .xslt that creates elements for all
> relationship types.

hmmm, I don't think so, but the point is moot anyway, just so long as 
the final version uses xml that validates, either against obo-xml or 
your own documented variant

>
>>
>> There is a DTD at the moment
>> http://www.godatabase.org/dev/xml/dtd
>
> This didn't exist at the time I wrote my templates ( 4-6 months ago), 
> or I
> would have validated.

it did, it's just not well signposted! sorry about that

look forward to seeing a demo. I do this you have to work out the 
semantics of retrieval by ontology term though.

cheers
chris

>
> -Allen
>
>
>
>>
>> The docs are minimal as the explanation of all the fields is in the 
>> docs
>> for the obo text file format
>> http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf}
>>
>> We'll be converting to RNG+XSD soon
>>
>> You can get Obo-XML examples from
>> http://www.fruitfly.org/~cjm/obo-download
>>
>> You can see the default rule for creating a URI in the OWL files; 
>> these
>> currently all get the geneontology.org URI prefix by default, but this
>> will change (we were going to use LSIDs but the majority of OWL tools
>> don't seem to handle URNs very well)
>>
>> As far as DAS/2 supporting different file formats, Obo-XML and 
>> RDFS/OWL
>> would seem to be the natural contenders. We currently go from the 
>> former
>> to the latter via a simple XSLT, the reverse transformation is a 
>> little
>> more difficult.
>>
>> Allen has inlined some comments from an email exchange with me in the
>> document. I agree about keeping the API minimal. On the other hand you
>> will need at least some inferencing machinery - I'd encourage you to 
>> reuse
>> existing reasoning services here.
>>
>> Cheers
>> Chris
>>
>> On Tue, 7 Feb 2006, Helt,Gregg wrote:
>>
>>> I talked to Suzi, she's planning to join our teleconference today to
>>> discuss ontologies, wearing her hat as co-PI of the National Center 
>>> for
>>> Biomedical Ontology.  Hopefully Lincoln can join too.
>>>
>>> I took a closer look at the DAS/2 ontology work Allen has done (see
>>> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone 
>>> who
>>> wants to contribute to the ontology discussion to read this doc.  It
>>> specifies a way to retrieve ontologies in OBOXML format.  In this 
>>> format
>>> each ontology term gets an absolute URI through the same mechanism 
>>> that
>>> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
>>> relative but resolvable).  As Allen pointed out yesterday this would
>>> solve our problem of how to uniquely specify ontology terms in the 
>>> DAS/2
>>> TYPES XML.
>>>
>>> I couldn't find any documentation for the OBOXML format, other than 
>>> the
>>> code that generates it from OBO files.  But I'm using OBOXML as an
>>> example here because it clearly has resolvable URIs for each ontology
>>> term.  In Allen's spec, ontologies can also be returned in other
>>> formats, but it's unclear to me whether terms in these other formats
>>> would resolve to similar URIs.
>>>
>>> 	gregg
>>>
>>>> -----Original Message-----
>>>> From: das2-bounces at portal.open-bio.org
>>> [mailto:das2-bounces at portal.open-
>>>> bio.org] On Behalf Of Andrew Dalke
>>>> Sent: Tuesday, February 07, 2006 1:32 AM
>>>> To: DAS/2
>>>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
>>>> sprint,6 Feb 2006
>>>>
>>>>> gh: would like a re-cast as xml document, hosted at so/sofa
>>>>> website. that xml would be like a std ontology representation so 
>>>>> you
>>>>> could extend it. so someone could point to an extension of it.
>>>>
>>>> I asked as an action item if Gregg would look into the solution
>>>> for this.  Do we refer to the ontology by a "GO:0123456" identifier
>>>> or by some URL scheme?  If so, what's the mapping from URL scheme
>>>> to something that clients and people can understand, eg, to
>>>> ask for everything which is an exon?
>>>>
>>>> Does this mapping need a version number - does it change over time?
>>>>
>>>> 					Andrew
>>>> 					dalke at dalkescientific.com
>>>>
>>>> _______________________________________________
>>>> DAS2 mailing list
>>>> DAS2 at portal.open-bio.org
>>>
>>>
>>> _______________________________________________
>>> DAS2 mailing list
>>> DAS2 at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/das2
>>>
>>
>>
>> _______________________________________________
>> DAS2 mailing list
>> DAS2 at portal.open-bio.org
>> http://portal.open-bio.org/mailman/listinfo/das2
>>


From Steve_Chervitz at affymetrix.com  Wed Feb  8 00:30:52 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Tue, 07 Feb 2006 16:30:52 -0800
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
 7 Feb 2006
Message-ID: <C00E7BBC.1BC8A%Steve_Chervitz@affymetrix.com>

Notes from the DAS/2 teleconference for the code sprint, 7 Feb 2006

$Id: das2-teleconf-2006-02-07.txt,v 1.1 2006/02/08 00:37:41 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  Sanger: Andreas Prlic, Thomas Down
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris, Suzi Lewis
  UCLA: Allen Day
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda:
* Vote on constructing URLs/URIs to query segments, types, features
* Status report from people
* Ontologies
* Feat property changes

Topic: Constructing URLS/URIs to query segments, types, features
----------------------------------------------------------------
1.) specified by query_id
2.) hardwired to ~/segments, ~/types, ~/features
3.) ?

ad: lots of people have left here so the vote won't include all.
see email why a query url is useful
agree w/ gregg: short names could be a nice to have.
shouldn't have to worry about how you organize your urls
gh: yes it does: this/types this/segments etc.
ad: can take it out if there's confusion
gh: recommended structure is good.
ee/gh: people will look at the examples and do it that way. they won't
look at .rnc file
gh: make it clearer in the spec that these are merely suggestions of the
hierarchy, you don't have to do it this way.

ad: roy's view: likes the query id url for doing search for all
featues, or all types.
query id is the url used to do search against features.
uri could be relative or absolute.
gh: category element defines a query id for a subset of das.
it's the attribute query id in the category

ad: I also want to rename category back to capability.
how do we arrange urls in a versioned source.
construction off of strings or via attributes in a url
gh: votes for hardwired, but feels less strong today about it.
ad: majority vote is for query id, spec czar goes with that.

[A] query id
[A] andrew will update spec to have less mention of hierarchical structure
[A] allen will update server to do it the recommended way

gh: in addition to have an arbitrary query id to get segments, types,
features, there's a recommended way to do it via the hierarchy. server
should do it the recommended way (hierarchy)

ee: we should be flexible about it.
gh/ad: ok take out recommendation.

Topic: Status reports
---------------------

ad: see his emails.
gh: we need examples in spec document and scratch to be better
synchronized.
ad: should be, i've been trying to keep these in sync.
gh: plan to push into html, incorporate scratch into doc?
ad: yes, eventually.
will also add andreas' work to scratch too.

td: java xml binding libraries, how to put it into a workable server
ap: das registry, sources command, attribute handling, people can
connect to a toy server publically available.
gh: registry will respond?
ap: yes. toy server, toy data like das1, returning sources command.
gh: can you add allen's codesprint server? consider it registered.
ap: is fully working?
gh: can allen send a command to it to register it?
ap: no.
gh: would like to tell my client to do discovery rather than hard
wiring.

gh: comits to igb das/2 client to handle seq, segment, types. not
features query yet. given decision about url construction, can do this
fast so we can test on codesprint server seq, seg, types to bring up
something meaningful in gui. not features by today. affy das/2 server
is running behind. will sync up today as well.

nh: apollo working out sequence, segment, types request. now does
versioned sources. integrating those into query gui as well.

aday: changes early this am. server running under /codesprint is now a
static doc pointing back to the old server. adding segment command,
merging region and seq command. has made everything except
capabilities writeback stuff.
ad: there's another request recently, see my email.
aday: have gotten 40 emails from you in the last day!

aday: brian oconnor is working on bundling dependencies for an rpm
based release.
gh: I also did significant refactoring/moving assay/ontology stuff
into subclasses on client side. haven't seen brian's code, but should
run fine. 

Topic: Integrating Sequence Ontology with DAS/2
-----------------------------------------------

suzi: national center for biomedical ontology, one of 7
natl centers for biomedical computing. focus on needs regarding
developing and using ontologies.

gh: hoping to have a typing system in das/2 via types queries that
references SO but doesn't require client to fully understand
ontologies. too much of a burden. that's the challenge. this
translates into referring to ontology terms as opaque uris
suzi: 'understands' means they're ignoring any relationships between
types. 
gh: yes.
currently type has attrib for id, attrib for ontology.
ad: uri or arbitrary string
suzi: can use uri or string, preprocessed
ad: one or the other
gh: prefers uri
suzi: from uri you can get the string
gh: not clear how to construct uri for particular terms in an ontology
doc
suzi: this will happen in next few months. talking with daniel rubin
about this.
gh: this is where allen comes in. ontology das.
aday: next step is getting it hosted on NCBO server.
currently communicating with chris mungall. said they're planning on
implementing something similar soon, not sure if they'd accept allen's
solution. unclear.
working with gavin sherlock on ontology support for microarry samples,
tissue type, phenotype. was hoping people could pick this up and use
it. 
suzi: gavin and I could help push this.
gh: chris m posted concerns about obo xml that's in allen's scheme
isn't same as what he's using. re: how you make absolution uris.
aday: there's not much docs on obo xml format. did the best I could.
suzi: should be able to sort it out. just an inertia problem of
getting it installed. not a competition issue. fine with me. not
difficult?
aday: by end of week we'll have an rpm.
suzi: let's keep pushing on this to make it happen. I'll talk to gavin
tomorrow. can we install on sf site, or do we need to set it up
elsewhere?
aday: could conceivably set up a cgi on sf. uses custom apache
handler tho.

gh: more ontology q's can wait till tomorrow w/ lincoln.
concern: how do we deal w/ types that represent more
than one ontology terms. defer discussion till tomorrow.

Topic: Feature Properties
-------------------------

See andrew's post today.

ad: this ties into ontologies. two ontology related issues: two different
ways to query. ontology of a feature, and two diff ways to search a db
for that property: exactly equal, or a subtype.
this is a property with two diff searches you may want to do on it.
properties like note, alias, phase have ability to search key/val
properties, e.g., att:alias=something.
score is a floating point number you may want to support > or < on it.
regular exp searches, identical, etc.
td says use xml query language, but worried about complexity of this.
99% of time this is way more that you need.

scenario: given 4 different notes in a feature, is order important?
extensions: curation point gives curator's name and time stamp.
e.g., search for all featues modified by andrew in 2004.
discussion: pull this into a note element, perhaps phase and alias
too.
property table only supports a substring search. give me an author
name, e.g.
not saying getting rid of tag values.
server supporting new data types, extensions, feat search w/ sanger
curation elements for query. or thomas xml search.
this is why I want to move categories back to capabilities.
gh: more appropriate as capabilities than header.
ad: someone can get a document. andreas can combining many servers into
one, say: which one supports which.

to summarize: 
- properties are simple strings
- only substring searches
- change att: to prop:
- note and alias and phase are elements
- advertise that a server has extension to das query lang

gh: what about phase? lincoln needs it.
ad: if it's something that people will be editing, make it a element.
gh: phase is inappropriate for certain types. would like formal way
when it should be there or not.
ad: this is formalizing a way for server to tell client that there are
more types of searches available.
can't see how to do it automatically: eg for a given score, knowing
what is considered significant (low or high, e.g.).
td: if he needs a phase he re-infers it. doesn't work for partial CDS
tho.
gh: how much spec churn will this generate?
ad: [various things, half a dozen or so, some simplifying]
gh: does a colon in a query string need to be escaped? if so, this
makes it hard to read.
ad: could use prop_ rather than prop:
thomas and I had long discussion about this.

[A] andrew will incorporate these changes into feature properties

Topic: Maintainer information
-----------------------------

ad: modified examples under scratch
gh: maintainer at source or version level
ad: one for all sources level
ap: at sanger we have one central server with lots of sources. notes
who's responsible for which server.
gh: ownership cascades down to sub elements?
ad: yes


Topic: XML Base
---------------

gh: can be in any element. as well as xml:lang, don't really
understand.
ad: it's what the atom spec does, so we copied. maybe for
bidirectional languages.
gh: flexible uri resolution scheme w/ xml base. implementation in java
tools is spotty for xml:base. curious about java obj binding of xml
what support they have for resolving xml base. at this point will have
to roll it myself. want to ask thomas about this.
ap: he's using Stacks parser, gets global namespace.
gh: bigger concern for when we have to use sax, need to do xml:base
resolution, eg. when we need to retrieve lots of features.
ad: it can be done with sax.
gh: not hard, but it is a multistep process.
ad: multiple levels of xml:base

ad: tomorrow's agenda: go through roy's otter stuff, convert into new
das format. to get a feel for how data will look. see roy's email. to
use experience gathered from otter to make sure we're sufficiently
covering features.

gh: talking about writeback?
ad: premature. let's talk style sheets wed, and writeback
thursday. plus anything else that's come up about the spec.
want to know how style sheets will look. lincoln should be able to
help out there.


From nomi at fruitfly.org  Wed Feb  8 03:27:13 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Tue, 7 Feb 2006 19:27:13 -0800 (PST)
Subject: [DAS2] We need DAS/2 progress reports for the grant!
Message-ID: <17385.25873.660275.790249@kinked.lbl.gov>

Dear DAS/2 developers,

I am writing this on behalf of Gregg and the DAS/2 team.  This is so
important I'm actually using capital letters.

As you know, we have submitted a request for renewing the DAS/2 grant.
Our chances of having this renewal approved are iffy, especially since we
are asking for more money than in the original grant and NIH's budget is
very tight right now.

The reviewers are about to read our grant proposal and decide whether to
fund it, and we need to send them a supplementary progress report about
what we've accomplished since we submitted the grant in November.
Describing how much progress we've made towards implementing the DAS/2
protocol in both servers and clients will help make our case that we
deserve more funding to continue this important research.

Gregg has been trying for weeks to find out when this progress report was
due (we had figured we had until the end of February).  Today he
*finally* got through to our scientific review administrator, who said
that we have to send it to them no later than THIS THURSDAY.

Obviously, this is very short notice, so we are asking all of you to very
quickly put together a paragraph (no more!) describing your progress
between Nov 1 and the of the end of this week (i.e., you can project to
what you expect to have completed by Friday).  If you need context, I
have attached a copy of the grant; I will also send some of you
individual notes about what we need from you.

Please send us (the DAS2 mailing list, or, if you're feeling shy, just me
and Gregg) your paragraph in PLAIN TEXT so that I can more easily
assimilate them into a single document.  We plan to work on incorporating
your reports into our progress report tomorrow (Wed), send out a draft
tomorrow night (our time) for you to review, and incorporate any
suggestions into our final version that we'll send off on Thursday.

Sorry for the short notice, and thanks in advance for your help.

      Nomi and Gregg

-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAS2_renewal_grant_final2l.doc
Type: application/octet-stream
Size: 453632 bytes
Desc: DAS2 renewal grant proposal
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20060207/29609114/attachment-0001.obj>

From allenday at ucla.edu  Wed Feb  8 03:14:49 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 7 Feb 2006 19:14:49 -0800 (PST)
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <e00e781866375762b29061d2b510a10e@fruitfly.org>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>
	<e00e781866375762b29061d2b510a10e@fruitfly.org>
Message-ID: <Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>

Chris,

Why have you chosen to make <id/> a subelement of <term/>?  Is it expected
that there will be multiple IDs for a given term, and if so is there not a
primary ID?  having an id attribute is a defacto standard for DOM libs, so
you can call getElementById().

-Allen

On Tue, 7 Feb 2006, chris mungall wrote:

> 
> On Feb 7, 2006, at 1:20 PM, Allen Day wrote:
> 
> > Hi Chris,
> >
> > On Tue, 7 Feb 2006, Chris Mungall wrote:
> >
> >>
> >> Hi all
> >>
> >> I'm concerned that the XML in the URL below isn't quite Obo-XML, it's
> >> Allen's modified version of it. In particular, the adding of an "id"
> >> attribute which is redundant with the id element, and the 
> >> modification of
> >> the ID scheme to use slashes instead of :s.
> >>
> >> I believe the latter may have been to make the ID scheme more DAS-y?
> >
> > The slash was introduced to take advantage of xml:base and the
> > hierarchical relationship between namespaces and terms, e.g.
> >
> >   xml:base="/das/ontology/obo/1/ontology" + id="SO/0000001"
> >
> > is equivalent to:
> >
> >   /das/ontology/obo/1/ontology/SO/0000001
> 
> it's actually equivalent to:
> /das/ontology/obo/1/ontologySO/0000001
> 
> > If we want the identifier to be SO:0000001, it means that we have to 
> > make
> > xml:base="/das/ontology/obo/1/ontology/SO.  This is problematic for two
> > reasons:
> >
> >   1) multiple xml:base cannot be defined for the entire document, 
> > meaning
> >      that URIs for other records referenced become very long.
> 
> Why not just define a qname for every idspace? This is the standard way 
> of doing this in XML
> 
> Using xml:base is not a big gain for brevity, since fairly soon some 
> obo ontologies will reference other obo ontologies.
> 
> In fact is this even as issue if you get rid of the id attribute to 
> conform to obo-xml? ids in obo-xml are encoded as elements, so xml:base 
> rules are not applied. Obo has it's own rules for ID generation. This 
> has the arguable disadvantage that we can't directly use xml:base and 
> the whole xml namespace system for OBO IDs, we layer our own system on 
> top. This is actually preferable for us.
> 
> >   2) different ontologies cannot use the same xml:base
> >
> > The only way I see out of this ATM is to treat : as a / internal to the
> > Ontology-DAS service.
> 
> I'm still not sure what the problem is, and I think you may be stuck 
> anyway when it comes to RDF/OWL ontologies
> 
> >
> >> OBO IDs are composed of a prefix and a local ID. These are always 
> >> joined
> >> with a :. The prefix can be specified as shortform (eg GO) or a URI
> >> prefix. When the long form is combined with the local ID you get your 
> >> URI.
> >>
> >> If DAS wants to use a modified version of Obo-XML, that's fine, but 
> >> please
> >> don't call it Obo-XML, it will cause huge confusion!
> >>
> >> I would much prefer if you used Obo-XML as it is - if there are things
> >> you'd like to see changed about the format we can perhaps work that 
> >> out.
> >> I'm concerned by the changing the ID to use / instead of :. This is 
> >> wrong,
> >> and if it's something that's required for DAS, how will you 
> >> interoperate
> >> with RDF etc?
> >>
> >> In fact there are other parts where the xml is definitely not Obo-XML 
> >> - it
> >> looks like Allen has coded these by hand rather than taking existing 
> >> XML.
> >> That's fine, but it should be marked as such. For example, there is no
> >> develops_from element in Obo-XML; all relations bar is_a are encoded 
> >> as
> >> relationship elements.
> >
> > The XML provided by the Ontology-DAS server is using templates to mark 
> > up
> > ontology records that have been loaded to a chado database using
> > perl-go-perl.  The develops_from node, IIRC, was created because there 
> > is
> > a section in a perl-go-perl .xslt that creates elements for all
> > relationship types.
> 
> hmmm, I don't think so, but the point is moot anyway, just so long as 
> the final version uses xml that validates, either against obo-xml or 
> your own documented variant
> 
> >
> >>
> >> There is a DTD at the moment
> >> http://www.godatabase.org/dev/xml/dtd
> >
> > This didn't exist at the time I wrote my templates ( 4-6 months ago), 
> > or I
> > would have validated.
> 
> it did, it's just not well signposted! sorry about that
> 
> look forward to seeing a demo. I do this you have to work out the 
> semantics of retrieval by ontology term though.
> 
> cheers
> chris
> 
> >
> > -Allen
> >
> >
> >
> >>
> >> The docs are minimal as the explanation of all the fields is in the 
> >> docs
> >> for the obo text file format
> >> http://www.godatabase.org/dev/doc/obo_format_spec.{html,txt,pdf}
> >>
> >> We'll be converting to RNG+XSD soon
> >>
> >> You can get Obo-XML examples from
> >> http://www.fruitfly.org/~cjm/obo-download
> >>
> >> You can see the default rule for creating a URI in the OWL files; 
> >> these
> >> currently all get the geneontology.org URI prefix by default, but this
> >> will change (we were going to use LSIDs but the majority of OWL tools
> >> don't seem to handle URNs very well)
> >>
> >> As far as DAS/2 supporting different file formats, Obo-XML and 
> >> RDFS/OWL
> >> would seem to be the natural contenders. We currently go from the 
> >> former
> >> to the latter via a simple XSLT, the reverse transformation is a 
> >> little
> >> more difficult.
> >>
> >> Allen has inlined some comments from an email exchange with me in the
> >> document. I agree about keeping the API minimal. On the other hand you
> >> will need at least some inferencing machinery - I'd encourage you to 
> >> reuse
> >> existing reasoning services here.
> >>
> >> Cheers
> >> Chris
> >>
> >> On Tue, 7 Feb 2006, Helt,Gregg wrote:
> >>
> >>> I talked to Suzi, she's planning to join our teleconference today to
> >>> discuss ontologies, wearing her hat as co-PI of the National Center 
> >>> for
> >>> Biomedical Ontology.  Hopefully Lincoln can join too.
> >>>
> >>> I took a closer look at the DAS/2 ontology work Allen has done (see
> >>> http://biodas.org/documents/das2/das2_ontology.html).  I urge anyone 
> >>> who
> >>> wants to contribute to the ontology discussion to read this doc.  It
> >>> specifies a way to retrieve ontologies in OBOXML format.  In this 
> >>> format
> >>> each ontology term gets an absolute URI through the same mechanism 
> >>> that
> >>> the rest of DAS/2 uses (URIs for ids, which can be either absolute or
> >>> relative but resolvable).  As Allen pointed out yesterday this would
> >>> solve our problem of how to uniquely specify ontology terms in the 
> >>> DAS/2
> >>> TYPES XML.
> >>>
> >>> I couldn't find any documentation for the OBOXML format, other than 
> >>> the
> >>> code that generates it from OBO files.  But I'm using OBOXML as an
> >>> example here because it clearly has resolvable URIs for each ontology
> >>> term.  In Allen's spec, ontologies can also be returned in other
> >>> formats, but it's unclear to me whether terms in these other formats
> >>> would resolve to similar URIs.
> >>>
> >>> 	gregg
> >>>
> >>>> -----Original Message-----
> >>>> From: das2-bounces at portal.open-bio.org
> >>> [mailto:das2-bounces at portal.open-
> >>>> bio.org] On Behalf Of Andrew Dalke
> >>>> Sent: Tuesday, February 07, 2006 1:32 AM
> >>>> To: DAS/2
> >>>> Subject: Re: [DAS2] Notes from the DAS/2 teleconference for the code
> >>>> sprint,6 Feb 2006
> >>>>
> >>>>> gh: would like a re-cast as xml document, hosted at so/sofa
> >>>>> website. that xml would be like a std ontology representation so 
> >>>>> you
> >>>>> could extend it. so someone could point to an extension of it.
> >>>>
> >>>> I asked as an action item if Gregg would look into the solution
> >>>> for this.  Do we refer to the ontology by a "GO:0123456" identifier
> >>>> or by some URL scheme?  If so, what's the mapping from URL scheme
> >>>> to something that clients and people can understand, eg, to
> >>>> ask for everything which is an exon?
> >>>>
> >>>> Does this mapping need a version number - does it change over time?
> >>>>
> >>>> 					Andrew
> >>>> 					dalke at dalkescientific.com
> >>>>
> >>>> _______________________________________________
> >>>> DAS2 mailing list
> >>>> DAS2 at portal.open-bio.org
> >>>
> >>>
> >>> _______________________________________________
> >>> DAS2 mailing list
> >>> DAS2 at portal.open-bio.org
> >>> http://portal.open-bio.org/mailman/listinfo/das2
> >>>
> >>
> >>
> >> _______________________________________________
> >> DAS2 mailing list
> >> DAS2 at portal.open-bio.org
> >> http://portal.open-bio.org/mailman/listinfo/das2
> >>
> 


From allenday at ucla.edu  Wed Feb  8 03:57:05 2006
From: allenday at ucla.edu (Allen Day)
Date: Tue, 7 Feb 2006 19:57:05 -0800 (PST)
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <e00e781866375762b29061d2b510a10e@fruitfly.org>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>
	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>
	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>
	<e00e781866375762b29061d2b510a10e@fruitfly.org>
Message-ID: <Pine.LNX.4.58.0602071950350.29889@sumo.ctrl.ucla.edu>

Hi Chris,

> Why not just define a qname for every idspace? This is the standard way 
> of doing this in XML

Can you give a concrete example of this?  a search for "qname idspace"
returns a single godatabase.org result.


Anyway, I have stripped out the id= attributes from the <term/> and
<typedef/> elements.  You can see valid (by your DTD) obo xml produced
from the das server here:

Entire SO:
http://das.biopackages.net/das/ontology/obo/1/ontology/SO?format=legacy1

SO "exon" record:
http://das.biopackages.net/das/ontology/obo/1/ontology/SO/0000147?format=legacy1

-Allen


From Gregg_Helt at affymetrix.com  Wed Feb  8 08:36:01 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 8 Feb 2006 00:36:01 -0800
Subject: [DAS2] Working with xml:base in Java?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9BF@msex02.affymetrix.com>


	I've been mucking around trying to find an answer to my own
question about ways to easily handle xml:base in Java.  And I think the
answer if I want to continue to use DOM ends up being "code it
yourself".  But it took a while to get to that answer.  I'm writing down
these notes so I can refer back to them next time if the issues I
encountered come up again.  But I figured I might as well post in case
other DAS/2 implementers have similar problems.

	So the standard Java 1.5 distribution includes the
org.xml.dom.Node interface, which conveniently enough has a getBaseURI()
method that should do exactly what I want -- for any node in an XML
document, give me the resolved base URI for that node (regardless of how
complex a combination of xml:base attributes are used in the path to
that node).  Which I can then combine with whatever id attribute I'm
interested in (via Java networking classes) to get the full URI.
	But I need to guarantee compatibility with Java 1.4, so I can't
rely on 1.5.  Java 1.4 has a previous version of org.xml.dom.Node, but
with no getBaseURI() method.  Turns out this is because the 1.5 Node
interface complies with DOM-level3 spec (includes XML Base support) but
the 1.4 Node interface only supports DOM-level2 spec (no XML Base
support).  Okay, but I can download the Xerces2 distribution, which is a
Java library that also has a full implementation of DOM-level3.  So I
get that set up, add some calls to node.getBaseURI() to my code, and it
compiles fine.  But when I run the program I get an ugly
java.lang.NoSuchMethodError.  I dig around on the web and find the
problem is a class/package namespace collision -- both Xerces2 and the
builtin java libraries have a class named org.xml.dom.Node, but of
course they're different.  And replacing built-in java classes is not
normally allowed, so when the program is actually run and classes are
loaded the builtin Node class wins (the one w/o the getBaseURI()
method).  It would have been nice if they mentioned this in the JDK
Compatibility section of the Xerces2 FAQ...
	But there is some discussion of solutions to this problem on the
Xerces mailing list. There is actually a way to replace builtin java
packages via an "Endorsed Standards Override Mechanism", if they're on
the list of endorsed standards, which org.w3c.dom is.  This involves
putting the replacement package in an endorsed directory and setting a
system property to direct the JVM to look there for replacement
packages. But... whatever solution I use has to work with Java WebStart.
I can't find _any_ info on whether the package override mechanism works
with WebStart.  And even if it does work for some WebStart
implementations, I'd be wary of assuming it works for others -- it seems
like one of those things IT folks on the user end might get concerned
about.  I've also found other solutions to the package name clash, but
none that seems compatible with WebStart.

	So it looks like, considering my other constraints, if I want to
stick with DOM I'll need to code xml:base handling myself.  Looking at
the source code for Xerces2, doesn't look too hard.  Except... damn, the
getBaseURI() method implementation is actually commented in the Xerces
code as "Experimental".  Looking closer... um, I think it actually
doesn't implement the spec correctly.  Grr... 

To summarize, when it's time for my status report tomorrow, I think it's
best if I just remain silent.

	gregg

P.S. I suspect the answer for SAX will be similar.
P.P.S. XOM (http://www.xom.nu/) is starting to look pretty good, but I
may just be hallucinating at this point...
	
 
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Helt,Gregg
> Sent: Tuesday, February 07, 2006 11:01 AM
> To: Thomas Down
> Cc: DAS/2
> Subject: [DAS2] Working with xml:base in Java?
> 
> 
> 	Thomas, I'm wondering what toolkits you're using for binding XML
> to Java objects?  And particularly how you are dealing with resolving
> URIs when xml:base is used.  So far I've mostly used various
> implementations of SAX and DOM -- I've found some reports of builtin
> xml:base support in Xerces SAX/DOM, but it's still unclear.
> 
> 	I've been avoiding the issue up till now.  It won't be too hard
> to implement URI resolution relative to xml:base, but I thought I'd
> check around first and see if there's automated support of this in
some
> toolkit.
> 
> 	Thanks,
> 	Gregg
> 
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From td2 at sanger.ac.uk  Wed Feb  8 08:44:38 2006
From: td2 at sanger.ac.uk (Thomas Down)
Date: Wed, 8 Feb 2006 08:44:38 +0000
Subject: [DAS2] Re: Working with xml:base in Java?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9B9@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9B9@msex02.affymetrix.com>
Message-ID: <70790A43-AA5F-4F4A-8F20-50CDE30C7BB3@sanger.ac.uk>


On 7 Feb 2006, at 19:00, Helt,Gregg wrote:

>
> 	Thomas, I'm wondering what toolkits you're using for binding XML
> to Java objects?  And particularly how you are dealing with resolving
> URIs when xml:base is used.  So far I've mostly used various
> implementations of SAX and DOM -- I've found some reports of builtin
> xml:base support in Xerces SAX/DOM, but it's still unclear.
>
> 	I've been avoiding the issue up till now.  It won't be too hard
> to implement URI resolution relative to xml:base, but I thought I'd
> check around first and see if there's automated support of this in  
> some
> toolkit.

Hi Greg,

I'm actually using Stax (the streaming API for XML).  The  
implementation I use is called Woodstox:

          http://woodstox.codehaus.org/

(but there are a few others out there).  No builtin xml:base support  
but it's easy to write a little wrapper around XMLStreamReader to  
spot xml:base attributes and maintain a stack of base URIs.

I'm using java.net.URI to do the URI handling/resolution/ 
relativization.  Seems to be working okay... so far...

         Thomas.


From Gregg_Helt at affymetrix.com  Wed Feb  8 10:12:22 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 8 Feb 2006 02:12:22 -0800
Subject: [DAS2] RE: Working with xml:base in Java?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9C0@msex02.affymetrix.com>


> -----Original Message-----
> From: Thomas Down [mailto:td2 at sanger.ac.uk]
> Sent: Wednesday, February 08, 2006 12:45 AM
> To: Helt,Gregg
> Cc: DAS/2
> Subject: Re: Working with xml:base in Java?
> 
> 
> On 7 Feb 2006, at 19:00, Helt,Gregg wrote:
> 
> >
> > 	Thomas, I'm wondering what toolkits you're using for binding XML
> > to Java objects?  And particularly how you are dealing with
resolving
> > URIs when xml:base is used.  So far I've mostly used various
> > implementations of SAX and DOM -- I've found some reports of builtin
> > xml:base support in Xerces SAX/DOM, but it's still unclear.
> >
> > 	I've been avoiding the issue up till now.  It won't be too hard
> > to implement URI resolution relative to xml:base, but I thought I'd
> > check around first and see if there's automated support of this in
> > some
> > toolkit.
> 
> Hi Greg,
> 
> I'm actually using Stax (the streaming API for XML).  The
> implementation I use is called Woodstox:
> 
>           http://woodstox.codehaus.org/

I would like to check out Stax, haven't used it before.
 
> (but there are a few others out there).  No builtin xml:base support
> but it's easy to write a little wrapper around XMLStreamReader to
> spot xml:base attributes and maintain a stack of base URIs.
> 
> I'm using java.net.URI to do the URI handling/resolution/
> relativization.  Seems to be working okay... so far...

That's what I was thinking about when I said it wouldn't be too hard to
implement... But that was yesterday.  A long time ago.

Now I've taken a detour into re-reading the XML Base spec
http://www.w3.org/TR/xmlbase/, and things don't seem so easy.

I _think_ if there's at least one xml:base attribute in the element
hierarchy above where you're trying to determine a base URI, and
resolution of those xml:base attributes yields an absolute URI, it's all
good, that's the  base URI.  But on the other hand if this resolution
yields a relative URI instead of an absolute URI I'm not sure what
happens -- I would guess it's an error, but I can't see anywhere in the
XML Base spec that spells this out.  And if there's no xml:base to use
to determine a base URI, things get weird:
   if the document is "encapsulated within another entity", the base URI
is the URI of that entity (I have no idea if DAS/2 docs could appear in
such a context)
   otherwise the base URI is the URI used to retrieve the document
   oh, except if you burrow down into the spec pointers to RFC 2396
http://www.ietf.org/rfc/rfc2396.txt, if the request gets redirected you
need to make sure the base URI is the last URI used in the redirect
   oh yeah, and apparently external entity declarations can affect all
of this in ways I don't understand
   and there's probably other gotchas I've missed...

Now from the server side, none of this is really an issue.  Just pick
from a multitude of variants that XML Base allows when you send
responses to the client.  From the client side, if we really want DAS/2
to support XML Base (and I think we do), things get tricky.  It's
definitely pushing me towards using libraries that provide builtin
support for XML Base.

	Gregg


From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb  8 11:54:54 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 08 Feb 2006 11:54:54 +0000
Subject: [DAS2] Ontologies in DAS/2
In-Reply-To: <Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>	<e00e781866375762b29061d2b510a10e@fruitfly.org>
	<Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>
Message-ID: <43E9DC0E.30809@mrc-lmb.cam.ac.uk>

Allen Day wrote:
> Why have you chosen to make <id/> a subelement of <term/>?  Is it expected
> that there will be multiple IDs for a given term, and if so is there not a
> primary ID?  having an id attribute is a defacto standard for DOM libs, so
> you can call getElementById().

I'm curious about the DAS use of id attributes, especially given an 
expectation to use getElementById().

DAS has attributes that are URLs - they include the '/' character.

But getElementById() is an HTML or XHTML DOM method I believe.

Both HTML 4 and XHTML require that id attributes be of type ID, I think, 
and the ID type does not permit '/' characters (IDs are Names).

I find it pretty confusing that DAS uses an attribute that is called id 
that isn't an ID. And I'm curious to know if getElementById() works with 
it? Sounds like a sloppy implementation of the DOM. Or did I miss something?

Cheers, Dave


From dalke at dalkescientific.com  Wed Feb  8 16:36:11 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 8 Feb 2006 16:36:11 +0000
Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2)
In-Reply-To: <43E9DC0E.30809@mrc-lmb.cam.ac.uk>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>	<e00e781866375762b29061d2b510a10e@fruitfly.org>
	<Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>
	<43E9DC0E.30809@mrc-lmb.cam.ac.uk>
Message-ID: <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com>

Dave Howorth wrote:
> I'm curious about the DAS use of id attributes, especially given an 
> expectation to use getElementById().
>
> DAS has attributes that are URLs - they include the '/' character.
>
> But getElementById() is an HTML or XHTML DOM method I believe.
>
> Both HTML 4 and XHTML require that id attributes be of type ID, I 
> think, and the ID type does not permit '/' characters (IDs are Names).
>
> I find it pretty confusing that DAS uses an attribute that is called 
> id that isn't an ID. And I'm curious to know if getElementById() works 
> with it? Sounds like a sloppy implementation of the DOM. Or did I miss 
> something?

We've been talking about this and related matters most of the
day.  It started with Thomas' question "How do I get all of the
exons in the database which are from Vega?"  (Vega being some
other database.)

All of the features which are exons from Vega have the same DAS
data type.  This means he wants to do a feature query with
type = <the DAS type id>

He needs to get the DAS type id.  He can get all of the exons
using an ontology search.  But he wants to search for the string
"exon".  Given the discussion yesterday, will the type query
support "ontology='exon'" or must he use some other service to
convert "exon" to "SO:exon" or to "http://some/server.url"?

Suppose for now it is "SO:exon".  He does

     http://das.server/../types?ontology=SO:exon

That gets all of the exon types, but not the ones from Vega.
The Vega types have a source="Vega".  DAS type queries do
not support searching on that field.

PROPOSAL:  Add a "source=" (case-insensitive substring search)
field to the types query.  (I don't think there is any contention
here so I'll add it.)

     http://das.server/../types?ontology=SO:exon;source=Vega

That comes back with a single DAS type.

He now wants to search for all features with that type.  What
does he use for the query?  Is it (assuming proper escaping)

    http://das.server/../features?type=http://das.server/../type/T12345

?  That's rather excessive, especially if there are many
DAS types derived from the given ontology term.

All around people want to use "T12345" for that, and not the full
URL.  Are there people who do want to use the full URL?

The current system comes from saying the URL is the identifier
for a DAS object.

If as Dave points out we have a "id" which is a simple string
(of the format /[A-Za-z0-9_]+/ or so) then there's no problem.
We can use that for the query, as

    http://das.server/../features?type=T12345

PROPOSAL: do not use a URL for the identifier for objects

That fixes a few problems:
   - xml:base is no longer an issue; these are ids and not URLs
   - the names are short and sweet

It introduces a few problems.

Problem 1: a feature has a type.  How can the client get from the
type id to the type information if there is no URL to resolve?

   Solution 1: add a 'id=' term to the types query URL, eg
      http://das.server/../types?id=T12345
   (or possibly call it 'type=')

   Solution 2: append "/" + type id to the types query URL, eg
     http://das.server/../types/T1234

   Solution 3: have both an 'id' and an 'href' attribute

   Solution 4: the client downloads all the types and compares
    the id fields.

QUESTION:
   At Hinxton nearly all the DAS servers have only one or two types.
Ensembl has 45 types and Allen's has about 50.  Is it reasonable
to have clients just go ahead and download everything and not
worry about a query language?  Is Chado any different?

Problem 2: a feature can refer to its parent and part features.
It can refer to regions on other features.  How does a client get
information about the feature given the feature id?

   Solution 1: add a 'id=' term to the features query URL
   Solution 2: append "/" + feature id to the feature query URL
   Solution 3: have both an 'id' and an 'href' attribute


We discussed this a lot and decided on

PROPOSAL: add an 'id=' query to the types and features query.

We decided against solution 2 because of me - I don't like
working with URLs that way.  Thomas pointed out that an 'id='
query is useful, eg, if a feature has three parts then a client
can request

    http://das.server/../features?id=part1,part2,part3
(NOTE: we're also thinking of proposing this syntax for an 'OR'
query over the same term
    http://das.server/../features?id=part1;id=part2;id=part3
)

I pointed out that having both means there are two ways in the
server to look-up by id - extra machinery.

QUESTION: Who will want to refer to features and types by URL?

Possibilities:
   - hypothetical model where the queries return a list of URLs and
the server (through HTTP pipelining) asks only for the ones it
doesn't have already; saving bandwidth.  THIS IS NOT A USE CASE!

   - request a feature in a specific format (but that can be done
       through the query URL)

   - RDF people who want individually named items (not a use case)

?We couldn't come up with a case where someone would want to
refer to features and types as an individually named URL!

For segments there is a use case - you can ask for sequence by
range, and that's through the segment URLs.  However, that could
be done with the segment query URL so it's not a strong use case.
In any case, it hasn't been a problem so I'll put that off for now.

That being the case, there's no need to consider "Solution 2".
Why have URLs if no one wants to use them?

What did come up during the discussion here was that we had
planned to use URLs for writeback.  That model seems rather
nice.  "DELETE" and "PUT" to the correct URLs, rather than
going through a "POST to delete.cgi?type_id=", etc.

The model for writeback was something like "ask server to make
a copy, with region A:C available for editing.  User works
with region.  User commits region back to server."

In that case, the request for region might as easily make a
copy of the source, available through a special URL visible
only to that one user.  In this copy it can expose "url="
attributes for editing, perhaps also with a "writeable=" field
because some features will not be editable for that user.

I complained yesterday about "writeable" but that was because
for the general purpose server the concept of "writeable" was
user-specific and not appropriate.  In this writeback model
it's just fine.

Another thing came up during discussion of this.  Roy yesterday
proposed the idea of a simple server which only supports getting
"everything".  It doesn't support the DAS query specification.
That is, it only supports

   http://das.server/../types
   http://das.server/../features

and fetching those returns everything.  This is useful for small
data sets because those could be simple files, like

   http://das.server/../types.xml
   http://das.server/../features.xml

Still, for that case there would need to be "feature/F1", "type/T2",
etc.  In essense, a duplicate of every record.

Last December during discussion people said there was no use
case for this sort of flat-file oriented server.  This was not
a design goal.

Thomas mentioned that there is a use case.  Uploading of DAS
tracks to a server.  People complain now that it's hard to
do that.  With this url-less model people can upload a small
number of documents (or at .zip file of a directory) with
the versioned source, types, and features data.

<!-- this is "sources.xml" -->
<VERSION>
   <COORDINATES ... />
   <CAPABILITY type="types" query_url="types.xml">
     <FORMAT name="das2xml">
     <SUPPORTS name="all" />
   </CAPABILITY>
   <CAPABILITY type="features" query_url="features.xml">
     <FORMAT name="das2xml">
     <SUPPORTS name="all" />
   </CAPABILITY>
</VERSION>

<!-- this is features.xml -->
<FEATURES>
</FEATURES>

<!-- this is types.xml  -->
<TYPES>
</TYPES>

There is no need to have an "exploded" copy of all of the
records in parallel to the types and features xml files.

Big Advantage:

Stylesheets are much easier to write.  Refer to fields by
short id instead of long URL.

Conclusion:
   Proposal 1: "id"s are of the form /[A-Za-z0-9_]+/
   Proposal 2: FEATURE and TYPE elements have an option "url"
             (or "href") attribute
   Proposal 3: the feature and type queries support a 'id=' search
   Proposal 4: the type query supports a "source=" search

Churn factor:
   Allen's server doesn't need the 'type/' and 'feature/' fields
   Gregg and others don't need to worry about xml:base any more.
   Type and feature lookups need to track the query URL as well
     as the type and feature id
   We need a new 'id=' search capability

These don't seem big on a programming sense, more a conceptual one.

					Andrew
					dalke at dalkescientific.com


From cjm at fruitfly.org  Wed Feb  8 18:03:41 2006
From: cjm at fruitfly.org (chris mungall)
Date: Wed, 8 Feb 2006 10:03:41 -0800
Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2)
In-Reply-To: <701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>	<e00e781866375762b29061d2b510a10e@fruitfly.org>
	<Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>
	<43E9DC0E.30809@mrc-lmb.cam.ac.uk>
	<701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com>
Message-ID: <94bafd156da54842f9093244ca6083d1@fruitfly.org>


I'm mostly skim the messages here, so I may be missing something, but 
I'm a little confused by this:

On Feb 8, 2006, at 8:36 AM, Andrew Dalke wrote:

>
>     http://das.server/../types?ontology=SO:exon

I don't understand this - SO:exon isn't an ontology

>
> That gets all of the exon types, but not the ones from Vega.
> The Vega types have a source="Vega".  DAS type queries do
> not support searching on that field.
>
> PROPOSAL:  Add a "source=" (case-insensitive substring search)
> field to the types query.  (I don't think there is any contention
> here so I'll add it.)
>
>     http://das.server/../types?ontology=SO:exon;source=Vega

What does 'types' return? A type from an ontology (eg SO:exon) or 
something else? Why would source be recorded here? Surely source would 
be a valid constraint on a feature query, but not a type query.

Perhaps it's the case that in DAS a 'type' means some kind of arbitrary 
grouping (eg features of type X and source Y), and 'ontology' means a 
term/type from an ontology. If it isn't too late I'd suggest changing 
these conventions.


From Gregg_Helt at affymetrix.com  Wed Feb  8 18:12:46 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 8 Feb 2006 10:12:46 -0800
Subject: [DAS2] Why use URIs for feature IDs?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>

      Regarding using URIs for DAS features, here's the quote from Paul
Prescod that I used in the original DAS/2 grant proposal addressing the
question "why use URIs?".  From
http://www.prescod.net/rest/rpc_for_get.html : 

You can give that URI address to anyone, anywhere and they can reuse it.
In particular this means that we can compose applications that were not
thought of in advance. Google is an example of an application that was
composed "after the fact" out of URIs. Yahoo is another...There are a
raft of deployed W3C recommendations that work with information related
through URIs. Many of these are XML-related specifications that work as
well in API-like applications as in user interface-based applications.
These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery,
xml-stylesheet.  Information published through HTTP URIs can be combined
through XInclude, queried and sorted through XQuery and XSLT, visually
rendered with xml-stylesheet, related through RDF, linked through XLink,
pointed into through XPointer.


From dalke at dalkescientific.com  Wed Feb  8 19:24:06 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Wed, 8 Feb 2006 19:24:06 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
Message-ID: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com>

Yes.  I like URLs.  I've been so in favor of URLs that until
this morning I had in the spec that the "id" *is* the URL.
There was no short form for the URL.  (still /is/ no short form
since it hasn't changed ;)

That meant several things:
   - everyone needs to disambiguate through the xml:base to
      figure out if two features are the same. (Neither Gregg nor
      Thomas liked that)

   - queries of the style we are doing become more complex
      (type=http://www.server/path/to/das/type/000A956826C8  vs.
       type=000A956826C8 )

   - passing URLs about make for bigger XML, hence slower.

The first is technical.  The second is emotional - that sort of
query looks ugly.  The last is .. I can't speak for the last.
In an earlier email I showed how a different site layout can
be as efficient as any id scheme.  Quickly, use
    http://www.../volvox/1/S      <- versioned source URL
    http://www.../volvox/1/T?..   <- types query url
    http://www.../volvox/1/T001   <- type urls
    http://www.../volvox/1/F?..   <- feature query urls
    http://www.../volvox/1/F001   <- type urls
and don't worry about any sort of hierarchy in the system.
Everything has the xml:base of "http://www.../volvox/1/"
so relative URLs are trivial strings.


Several said "just chop off the last bit of the URL to get
the id" or "combine some base feature URL with the feature
id to get the full URL."

Why is that useful?  Lincoln said on today's phone call that
he wants both a URL and an id, and expected that both would
be there.

I'm now going to be either stubborn or irritating or both.
Why have an id at all?  That is, why at all have a short string
(say of the form /[A-Za-z0-9_]/ when the URL is there and
meets all the functional requirements of an identifier?

(I'll use 'id' to refer to a short string, 'url' to refer to
a URL.  Both are identifiers.  I should be using 'uri' for
the latter, I know.  See comment below.)

Today I thought I came up with one reason to have ids and
to have a non-existant URL for a <FEATURE> element.  I
think now that I was wrong.

My use case was for uploading data to the Emsembl viewer
to display a new DAS track.  Put all of the types into one
file, in the types XML format.  Put all of the features into
another file in a features XML format.  Use arbitrary ids for
cross referencing, because there is no URL for them - they
don't exist in any form outside the document.

Upload them to the server.  The server reassembles the
annotations by cross referencing the ids.

I now see that that's a mistake.  As Gregg corrected me,
they use URIs not just URLs.  They could use
"das_private:ABC123" or a fully-qualified URL or a
xml:base and the partial URL or whatever scheme.  All
the server needs to know is how to compare the two URI
strings.  It's free to rename the strings if need be.

(Could it keep the original URLs?  Perhaps, but the
original data might not be accessible.  Consider an
exon predictor whose output you want to upload to the
Ensembl viewer.  There is no URL for that.)


Given that this isn't a valid use case for having an 'id'
and not having a 'url' now I ask again, what's the point of\
having *both* a unique URL and a unique 'id' for the elements?

Tradition?  Elegance?

With Dave Howorth's comment about the specialness of 'id'
I can see changing the attribute name to 'url'.... or 'uri'.

I've got to write a couple paragraphs for Nomi now.
I'll leave with the following comment from

http://tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages

> Designing XML Languages is hard. It?s boring, political, 
> time-consuming, unglamorous, irritating work. It always takes longer 
> than you think it will, and when you?re finished, there?s always this 
> feeling that you could have done more or should have done less or got 
> some detail essentially wrong.


					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Wed Feb  8 21:46:37 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Wed, 8 Feb 2006 13:46:37 -0800
Subject: [DAS2] Re: New DAS/2 server for codesprint
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9C5@msex02.affymetrix.com>

Following Steve's suggestion, I'm focusing on the region around YGL076C
(also known as RPL7A) on the yeast genome to get a small slice of
feature XML back from the codesprint server for a region where I know
what the genes  should be:

http://das.biopackages.net/das/genome/yeast/S228C/feature?overlaps=chrVI
I/364251:366080;type=SO:gene

This returns the YGL076C gene with three CDS and two introns.  A nearby
snoRNA also gets returned. 

	Gregg

> -----Original Message-----
> From: Chervitz, Steve 
> Sent: Monday, February 06, 2006 5:03 PM
> To: Helt,Gregg; Allen Day
> Cc: DAS/2
> Subject: Re: [DAS2] Re: New DAS/2 server for codesprint
> 
> 
> 
> There's a gene (RPL7A) with two introns on chr7 at roughly 
> 366kbp - 364kbp: 
> http://db.yeastgenome.org/cgi-bin/locus.pl?locus=YGL076C
> 
> Most genes with introns in cerevisiae (which aren't many) 
> have just a single intron that creates a small 5' exon, such 
> as the alpha and beta tubulin genes on chr13. Tub1 is on 
> chr13 at 99Kbp, and tub3 is also on chr13 at 23Kbp. So the 
> first 100Kb of chr13 would be another region to try. 
> http://db.yeastgenome.org/cgi-bin/locus.pl?locus=tub1
> 
> Steve
> 
> 
> > From: "Helt,Gregg" <Gregg_Helt at affymetrix.com>
> > Date: Mon, 6 Feb 2006 16:14:55 -0800
> > To: Allen Day <allenday at ucla.edu>
> > Cc: DAS/2 <das2 at portal.open-bio.org>
> > Conversation: [DAS2] Re: New DAS/2 server for codesprint
> > Subject: RE: [DAS2] Re: New DAS/2 server for codesprint
> > 
> > 
> > Allen, can you recommend a reasonable region on yeast to do 
> a features 
> > query that will return features with some hierarchy (like 
> > transcript/exons)?
> > 
> > Thanks,
> > Gregg
> > 
> > 
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org 
> > http://portal.open-bio.org/mailman/listinfo/das2
> 
> 


From Steve_Chervitz at affymetrix.com  Wed Feb  8 21:47:18 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Wed, 08 Feb 2006 13:47:18 -0800
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
 8 Feb 2006h
Message-ID: <C00FA6E6.1BD57%Steve_Chervitz@affymetrix.com>

Notes from the DAS/2 teleconference for the code sprint, 8 Feb 2006

$Id: das2-teleconf-2006-02-08.txt,v 1.1 2006/02/08 21:51:14 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  Sanger: Thomas Down
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris
  UCLA: Allen Day, Brian O'connor
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


Agenda:

* progress report for grant renewal
* ontologies
* ids and urls
* style sheets
* status reports

Topic: Progress report for grant
--------------------------------

gh: needs to be in the mail by 5pm tomorrow, to be included as a hard
copy addendum to grant. will improve chances of funding for next cycle.
review will be done be end of feb.
nh: no later than 4pm pst today. state what you've accomplished since
Nov 1 and now, in particular this week. one or two paragraphs.
gh: 
1. highlight significant enhancements
2. involvement of sanger, ebi
3. registry work from andreas, http spec for that registry
4. writeback 

ad: andreas worked on registry server, will send write up soon post
telelconference. 

[A] Everyone write up 1-2 paragraphs of progress and send to Nomi ASAP


Topic: Ontologies
-----------------

gh: concerned about ontol attrib in types doc because, do we want it
to be possible for a type to be an instantiation of multiple terms in
the ontology.
ls: will make it hard to validate. one type = many ontol terms. don't
like it. types will be specializations of SO terms and will not have
multiple parents.
gh: thinking about people doing curation. if a type is anchored to one
tern in the ontol, and a feat can have only one type, a feat won't be
able to refer to >1 term in SO.
ls: any use case for this?
gh: still exploring this. eg., both a computed feature and an exon?
ls: no. separate category for predicted genes.
gh: is there something for 'computed exon' or 'computed cds'?
ls: think so.
sc: multiple branches like go?
ls: multiple relationship types do exist. something can be is_a or
part_of.
I wanted das/2 to be limited to what you can say in SO, with notion
that you can extend it. e.g., three predicted exons one with genefinder,
exonerate, etc.

ad: given a string 'exon' how does that get used to query server?
ls: find exon SO term, download list of types from das server, find
everything that inherits from exon ontology term.
clients need to know how to search the SO list.
they will have a local copy of SO that they'll refresh from time to
time.
gh: client isn't required to know the full structure, except maybe to
search higher-level terms. but the term in the ontology attribute is
sufficient. 
ls: could just search types and desc to find exons, but that relies on
implementer describing their types correctly.
gh: if a client wants to understand an ontol, the best way to go is
via what allen's proposing, searching via ontology das, preferably via
NCBO server.
ad: what is the actual string we're searching on?
aday: name or definition, or id.
ls: client should have a copy of the SO. unambiguous in this opinion.
client has SO, looks through types XML to find what the local types
are which the server supports which match what it's looking for in the
SO.
here's a flowchart:

- client downloads SO, caches.
- client downloads seq types list, caches.
- user searches to find exon
- client looks to find matches against 'exon', maybe 5 hits.
- prompts user to select which he's looking for
- client looks thru cached types xml to find server types of SO term
  that user selected
- client does feature query.

ad: what is the string that the user is looking for URL or string?
ls: in type xml how do we indicate the term?
gh: we've been discussing this the past few days
ls: why not replace the term with SO accession number? then we don't
have to figure out the correct representation of ontology in an
xml. can finish this by friday. chris mungall has weighed in, and xml
version of SO ontology is not completely stable.

gh: perferctly ok for client to know nothing about SO and treat these
as unique string.
ls: right. names will eventually be things like 'exon'.
aday: chris's main complaint is that the doc didn't validate. I didn't
have a dtd. got it and now it validates. I thought this was a done
deal. there is a document written that describes how to do what we're
talking about.
ls: the only thing to be resolved, in types xml document, how do we
refer to SO terms?
aday: an attribute there that allows you to put in uri. it's a
relative url that points to ontology das server to get obo xml for
that term.
ad: how do I go from string 'exon' to find out what that is?
aday: 
ls: lets say administrator of das server has local type called
foobar. associated w/ url for SO 'exon' term. andrew's question is,
user want's to search for exons, how to go from 'exon' to correct url
in SO to find what types correspond to that? what's to go from 'exon'
to foobar. 
aday: search SO for exon, local types.
there's a filter onontolgy that lets you search all terms and
definitions
gh: there's a reqt now that server must understnd parent child
relationships in ontology.
aday: server could do xpath query to pull out the terms you're
interested in w/o understanding ontology
ls: user types 'exon' returns all feats in the genome that are exons.
aday: two servers, feat and ontol server
gets all types from feat server, each has url to ontology das server,
maybe multiple ontology das servers. each must have it's ontology
searched returns supported or not. client assembles all search results
from static obo xml documents,
gh: for most clients this will be irrelevant. user will get a list of
types - genscan, blat alignment, for things they may be interested
in. they don't need to understand ontology nor does client. there may
be a url to look up info about the term. this is the typical
case. more sophisticated use cases can be put off till later.
ls: in types xml can we have two attributes, url and accession
so_accession="SO:12414", other will be url for obo xml.

[A] types will have separate attributes for URI and SO accession number

Topic: IDs and URLs
-------------------

ad: discussion about searching for exon, use case: client goes to
server to get list of all types, wants all features of a
given type in a given range. may filter based on contains or inside,
das-type=xxxxx. 
talking about that being a URL to get full name for it.
what is the thing you send to server to ask for the types?
gh: url
ad: make this an id so it's not a long complex url. just an id
specific to that server. such that you go to feat query url and get
it.
ls: can just chose the last component of the url, type id.
ad: why have ability to get feature type individually?
ls: will have to be uniquified, by adding url to types query.
ad: feat query =
ls: isn't this the way it was?
gh: every feat has unique uri.
ad: talking about filtering and querying.
ls: just give it the id not the whole url.
ad: now it is the url
ls: should be the id
does it make sense to be something that another server has defined?
probably not. just a local type.

[lots of back and forth here, didn't catch it all...]

ad: do we need ability to refer to feature or type by url?
gh: yes. for making rdf statements about das2 features.
ad: who will do this?
gh: I will if no one else does. web technology is moving in this direction.
ls: we want every object a das server serves to be referencable as a
url/uri. as for filtering mechanism, for type filter we can just use
the id of the type, a short string.
ad: agree, as of this morning the url and id are same thing.
ls: a relative uri, by definition the server should implicitly attach
the versioned data source url to it.
ad: xml processors
ls: define the way the filter query mechanism, hard code implicit
paths into it.
ls: featuresquery?type=something if 'something' has no slashes, server
implicitly adds http://myserver/das/types/...
ad: don't like pasting urls and strings together to get things.
don't like queries with implicit logic like that.
ls: perfectly happy saying you can use urls in the query strings. I'd
go with short ids
ad: propsing we have both, id and href. here's the case: people
uploading to server want to provide a das track, can provide two
documents. works well for < 1000 features

gh: we have to have uri for features.
ad: why?
gh: I will send you the page from the first grant.
ls: main reason is: to avoid namespace clashes when integrating data sets.
td: what do you mean by integrate?
ls: view of features from 4 diff annotation groups, want to search for
a particular feature by its id, need to indicate which data source
it's coming from.
td: won't you be keeping track of which data source anyway?
you never get a track that's a mixture of diff sources.
gh: dangerous to do this.
td: there must be something keeping track of which track is from.
gh: my assumption is that this is with uri
td: there's nothing that constrains a server to only use uris from itself.
gh: we sacrificed this when we went with capabilities.
ls: a server can emit a set of features, some use relative uris and
some absolute ones. if my server starts emiting features with
affymetrix uris, the assumption is these originate from affymetrix.
uris indicate that they originate from diff places even though you may
physically get them from a das server at a different location.
gh: thomas is right. given a feature uri you have no way to tell which
das server it came from. clients must keep track of this themselves.
ls: we wanted to divorce the origin of the feat from the sever that
serves it. should be possible to serve features that come from
somewhere else.
gh: making feature uri opaque was deliberate.
ad: when you do a feat query it could return the whole db. so the
server must know how to return a feature document that contains all
features. that server must know all the data.
gh: don't see problem
ad: all features and types have id and url. different. url is optional
gh: no, required. also, not url, but uri.
ad: ok. why should all records have a uri?
gh: compatibility with semantic web/rdf, lsid, future proofing.
ad: if they want to they can, if not they shouldn't be required. no
one is doing rdf now.

ls: what issue are you concerned about with respect to uri?
ad: like ontology search. give me all features of this das type, you
then have to give the url. this is different than id.
ls: completely happy treating id as the last component of uri and
doing a paste. why don't you like the paste?
ad: you can get features from two diff places, each ending with same
last word.
ls: what query is it that allows you to filter by feature id? we have
positional, type filtering and getting a single feature from server of
origin.
gh: there shouldn't be an id filter. just resolving uri for that
feature.
ls: we can't search a feature by regex match on it's id.
ad: i'm not saying that. I'm suggesting that the url be optional.
ls: I don't understand the point.
gh: why can't uri be required?
ad: see use case in email today subject="ids and urls". involves
uploading das tracks to a server.

[some trouble: not everyone has seen it]

ls: I say we have a policy that if there is big discussion, the email
should come more than 30 minutes before conf call.
gh: I've read most of it and am still confused.
ls: I still don't understand it after reading. you'll have to rephrase
it.
ad: all types and features have id and url.
ls: no, explain in a follow up email.
ad: ok

[A] Andrew will send follow up email to elaborate on his "ids and urls" use
case

[A] Everyone will try to absorb andrew's ids and urls use case

Topic: Style Sheets
-------------------

ad: how do you refer to elements in style sheets, by id or url?
gh: no opinion
ad: if everything is refered to by id, that makes style sheets easier to
write.
gh: has anyone gotten to implementation of style sheets for das/2?
ad: my proposal was a straw man.

Topic: Status reports
---------------------

gh: reading lots of specs. after yesterday's rant about xml:base last
night, implemented a stack. works fine for our current server.
we shouldn't throw out xml:base because of a few edge cases. we might
want to specify which subset of xml:base we use.
checked in code for igb client, does capabilities, specify feat,
types, segments. trouble when modeling sequences.

ee: working on das/2 client. building new widget as gregg asked for.

ad: working with andreas write up for registry.

td: understanding the spec. xml parsing.
gh: you are using stacks, have experience with it?
td: yes, less painful. streaming api for xml.
gh: tried xom. picky about namespaces. difficult to use with spec
that's not stable.
td: some trouble with dom
gh: sources, types, segments I use dom (small document). for features
use sax

nh: progress with apollo. list of versioned sources, show segments,
user picks, gets features. something that the parser doesn't like.
not sure where the problem comes from.

sc: working on setting up internal das server on 64bit machine
here. refining the pipeline for generating files for loading the affy
das server with updated data for various public and affy data
sources. also writing up and posting meeting notes.

aday: message from gavin about ontology responses. caching issue cased
trouble with model/controller. chris's obo dtd.
dependencies for server rpm were finished. now building the rpm.

td: prsing xml from codesprint server. a few things are matching the
spec from a few weeks back. prop, loc elements. will these be changed.
aday: feature xml?
td: yes. I'm still absorbing the changes, dozens of mails about feat
properties.
gh: more important is loc element, splitting into id and range. used
to be one thing, now is two. one is id, other is start,end,strand.
aday: will look into today.

nh: I'm also taking charge of getting grant progress report
done. especially need allen re: server, andreas via registry.

gh: any reports for write back.
brian: some work on that. not ready for prime time.
gh: roy?
ad: some talk about this puts and deletes on the urls.
gh: let's talk about it tomorrow.


From td2 at sanger.ac.uk  Wed Feb  8 23:20:34 2006
From: td2 at sanger.ac.uk (Thomas Down)
Date: Wed, 8 Feb 2006 23:20:34 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
Message-ID: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk>

[I should prefix my comments here by saying that I don't actually  
have a terribly strong opinion on this matter *except that* I'd  
really like the spec to be explicit on how feature query language  
works...  Does it go .../features?type=exon, .../features?type=types/ 
exon, or .../features?type=http://das2.sanger.ac.uk/ensembl35/types/ 
exon?].

Anyway, I'm still having a bit of trouble seeing why features need  
individually GETable URIs.  The use case I remember from the  
conference call was that it would be nice to be able to describe DAS/ 
2 features in RDF documents.  I guess that makes sense to me, but for  
this purpose is there anything wrong with a URI like:

            http://das2.sanger.ac.uk/ensembl35/features#id12345

This seems compatible with Andrew's ID proposal.

My memory of RDF/DAML/OWL/etc is that most objects which get  
described in such documents are actually fragment identifiers in  
larger documents, rather than individually GETable entities.  Am I  
missing something here?

                Thomas


On 8 Feb 2006, at 18:12, Helt,Gregg wrote:

>       Regarding using URIs for DAS features, here's the quote from  
> Paul
> Prescod that I used in the original DAS/2 grant proposal addressing  
> the
> question "why use URIs?".  From
> http://www.prescod.net/rest/rpc_for_get.html :
>
> You can give that URI address to anyone, anywhere and they can  
> reuse it.
> In particular this means that we can compose applications that were  
> not
> thought of in advance. Google is an example of an application that was
> composed "after the fact" out of URIs. Yahoo is another...There are a
> raft of deployed W3C recommendations that work with information  
> related
> through URIs. Many of these are XML-related specifications that  
> work as
> well in API-like applications as in user interface-based applications.
> These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery,
> xml-stylesheet.  Information published through HTTP URIs can be  
> combined
> through XInclude, queried and sorted through XQuery and XSLT, visually
> rendered with xml-stylesheet, related through RDF, linked through  
> XLink,
> pointed into through XPointer.
>
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Thu Feb  9 09:35:19 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 09:35:19 +0000
Subject: [DAS2] Re: New DAS/2 server for codesprint
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9C5@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C5@msex02.affymetrix.com>
Message-ID: <f9b035387c49e21c30707eb2df61c3b2@dalkescientific.com>

In the das2/scratch directory is a program called "verify_examples.py"
I ran it against

http://das.biopackages.net/das/genome/yeast/S228C/feature?overlaps=chrVI
I/364251:366080;type=SO:gene

as follows

[guest276:das/das2/scratch] dalke% python ./verify_examples.py
load FEATURES  
"http://das.biopackages.net/das/genome/yeast/S228C/feature? 
overlaps=chrVII/364251:366080;type=SO:gene"
! expected root tag  
'{http://www.biodas.org/ns/das/genome/2.00}FEATURES' got  
'{http://www.biodas.org/ns/das/2.00}FEATURELIST'
^D
[guest276:das/das2/scratch] dalke%

That is, it's a simple command language.  The command to
load a URL of the given type is

   load FEATURES "url"

In this case it warns that the top-level name is "FEATURELIST"
instead of "FEATURES", which is something that was changed
last summer, I think.

Saving locally and editing by hand I then get

! expected root tag  
'{http://www.biodas.org/ns/das/genome/2.00}FEATURES' got  
'{http://www.biodas.org/ns/das/2.00}FEATURES'

That's because

<FEATURES
   xmlns="http://www.biodas.org/ns/das/2.00"

should be

<FEATURES
   xmlns="http://www.biodas.org/ns/das/genome/2.00"

according to the spec.  I don't like the namespace though.

*** Does anyone mind if we change the namespace URL?  ***

Next is
* fatal: file not found: http://www.biodas.org/dtd/das2feature.dtd

That occurs because the XML says that it requires the DTD
to be understood (with the 'standalone="no"' at the top)

Taking that out and the DTD link,

*  
file:///Users/dalke/cvses/das/das2/scratch/biopackages_features.xml:10: 
4: error: attribute "type" not allowed at this point; ignored

That should be "type_id" instead of "type".  I've used "id"
as a convention to indicate that something is a URL inside of
DAS.  Change it to "url" or "uri" instead?

The PARENT should be after the LOC.  However, I think that the
ordering requirement is too fragile so I'll change the schema
so the elements can go in more arbitrary order.

There was an issue with the <PROP> element.  I'll explain
in the next email.

*  
file:///Users/dalke/cvses/das/das2/scratch/biopackages_features.xml:95: 
57: error: element "LOC" from namespace  
"http://www.biodas.org/ns/das/genome/2.00" not allowed in this context

That came from

   <FEATURE
     id="feature/Affymetrix_YG-S98:3128_at"
     type_id="type/SO:PCR_product"
     name="Affymetrix_YG-S98:3128_at"
   >

       <LOC id="segment/chrVII" range="319671:774428:-1"/>
       <LOC id="segment/chrVII" range="735985:736081:1"/>

   </FEATURE>


The RNC had a bug - it only allowed a single LOC element.  Fixed.


I've updated the schema and committed a copy of a features
data set from Allen's server to CVS under
    das/das2/scratch/biopackages_features.xml


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Feb  9 10:00:45 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 10:00:45 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
	<7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk>
Message-ID: <631777ea4ff08f68b6dde657effc18fe@dalkescientific.com>

Thomas Down wrote:
> Anyway, I'm still having a bit of trouble seeing why features need 
> individually GETable URIs.  The use case I remember from the 
> conference call was that it would be nice to be able to describe DAS/2 
> features in RDF documents.  I guess that makes sense to me, but for 
> this purpose is there anything wrong with a URI like:
>
>            http://das2.sanger.ac.uk/ensembl35/features#id12345

For that matter, the spec doesn't at present say that the
individual URLs need to be fetchable.  A client could treat them
as opaque and unresolvable URLs and still do what it wants.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Feb  9 11:15:18 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 11:15:18 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <4d48d7749f04bb97f6791a5726230fee@dalkescientific.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
	<4d48d7749f04bb97f6791a5726230fee@dalkescientific.com>
Message-ID: <4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com>

I'm going to incur the possibility of pitchforks here.. :)

Me:
> Yes.  I like URLs.  I've been so in favor of URLs that until
> this morning I had in the spec that the "id" *is* the URL.
> There was no short form for the URL.  (still /is/ no short form
> since it hasn't changed ;)
>

> I'm now going to be either stubborn or irritating or both.
> Why have an id at all?  That is, why at all have a short string
> (say of the form /[A-Za-z0-9_]/ when the URL is there and
> meets all the functional requirements of an identifier?

Here's the change - or not change since it reflects the
current spec.

Features and types have a single "id".  That id is a uri
in all its glory.

Referring to Dave's email, yes, special characters are
included - this is a uri.  Looking at
   http://blog.bitflux.ch/wiki/GetElementById_Pitfalls
the getElementById refers to the attribute with type "ID"
which happens to be named "id" for XHTML and SVG.  Given
   http://www.w3.org/TR/xml-id/
I have added xml:id as a common attribute for all of the
DAS items for independent and optional identification of
an element in a document.

There is no short-form id for features and types.  Queries
are done using the full URL.  For example, to find all elements
of type "http://www.example.com/das2/human/1/type/T12345" the
query string (assuming the query url is ".../1/feature_search.cgi")

   http://www.example.com/das2/human/1/feature_search.cgi?
type=http%3A%2F%2Fwww.example.com%2Fdas2%2Fhuman%2F1%2Ftype%2FT12345

The single and sole exception is for range queries.  Each
segment has a URL and a "name" attribute.  This name is a
unique short-form identifier used for range queries.  The name is
of the form /[A-Za-z_][A-Za-z_0-9]*/ .  To do a range query
for all features on a segment with name Chr1 and range 50 to 100 use
the format "X/50:100" and the query looks like

   http://www.example.com/das2/human/1/feature_search.cgi?
overlaps=X%2F50%3A100

The reason for this exception is three-fold:
   - the syntax for merging the URL and two/three fields became ugly
   - Gregg wants to send multiple ranges at a time, if the client
       knows enough about what it has already
   - the client may consult one of several reference servers given
      the coordinate system for the annotations.

These do not hold for feature types (features are independent
objects; there will be at most a handful in most servers; the
types are specific to the given set of features)

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Feb  9 11:41:35 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 11:41:35 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
	<4d48d7749f04bb97f6791a5726230fee@dalkescientific.com>
	<4baed72f7f93ba4a90e521da4e27cc58@dalkescientific.com>
Message-ID: <0255ae96de376ffd89e2af0d9766aed6@dalkescientific.com>

> I'm going to incur the possibility of pitchforks here.. :)

To mollify or intensify the pitchforks ...

Several people have said that "the id is the last component
of the URL" or "the URL is the base + '/' + the id".

That's what DAS1 did.  I don't like URL construction like
this. It makes the URL organization imposed by the specification
when it doesn't need to do so.  For example,

Allen prefers his URLs like this
    /feature?this=that    is the query interface
    /feature/F00001       is an identifier for the features

I might like it like this
    /feature_search.cgi?..   is the query interface
    /feature/F00001       is an identifier for the features

Still others as
    /features?this=that   is the query interface
    /feature/exon/A1      is an identifier for the features
    /feature/contig/A     is another identifier for the features

** NOTE: in this case the "last term of the URL" is not
sufficient as a unique short-form id  **

Or still others as
    /cgi-bin/fsearch.rb?this=that   is the query interface
    /data/F1                   is an identifier for the features
    /data/F2                   is another identifier

One advantage to hard-coding the URL organization into the
spec is the tradition from DAS1, and the general practice of
expecting one-off URL schemes during web scraping.

Another is that people understand it more easily.  It's
a lot easier to write out examples in one naming scheme than
it is to say "using the identifier from the record ..."

On the other hand, the programming is easier.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Feb  9 11:48:02 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 11:48:02 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <631777ea4ff08f68b6dde657effc18fe@dalkescientific.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C1@msex02.affymetrix.com>
	<7C8A251D-7712-489A-BAEB-4701F46BAF1D@sanger.ac.uk>
	<631777ea4ff08f68b6dde657effc18fe@dalkescientific.com>
Message-ID: <2878cecec027ce28826c48d1a3a68e30@dalkescientific.com>

Churn factor:

The only part of the spec that changes is the query interface
for types.  The type feature filter must take a full URL and
not a partial URL nor a non-existant 'short id'.

Allen's server does not support queries given the full URL.

Here's what the spec says -- note that it quotes the previous
draft and I added some comments.

> Query parameter "type"
>
>   type=type_url
>
> Example:
>   $FQ?type=http%3A%2F%2Fwww.biodas.org%2FtypeA
>
> Match features with the given feature type.
>
> XXX the previous version of this document says
>
> Match features of the given type. A type is one of:
>   1. a typeid returned by the feature type document described
>   earlier. Only features exactly matching the type are returned.
>
>   2. a sequence ontology term, such as "exon". Features matching the
>   term or *any of its ISA descendents* are returned.
>
>   3. a sequence ontology accession number, such as SO:12345. Features
>   matching the accession number or *any of its ISA descendents* are
>   returned.
>
>   4. a reserved type beginning with the namespace "das:". The only such
>   reserved type is currently "das:feature-lock", used for feature
>   updating.
>
> XXX I think we should only have it do 1.  For 2 and 3 use the query
> parameter 'ontology'.  For 4, use a different query term, or don't use
> locks as features.

Based on the discussion yesterday, this changes to:
   1. we support this one, with fully resolved URLs
   2. the searching is done in the client so this option is removed
   3. the searching is done in the client so this option is removed
   4. we can always define "http://www.biodas.org/spec/special-type"
as a URL to send to the server if we want to define a special query.


					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Thu Feb  9 15:27:57 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 07:27:57 -0800
Subject: [DAS2] Why use URIs for feature IDs?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9C9@msex02.affymetrix.com>

I think that as Thomas says, using URI fragment notation, 
http://das2.sanger.ac.uk/ensembl35/features#id12345
is a perfectly valid URI and thus is acceptable as a feature ID.

But, if the intent is to construct feature URIs using fragment
identifiers in combination with either ID attributes (as defined in a
DTD) or xml:id attributes, as an alternative approach to URI = ID
attribute with xml:base resolution, I think it would get messy.

As I understand it a fragment identifier approach would mean
URI = (URL of doc feature XML is embedded in) + "#" + value of feature's
ID attribute.  But then if the feature is returned as part of a query,
say:
http://das2.sanger.ac.uk/ensembl35/features?overlaps=chr1/10000:20000
and the feature with attribute id="id12345", then the feature URI using
standard fragment notation would be 
http://das2.sanger.ac.uk/ensembl35/features?overlaps=chr1/10000:20000#id
12345
In other words there would be a very large number of possible feature
URIs, with query string gunk in them, identifying the same feature.
Unless we define a nonstandard way of constructing fragment identifiers
that chops off the query string.

Instead of something nonstandard I'd rather use xml:base, adhere to the
XML Base spec, and allow the feature id attribute to be full or relative
URIs.  Then specifying in the top element that 
xml:base = http://das2.sanger.ac.uk/ensembl35/features/, a feature
returned by the features query whose with attribute id="id12345"
resolves the feature URI to:
http://das2.sanger.ac.uk/ensembl35/features/id12345

There might even be a way to fiddle with xml:base and id to use a "#"
instead of the last "/", though I'm not at all sure about that.

	gregg

> From: Thomas Down [mailto:td2 at sanger.ac.uk]
> Sent: Wednesday, February 08, 2006 3:21 PM
> To: Helt,Gregg
> Cc: DAS/2
> Subject: Re: [DAS2] Why use URIs for feature IDs?
> 
> [I should prefix my comments here by saying that I don't actually
> have a terribly strong opinion on this matter *except that* I'd
> really like the spec to be explicit on how feature query language
> works...  Does it go .../features?type=exon, .../features?type=types/
> exon, or .../features?type=http://das2.sanger.ac.uk/ensembl35/types/
> exon?].
> 
> Anyway, I'm still having a bit of trouble seeing why features need
> individually GETable URIs.  The use case I remember from the
> conference call was that it would be nice to be able to describe DAS/
> 2 features in RDF documents.  I guess that makes sense to me, but for
> this purpose is there anything wrong with a URI like:
> 
>             http://das2.sanger.ac.uk/ensembl35/features#id12345
> 
> This seems compatible with Andrew's ID proposal.
> 
> My memory of RDF/DAML/OWL/etc is that most objects which get
> described in such documents are actually fragment identifiers in
> larger documents, rather than individually GETable entities.  Am I
> missing something here?
> 
>                 Thomas
> 
> 
> On 8 Feb 2006, at 18:12, Helt,Gregg wrote:
> 
> >       Regarding using URIs for DAS features, here's the quote from
> > Paul
> > Prescod that I used in the original DAS/2 grant proposal addressing
> > the
> > question "why use URIs?".  From
> > http://www.prescod.net/rest/rpc_for_get.html :
> >
> > You can give that URI address to anyone, anywhere and they can
> > reuse it.
> > In particular this means that we can compose applications that were
> > not
> > thought of in advance. Google is an example of an application that
was
> > composed "after the fact" out of URIs. Yahoo is another...There are
a
> > raft of deployed W3C recommendations that work with information
> > related
> > through URIs. Many of these are XML-related specifications that
> > work as
> > well in API-like applications as in user interface-based
applications.
> > These include: XPath, XPointer, XSLT, XLink, RDF, XInclude, XQuery,
> > xml-stylesheet.  Information published through HTTP URIs can be
> > combined
> > through XInclude, queried and sorted through XQuery and XSLT,
visually
> > rendered with xml-stylesheet, related through RDF, linked through
> > XLink,
> > pointed into through XPointer.
> >
> >
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2


From dalke at dalkescientific.com  Thu Feb  9 15:43:27 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 15:43:27 +0000
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9C9@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C9@msex02.affymetrix.com>
Message-ID: <5920623233379c4200775188315082bb@dalkescientific.com>

Gregg
> As I understand it a fragment identifier approach would mean
> URI = (URL of doc feature XML is embedded in) + "#" + value of 
> feature's
> ID attribute.

As I understand it the part after the '#' is a query language
which is document type specific and used by the client.  DAS does not
define how that query language is used, so it has no meaning in the
DAS world.

http://www.ietf.org/rfc/rfc2396.txt

4. URI References

    The term "URI-reference" is used here to denote the common usage of a
    resource identifier.  A URI reference may be absolute or relative,
    and may have additional information attached in the form of a
    fragment identifier.  However, "the URI" that results from such a
    reference includes only the absolute URI after the fragment
    identifier (if any) is removed and after any relative URI is resolved
    to its absolute form.  Although it is possible to limit the
    discussion of URI syntax and semantics to that of the absolute
    result, most usage of URI is within general URI references, and it is
    impossible to obtain the URI from such a reference without also
    parsing the fragment and resolving the relative form.
  ....
4.1. Fragment Identifier

    When a URI reference is used to perform a retrieval action on the
    identified resource, the optional fragment identifier, separated from
    the URI by a crosshatch ("#") character, consists of additional
    reference information to be interpreted by the user agent after the
    retrieval action has been successfully completed.  As such, it is not
    part of a URI, but is often used in conjunction with a URI.

       fragment      = *uric

    The semantics of a fragment identifier is a property of the data
    resulting from a retrieval action, regardless of the type of URI used
    in the reference.  Therefore, the format and interpretation of
    fragment identifiers is dependent on the media type [RFC2046] of the
    retrieval result.  The character restrictions described in Section 2

    for URI also apply to the fragment in a URI-reference.  Individual
    media types may define additional restrictions or structure within
    the fragment for specifying different types of "partial views" that
    can be identified within that media type.

    A fragment identifier is only meaningful when a URI reference is
    intended for retrieval and the result of that retrieval is a document
    for which the identified fragment is consistently defined.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Thu Feb  9 15:53:38 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 15:53:38 +0000
Subject: [DAS2] writeback via diffs
Message-ID: <7a182cd18dacf110341f5cec43436f38@dalkescientific.com>

Summary: We've been talking about the "update via a delta" model
as an alternative to the "lots of changes to the server" model.
Deltas mean the heavy work is done in the client (or middleware),
vs. the server.


We've been looking at the writeback spec.  It doesn't handle
the case of a complex feature with a parent/part relationship.

In the current scheme that's done as a:
   - get the write lock
   - POST the new feature (parent)
   - POST the new feature (child)
   - commit on the lock

What URL does the parent record have to point to the child?
Does the database defer referential integrity checks until
the commit on the lock?  Is this a case where the POST for
that feature returns an UPDATELIST document for every unknown/
placeholder identifier in the record?  Probably.

Another solution is to ask the server "give me two identifiers
which can be used for features".  (NOTE: must do this for
either URLs or 'short ids' because the client might guess
and override an existing feature.)  Cute. But no real takers here.


BTW, does the full DAS query system support searches of the
modified version of the server?  How does the server know that
the search request comes from a client working in an editable
view?

In talking about it we've been working on an idea we all
talked about last year; submitting a delta to the server
and moving the heavy work into the client.

That is, after the client is done locally it sends a
document which looks like

<WRITEBACK>
   <DELETE id="http://www/das/type/T12345" />
   <DELETE id="http://www/das/feature/exon/1" />
   <DELETE id="http://www/das/feature/exon/2" />
   <DELETE id="http://www/das/feature/contig/Ctg9" />
   <TYPES>
     <!-- this modifies an existing type -->
     <TYPE id="http://www/das/type/DEADBEEF">
       ... updated type information here ...
       <PROP key="name" value="Pa Cartwright" />
     </TYPE>
     <!-- this creates a new type -->
     <TYPE id="XXXXXXXXXX" >  <!-- see below for id discussion -->
     </TYPE>
  </TYPES>
  <FEATURES>
     <!-- this updates an existing feature -->
     <FEATURE id="http://www/das/feature/F9415"
          type_id="http://www/das/type/T12345">
       ...
     </FEATURE>
     <!-- this creates a new feature -->
     <FEATURE id="YYYYYY" type_id="http://www/das/type/T12345">
       ...
     </FEATURE>
   </FEATURES>
</WRITEBACK>

There are several things to note:
   - the <DELETE> elements, to remove existing types and features
   - the types and features are in the normal formats.
   - there is no way to update a part of a record/ the record
       is sent in full
   - new identifiers are still a problem


The use model for this is as follows, based on Otter.

   - get the SOURCES document, which will have

<CAPABILITY type="locks" url="http://www/../get_lock_info.pl" />

<CAPABILITY type="writeback" url="http://../post_updated_delta.py" />

   - get an exclusive write lock on a region
       - POST to the locks URL (and GET gets a list of the locks?)

       - only one region locked at a time (current spec allows the
          full query language; is that needed?)

       - user is authenticated via HTTP-level authentication
           (Q: allow https for any of this?)

       - optional timeout time in request; server may give shorter
           or longer timeout

       - user is allowed to edit all features in the given region

   - get all the features in that region  (because there may have
       been a commit before the write lock)

   - work with the data on the local copy of the server data

   - push the big red "COMMIT" button

   - server POSTS the delta to the server
       - user authentication again
       - also sends a lock-id or a nounce so the server can
           double-check that there wasn't some other change

   - server checks payload for referential integrity

The problem is the need for a URL.  We've come up with two
solutions.

   1. ask the server for things which can be used as identifiers.
These identifiers live for the life of the lock.

   2. reserve a private URI scheme, like "das-private:" followed
by a client-defined identifier.  On upload the server maps those
into valid local identifiers.  To work correctly for the client
the response document would need to contain mapping from private
identifiers to server identifiers.

The current spec uses the latter mechanism but does not specify
how the placeholder identifier is generated.  The mapping is
essentially the "UPDATELIST" from the current spec, though with
no need to support the status field on a per item basis - it
should be an all or none transaction.


Sending a delta gets rid of the DELETE and PUT (and POST update)
methods on the server.  Not ReSTful.  It places the burden on the
client for tracking the user edits instead of in the server.
But we have a good sense that it will work and is understandable.

It maps much more closely to the current Otter use.  We don't
know how Apollo/Chado wants to support writeback.

If we decide to stay with the existing ReSTy spec then our
recommendations are:

   - there's no need to support partial updates; clients send
the complete record to the server for update

   - the query language does not need to support the full
      DAS query language; only the "region" query (based on
      Otter experience)

   - there's no current need to extend the range of a lock
       nor to extend the time of the lock.

And I don't like that "lock=" is a parameter to the feature
and types URLs which creates locks for those types rather than
performs queries.  I would rather these be new URLs.

					Andrew
					dalke at dalkescientific.com


From lstein at cshl.edu  Thu Feb  9 16:12:32 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 9 Feb 2006 11:12:32 -0500
Subject: [DAS2] Why use URIs for feature IDs?
In-Reply-To: <5920623233379c4200775188315082bb@dalkescientific.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C9@msex02.affymetrix.com>
	<5920623233379c4200775188315082bb@dalkescientific.com>
Message-ID: <200602091112.33548.lstein@cshl.edu>

Hi Folks,

I've drunk the W3C Kool-Aid and do feel that a major feature of DAS/2 as it 
now stands is that all data objects are referenceable as URIs. Furthermore, I 
think it is a handy-dandy feature for them to be fetchable URLs as well, 
having, I suppose, drunk the REST Kool-Aid. For this reason, I prefer the / 
notation to the # notation. Over and above the fact that the #fragment is not 
a part of the URI at all (according to the part of the spec that Andrew 
quoted), a practical issue with the # notation is that all browsers (and, I 
believe, some client-side libraries, although not the Perl LWP) strip out the 
# and whatever follows it. The server never gets a chance to act on the 
fragment.

Since xml:base is giving us a hard time with respect to the queries, and 
causing major confusion and dissension in the group, I'd prefer to go with 
Andrew's strict idea of making all the IDs passed to the queries full URIs. 
In other words, including the properly escaped http://etc.etc in the query 
string. This is going to make it a bit annoying to debug servers from within 
browsers, but will clean up the semantics considerably and once and for all 
remove the confusion about who "owns" a feature versus who "serves" a 
feature.

Lincoln


On Thursday 09 February 2006 10:43, Andrew Dalke wrote:
> Gregg
>
> > As I understand it a fragment identifier approach would mean
> > URI = (URL of doc feature XML is embedded in) + "#" + value of
> > feature's
> > ID attribute.
>
> As I understand it the part after the '#' is a query language
> which is document type specific and used by the client.  DAS does not
> define how that query language is used, so it has no meaning in the
> DAS world.
>
> http://www.ietf.org/rfc/rfc2396.txt
>
> 4. URI References
>
>     The term "URI-reference" is used here to denote the common usage of a
>     resource identifier.  A URI reference may be absolute or relative,
>     and may have additional information attached in the form of a
>     fragment identifier.  However, "the URI" that results from such a
>     reference includes only the absolute URI after the fragment
>     identifier (if any) is removed and after any relative URI is resolved
>     to its absolute form.  Although it is possible to limit the
>     discussion of URI syntax and semantics to that of the absolute
>     result, most usage of URI is within general URI references, and it is
>     impossible to obtain the URI from such a reference without also
>     parsing the fragment and resolving the relative form.
>   ....
> 4.1. Fragment Identifier
>
>     When a URI reference is used to perform a retrieval action on the
>     identified resource, the optional fragment identifier, separated from
>     the URI by a crosshatch ("#") character, consists of additional
>     reference information to be interpreted by the user agent after the
>     retrieval action has been successfully completed.  As such, it is not
>     part of a URI, but is often used in conjunction with a URI.
>
>        fragment      = *uric
>
>     The semantics of a fragment identifier is a property of the data
>     resulting from a retrieval action, regardless of the type of URI used
>     in the reference.  Therefore, the format and interpretation of
>     fragment identifiers is dependent on the media type [RFC2046] of the
>     retrieval result.  The character restrictions described in Section 2
>
>     for URI also apply to the fragment in a URI-reference.  Individual
>     media types may define additional restrictions or structure within
>     the fragment for specifying different types of "partial views" that
>     can be identified within that media type.
>
>     A fragment identifier is only meaningful when a URI reference is
>     intended for retrieval and the result of that retrieval is a document
>     for which the identified fragment is consistently defined.
>
>
>
> 					Andrew
> 					dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Thu Feb  9 16:15:48 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Thu, 9 Feb 2006 11:15:48 -0500
Subject: [DAS2] RE: Working with xml:base in Java?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9C0@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9C0@msex02.affymetrix.com>
Message-ID: <200602091115.49675.lstein@cshl.edu>

The Perl libraries provide a very simple HTTP_Base attribute. As you parse 
your way through the XML, you can change the HTTP_Base using any of the 
relative or absolute address resolution modes, so that subsequent URLs are 
correctly resolved. Unfortunately it is a SAX model, so that you have to push 
previous bases onto a stack and restore them as needed.

Lincoln


On Wednesday 08 February 2006 05:12, Helt,Gregg wrote:
> > -----Original Message-----
> > From: Thomas Down [mailto:td2 at sanger.ac.uk]
> > Sent: Wednesday, February 08, 2006 12:45 AM
> > To: Helt,Gregg
> > Cc: DAS/2
> > Subject: Re: Working with xml:base in Java?
> >
> > On 7 Feb 2006, at 19:00, Helt,Gregg wrote:
> > > 	Thomas, I'm wondering what toolkits you're using for binding XML
> > > to Java objects?  And particularly how you are dealing with
>
> resolving
>
> > > URIs when xml:base is used.  So far I've mostly used various
> > > implementations of SAX and DOM -- I've found some reports of builtin
> > > xml:base support in Xerces SAX/DOM, but it's still unclear.
> > >
> > > 	I've been avoiding the issue up till now.  It won't be too hard
> > > to implement URI resolution relative to xml:base, but I thought I'd
> > > check around first and see if there's automated support of this in
> > > some
> > > toolkit.
> >
> > Hi Greg,
> >
> > I'm actually using Stax (the streaming API for XML).  The
> > implementation I use is called Woodstox:
> >
> >           http://woodstox.codehaus.org/
>
> I would like to check out Stax, haven't used it before.
>
> > (but there are a few others out there).  No builtin xml:base support
> > but it's easy to write a little wrapper around XMLStreamReader to
> > spot xml:base attributes and maintain a stack of base URIs.
> >
> > I'm using java.net.URI to do the URI handling/resolution/
> > relativization.  Seems to be working okay... so far...
>
> That's what I was thinking about when I said it wouldn't be too hard to
> implement... But that was yesterday.  A long time ago.
>
> Now I've taken a detour into re-reading the XML Base spec
> http://www.w3.org/TR/xmlbase/, and things don't seem so easy.
>
> I _think_ if there's at least one xml:base attribute in the element
> hierarchy above where you're trying to determine a base URI, and
> resolution of those xml:base attributes yields an absolute URI, it's all
> good, that's the  base URI.  But on the other hand if this resolution
> yields a relative URI instead of an absolute URI I'm not sure what
> happens -- I would guess it's an error, but I can't see anywhere in the
> XML Base spec that spells this out.  And if there's no xml:base to use
> to determine a base URI, things get weird:
>    if the document is "encapsulated within another entity", the base URI
> is the URI of that entity (I have no idea if DAS/2 docs could appear in
> such a context)
>    otherwise the base URI is the URI used to retrieve the document
>    oh, except if you burrow down into the spec pointers to RFC 2396
> http://www.ietf.org/rfc/rfc2396.txt, if the request gets redirected you
> need to make sure the base URI is the last URI used in the redirect
>    oh yeah, and apparently external entity declarations can affect all
> of this in ways I don't understand
>    and there's probably other gotchas I've missed...
>
> Now from the server side, none of this is really an issue.  Just pick
> from a multitude of variants that XML Base allows when you send
> responses to the client.  From the client side, if we really want DAS/2
> to support XML Base (and I think we do), things get tricky.  It's
> definitely pushing me towards using libraries that provide builtin
> support for XML Base.
>
> 	Gregg
>
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dalke at dalkescientific.com  Thu Feb  9 16:37:12 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 16:37:12 +0000
Subject: ids and URLs (was Re: [DAS2] Ontologies in DAS/2)
In-Reply-To: <94bafd156da54842f9093244ca6083d1@fruitfly.org>
References: <C71929195D04BF48BAECD499AF717B480198C9B4@msex02.affymetrix.com>	<Pine.OSX.4.58.0602070908130.19561@adsl-68-126-147-89.dsl.pltn13.pacbell.net>	<Pine.LNX.4.58.0602071303540.15849@sumo.ctrl.ucla.edu>	<e00e781866375762b29061d2b510a10e@fruitfly.org>
	<Pine.LNX.4.58.0602071913010.29889@sumo.ctrl.ucla.edu>
	<43E9DC0E.30809@mrc-lmb.cam.ac.uk>
	<701ca3642e6ed0ea3e12fbd26991a3fb@dalkescientific.com>
	<94bafd156da54842f9093244ca6083d1@fruitfly.org>
Message-ID: <c538ce3b541fc8430dee213bf9f6b45f@dalkescientific.com>

[Top-posting summary]

I agree with Chris that the DAS "type"s aren't really types.

Chris Mungall:
> I'm mostly skim the messages here, so I may be missing something, but 
> I'm a little confused by this:
>
> On Feb 8, 2006, at 8:36 AM, Andrew Dalke wrote:
>
>>
>>     http://das.server/../types?ontology=SO:exon
>
> I don't understand this - SO:exon isn't an ontology

I made it up; I mean "whatever the SO term is for an exon".
I think it's SO:0005845 ("single_exon") or SO:0000147 ("exon")


>> PROPOSAL:  Add a "source=" (case-insensitive substring search)
>> field to the types query.  (I don't think there is any contention
>> here so I'll add it.)
>>
>>     http://das.server/../types?ontology=SO:exon;source=Vega
>
> What does 'types' return? A type from an ontology (eg SO:exon) or 
> something else? Why would source be recorded here? Surely source would 
> be a valid constraint on a feature query, but not a type query.

A DAS type is a somewhat strange thing, in the type sense.  It
stores:
   - the link to the ontology
   - a list of the formats available for features of that type
   - this "source" field
   - potentially some per-source data used for depiction, or
      perhaps not

Thomas Down here has this use case.

He has a program which searches for exons.  All of the annotations
it makes for a month are from that program.  He wants them to be
the same type - conceptually "the exons predicted by the program".

Some of that data could be moved into the feature. The feature
can point directly to the ontology, and have a "source".

> Perhaps it's the case that in DAS a 'type' means some kind of 
> arbitrary grouping (eg features of type X and source Y), and 
> 'ontology' means a
> term/type from an ontology. If it isn't too late I'd suggest changing
> these conventions.

That is more like the case.  Got a better name.  "class"?  ROFL.  Or 
not.

It is not a type system.  It is closer to a group than
anything else.  I agree that "type" has connotations which are
not true for this case.

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Thu Feb  9 16:40:34 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 08:40:34 -0800
Subject: [DAS2] Why use URIs for feature IDs?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9CB@msex02.affymetrix.com>

Interesting, I hadn't fully absorbed part 4 of the URI spec (rfc2396).
So if I understand correctly:

If we replace everywhere we've called something a "URI" with "URI
reference" we're being correct -- a URI reference can be an absolute or
relative URI, and can also include a fragment identifier.  And according
to the spec saying "the URI" means the absolute URI, not the relative
URI.  So to restate, I think the ids we use in DAS/2 should be URI
references.  Maybe instead of "id" or "uri" we should use "uri_ref" for
the attribute name?

I still see no reason to exclude URI references with fragment
identifiers, though I agree with Lincoln that actually resolving a URL
with a fragment is problematic.  But we're not guaranteeing that these
URI references are URLs anyway.

The capabilities "query_id" attributes are another story.  These need to
be not just URI references but also resolve via XML-Base to full URLs.

	gregg  

> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Thursday, February 09, 2006 7:43 AM
> To: DAS/2
> Subject: Re: [DAS2] Why use URIs for feature IDs?
> 
> Gregg
> > As I understand it a fragment identifier approach would mean
> > URI = (URL of doc feature XML is embedded in) + "#" + value of
> > feature's
> > ID attribute.
> 
> As I understand it the part after the '#' is a query language
> which is document type specific and used by the client.  DAS does not
> define how that query language is used, so it has no meaning in the
> DAS world.
> 
> http://www.ietf.org/rfc/rfc2396.txt
> 
> 4. URI References
> 
>     The term "URI-reference" is used here to denote the common usage
of a
>     resource identifier.  A URI reference may be absolute or relative,
>     and may have additional information attached in the form of a
>     fragment identifier.  However, "the URI" that results from such a
>     reference includes only the absolute URI after the fragment
>     identifier (if any) is removed and after any relative URI is
resolved
>     to its absolute form.  Although it is possible to limit the
>     discussion of URI syntax and semantics to that of the absolute
>     result, most usage of URI is within general URI references, and it
is
>     impossible to obtain the URI from such a reference without also
>     parsing the fragment and resolving the relative form.
>   ....
> 4.1. Fragment Identifier
> 
>     When a URI reference is used to perform a retrieval action on the
>     identified resource, the optional fragment identifier, separated
from
>     the URI by a crosshatch ("#") character, consists of additional
>     reference information to be interpreted by the user agent after
the
>     retrieval action has been successfully completed.  As such, it is
not
>     part of a URI, but is often used in conjunction with a URI.
> 
>        fragment      = *uric
> 
>     The semantics of a fragment identifier is a property of the data
>     resulting from a retrieval action, regardless of the type of URI
used
>     in the reference.  Therefore, the format and interpretation of
>     fragment identifiers is dependent on the media type [RFC2046] of
the
>     retrieval result.  The character restrictions described in Section
2
> 
>     for URI also apply to the fragment in a URI-reference.  Individual
>     media types may define additional restrictions or structure within
>     the fragment for specifying different types of "partial views"
that
>     can be identified within that media type.
> 
>     A fragment identifier is only meaningful when a URI reference is
>     intended for retrieval and the result of that retrieval is a
document
>     for which the identified fragment is consistently defined.
> 
> 
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From Gregg_Helt at affymetrix.com  Thu Feb  9 16:57:02 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 08:57:02 -0800
Subject: [DAS2] Proposed agenda for DAS/2 code sprint teleconference, Feb 9
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9CC@msex02.affymetrix.com>

ids for features, sequences, types, etc.
stylesheets
writeback
update to NIH grant proposal
status report
 
Anything else we should add?
 

From dalke at dalkescientific.com  Thu Feb  9 18:28:48 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu, 9 Feb 2006 18:28:48 +0000
Subject: [DAS2] arbitrary data in writeback
Message-ID: <a64069f19dbc770901366184200854e2@dalkescientific.com>

The DAS spec for features looks something like this

<FEATURES>
  <FEATURE>
   ...
   <PROP key="name" value="some data goes here" />
   <PROP key="homepage" href="http://blah/" />
   <PROP key="icon" mimetype="image/png">
iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFAQAAAABUH0DFAAAANUlEQVR4nGIM1AcAAAD// 
2LiYgAA
AAD//2I6wAAAAAD//2JycAAAAAD//2L6dgAAAAD//wMAEoUDipTjFscAAAAASUVORK5CYII=
   </PROP>

   <some_non_das_namespace:curation-history>
     ...
   </some_non_das_namespace:curation-history>

   <flybase:substitution>
      ..
   </flybase:substitution>
  </FEATURE>
</FEATURES>

There are two points for extension.  One is the PROP table
which is meant to be simple.  Clients can do substring searches
of PROP elements with "value"s, as in

    prop-name=blah+blah

All clients should be able to understand these data formats, though
there is no constraint for the key names.  They are convention only.

Right now a key gets either a string, a URL, or a chuck of binary data
which is uuencoded.  (The key can be present many times; is that
a problem with Apollo?)  The latter two (URL and binary data)
are *proposals*.  They are neat, but not based on user demand.  No
one has told me that they will use it.

Allen wants one more possibility, "existence", with no associated
value at all.  Nomi says that Apollo can't round-trip that data
except by also tracking the input XML.  I don't want a "it just
exists" field and would prefer those stored with an empty string.


Then there is the support for non-DAS elements as extensions.
These can contain arbitrary XML, so long as they are not in the
DAS XML namespace.

A client can ignore elements it doesn't understand.  However,
if it does writeback of a feature it *MUST* include all elements
it doesn't understand.  I can write that into the spec.

It doesn't need to do anything with that data.  It can keep it
around as a chunk of text.  It just needs to send it back to
the server when it does the writeback.

For that matter, it doesn't even need to keep it around.  It
can throw the unknown data to the wind and work with the stuff
it does know.  Just before doing the writeback, go back to the
server and get the features again.  From the documents get the
unknown extension elements and insert them into the data - as
text! - to be sent back to the server.

Clients may mess up and commit records without these elements.
The server will treat those as delete of those records.  Because
it cannot tell if the client really knows what to do with that
data.

This is the easiest solution as a spec writer.  We have nearly
all of the format for that transaction, excepting a bit about
being able to delete.

NOTE: a server may ignore the uploaded data.  For example, it
may modify the transaction history and throw out whatever the
client sent to it -- if that's how the <transaction-history>
element is specified.

The other solution is to be more fine grained, so that clients
send deltas, like

<FEATURES>
  <FEATURE>
   ...
   <PROP key="name" value="some data goes here" />
   <PROP key="homepage" href="http://blah/" />
   <PROP key="icon" mimetype="image/png">
iVBORw0KGgoAAAANSUhEUgAAAAoAAAAFAQAAAABUH0DFAAAANUlEQVR4nGIM1AcAAAD// 
2LiYgAA
AAD//2I6wAAAAAD//2JycAAAAAD//2L6dgAAAAD//wMAEoUDipTjFscAAAAASUVORK5CYII=
   </PROP>

   <delete>
     <some_non_das_namespace:curation-history />
   </delete>

   <replace>
     <flybase:substitution>
        ..
     </flybase:substitution>
   </replace>
  </FEATURE>
</FEATURES>

but that gets complex.  You end up with a grammar for the
deltas.  Eg, "delete the first 'some_non_das_namespace:curation-history'
but not the others".  It's a harder grammar to write and a
harder semantic to implement on client and server.


I don't understand the case where complete writeback is a problem.
There was the mention of if a client deletes a feature when it
shouldn't have because of extra data that it just didn't know about.

I didn't follow that at all.

Please enlighten me!  :)

					Andrew
					dalke at dalkescientific.com


From Steve_Chervitz at affymetrix.com  Thu Feb  9 19:06:03 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Thu, 09 Feb 2006 11:06:03 -0800
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
 9 Feb 2006
Message-ID: <C010D29B.1BDDB%Steve_Chervitz@affymetrix.com>

Notes from the DAS/2 teleconference for the code sprint, 9 Feb 2006

$Id: das2-teleconf-2006-02-09.txt,v 1.1 2006/02/09 19:13:39 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  Sanger: Thomas Down, Roy
  Sweden: Andrew Dalke
  UC Berkeley: Nomi Harris, Suzi Lewis
  UCLA: Allen Day, Brian O'connor
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


[note taker missed the first 5-10 minutes]

Topic: encoded URLs
-------------------

ls: apache bug - unesacped //. must be percent encoded or apache can
run into problems
gh: most people don't bother escaping, we should make this clear in
the spec. every major library has ways of doing this automatically.

[A] update spec to state: contained urls w/in das query urls should be
encoded

Topic: Style sheets
-------------------

ad: see Jan 26/27 email, "style sheet question"
what i described is not the same as what das/1 style sheets supply.
we already have a mechanism
gh: embed ss in types element?
ad: or, new capapbility or link server for a given source.
gh: prefer this
td: easy to have a single style element
gh: would a types elem have ptr to ss or do you query for the
capability?
ad: if no one's interested we don't have to answer the
question. sounds like no one's interested in style sheets.
gh: we'll keep what you have in the spec for style sheets and move on.
ls: what is it? 
ad: yes. style is embedded in type record. it's now on a per-element
basis. 
ls: ok with this. attributes of types. is there a need for a separate
ss? true it mixes presentation with data model. people will look for the
info they need and can ignore.
ls: transition to separate sheets - visual style id pointing to ss
url. same as with html. instead of 'i' tag moved to font style info.

Topic: Writeback
----------------

gh: discussion in progress in uk. how big a change from current
writeback spec?
ad: spec: server does modification to data. this proposal: client can
now do more stuff with the data.
gh: writeback for client is considerably harder, rarer to impl.
ad: issues: can you still do searches for modified data on server?
ls: building objs from bottom up (children, to parent) so everything
has a url.
ad: each feat has parent and a part.
ls: true. temporary id mechanism, response indicates mapping to local
id is.
what happens is: client locks, uploads parents, children with temp
ids, does referential integrity checking, then reports mapping from
temp to local id.
gh: doing http DELETE imposes a constraint
ls: how handling id issue?
gh: you need something to create new, real id
ad: b/c they're in one transaction, server can
ls: delete is a problem because http delete only permits one at a
time. updates a problem too. post that creates new objs allows you to
create multiple new objs at same time, but push and delete only
operate one at time.
ad: at this point don't want to change data model.
ls: so everything will be a post then, under your proposal, for
writeback url.
ad: a single post.
gh: moving from http delete to a
trying to understand how this is a delta model.
ad: only updates things that changed, and listed deletions
ls: fine. writeback, create update and delete sections
td: granularity. not single characters. one feature.
ls: three transactions we previously had, put, post, and delete, and
roll up into a single transaction.
gh: when you send back a feat you ve already seen, do you restate all
the xml for that feature, since otherwise it is deleted?
ad: yes.
gh: would like the unit of ro
ls: this achieves per transaction integrity, since you don't have to
do multiple deletes. the lock idea, had to persist over multiple
transactions to allow for that atomicity.
gh: we need to keep lock so curators can guarantee that nothing
changes underneath them.
td: lock corresponds to a db transaction as well.
ls: no one's impl this writeback so there's no friction against
changing it. i'm fine with it. as long as people don't mind we're
losing a cute feature described in a grant.
gh: what does roy or ed g. think?
roy: have been involved in this. this mirrors some features that otter
does. a good idea. deletes and put aren't big winners, if updating
multiple feats and they refer to each other.
roy: whole xml doc is the transcaction
ls: if anything doesn't make sense, all requests in the writeback doc
are rolled back.
roy: yes. some error messages to understand what might be going wrong.

gh: splits and merges work too? merging one feature from two, or
splitting one transcript into two.
roy: fits in well. get back two ids of new features. otter give a lot
back in the xml after posting the data.
gh: treats id in feat is a placeholder and it sends a real id back to
you. 
ls: your given a temporary placeholder then it give you real id.
might want to put a formal merge and split commands. because in
proposed new system (and old) to split one exon to two, you have to
either delete the original one, or update it to change one boundary
and create a new one. you've lost the ability to keep track of the
original and the two new ones.
ad: feats have place for arbitrary annotations. creational history log
could be maintained.
ls: how upload this to a server. splitting exon into two daughters is
different from deleting and creating two new ones.
ad: no needs this, for future.
gh: it's needed now.
ls: splitting genes into two pieces is important. people want to keep
track of this. formal merges and splits permits this tracking.
gh: my take, prefer fewer verbs as possible. if we can formally define
splits and merges as combos of delets and creates, perfer this.
ls: semantically difficult for server to know that a delete followed
by two creates is different than a split.
td: ancestor id on the features can solve this.
ad: haven't heard about this use case. features have place where you
can stick in new data. database can read it to understand history.
gh: like idea of curational track of ancestors. before, people said
we can't require dbs to do this.
td: optional property
ls: could thread it through feature properties.
ad: this version, or for 2.1?
gh: initial write back must support splits and merges.
[broad agreement]
ls: make sure it will work.
what happens when track of ancestors and the ancestor object disappears.
gh: can't assume a db has identifier for every curation in it's past
state.
roy: weakness of the current otter schema, james is working on a
fix. tag a release and go back to genes as of that release.
ls: acedb had this feature to rollback to older versions of gene
model.
aday: the schem we're using has support to previous version.
roy: tedious. big script, but a good thing to have.
ls: a few hours of more discussion to see what's involved in
supporting tracking curational merges, splits, renames, etc. to make
sure it's the write decision to put it into a curational property of
feature rather than having a formal database merges and split
operations. i'm ok doing it this way if it seems ok.
gh, aday: me too

Topic: NIH grant proposal
-------------------------

gh: i'm the bottle neck

Status reports:
---------------

gh: igb das client still. checked in code. you can get das2 client in
igb poiting to codesprint das2 server. sources, segments, types. no
features yet. working on this today. should go faster today.
ad: sent email to allen about some things about server that don't
agree with spec. properties
aday: features have no properties associated with them. do we need
valtype or href.
nh: a key with no value doesn't make sense. using 'true' if no value.
aday: ok. but need an agreement on what to do for properties with no
associated value or type
ad: can make it so.
aday: now put in empty string
ad: use for both value and href
aday: can't have both.
ad: what's interpretation if you have both?
can take out href part and have value= empty string
nh: client deals with empty value.
ad: leave it as a string
suzi: uneasy about this.
td: it does have a value, empty string.
suzi: some places where empty string doesn't make sense. data gets
dirty. if you're gonna have a tag-value structure, and may or may not
be a value, it's bad. some things are tag-value, some things just have
a value. it seems ambiguous, no guaranteed behavior.
ad: guaratee is for all keys to have a value. can be empty string.
gh: string or empty string is ok
ad: only used for clients who know what it means.
may have to update apollo
gh: if we allow arbitrary xml in features, client will have to
remember this xml or it will disappear.
ls: a huge issue w/ apollo in past. when communicating w/ db's that
have extra stuff, in the xml that isn't on client side data model.
suzi: my take, the client should not have to pass it all through.
nh: it forces client to be a complete database
gh: then the delta writeback
ls: works ok for deletes, updates become an issue
ad: you have to deal with text you don't understand.
ls: you have to keep track of tags you don't understand, other wise
they are deleted.
gh: trade off, simplicity of writeback, and what client has to
remember.
ls: client says: i don't understand it, but i can't delete it.
gh: how hard is it to have an abritrary xml chunk by client?
ls: give it an empty tag to say you want it to go away.
nh: how do you delete things that came in empty and you want to delete
them?
ls: can have attribute="delete me". this creates a burden on server
side. 
[client folks like this..]
decided to keep everything you know know and send it back. round trip
it.
ad: client can throw away what it wants. can go back to server
ls: boomerang.
gh: a variety of ways to make sure the data gets stored.
roy: will be in feature. just hold a pointer to it.
suxi: hard for apollow. passive round tripping is fine.. difficulty is
with deletes. ignoring stuff, don't know what it is. delete a
transcript or whole gene. some of that stuff you don't know what it
is, describes a mutant phenotype. you deleted from genomic record, but
there's other data that shouldn't be deleted. client would have to be
fully cognizant of it, beyond genome sequence features. client now
needs to model all the other data too.
ls: difficult to understand how a client could deal with it.
ad: just xml is a opaque chunk.
why can't client send back full record?
suzi: won't solve the full problem. if annotator said delete it
gh: client says delete that feature. it won't pass back any stuff
underneath the feature. some stuff underneath it that shouldn't be
deleted.
ad: that's what you have back ups for.
suzi: beyond this.
to deal with this, we made deletes be more atomic. had to be handled
at server side, otherwise, we have to put all that knowledge into
client. gets tied to a particular group.
ad: knowledge of what?
suzi: additional information
if you delete whole thing at top, any pass through data is also gone.
gh: not hard on client, just what does the server do with that?
suzi: this is why it belongs on server side. knows what matters and
what doesn't matter. if you don't want clients tied to a particular
db. that solution will be inadequate. we had to put the info on the
client and make the operations as fine grained as we could.

ap: writeback issues have been discussed. suggest to take this up
tomorrow. 
ad: could someone write up why a client couldn't just track the tings
that it wanted? then we can consider.


Status reports, cont'd
----------------------

roy: zmap client. can get sources and types from server. parsing it
creating internal objects. can't draw features yet. long discussion
about write back today.
ad: validator stuff
td: talking about writeback.
ap: working on registry. first das/2 server. distinguish between das/1
and das/2 via accession points.


brian: rpm build for allen's server. will post today at
biopackages.net
suzi: spoke to chris about web services for ontology. he will talk
with allen. thing about ids to deal with. also, if we do a web service
that isn't das like, it should be doable. should be able to get the
terms. also, if we want to have stop codon replacement, you also have
to say what position, what it's replaced with (uridine). how is this
done in das spec?
gh: can you post to the list?
suzi: yes. 
aday: will raise writeback issues as well.
suzi: small point mutations, indel, substitution (base and position)
aday: nearly got apache config file done, impl new std error
documents, 300, with error document.
nh: more apollo client progress. haven't dealt with types yet.
ee: igb improvements.
sc: pipeline for populating affy das server with array data. completed
pipeline for exon array design data.


From nomi at fruitfly.org  Thu Feb  9 20:08:33 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Thu, 9 Feb 2006 12:08:33 -0800 (PST)
Subject: [DAS2] unary properties
In-Reply-To: <a64069f19dbc770901366184200854e2@dalkescientific.com>
References: <a64069f19dbc770901366184200854e2@dalkescientific.com>
Message-ID: <17387.41281.765157.17683@kinked.lbl.gov>

On 9 February 2006, Andrew Dalke wrote:
 > Allen wants one more possibility, "existence", with no associated
 > value at all.  Nomi says that Apollo can't round-trip that data
 > except by also tracking the input XML.  I don't want a "it just
 > exists" field and would prefer those stored with an empty string.

fwiw, the empty string (rather than no string) doesn't help apollo--the
way it stores properties, if you ask for the value of property "foo" and
there's no "foo" in the property table, you get back "" (this was to
avoid having to put a million null-pointer checks).  so apollo would not
be able to differentiate--for purposes of writeback OR display without
apollo--between
    <PROP key="foo" value="">
and
    <PROP key="foo">
internally, both of these would look like "i don't know anything about
property foo," unless i saved them as "foo=true" when they were read in,
and then how would it know how to write them out correctly?

i would suggest that either
1. we use two different terms to differentiate between key/value
properties and properties that are valueless (though really i think they
are *keyless* rather than valueless).  perhaps the latter could be called
"attributes" or something?
   <PROP key="foo" value="true">
   <ATTRIBUTE value="foo">
(actually, ATTRIBUTE is probably a bad choice since it has a meaning in
xml, but you get the idea.)

OR (and i prefer this):
2. every property is required to have a key and either a value or an
href.
the valueless (or keyless) properties in the yeast data look like
      <PROP ptype="property/molecular_function unknown"/>

i guess these are like the default cases where other features might
(although i haven't seen any of these) have properties like
    <PROP key="molecular_function" value="transcription regulator activity">

but where did "property/molecular_function unknown" come from in the
first place?  what i think it should look like is
    <PROP key="molecular_function" value="unknown">

and then we avoid the whole keyless-property issue and make the
information more accessible to clients (and hence to users).  the way it
is now, it's an uninterpretable blob of text (really more of a comment
than a property), where as separating into key/value suddenly gives it
more meaning.

     Nomi


From Gregg_Helt at affymetrix.com  Thu Feb  9 20:05:14 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 12:05:14 -0800
Subject: [DAS2] unary properties
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9CD@msex02.affymetrix.com>

Looks to me like these might be GO terms, which should probably be
represented more like:

<PROP key="gene_ontology" value="rRNA modification" />

and possibly include an href to a description of that GO term.

Of course one could argue whether the attribute values should be URI
references rather than arbitrary strings, but you get the idea.

	gregg

> -----Original Message-----
> From: Nomi Harris [mailto:nomi at fruitfly.org]
> Sent: Thursday, February 09, 2006 12:56 PM
> To: Andrew Dalke; allenday at ucla.edu
> Cc: nomi at fruitfly.org; Helt,Gregg
> Subject: Re: [DAS2] unary properties
> 
> On 9 February 2006, Nomi Harris wrote:
>  > the valueless (or keyless) properties in the yeast data look like
>  >       <PROP ptype="property/molecular_function unknown"/>
> 
> i just looked at another region and found some more interesting
valuless
> (though i think they should be called keyless) properties:
> 
>         <PROP key="rRNA modification" value="" href=""/>
>       <PROP key="nucleolus" value="" href=""/>
>       <PROP key="snRNA 2'-O-ribose methylation guide activity"
value=""
> href=""/>
> 
> these really seem to me to be missing important information.
"nucleous"?
> we're going to randomly mention cell parts?  what this really should
say
> is
>       <PROP key="cellular_component" value="nucleolus"/>
> right?
> 
> so i think this is buggy data--it is missing the keys, and that should
be
> fixed.  in fact, i think having the spec insist that properties have
both
> key and value would help to catch errors like this.
> 
>         Nomi


From Gregg_Helt at affymetrix.com  Thu Feb  9 23:18:42 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 15:18:42 -0800
Subject: [DAS2] Refinements to range attribute and query filters in spec
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>

 
In the latest spec, the format for range queries is 
      seqid/min:max:strand
and the format for range attributes in feature elements is 
      min:max:strand
 
In the earlier spec
(http://biodas.org/documents/das2/das2_get.html#ranges) everything but
the seqid component of the range query was optional.  Are min and max
still optional, as in these examples from the previous version of the
spec?
    Chr1/1000     Chr1 beginning at position 1000 and going to the end.
    Chr1/:2000    Chr1 from the start to position 2000.
I personally find these kind of ranges confusing and not particularly
useful, and would rather make min and max required for both the range
attribute and range-based query filters. 
 
Also, the latest spec states: 
 
A region may be on the forward or reverse strand or on both strands.
These are respectively denoted 1, -1 and 0.  The reverse strand is the
reverse complement of the forward strand.  Unspecified strand means
forward strand.
 
So for a features query, are the four overlap filters below equivalent?
Chr1/1000:2000
Chr1/1000:2000:1
Chr1/1000:2000:-1
Chr1/1000:2000:0
Or does the addition of strand information further filter the returned
features by strand?  But if that's the case, then according to the spec
having no strand specified means forward.  So that would mean
overlaps="Chr1/1000:2000" would only return forward strand annotations,
and not any on the reverse strand?  To me that's counterintuitive, from
a filtering perspective I'd rather no strand info mean "both strands".
My main point though is we need to be explicit about how strand info or
lack thereof affects features queries with range-based filters.
 
      gregg
 

From suzi at fruitfly.org  Fri Feb 10 00:29:57 2006
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Thu, 9 Feb 2006 16:29:57 -0800
Subject: [DAS2] question or two
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
Message-ID: <54bc0e433303827918fe475855669a89@fruitfly.org>

if an annotator wants to indicate a stop-codon-readthrough (which may 
or may not be a seleno-cysteine mechanism). how would DAS send this 
info through? need SO type (the readthrough), the location (relative to 
transcript or genome), and the mechanism.

tRNA anticodon or AA?

alternative translation table? infer this from organism?

-S


From Gregg_Helt at affymetrix.com  Fri Feb 10 01:43:16 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Thu, 9 Feb 2006 17:43:16 -0800
Subject: [DAS2] feature NOTE and ALIAS elements?
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9D5@msex02.affymetrix.com>


> -----Original Message-----
> From: das2-bounces at portal.open-bio.org 
> [mailto:das2-bounces at portal.open-bio.org] On Behalf Of Andrew Dalke
> Sent: Tuesday, February 07, 2006 7:45 AM
> To: DAS/2
> Subject: Re: [DAS2] properties and queries
> 
> 
> To summarize, the current thought here for properties and 
> queries is as follows  (it's a long summary.  More like an essay.  :)
> 
> Add support for zero or more <NOTE> elements in the feature, 
> of the form
>    <NOTE>This is some arbitrary (but non-markup-ed) text</NOTE>
> 
> 
> Add a features search keyword "note=" which takes a search 
> string to be found in the note elements.  (substring? 
> soundex? regex? the search engine calls up Lincoln and asks?)
> 
> 
> Add support for zero or more <ALIAS> elements in the feature, 
> of the form
>    <ALIAS name="Zorro">
> 
> (I missed this in the redraft.  It should have been there. 
> Feature filter "name" already says it searches the "name" and 
> "alias" fields for a feature.)

Is the plan still as stated above, to have optional NOTE and ALIAS
elements in features?  I don't see these elements in the feature schema,
and the spec doc says they're built-in properties instead (values for
PROP key attribute that have defined meaning).

	Gregg
  

From td2 at sanger.ac.uk  Fri Feb 10 08:54:16 2006
From: td2 at sanger.ac.uk (Thomas Down)
Date: Fri, 10 Feb 2006 08:54:16 +0000
Subject: [DAS2] Refinements to range attribute and query filters in spec
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
Message-ID: <4A9E3BE1-9E24-4D25-AAD1-1851F18857D0@sanger.ac.uk>


On 9 Feb 2006, at 23:18, Helt,Gregg wrote:

>
> In the latest spec, the format for range queries is
>       seqid/min:max:strand
> and the format for range attributes in feature elements is
>       min:max:strand
>
> In the earlier spec
> (http://biodas.org/documents/das2/das2_get.html#ranges) everything but
> the seqid component of the range query was optional.  Are min and max
> still optional, as in these examples from the previous version of the
> spec?
>     Chr1/1000     Chr1 beginning at position 1000 and going to the  
> end.
>     Chr1/:2000    Chr1 from the start to position 2000.
> I personally find these kind of ranges confusing and not particularly
> useful, and would rather make min and max required for both the range
> attribute and range-based query filters.

I think it's reasonable for a client to want to fetch all features  
attached to a given sequence ID.  This would certainly be sensible  
behaviour for clients which always work on reasonably short sequences  
(e.g. protein-specialized clients), but even genome-centric clients  
might want to do this when they've had a hint that a particular  
feature type is "low density" (e.g. chromosome banding patterns?).

I'm not sure if anyone would want to query a range where only one of  
min and max are specified.

> Also, the latest spec states:
>
> A region may be on the forward or reverse strand or on both strands.
> These are respectively denoted 1, -1 and 0.  The reverse strand is the
> reverse complement of the forward strand.  Unspecified strand means
> forward strand.
>
> So for a features query, are the four overlap filters below  
> equivalent?
> Chr1/1000:2000
> Chr1/1000:2000:1
> Chr1/1000:2000:-1
> Chr1/1000:2000:0
> Or does the addition of strand information further filter the returned
> features by strand?  But if that's the case, then according to the  
> spec
> having no strand specified means forward.  So that would mean
> overlaps="Chr1/1000:2000" would only return forward strand  
> annotations,
> and not any on the reverse strand?  To me that's counterintuitive,  
> from
> a filtering perspective I'd rather no strand info mean "both strands".
> My main point though is we need to be explicit about how strand  
> info or
> lack thereof affects features queries with range-based filters.

Hmmm, I'd been interpreting Chr1/1000:2000 as "return features on  
both strands", but from the paragraph you quote I guess this is  
wrong.  I'd be happy to see this changes to "Unspecified strand means  
both strands".

              Thomas.


From dalke at dalkescientific.com  Fri Feb 10 10:47:26 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 10:47:26 +0000
Subject: [DAS2] Refinements to range attribute and query filters in spec
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9D3@msex02.affymetrix.com>
Message-ID: <c6739648961f07e66a22c6471b78211e@dalkescientific.com>

Gregg:
> In the latest spec, the format for range queries is
>       seqid/min:max:strand
> and the format for range attributes in feature elements is
>       min:max:strand


> I personally find these kind of ranges confusing and not particularly
> useful, and would rather make min and max required for both the range
> attribute and range-based query filters.

Agreed on this side.  All clients can easily get the upper limit,
and the lower limit is always 0.

> My main point though is we need to be explicit about how strand info or
> lack thereof affects features queries with range-based filters.

It was a confusion on my part.  There are three places which
refer to location + strand.

   1. specifying a feature location
   2. fetching a sequence
   3. doing a range search

"1. specifying a feature location"

We've been talking here about limiting the use of strands
for these.  Features definitely need a strand.  If the
strand is not specified then the feature is on both strands.
or has no meaning.  If needed, resolve the ambiguity by
looking at the type (or other property).  If you really,
really want to specify that it's on both strands then use
the 0.

The location element currently looks like this
   <LOC id="some_url_for_sequence"/>  <!-- on whole sequence -->
   <LOC id="some_url_for_sequence" range="300:500" />
   <LOC id="some_url_for_sequence" range="300:500:-1" />  <!-- on strand 
-->

Given the decision yesterday that segments are special,
in terms of identification, I propose using the short id,
so these look like, respectively

   <LOC segment="Chr1"/>
   <LOC segment="Chr1/300:500"/>
   <LOC segment="Chr1/300:500:-1"/>

"2. fetching a sequence"

Why does the server needs to support a reverse complement feature?
Let's leave it out and make the client do a string reversal if
it needs it.

"3. doing a range search"

Is there any reason to specify the strandedness when doing
a feature query?

Discussion here seems to be "would be nice but that lack
is one of the things people have never complained about
in DAS1".

I propose removing strandedness from the features query.

If others disagree then here are two solutions:
   A. have a "strand=" parameter, so that the strandedness
is different from the ranges.  If you want a query for
  the union of range Chr1/A:B:-1 and range Chr1/X:Y:1
then tough - make two requests, one for each strand.

   B. ranges may specify the strand (as now) but if not
specified then it means "of any strand".

We worked on a few cases where it might be useful to
make mixed strand queries.  There weren't any compelling
reasons.  Even in the worst case scenario without strand
support in the features query is that you get on average
twice the number of features back, and worst case for
option A is the need to make two queries.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Feb 10 10:48:18 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 10:48:18 +0000
Subject: [DAS2] Re: feature NOTE and ALIAS elements?
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9D5@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9D5@msex02.affymetrix.com>
Message-ID: <bd2c309106b184d5a31d540afa353abc@dalkescientific.com>

Gregg:
> Is the plan still as stated above, to have optional NOTE and ALIAS
> elements in features?  I don't see these elements in the feature 
> schema,
> and the spec doc says they're built-in properties instead (values for
> PROP key attribute that have defined meaning).

Yes.  I haven't updated the spec other than a few minor
points in the last couple of days.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Feb 10 15:04:45 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 15:04:45 +0000
Subject: [DAS2] 'OR' syntax in query language
Message-ID: <8593bb5041e0d054840da98c200d3e03@dalkescientific.com>

We talked a bit about the DAS query language.

It is currently of the form (modulo URL escaping)

   name=Andrew,Roy;inside=Chr/100:200

This is the same as

(    name contains the substring "Andrew"
   OR name contains the substring "Roy"
) AND (
      feature is inside 100:200 on the segment named 'Chr'
)

That is, there is an AND of all terms, and a single term
may have multiple OR-ed subqueries, merged by commas.

We want to change this to the form

   name=Andrew;name=Roy;inside

That is, the query key can exist more than once.  Queries
with the same key are 'OR'ed, elsewise they are 'AND'ed.


The advantage is the simplicity of not having to worry
about another quoting rule, in this case how to search
for terms containing a ",".

The only disadvantage is with servers which don't handle
multiple keys in a query - but we think those client
libraries are long since deceased.

					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Fri Feb 10 15:15:05 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 15:15:05 +0000
Subject: [DAS2] range searches
Message-ID: <80684d437a99822fd017cceee83b02b4@dalkescientific.com>

I think Gregg has thought the most about this one.

We have 4 classes of range search:

'inside' (feature completely inside request range)
'overlaps' (feature overlaps the request range)
'contains' (feature completely contains request range)
'identical' (feature is exactly the request range)

They exist for smart clients which want to limit the
region request size based on previously fetched knowledge.

Example: client is viewing "500:600" and zooms out to
"400:700".  In that case the client could ask for
features which
   overlap 400:500 OR overlap 600:700
   excluding those which overlap 500:600.

If that's the case, the selection language isn't powerful
enough.  There's no way to choose "excluding".

The other option is to issue only the overlap queries.

Does the query language need to be more powerful to
allow "excluding what I know about these regions" for
people like Gregg?

Another question came up; are queries like

   overlap 400:500 OR inside 900:1000

useful?  I don't think so.  If it is, it is not supported
by the current language which only does AND of dissimilar
terms.


					Andrew
					dalke at dalkescientific.com


From ap3 at sanger.ac.uk  Fri Feb 10 15:21:25 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Fri, 10 Feb 2006 15:21:25 +0000
Subject: [DAS2] registry status
Message-ID: <2fa320fbca91abfa9f175b64d0d8105c@sanger.ac.uk>

Hi!

the developmental registry has been updated:
it now supports 2 requests:

http://www.spice-3d.org/dasregistry/das2/sources
lists das2 servers

http://www.spice-3d.org/dasregistry/das1/sources
lists das1 servers.

The next step will be to provide user upload of das2 sources

Andreas


-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From dalke at dalkescientific.com  Fri Feb 10 15:49:10 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 15:49:10 +0000
Subject: [DAS2] curation history and splits&merges
Message-ID: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com>

We talked some on tracking curation history.

We decided it was a hard topic and we would defer further
discussion to the next sprint.  We're getting rather
frazzled here after nearly 5 days of hard work.

Here are some things that came up.

The writeback delta needs a field for user comments.

How persistent is an identifier for an object?  Is
it for the exact version of a feature or is it for
the concept of a the given feature?

That is, if there's a feature change the server could
assign it a new id/url.  It would need to tell the
annotation about the new id, just like it tells the
client about the newly created ids.

This makes updates more like a changeset version control
system, where there is a version number for each stable
data set.  Compare to CVS where there is a version number
for each file/record but not for the whole system.

But the current Otter database is more the CVS route.
While the changeset version seems nicer, there will
be some (I assume non-trivial) work to make Otter support
it.

There are advantages.  You could do searches with
timewarps by using a "changeset=" parameter in the
query.  The DAS mechanism handles that just fine,
since interlinks between no-longer current URLs would
be correct.

There needs to be a way to get the history of an
element. There are two thoughts:

   - put the curation history in the feature document (via
some embedded XML)

   - link to a URL which provides the curational history
document for the given element

We prefer the latter.


For splits and merges there needs to be support in
the delta to say if there is a relationship to existing
or about to be deleted features.  We did not work on
that, other than to get a feel that it works.

Again, no server handles this so we decided it table it
for the future, and work on it more for the next sprint.

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Fri Feb 10 16:36:49 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Fri, 10 Feb 2006 08:36:49 -0800
Subject: [DAS2] IGB DAS/2 client partially working -- and using registry!
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9D7@msex02.affymetrix.com>


Attached is a screenshot of IGB with data from a yeast test region
(chrVII, ~364-366kb) loaded from Allen's codesprint server by way of
Andreas' DAS/2 registry.  Still need to work on synchronizing up source
names, etc., but this is looking good.  As we had planned, having the
registry return a sources document allowed very easy integration!  

You may notice there is also a branch of the sources tree that is a
direct path to the codesprint server.  That just means I gave the
discovery engine two URLs to start from -- the registry and the
codesprint server.

This is the same version of IGB as the current head of the CVS
repository (as of today 8:30 AM PST).  I'm tempted to roll up a jar so
people can try it without having to compile the source, but on the other
hand it's pretty fragile right now, and the image conveys the gist of
it.

	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andreas Prlic
> Sent: Friday, February 10, 2006 7:21 AM
> To: DAS/2
> Subject: [DAS2] registry status
> 
> Hi!
> 
> the developmental registry has been updated:
> it now supports 2 requests:
> 
> http://www.spice-3d.org/dasregistry/das2/sources
> lists das2 servers
> 
> http://www.spice-3d.org/dasregistry/das1/sources
> lists das1 servers.
> 
> The next step will be to provide user upload of das2 sources
> 
> Andreas
> 
> 
> 
> 
>
-----------------------------------------------------------------------
> 
> Andreas Prlic      Wellcome Trust Sanger Institute
>                                Hinxton, Cambridge CB10 1SA, UK
> 			 +44 (0) 1223 49 6891
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DAS2_in_IGB.JPG
Type: image/jpeg
Size: 170143 bytes
Desc: DAS2_in_IGB.JPG
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20060210/6d528b58/attachment-0001.jpe>

From Gregg_Helt at affymetrix.com  Fri Feb 10 17:01:11 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Fri, 10 Feb 2006 09:01:11 -0800
Subject: [DAS2] Proposed agenda for DAS/2 Code Sprint teleconference, Feb 10
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9D9@msex02.affymetrix.com>

Properties
Range-based queries
Status reports - summarize overall progress during code sprint
Discuss next code sprint - goals, etc. 
???


From dalke at dalkescientific.com  Fri Feb 10 18:14:47 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 10 Feb 2006 18:14:47 +0000
Subject: [DAS2] changes commited
Message-ID: <6425fabe79dc6d27fd3a797b837d32de@dalkescientific.com>

removed the <PROP> href= and type= options in
the spec and all examples.

changed the url "," syntax for OR'ed terms into
multiple "key=value;key=value" terms.

changed "att=key:value" into "prop-key=value"


					Andrew
					dalke at dalkescientific.com


From suzi at fruitfly.org  Fri Feb 10 19:48:58 2006
From: suzi at fruitfly.org (Suzanna Lewis)
Date: Fri, 10 Feb 2006 11:48:58 -0800
Subject: [DAS2] question on properties
In-Reply-To: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com>
References: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com>
Message-ID: <26b9a50a02deb2be55150c6a5a47d419@fruitfly.org>

You probably know the answer to this Andrew.

One of the cases we encountered was unique properties vs cumulative 
properties.

For a simplistic (i.e. don't quibble to closely, I'm just trying to 
explain) example pretend that "ssn" and "comment" are both properties.

On the client side the appropriate behavior for these is different if 
the data coming over from the server contains >1 prop element with that 
tag.

If the client sees "ssn" twice it winces and then either ignores or 
overwrites with the 2nd value.

If the client sees "comment" twice then it appends the additional 
comment.

Question: Is this kind of information included in the spec? Uniqueness 
vs. cumulative

  
From Steve_Chervitz at affymetrix.com  Fri Feb 10 22:10:28 2006
From: Steve_Chervitz at affymetrix.com (Steve Chervitz)
Date: Fri, 10 Feb 2006 14:10:28 -0800
Subject: [DAS2] Notes from the DAS/2 teleconference for the code sprint,
 10 Feb 2006
Message-ID: <C0124F54.1BEF6%Steve_Chervitz@affymetrix.com>

Notes from the DAS/2 teleconference for the code sprint, 10 Feb 2006

$Id: das2-teleconf-2006-02-10.txt,v 1.1 2006/02/10 22:13:17 sac Exp $

Note taker: Steve Chervitz

Attendees: 
  Affy: Steve Chervitz, Ed E., Gregg Helt
  CSHL: Lincoln Stein
  Sanger: Thomas Down, Andreas Prlic
  Sweden: Andrew Dalke
  UCLA: Allen Day
        
Action items are flagged with '[A]'.

These notes are checked into the biodas.org CVS repository at
das/das2/notes/2006. Instructions on how to access this
repository are at http://biodas.org

DISCLAIMER: 
The note taker aims for completeness and accuracy, but these goals are
not always achievable, given the desire to get the notes out with a
rapid turnaround. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit. 


[note taker missed the first 5 minutes]

Topic: Properties
-----------------

gh: Properties are all tag-value
ad: yes
gh: don't think we need your binary thing.
ad: ok drop it
gh: href is needed. can always point it to a binary something out there.
can the value just be a url?
ad: can make it relative to xml base
gh: do you need some property with tag value and href at same time?
ls: how would you interpret that? should be either value or href.
ad: there's nothing to say how to interpret the url.
gh: nice to have multiple links out to somewhere else and to have some
indication what they are w/out traversing the link. e.g., this is the
genbank ref, ensembl ref, protein, etc.
if xid had an extra field with label, title e.g. that would suffice.
ad: sounds ok

[A] xids will have title + href, properties will have tag + value

Topic: Exercising the spec
---------------------------

gh: we need the reference server to actually exercise this part of the
spec. xid. possibly other things like: target overlap, inside, cigar
strings. encoding, decoding.
aday: oh no. 
ls: line element. cigar string is something that no one has tested yet.
gh: if we don't have server doing it by next code sprint
aday: any impls out there we could use?
gh: bioperl has a gff3 parser.
aday: I wrote it, and I didn't impl cigar string parsing.
ls: there's a cigar processor in bioperl AlignIO. in theory not hard
to do. 
gh: lbl folks (Nomi et al) have a java one, too. I think.
gh: other parts of spec that aren't getting exercised? I doubt if
anyone has used xml lang.
ad: added xml id. just there for other reasons, but not what we need
it for.
gh: we talked about all ids being xml ids and combing xml id and xml
base, can't remember why we stopped discussing.
ad: don't think we need to. style sheet has uses for this maybe.
ad: has anyone generated doc href yet?
td: can add this stuff easily now.
gh: for testing purposes, just throw a doc href everywhere it's
allowed.
ad: are servers supporting retrieval of seq data?
aday: yes
ad: support for alt feature formats?
aday: can do old compact formats, not sure about coverage.
gh: yes, alt feat formats are handled, but server isn't up and running
yet. igb das/2 client can handle it already.
ad: retrival of assembly?
aday: no assembly data
ad: i don't touch assembly
gh: may be for next code sprint.


Topic: range based query
------------------------

gh: thomas and i don't like optional mins and maxes.
ls: fine as long as you can always determine the size of the
reference. provide beginning and end.
gh: exception: if you want the whole sequence, can you just not supply
range?
ad: yes
gh: :1 and :-1 how to interpret nothing for strand on end and 0 for
strand at end?
ls: features that have strand +1, -1, features that have no strand or
on both strands (0) features that may have a strand but you don't know
(empty)
gh: when you put it in the query there's a differences between i don't
know and i will accept anything.
use case: transfrags from transcriptome project. unknown strand, but I
know it *is* one or the other strand.
ls: how about this arrangement:
 empty = i don't care
    0  = has strand but i dont know
    1  = forward strand
   -1  = reverse strand
    2  = both strands
ad: could be organized by track (everything in a track has same strand.
gh: don't think is good to structure a query so it's required that you
do have strand. you might could have diff strand designation on same
track. 
ls: you want to be able to distinguish things that are on both
strands, things that are on either strand, but you don't know which.
gh: biggest concern: given a range based query to server
1000-2000 means everything that overlaps, any strandedness within this
range.
ad: should support stranded searches. client can filter out
opposed to do a strand request against seq to get the rev comp. client
should be able to do this.
gh: in range attrib of features, you can add colon to indicate
strandedness.
ad: yes
gh: if no :strand does this mean unknown or don't care?
ls: defaults to *, anything. you get fwd, rev, don't know, don't care.
gh: required things on fwd strand to be :1, not make it a default.
ad: ok. if not there, means ambiguous, unknown, or not
appropriate. see email i sent.
if you get rid of search for strand in region query, most of this
issue goes away.
gh: don't think people would use this often (stranded query)
ad: you can make two queries to server instead of one.
gh: this is a resolution for all range-related issues.
ad: check my email to make sure it covers this.

[A] everyone review andrew's email re: range queries and strand issues.

gh: also or-ing of diff range-based queries is not useful for me.
I mainly need intersects of overlaps and inside. or-ing is equivalent
to using multiple queries.
td: why do you need and overlaps and inside?
gh: optimization on client side. keeps track of what it has
received. wants to minimize re-fetching.
td: can you just use overlap and not overlap?
gh: that may be equivalent, but the way I do it, you can guarantee you
never get the same feat twice with that combo. will require and-ing of
two range-based queries.

ad: modifying query lang, or-ing together two. include first range and
include second range should use multiple query keys because of the
comma. you will have to escape any comma if it's inside of query
string. 
gh: don't like the implicit 'and' if different but 'or' if keys the
same. it depends on the query.
ad: now all queries are and-ed, but commas mean multiple.
ls: comma syntax seems natural. the occasional query that had to have
an escaped comma didn't cause any bother.
td: this was as it is in das/1. exons and repeat. type=exon,
type=repeat. so the suggestion is to use the das/1 behavior.
ad: three independent segments
gh: types as well. can have any number of types= and segment= all
or-ed together. I still need anding of overlaps and inside.
td: different key are or-ed, same keys are and-ed.
ls: hoisted by my own petard here. works for me.
gh: allen?
aday: what's changed?
ls: the whole query language has changed in a fundamental way.
aday: dealing with multiple attributes with same name. fine.
gh: will server accept full urls for types?
aday: not now but will impl this.
gh: all types should be full uri's now. my client can't deal but will
soon.

Topic: status reports
---------------------
gh: state what what you hoped to accomplish and what you actually
accomplished. 

gh: hoped to get igb das client up to date with spec, working with one
das2 server, and get affy das2 server up and going.
affy das2 server will take longer. maybe by next code sprint.
igb is now using latest das2 spec, calling allen's server, and using
registry as well. happy with results. not everything done, but some
unexpected things (registry).
wrote up progress report for grant: going out 3pm today (we got
another day) a 2pg summary. will send out to everyone later.
todo: get das2 server up. client: deal with full uri issue. this is a
basic fuctionality of the client. smart handling of uris.

ee: igb client. big thing is make it treat all data sources too all
behave similar way das1/das2, quick load, separate files, regardless
of the data format. want to make it all seamless. going well.

sc: streamlined pipeline for populating das sever with affy exon array
data. didn't get to pipeline for external data (UCSC tracks), but have
basic framework in place.

ad: decided to do more writeback at next sprint. when is next sprint?
gh: march 13-17. lincoln will be in UK and can participate from there.
ad: I'm in the states next week. will come to emeryville for next
sprint.

[A] next code sprint is 13-17 March. Mark your calendars.

ad: hoped to work on spec, resolve detailed questions, make sure it works
with people's needs. will work on incorporating latest ideas into spec.
validator: have one but is not fit for public consumption. not at
where it was last summer on the previous version of spec.

ap: das interface for registry, can serve das1 and das2 sources w/ new
source command. java client - not yet. registry: todo UI so users can
upload to das registry.

td: hoping to write server. got something up for feat, types,
segments, need to run through andrew's validator. hope to work on
writeback, but didn't happen (but good discussion on it). want to get
more data included, ensembl database.
roy has been working on zmap client, coming along fine.

aday: primary goals: to support new version of spec -- not fully done
uri problem in query parsing. apache config integration is
done. installation and rpm for server - done for FC3 i386, available
in the next couple of days (brian o'connor). general documentation
improvement in code for server - not done.
Next step: post, put, delete, writeback framework (originally planned
this but may need to rethink),  impl transaction logs (maybe in
flux). adding more unit tests.
ad: writeback spec won't happen for at least 2 weeks. need to write up
what we've done on current spec first.

ls: will be available from 14th on. at ensembl meeting up to the 13th.
gh: allen come to emeryville?
aday: maybe.
gh: will have to explore how to fund hosting folks here for next
codesprint. 

gh: speaking for nomi - she had apollo working for parsing features
and displaying them. some issues with higher level integration into
apollo. making good progress.

gh: time to wrap it up. thanks for your hard work.
[applause]

[A] next teleconf will be on 20 Feb, 9:30 PST 5:30 UK (regular time)
we're skipping 13 feb (next monday) given all our time this week.


From dalke at dalkescientific.com  Sat Feb 11 02:11:05 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat, 11 Feb 2006 02:11:05 +0000
Subject: [DAS2] Re: question on properties
In-Reply-To: <26b9a50a02deb2be55150c6a5a47d419@fruitfly.org>
References: <4712c640d908a4e9aed099dd1ef1398b@dalkescientific.com>
	<26b9a50a02deb2be55150c6a5a47d419@fruitfly.org>
Message-ID: <c9f971ac2d043705eb5ff56e6051217c@dalkescientific.com>

Suzi:
> On the client side the appropriate behavior for these is different if 
> the data coming over from the server contains >1 prop element with 
> that tag.
>
> If the client sees "ssn" twice it winces and then either ignores or 
> overwrites with the 2nd value.

Or it says "error, error, cannot compute" and stops.  From one
of the guidelines ("the zen") of Python: "when in doubt, refuse
the temptation to guess."

> If the client sees "comment" twice then it appends the additional 
> comment.
>
> Question: Is this kind of information included in the spec? Uniqueness 
> vs. cumulative

Here's my thoughts.

We have several points for client/server extensions.
One is this property table, which is a set of key/value
strings.

Because they are strings you can use them for almost anything,
with the correct interpretation by the client and server.
That requires collusion between the two.

This is the extension point which is most familiar to everyone.
But it's open to the problem you pointed out.

The other is this non-DAS extension XML, which lets the
server add *anything*.  If the client doesn't know what the
field does it must ignore it.  If it does writeback with
that feature it must include the ignored element, and not
make any changes.

That means your server can add

<suzi:ssn xmlns:suzi="mailto:suzi at fruitfly.org">123-45-1534</suzi:ssn>

If the client doesn't know what to do, it ignores it.
It will never change the field.

If the client knows what that field does it must follow the
constraints set down for it, else the server should stop
with an error and not allow the update to occur.

There are two downsides to this approach.  There's no
way for a dumb client to understand that field, so no user
will ever see it, and there's no way to do a search on
that field.

(A server can extend the search syntax and tell the client
about the new syntax, but a dumb client doesn't know about
that.)

If there is need to support the dumb client then the
only way to support the data type constraints is in
the server.  It must check a given field and possibly
stop with an error or resolve ambiguities.  We can have
that the server reports an error message that the client
and/or user can use to figure out what's wrong.

Thinking about it a bit, it's possible to combine these
two.  For example, a server can have

   <PROP key="ssn" value="123-45-1534" />

then list as an extension

   <suzi:says-the-ssn-in-special/>

All this latter XML does is flag sufficiently aware clients
that the server implements the special SSN requirements.

A dumb client can ignore the flag, users add a new SSN,
and the server bails out, while the smart client early
on knows that that isn't going to be allowed.

This hybrid solution doesn't seem right to me though.

I currently (and without any experience) prefer putting
schema constrained fields in as extension elements.
Think of the property table as something exposed to the
user as a completely editable table, with no ability to
limit what that person does.

For the case of the SSN that might be overkill.  For
other things, like the current stage of a feature in
the curational process, it's best to put that data
there and not in the generic property table.

There is a long history of using generic key/value
tables as an ad-hoc way to extend a protocol.  I'm
trying to improve on that by defining a way for a
server to add well-structure, schema-dependent and
searchable data (for smart clients) without needing
to piggy back on a bunch of strings.


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Feb 20 15:31:42 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 20 Feb 2006 08:31:42 -0700
Subject: [DAS2] today's conf. call and President's Day
Message-ID: <ec529ed5e113de48c2dfd64380c3ce4d@dalkescientific.com>

Today is President's Day in the US.

Are the other US people working today?

					Andrew
					dalke at dalkescientific.com


From Gregg_Helt at affymetrix.com  Mon Feb 20 16:47:13 2006
From: Gregg_Helt at affymetrix.com (Helt,Gregg)
Date: Mon, 20 Feb 2006 08:47:13 -0800
Subject: [DAS2] today's conf. call and President's Day
Message-ID: <C71929195D04BF48BAECD499AF717B480198C9EC@msex02.affymetrix.com>

It's a day off for Affymetrix, but I'm working anyway.  Unless there are
major objections I'd like to go ahead and do the conference call at the
standard time (9:30 AM Pacific time).  There may be a few less people
joining in from the US.

	thanks,
	gregg

> -----Original Message-----
> From: das2-bounces at portal.open-bio.org
[mailto:das2-bounces at portal.open-
> bio.org] On Behalf Of Andrew Dalke
> Sent: Monday, February 20, 2006 7:32 AM
> To: DAS/2
> Subject: [DAS2] today's conf. call and President's Day
> 
> Today is President's Day in the US.
> 
> Are the other US people working today?
> 
> 					Andrew
> 					dalke at dalkescientific.com
> 
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2


From lstein at cshl.edu  Mon Feb 20 17:37:06 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 20 Feb 2006 12:37:06 -0500
Subject: [DAS2] today's conf. call and President's Day
In-Reply-To: <C71929195D04BF48BAECD499AF717B480198C9EC@msex02.affymetrix.com>
References: <C71929195D04BF48BAECD499AF717B480198C9EC@msex02.affymetrix.com>
Message-ID: <200602201237.06497.lstein@cshl.edu>

Hi,

I've dialed in and all I"m getting is hold music. Could you confirm this info?

 800 531-3250
 287-9055

Thanks!

Lincoln

On Monday 20 February 2006 11:47, Helt,Gregg wrote:
> It's a day off for Affymetrix, but I'm working anyway.  Unless there are
> major objections I'd like to go ahead and do the conference call at the
> standard time (9:30 AM Pacific time).  There may be a few less people
> joining in from the US.
>
>  thanks,
>  gregg
>
> > -----Original Message-----
> > From: das2-bounces at portal.open-bio.org
>
> [mailto:das2-bounces at portal.open-
>
> > bio.org] On Behalf Of Andrew Dalke
> > Sent: Monday, February 20, 2006 7:32 AM
> > To: DAS/2
> > Subject: [DAS2] today's conf. call and President's Day
> >
> > Today is President's Day in the US.
> >
> > Are the other US people working today?
> >
> >      Andrew
> >      dalke at dalkescientific.com
> >
> > _______________________________________________
> > DAS2 mailing list
> > DAS2 at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/das2
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)


From lstein at cshl.edu  Mon Feb 20 16:50:38 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Mon, 20 Feb 2006 11:50:38 -0500
Subject: [DAS2] today's conf. call and President's Day
In-Reply-To: <ec529ed5e113de48c2dfd64380c3ce4d@dalkescientific.com>
References: <ec529ed5e113de48c2dfd64380c3ce4d@dalkescientific.com>
Message-ID: <200602201150.38431.lstein@cshl.edu>

I am working today!

Lincoln

On Monday 20 February 2006 10:31, Andrew Dalke wrote:
> Today is President's Day in the US.
>
> Are the other US people working today?
>
>      Andrew
>      dalke at dalkescientific.com
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/das2

-- 
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)


From dalke at dalkescientific.com  Mon Feb 20 17:28:56 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 20 Feb 2006 10:28:56 -0700
Subject: [DAS2] today's conf. call and President's Day
In-Reply-To: <E07FE3FF-7EA2-4BDC-8FBA-2992A5CBBEDE@sanger.ac.uk>
References: <ec529ed5e113de48c2dfd64380c3ce4d@dalkescientific.com>
	<E07FE3FF-7EA2-4BDC-8FBA-2992A5CBBEDE@sanger.ac.uk>
Message-ID: <db7e92f877d56f0b329931710318f0cc@dalkescientific.com>

Thomas Down wrote:
> Well, I can't speak for US people, but I do know that Andreas Prlic is 
> on holiday today and I presume won't be joining the conference call.  
> I can join if there's anything that needs discussing urgently, but 
> otherwise I'd be happy to leave it 'til next week.

Status update for me:

   Last week was a break for me from the sprint - I was winded.  I
worked a bit here and there on how to do a GUI interface for the 
validation.
I hope to get a demo page of the results up within a day or so.

   This week I'll be working on that and a new draft of the spec.

   Also, I'm now back home in Santa Fe, where we haven't had rain nor
snow for 100 days - my cacti are drooping!  :(


					Andrew
					dalke at dalkescientific.com


From dalke at dalkescientific.com  Mon Feb 27 14:50:10 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 27 Feb 2006 08:50:10 -0600
Subject: [DAS2] will miss today's conf. call
Message-ID: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com>

Hi all,

   Not only am I on the road back from the Python conference but
my cell phone battery is nearing dead so I won't be able to make
it to today's phone conference call.

   Here's my status.  I've been working on the validator, to
the detriment of the next spec rewrite.

   This validator does single-document checks.  That is, it
does not do internal integrity checks to make sure that
the results of, say, a range query only returns features in
that range, or that the features are in the range given by
the segments.

   I plugged the results into a web server running on my
laptop.  It's using some new Python libraries which are
not yet installed on the OBF machine, but which I can install
after I get back to Santa Fe.  The GUI is similar to what
I threw together at Sanger during the Sprint - enter a URL
and a document type, view the results.

   What took long is the code to pin down where the errors
happened, for example, to show which attribute was the
extra attribute in an element.

  I've attached sample output for your viewing pleasure.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20060227/026be2fd/attachment-0001.html>
-------------- next part --------------


There is enough there for a Javascript jockey to make an
neat little interactive viewer, eg, click on the error
message to be shown where it occurs in the document.
Also, the marker I'm using to show where the error occurs
in the body of the text needs work - the method I use
isn't that cross platform portable.

I think the next steps for me are:
   - get the validator working as-is on the OBF web site
       (should be on-line by tomorrow)
   - get back to writing the 3rd draft of the spec.

					Andrew
					dalke at dalkescientific.com

From ap3 at sanger.ac.uk  Mon Feb 27 17:41:08 2006
From: ap3 at sanger.ac.uk (Andreas Prlic)
Date: Mon, 27 Feb 2006 17:41:08 +0000
Subject: [DAS2] will miss today's conf. call
In-Reply-To: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com>
References: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com>
Message-ID: <197aeffa03988a8fc098f27926ee511d@sanger.ac.uk>

any conference call today?
- listening to the hold music

Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891


From nomi at fruitfly.org  Mon Feb 27 17:43:00 2006
From: nomi at fruitfly.org (Nomi Harris)
Date: Mon, 27 Feb 2006 09:43:00 -0800
Subject: [DAS2] will miss today's conf. call
In-Reply-To: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com>
References: <5977901a6c564ba1062589a0c2c80288@dalkescientific.com>
Message-ID: <17411.14884.410370.608675@spongecake.lbl.gov>

are we having a teleconference today?  i got bored of waiting on hold for
the moderator.  someone email me if it's happening.

the validator sounds useful!

    Nomi


From boconnor at ucla.edu  Tue Feb 28 00:46:02 2006
From: boconnor at ucla.edu (Brian O'Connor)
Date: Mon, 27 Feb 2006 16:46:02 -0800
Subject: [DAS2] DAS2 Reference Server @ UCLA
Message-ID: <44039D4A.5000503@ucla.edu>

Hi,

If anyone is using the DAS/2 server at UCLA (das.biopackages.net) there 
will be some maintenance on the server later today (after 5pm Pacific).  
This won't affect the DAS/2 codebase, I'm just moving around some of our 
other production websites and there will be some downtime.  The outage 
should just last a few minutes.

--Brian