From andy.jenkinson at ebi.ac.uk Sat Nov 1 11:31:04 2008 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Sat, 01 Nov 2008 15:31:04 +0000 Subject: [DAS2] [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> References: <49087FF9.5080704@ebi.ac.uk> <4908837C.1000902@nbn.ac.za> <490891A4.8050908@ebi.ac.uk> <4909821B.6050300@nbn.ac.za> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> Message-ID: <490C7638.70101@ebi.ac.uk> Gregg Helt wrote: > On Fri, Oct 31, 2008 at 7:22 AM, Andy Jenkinson wrote: > >> I can't find a description of the response to a writeback command in Asia's >> thesis. Does it contain features (as in DAS2) or just a confirmation? > > > Take a look at the writeback spec ( > http://biodas.org/documents/das2/das2_writeback.html ), it's much shorter > than the retrieval spec, just a few pages. > > The general idea is that a server may not be able to do all the > creations/edits/deletes a client is requesting in exactly the same form the > client has specified, and furthermore that changes a client requests in one > feature can possibly trigger changes in other features. Therefore the > semantics of the client request are "here's what I want to do" and the > writeback server responds with "here's what I actually did". In the DAS2 > writeback spec these are communicated mostly by passing back and forth > feature XML, except for deletion getting it's own special bit of XML. I looked at the DAS2 spec, but I was wondering specifically about Asia's implementation - whether it did the same or returned either a simple confirmation or a DAS 1.53 features response. From asia at student.chalmers.se Sun Nov 2 18:52:02 2008 From: asia at student.chalmers.se (Asia Grzibovska) Date: Mon, 3 Nov 2008 00:52:02 +0100 (CET) Subject: [DAS2] [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <490C7638.70101@ebi.ac.uk> References: <49087FF9.5080704@ebi.ac.uk> <4908837C.1000902@nbn.ac.za> <490891A4.8050908@ebi.ac.uk> <4909821B.6050300@nbn.ac.za> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> Message-ID: <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> Hello, I have noticed a writeback thread in the mailing list and read it with a big interest. I agree with most of the ideas expressed by participants. The writeback specification was quite uncertain at the time when I was implementing it, and the aim of my project was rather to prove the concept. In principle it was proved, and writeback could be implemented in the real-life. However, the writeback specification needs to be more concrete. 1)about response to a writeback command. The response was a simple confirmation, it did not contain features, because writeback was not so complex. It saved exactly as "here's what I want to do". If the changes were successfully saved into the database, a simple confirmation was enough. Otherwise nothing was saved and a full error description was sent back. The source code can be found here http://code.google.com/p/daswriteback/source/browse/trunk/servlet/src/uk/ac/sanger/DatabaseUtilities.java 2)about URI. It is correct that "every feature in DAS/2.0 has a unique URI". For simplicity I did not include it into the writeback document, but it would be good to have it in the real implementation. 3)meta-annotations could simplify many things and add more functionality >> >> >> >> Best regards, Asia > Gregg Helt wrote: >> On Fri, Oct 31, 2008 at 7:22 AM, Andy Jenkinson >> wrote: >> >>> I can't find a description of the response to a writeback command in >>> Asia's >>> thesis. Does it contain features (as in DAS2) or just a confirmation? >> >> >> Take a look at the writeback spec ( >> http://biodas.org/documents/das2/das2_writeback.html ), it's much >> shorter >> than the retrieval spec, just a few pages. >> >> The general idea is that a server may not be able to do all the >> creations/edits/deletes a client is requesting in exactly the same form >> the >> client has specified, and furthermore that changes a client requests in >> one >> feature can possibly trigger changes in other features. Therefore the >> semantics of the client request are "here's what I want to do" and the >> writeback server responds with "here's what I actually did". In the >> DAS2 >> writeback spec these are communicated mostly by passing back and forth >> feature XML, except for deletion getting it's own special bit of XML. > > I looked at the DAS2 spec, but I was wondering specifically about Asia's > implementation - whether it did the same or returned either a simple > confirmation or a DAS 1.53 features response. > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > From andy.jenkinson at ebi.ac.uk Tue Nov 4 07:05:01 2008 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Tue, 04 Nov 2008 12:05:01 +0000 Subject: [DAS2] [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <491022D6.8070204@nbn.ac.za> References: <49087FF9.5080704@ebi.ac.uk> <4908837C.1000902@nbn.ac.za> <490891A4.8050908@ebi.ac.uk> <4909821B.6050300@nbn.ac.za> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> <491022D6.8070204@nbn.ac.za> Message-ID: <49103A6D.3060201@ebi.ac.uk> Gustavo Salazar wrote: > xml:base="http://das.sanger.ac.uk:80/das/interpro/features"> > phosphoinositide 3-kinase in leukocytes > > type="inferred from sequence similarity (ECO:0000044)" > > > > > > > > > > > Then for this example the id for the feature will be the uri > http://das.sanger.ac.uk:80/das/interpro/features/G3DSA:1.10.1070.11 When updating a feature the URI field can be whatever would go into the ID field, because both the server and the client need them to be the same. The ID/URI can be assumed to be relative to the base of the document in the same way as HTML relative links are processed. However, your example is a URI in its own right because it contains a colon but no /. Some examples: URI - foo:something base - http://das.sanger.ac.uk/das/interpro/features final URI - foo:something URI - something base - http://das.sanger.ac.uk/das/interpro/features final URI - http://das.sanger.ac.uk/das/interpro/something URI - ./foo:something base - http://das.sanger.ac.uk/das/interpro/features final URI - http://das.sanger.ac.uk/das/interpro/foo:something See page 26 of RFC3986, http://www.apps.ietf.org/rfc/rfc3986.html So long as the absolute URI is derived in the same way for both servers you should be fine using whatever the "base server" uses. > I'm not sure if this is the right way to submit the types... I'm parsing > the types element that is optional in the writeback document, however as > I understood that part of the document is used to add new types, which I > think is out of the scope of my project. Other issue that i have is > where should I put the information of the method maybe another PROP tag > like ?? It looks like only some of the fields in DAS are supported here, some of those missing are constrained by the ontology (in parentheses): type ID (e.g. SO:001234) type label (e.g. exon) type category (e.g. inferred from sequence similarity...) method ID method label Really any of these can be changed so they should be represented. In the interests of keeping your project manageable we can limit the implementation to not adding new types (DAS and DAS/2 have different ways of interpreting them anyway). But I can especially see a case for changing the category (evidence). Since DAS/2 does not have an equivalent for this, you could put it in a PROP element. From gregghelt at gmail.com Tue Nov 4 17:58:05 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Tue, 4 Nov 2008 14:58:05 -0800 Subject: [DAS2] [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <491085EC.2090003@nbn.ac.za> References: <49087FF9.5080704@ebi.ac.uk> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> <491022D6.8070204@nbn.ac.za> <49103A6D.3060201@ebi.ac.uk> <491085EC.2090003@nbn.ac.za> Message-ID: <50158cb00811041458t5e00c5d2l4022039774db2ecb@mail.gmail.com> On Tue, Nov 4, 2008 at 2:24 AM, Gustavo Salazar wrote: > > About the URI, can this URI be built using the uri in the base of the > document + the uri atributte in the feature?? On Tue, Nov 4, 2008 at 4:05 AM, Andy Jenkinson wrote: > When updating a feature the URI field can be whatever would go into the ID > field, because both the server and the client need them to be the same. The > ID/URI can be assumed to be relative to the base of the document in the same > way as HTML relative links are processed. However, your example is a URI in > its own right because it contains a colon but no /. > ... > So long as the absolute URI is derived in the same way for both servers you > should be fine using whatever the "base server" uses. > Please note that in DAS2.0 the way to resolve relative URI references to absolute URIs does not have to be "assumed". As I mentioned before in this thread: > DAS/2.0 uses the XML Base specification to > resolve relative URI references via xml:base attributes and/or the URI the > document is a representation of. > Most software libraries that deal with URIs have a method to resolve a relative URI reference against a base URI to to yield an absolute URI, so that people don't have to hand-code the URI syntax and resolution rules. Just to check, by "base of the document" do you mean the value of the "xml:base" attribute in the root XML element, or the URL used to retrieve the document? According to the XML Base spec these are both incorporated in a hierarchy of URI reference resolution, so it's possible to have a doc with no "xml:base" attributes and still follow the XML Base spec. However I strongly recommend using an "xml:base" with an absolute URI value on the root XML element of a doc to be more assertive about URI reference resolution. This is particularly important for DAS writeback, since at least so far the URI to POST modifications to has been different than the URIs to GET features/types/segments from. Gregg From gregghelt at gmail.com Tue Nov 4 18:02:41 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Tue, 4 Nov 2008 15:02:41 -0800 Subject: [DAS2] Fwd: [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <491022D6.8070204@nbn.ac.za> References: <49087FF9.5080704@ebi.ac.uk> <490891A4.8050908@ebi.ac.uk> <4909821B.6050300@nbn.ac.za> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> <491022D6.8070204@nbn.ac.za> Message-ID: <50158cb00811041502g3b0f7fd1xf1324240cb812474@mail.gmail.com> Forwarding more messages that didn't get cross-posted to das2 list: ---------- Forwarded message ---------- From: Gustavo Salazar Date: Tue, Nov 4, 2008 at 2:24 AM Subject: Re: [DAS] [DAS2] [Fwd: Re: Writeback implementation] To: Asia Grzibovska Cc: das at lists.open-bio.org Hello, Thanks Asia for clarify some of our doubts of your implementation. I agree with the idea of answer with a confirmation or a detailed error, specially if the writeback is going to process atomic requests, because if the "here's what I want to do" is just one task, the answer just could be or DONE or a detailed error (even if the error is about dependencies with other features). About the URI, can this URI be built using the uri in the base of the document + the uri atributte in the feature?? So far I have implemented a parser for a document that follows the DAS/2 schema and uses the properties defined by Asia for the features, an example to add or update(in the DAS/2 schema there is not difference between those commands): phosphoinositide 3-kinase in leukocytes Then for this example the id for the feature will be the uri http://das.sanger.ac.uk:80/das/interpro/features/G3DSA:1.10.1070.11 I'm not sure if this is the right way to submit the types... I'm parsing the types element that is optional in the writeback document, however as I understood that part of the document is used to add new types, which I think is out of the scope of my project. Other issue that i have is where should I put the information of the method maybe another PROP tag like ?? If anybody have any comments about what else should I include in this document please let me know. Cheers, Gustavo. Asia Grzibovska wrote: > Hello, > I have noticed a writeback thread in the mailing list and read it with a > big interest. I agree with most of the ideas expressed by participants. > The writeback specification was quite uncertain at the time when I was > implementing it, and the aim of my project was rather to prove the > concept. In principle it was proved, and writeback could be implemented > in the real-life. However, the writeback specification needs to be more > concrete. > > 1)about response to a writeback command. The response was a simple > confirmation, it did not contain features, because writeback was not so > complex. It saved exactly as "here's what I want to do". If the changes > were successfully saved into the database, a simple confirmation was > enough. Otherwise nothing was saved and a full error description was sent > back. The source code can be found here > > http://code.google.com/p/daswriteback/source/browse/trunk/servlet/src/uk/ac/sanger/DatabaseUtilities.java > > 2)about URI. It is correct that "every feature in DAS/2.0 has a unique > URI". For simplicity I did not include it into the writeback document, but > it would be good to have it in the real implementation. > > 3)meta-annotations could simplify many things and add more functionality > > >> >>> >>> >>> >>> >>> >> > Best regards, > Asia > > > >> Gregg Helt wrote: >> >> >>> On Fri, Oct 31, 2008 at 7:22 AM, Andy Jenkinson >>> wrote: >>> >>> >>> >>>> I can't find a description of the response to a writeback command in >>>> Asia's >>>> thesis. Does it contain features (as in DAS2) or just a confirmation? >>>> >>>> >>> Take a look at the writeback spec ( >>> http://biodas.org/documents/das2/das2_writeback.html ), it's much >>> shorter >>> than the retrieval spec, just a few pages. >>> >>> The general idea is that a server may not be able to do all the >>> creations/edits/deletes a client is requesting in exactly the same form >>> the >>> client has specified, and furthermore that changes a client requests in >>> one >>> feature can possibly trigger changes in other features. Therefore the >>> semantics of the client request are "here's what I want to do" and the >>> writeback server responds with "here's what I actually did". In the >>> DAS2 >>> writeback spec these are communicated mostly by passing back and forth >>> feature XML, except for deletion getting it's own special bit of XML. >>> >>> >> I looked at the DAS2 spec, but I was wondering specifically about Asia's >> implementation - whether it did the same or returned either a simple >> confirmation or a DAS 1.53 features response. >> _______________________________________________ >> DAS2 mailing list >> DAS2 at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das2 >> >> >> > > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > > _______________________________________________ DAS mailing list DAS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/das From gregghelt at gmail.com Tue Nov 4 18:04:23 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Tue, 4 Nov 2008 15:04:23 -0800 Subject: [DAS2] Fwd: [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <491085EC.2090003@nbn.ac.za> References: <49087FF9.5080704@ebi.ac.uk> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> <491022D6.8070204@nbn.ac.za> <49103A6D.3060201@ebi.ac.uk> <491085EC.2090003@nbn.ac.za> Message-ID: <50158cb00811041504v5795198fwab29092aa8d73513@mail.gmail.com> Forwarding more messages that didn't get cross-posted to das2 list: ---------- Forwarded message ---------- From: Gustavo Salazar Date: Tue, Nov 4, 2008 at 9:27 AM Subject: Re: [DAS] [DAS2] [Fwd: Re: Writeback implementation] To: Andy Jenkinson Cc: das at lists.open-bio.org Hello, Thanks Andy for the examples about the use of the URI, now is much clear for me. I haven't implemented the small code to built the URI from the URI and base parameters but I will soon. Following the idea of use the PROP tag to put the attributes missed for DAS/2 that are required in DAS/1.53 I have modify my XML example as follows: phosphoinositide 3-kinase in leukocytes As you can notice i've also add the properties start and stop for the position of the feature since i noticed that the range in LOC is for the location of the segment and a segment can have several features in different positions. In my first tests with a document like this I can built the next answer with MyDas that looks correct for DAS/1.53 polypeptide_structural_domain GENE3D 830 1031 0.0 0 - Cheers, Gustavo. _______________________________________________ DAS mailing list DAS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/das From andy.jenkinson at ebi.ac.uk Wed Nov 5 18:15:37 2008 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Wed, 05 Nov 2008 23:15:37 +0000 Subject: [DAS2] [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <491085EC.2090003@nbn.ac.za> References: <49087FF9.5080704@ebi.ac.uk> <4908837C.1000902@nbn.ac.za> <490891A4.8050908@ebi.ac.uk> <4909821B.6050300@nbn.ac.za> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> <491022D6.8070204@nbn.ac.za> <49103A6D.3060201@ebi.ac.uk> <491085EC.2090003@nbn.ac.za> Message-ID: <49122919.6040907@ebi.ac.uk> As I understand, the DAS/2 "type" attribute is a URI, to match the "types" schema. This is really the equivalent of the type ID. Also, elements cannot have arbitrary attributes, instead they need two attributes, one called "key" and the other "value": so it would look more like: With regards to the element, this refers to the location of the feature within a segment. In DAS/2 there can be more than one (i.e. the same feature in multiple locations), but for DAS just send one LOC element. The range attribute should be the start/end/strand positions of the _feature_, so you do not need to send separate start and end properties: However I would be careful with the numbers as I believe DAS/2 uses zero-indexed start positions. Gustavo Salazar wrote: > > Hello, > > Thanks Andy for the examples about the use of the URI, now is much clear > for me. I haven't implemented the small code to built the URI from the > URI and base parameters but I will soon. > Following the idea of use the PROP tag to put the attributes missed for > DAS/2 that are required in DAS/1.53 I have modify my XML example as > follows: > > xml:base="http://das.sanger.ac.uk:80/das/interpro/features"> > phosphoinositide 3-kinase in leukocytes > > type="inferred from sequence similarity (ECO:0000044)" > > > > > > > > > > > > > > > > > > As you can notice i've also add the properties start and stop for the > position of the feature since i noticed that the range in LOC is for the > location of the segment and a segment can have several features in > different positions. > In my first tests with a document like this I can built the next answer > with MyDas that looks correct for DAS/1.53 > > > href="http://localhost:8080/MyDas/das/writeback/features?segment=O00329"> > label="O00329"> > > polypeptide_structural_domain > GENE3D > 830 > 1031 > 0.0 > 0 > - > > > > > > Cheers, > > Gustavo. From gregghelt at gmail.com Thu Nov 6 10:30:51 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Thu, 6 Nov 2008 07:30:51 -0800 Subject: [DAS2] Links to old DAS/2 grant and final progress report In-Reply-To: <5E69F583-38C2-45DD-9F0D-571C35E0FA27@pantherinformatics.com> References: <50158cb00810291414u52198f35kd23774b8e538ba87@mail.gmail.com> <5E69F583-38C2-45DD-9F0D-571C35E0FA27@pantherinformatics.com> Message-ID: <50158cb00811060730r11dc4c29od5669fe519ea5fa3@mail.gmail.com> On Wed, Oct 29, 2008 at 1:41 PM, Brian Gilman < gilmanb at pantherinformatics.com> wrote: > Hey Gregg, > > I was wondering why DAS/DAS2 couldn't be represented using different > wire protocols? For instance, why not das in json and das in xml etc. etc? The DAS2.0 spec does allow server responses in alternative content formats. Every capability in the "sources" doc can specify alternative content formats via the element, and also in the "types" doc alternative formats for annotations can be listed on a per-type basis via elements. Retrieving annotations in these alternative formats is done by adding a "format=XYZ" to the retrieval URI's query parameters. We considered using standard HTTP content negotiation but felt that it would be better to have the format embedded in the retrieval URIs rather than in the HTTP headers, and that this approach also allowed more granularity without additional request overhead. This alternative content ability is used heavily by the Integrated Genome Browser (IGB) DAS2.0 client and the Genometry DAS2.0 server to request/deliver very efficient binary formats when possible. If you take a look at IGB's console output when loading annotations from the Affymetrix Genometry server, you'll see that for many annotation types IGB sends feature requests for custom binary formats like "bgn", "bps", or "brs". In more recent work I'm starting to use Google protocol buffers for alternative content formats, and fiddling a little with json too. One of the nice things about the Trellis DAS2 server framework I've been developing is that the framework handles conversion of data model to data format, at least when it understands the requested format. So I hope to generalize the conversion and add support for various formats into Trellis, so they don't have to be reimplemented for every backend data source. Regarding the discussion of SOAP and REST -- I agree that trying to compare the two directly confuses the issue. SOAP is a specific protocol whereas REST is a set of principles for software design. I try nowadays to compare specific "RESTful" protocols to SOAP when needed. However the grant proposal was written for people who don't generally read web tech specifications, back then whatever else SOAP was it definitely wasn't RESTful, and there was no other handy term for an alternative to SOAP, so I glossed over the category differences when comparing SOAP and REST. Gregg Also, SOAP is a wire protocol while REST may not be... I think this confuses > the issue? > > -B > -- > Brian Gilman > President Panther Informatics Inc. > E-Mail: gilmanb at pantherinformatics.com > gilmanb at jforge.net > AIM: gilmanb1 > > > > > > On Oct 29, 2008, at 5:14 PM, Gregg Helt wrote: > > I've been reminded recently that when I say stuff like "well in the DAS/2 >> grant we proposed XYZ" few people can actually look at the old grant and >> see >> what I'm talking about. So I've posted a copy in a more permanent >> location: >> http://biodas.s3.amazonaws.com/das2grant/DAS2+Grant+Proposal+Feb2003.doc >> . >> I've also posted a copy of the final grant progress report: >> >> http://biodas.s3.amazonaws.com/das2grant/DAS2+Grant+Final+Progress+Report+Aug2008.doc >> . >> If you do take a look at the grant, keep in mind it was written over >> five >> years ago. Back then for example REST was still a relatively new concept, >> and SOAP hype was peaking, so there's a fair amount of text dedicated to >> explaining the RESTful approach we wanted to take and why we weren't just >> using SOAP. Since then some of the specifics have definitely changed, but >> I >> think the general concepts hold up pretty well. For instance the current >> thread about meta-annotation had me looking back at the section on >> "Feature >> References" and finding it's still relevant. >> >> I've also added links on the BioDAS wiki DAS/2 pages to the grant and >> progress report. >> >> Gregg >> _______________________________________________ >> DAS2 mailing list >> DAS2 at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das2 >> >> > From jw12 at sanger.ac.uk Wed Nov 12 05:41:59 2008 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Wed, 12 Nov 2008 10:41:59 +0000 Subject: [DAS2] DAS workshop registration 2009 Message-ID: <1226486519.4881.33.camel@deskpro20727.dynamic.sanger.ac.uk> Registration is open for the 2009 DAS workshop (8,9,10th March) at the Genome Campus, Hinxton UK. If you are interested in attending, please find out more by going to http://www.dasregistry.org/course.jsp and register via the web link at the bottom of the page. Closing date for registration is 1st Feb 2009. If you register now you can change the details of your registration any time up until this closing date. Please register early as places will be limited. Also if you would be interested in presenting your work please write to Jonathan Warren at jw12 at sanger.ac.uk with a synopsis and a title. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From gregghelt at gmail.com Wed Nov 12 14:28:15 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Wed, 12 Nov 2008 11:28:15 -0800 Subject: [DAS2] DAS Teleconference Thursday November 13th, 9 AM PST Message-ID: <50158cb00811121128k543321acu6d302bb85e2bffe2@mail.gmail.com> Just a reminder to everyone that there's a DAS teleconference tomorrow, Thursday November 13th, at 9AM PST (4 PM GMT). We're switching back to using Suzi's conference line: US Teleconference #: 866-692-3582 Conference #: 4977624 For UK callins, one of these, not sure which is most appropriate for Hinxton: BIRMINGHAM: 0808-238-6019 GLASGOW: 0808-238-6019 LEEDS: 0808-238-6019 LONDON: 0808-238-6019 MANCHESTER: 0808-238-6019 I'd like to talk about possible modifications to the DAS1+2 sources doc, and writeback. If you would like to add to the agenda, please do. Gregg From Steve_Chervitz at affymetrix.com Wed Nov 12 22:52:42 2008 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Wed, 12 Nov 2008 19:52:42 -0800 Subject: [DAS2] [DAS] DAS Teleconference Thursday November 13th, 9 AM PST In-Reply-To: <50158cb00811121128k543321acu6d302bb85e2bffe2@mail.gmail.com> Message-ID: The teleconf info is now updated on the biodas.org wiki community portal page here: http://www.biodas.org/wiki/BioDAS:Community_Portal#Teleconference Also note: 9am PST is 5pm GMT, not 4. Steve > From: Gregg Helt > Date: Wed, 12 Nov 2008 11:28:15 -0800 > To: , > Subject: [DAS] DAS Teleconference Thursday November 13th, 9 AM PST > > Just a reminder to everyone that there's a DAS teleconference tomorrow, > Thursday November 13th, at 9AM PST (4 PM GMT). We're switching back to > using Suzi's conference line: > > US Teleconference #: 866-692-3582 > Conference #: 4977624 > > For UK callins, one of these, not sure which is most appropriate for > Hinxton: > BIRMINGHAM: 0808-238-6019 > GLASGOW: 0808-238-6019 > LEEDS: 0808-238-6019 > LONDON: 0808-238-6019 > MANCHESTER: 0808-238-6019 > > I'd like to talk about possible modifications to the DAS1+2 sources doc, and > writeback. > If you would like to add to the agenda, please do. > > Gregg > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das ------------------------------------------------------------ This transmission is intended for the sole use of the individual and entity to whom it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. You are hereby notified that any use, dissemination, distribution or duplication of this transmission by someone other than the intended addressee or its designated agent is strictly prohibited. If you have received this transmission in error, please notify the sender immediately by reply to this transmission and delete it from your computer. From gregghelt at gmail.com Thu Nov 13 02:39:18 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Wed, 12 Nov 2008 23:39:18 -0800 Subject: [DAS2] Coordinates and sequence URIs Message-ID: <50158cb00811122339x6e727e82n3427a4358fdd78db@mail.gmail.com> On Thu, Oct 30, 2008 at 2:01 PM, Garret Wilson wrote: > ... > This brings up a related issue regarding the assembly and sequence URIs at > http://www.biodas.org/wiki/GlobalSeqIDs . Before on this list I've brought > up the issue of whether DAS has authority to maintain identifiers in > namespaces from domains controlled by third parties (i.e. NCBI). This still > worries me. > > How confident can we be that the DAS GlobalSeqIDs are stable and will not > change for a while? The GlobalSeqIDs were created because at the time there were no stable URIs for genome assemblies and assembly sequences from authoritative sources like NCBI. As far as I know that's still the case, though since then there's been some movement towards stable URIs at NCBI (see here) and other authoritative sources. Also at the time the GlobalSeqIDs were created the DAS registry used IDs for coordinates but not URIs. But now the DAS registry uses the DAS1.53E/2.0 "sources" document, so every COORDINATES entry has a URI. For example: http://www.dasregistry.org/coordsys/CS_DS40 is the registry coordinates URI corresponding to the GlobalSeqID URI http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/ . Given that we have a central DAS registry I do think it makes sense that maintaining stable URIs for sequences and assemblies (and other collections of sequences) be handled in the registry -- at least when there's no stable URIs from an authoritative source. I think there are better ways to assign URIs then either the way currently used in the DAS1 registry (very opaque) or the DAS2 GlobalSeqIDs (transparent but encroaching on NCBI namespace), but the more important point is that we should only have one strategy for all versions of DAS. We discussed this back in 2006/2007, and I know Andreas Prlic joined in on several teleconference conversations about merging the DAS2 notion of global seq and assembly IDs into the DAS registry and "sources" doc coordinates elements. Secondly, related to URI resolution, I note that I cannot take an assembly > URI such as http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/ and simply > resolve the chromosome ID (e.g. chr1) against it to form the sequence URI. > My application instead has to have specific knowledge of this particular > assembly namespace, knowing that it must first append the path segment > "dna/" to the URI, yielding > http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/dna/chr1 . > > I'd rather my application, once it knew the assembly URI, simply need to > resolve the chromosome ID to the assembly URI to determine the sequence URI, > such as http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/chr1 . > > Garret This illustrates one weakness of the current DAS sources XML -- given the coordinates URI, there is still no ability to directly determine authoritative/reference sequence URIs for those coordinates. These sequence URIs can't be reliably inferred from the coordinates URIs, and I don't think they should be inferred (or constructed) at all. Attempts to infer sequence URIs currently lead to all sorts of trouble, as I've found in working on the Trellis/Ivy DAS1-->DAS2 proxy. For example the proxy assumes that if versioned source V1 has coordinates C and entry_points capability E1, then E1 describes the segments available for coordinates C. Based on this assumption if versioned source V2 also has coordinates C but doesn't have an entry_points capability then the proxy uses E1 from V1 instead since the versioned sources share the same coordinates. Which sometimes works but not always -- what happens if versioned source V3 also has coordinates C but has an entry_points capability E3 that disagrees with E1? I'm seeing the above situation in the DAS1 registry -- for example, for coordinates .../CS_DS40 (NCBI human genome assembly v.36) which has 44 different versioned sources in the registry. 2 of these versioned sources have entry_points capabilities: A) http://hgwdev-gencode.cse.ucsc.edu/cgi-bin/das/hg18/entry_points B) http://www.snpbox.org/cgi-box/das/SNPbox_human_44_36f/entry_points However, these entry_points queries don't return the same thing. They agree on naming for chromosome IDs, but for non-chromosomal sequences the naming starts varying, for instance "M" vs "MT" for the mitochondrial DNA. More importantly, they disagree on the stop/length value for nearly every chromosome! So I think the sequence URIs should be specified -- given the coordinate URIs and capability URIs of a versioned source, there should be a query mechanism to return sequence info for the coordinate URI and this info should include sequence URIs. As illustrated above both the DAS1 entry_points and DAS2 segments queries currently seem too disconnected from the coordinates URIs without some changes to the sources XML. One would be to add to the entry_points and/or segments capabilities of "authoritative" versioned sources a coordinates attribute which would be a relative URI reference to the coordinates for which they are the authoratative list of sequences. This is actually in the RelaxNG schema for DAS2, but currently commented out. Gregg From andy.jenkinson at ebi.ac.uk Thu Nov 13 07:29:58 2008 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Thu, 13 Nov 2008 12:29:58 +0000 Subject: [DAS2] [DAS] Coordinates and sequence URIs In-Reply-To: <50158cb00811122339x6e727e82n3427a4358fdd78db@mail.gmail.com> References: <50158cb00811122339x6e727e82n3427a4358fdd78db@mail.gmail.com> Message-ID: <491C1DC6.90003@ebi.ac.uk> Gregg Helt wrote: > I think there are better ways to assign URIs then > either the way currently used in the DAS1 registry (very opaque) or the DAS2 > GlobalSeqIDs (transparent but encroaching on NCBI namespace), but the more > important point is that we should only have one strategy for all versions of > DAS. Currently DAS1 does not formally include URIs, should it do so we can improve how the registry handles them. > > Attempts to infer sequence URIs currently lead to all sorts of trouble, as > I've found in working on the Trellis/Ivy DAS1-->DAS2 proxy. For example the > proxy assumes that if versioned source V1 has coordinates C and entry_points > capability E1, then E1 describes the segments available for coordinates C. > Based on this assumption if versioned source V2 also has coordinates C but > doesn't have an entry_points capability then the proxy uses E1 from V1 > instead since the versioned sources share the same coordinates. Which > sometimes works but not always -- what happens if versioned source V3 also > has coordinates C but has an entry_points capability E3 that disagrees with > E1? > > I'm seeing the above situation in the DAS1 registry -- for example, for > coordinates .../CS_DS40 (NCBI human genome assembly v.36) which has 44 > different versioned sources in the registry. 2 of these versioned sources > have entry_points capabilities: > A) http://hgwdev-gencode.cse.ucsc.edu/cgi-bin/das/hg18/entry_points > B) http://www.snpbox.org/cgi-box/das/SNPbox_human_44_36f/entry_points > However, these entry_points queries don't return the same thing. They agree > on naming for chromosome IDs, but for non-chromosomal sequences the naming > starts varying, for instance "M" vs "MT" for the mitochondrial DNA. More > importantly, they disagree on the stop/length value for nearly every > chromosome! > > So I think the sequence URIs should be specified -- given the coordinate > URIs and capability URIs of a versioned source, there should be a query > mechanism to return sequence info for the coordinate URI and this info > should include sequence URIs. As illustrated above both the DAS1 > entry_points and DAS2 segments queries currently seem too disconnected from > the coordinates URIs without some changes to the sources XML. One would be > to add to the entry_points and/or segments capabilities of "authoritative" > versioned sources a coordinates attribute which would be a relative URI > reference to the coordinates for which they are the authoratative list of > sequences. This is actually in the RelaxNG schema for DAS2, but currently > commented out. Merging sequence info with sequence URIs won't work for UniProt, it's just too big. We need to either make one source authoritative for a coordinate system, either in sources or coordsys documents, or have the registry validate coordinate system compliance. I'd suggest the latter because it allows for redundancy. Either way we need to make it a requirement that a coordinate system has at least one server providing segments and sequence, which is not currently the case. From Steve_Chervitz at affymetrix.com Thu Nov 13 20:16:13 2008 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 13 Nov 2008 17:16:13 -0800 Subject: [DAS2] Minutes from 13 Nov 2008 DAS teleconference Message-ID: Here are my notes from today?s teleconf. I changed the syntax for action items to indicate the date on which they originated. Should help prevent excessive slippage. Steve ====================================== Minutes from 13 Nov 2008 DAS teleconference Teleconference Info: See http://www.biodas.org/wiki/BioDAS:Community_Portal#Teleconference Attendees: Free agent: Gregg Helt Affymetrix: Steve Chervitz EBI: Andy Jenkinson Sanger: Jonathan Warren LBNL (Suzi's Lab): Ed Lee, Leo(?), Nomi Harris Note taker: Steve Chervitz Action items are flagged with '[A-YYMMDD]' indicating the date they originated. New items arising in the discussion are flagged with '[A-new]'. All pending action items are summarized at the bottom of the minutes. The teleconference schedule and links to past minutes are available from the Community Portal section of the biodas.org site: http://www.biodas.org/wiki/BioDAS:Community_Portal#Teleconference DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit on the the discussion list. ======================================================== Agenda: ======== * Matters arising * Review progress on action items from last week (based on minutes) * Discuss possible modifications to the DAS1+2 sources doc * Writeback issues Matters arising ===================== GBOL Discusion [see: http://gmod.org/wiki/Gbol ] EL: Gbol is still in dev. Requires a chado backend, flybase db. GH: UML diags for gbol? EL: No. For simple obj layer is a direct mimic of chado data model. Some menu stuff for convenience. For the biological layer haven't created any diags yet. Based on a subset of SO. GH: For simple obj layer I should refer to the chado object diag? EL: Each table is an obj, with someconvenience things. EL: A data model, lightweight versatile. Chado is complex, not very user friendly. Gbol layer is geared toward biologist. A gene object, not worrying about underlying structure. Plug and play architecture. Set up factory to take care of I/O. Easy to add new data sources. Read from chado and write to GFF3, e.g., or from two diff data sources. One test: Everyone has diff implementation of chado. Gbol will do on the fly translation, based on controlled vocabularies. GH: Primary use case is Apollo. EL: Planned for Jbrowse (Ajax-based Gbrowse in Ian's lab) All done on the server (I/O), not client using web services. Java based. GH: Looking at data model, not hard to do a DAS1/2 translation. What is current das support? EO: No good chado-specific das servers. Can use Gbrowse as an intermediary. Old gbrowse is being deprecated. Gbol will act as a JSON provider for Gbrowse, but no reason it could not act as a DAS server too. Discussion of action items from 30 Oct 2008 teleconf: ===================================================== [A-081030] All: Review Gregg's DAS UML modeling, post any comments to list. AJ: Looked at it. GH: Let me know if you see anything problematic. Its a pretty realistic representation. AJ: Regarding methods: In das/1 should a method be part of a type or an entity in itself. GH: Das/2 combine method and type into the type. There's an optional method in type. Types use ontological terms (not reps thereof). Das/2 type 'transcript' is not a 1:1 mapping to the SO term (e.g., method=Genscan, type=transcript) . So you may have more types than SO terms. AJ: Can do it another way. DAS/1 id for a type is the ontology ID. Important for translation issues. GH: Haven't done much Where do you see it changing? AJ: An optional element with a required attrib. You might have an ontology to describe method. Might want to say something is result from a type of experiment, type of algorithm, type of sample. May not want to shoehorn them into the type. Das has moved away from complex query capabilities. Servers don't impl types. People tend to make a separate das source for each data type. Moving away from queryability and towards simplicity. GH: Complexity is then pushed into understanding different sources. Good to [A-new] Gregg: Work on translation of method and type in Trellis Ivy proxy. [A-081030] All: Review Gregg's DAS1->DAS2 proxy work (Trellis/Ivy/Vine), post any comments to list. [A-081030] AJ: Continue checking out Gregg's DAS1->DAS2 proxy, esp. the XML. GH: Any feedback? JW: Had a look. Interested in locations. GH: Translating das1 feats starts/stops into location. also translating target starts/stop and group. AJ: Seemed to work quite well. Problem comes when people abuse the spec a bit. GH: If no start/stop, = locationless feature. Phase and score are additional complexity. If they are non-numbers it filters them out now. If numbers, uses das1 score element AJ: What non-numbers in score? GH: Dash is allowed = no score available. Sometimes '*' or '.' AJ: DAS spec sometimes uses '.' or '0', for strand [A-081030] AJ: Post info about March '09 Hinxton DAS workshop to biodas.org/current_events JW: Done. Got a lot of registrations already. Aiming for 30 for accomodations, 50 total (including campus folks). Will hit these numbers easily. GH: Hoping to attend. AJ/JW: may have trouble accomodating everyone who wants to talk. [A-new] Gregg: register for '09 Hinxton DAS workshop soon! [A-081030] GH: Send out action and agenda items well in advance of teleconf. Done. [A-081030] GH: Add auth and security on the agenda so interested folks can call in. [A-081030] GH: Solicit feedback about security/auth from interested parties. GH: Not added to agenda this week. [A-081030] GH: Contribute to the DAS changes document re: DAS/2, sources & deprecating DSN. GH: Still pending. Hopefully next week. [A-081030] GH: Get new teleconf number from Suzi; post to list with agenda. GH: We are going to use Suzi's number going forward. SC: I put this on the biodas.org wiki. Can also post the date of upcomming teleconfs. [A-081030] JN: Post preliminary java web start IGB release on bioviz.org GH: Not on this call today. Next time. [A-081030] SC: Merge DAS2 subscribers to DAS list. Redirect DAS2 posts to DAS list. [A-081030] SC: Consider making DAS list auto reject posts from non-subsribers. [A-081030] SC: Add Andy J and Jonathan W as admins to the DAS mailing list. SC: All pending, though I did update the section of the biodas.org wiki to indicate that the das2 list is being retired and all traffic should be sent to the das list. [A-081030] SC: Change 2 -> 2.1 and say it is "evolving"; declare the HTML spec as "frozen" [A-081030] SC: Send link to the 2.1 wiki spec to list. SC: Done. GH: Need to do the same for the 1.5 vs 1.6 spec. [A-new] SC: Add AJ and JW as biodas.org sysops (can't edit side bar) [A-new] AJ/JW: Put link to 1.6 evolving version of the 1.5 das spec on biodas.org sidebar. [A-081030] SC: Fix Affy IGB launching links on SF page. SC: Have not done. Noticed today that they appear to be fixed (probably by Ann's group -- thanks!) [A-081030] SC: Update biodas.org community portal page with new teleconf number. SC: Done [A-081016] SL: Summarize authentication pros and cons. Review descriptions, make a decision. EL: Was there a write up of this? GH: People posted comments to the list: David Nix, Andy, Steven Blanchard. Suzi is supposed to summarize. [A-081016] SL: Decide Ian or Suzi is PI on grant. Issue reciprocal letters of collab. GH: Suzi's grant action item: (Feb 2009) Feedback from funding people is that they're interested in DAS part of it (distributed annotation). Suzi will have some feedback after conf call on 11/14. Topic: Writeback ================== GH: Given LBL folks are here. How does it work in Apollo, retrieve and edit curations? EL: Rudimentary via das. Supports a number of data sources, load into Apollo data model, modify, translat Data sources: Chado, chado-xml, gff3, genbank records, some others. GH: Thinking about for das/2 writeback: ID assignment and batch operation. How do you do that? EL: Id assignment is a chado (db) issue. Configurable by user (in following format), vs database ID. In the db, at time of writeback writing to chado instance, gets next available ID (pk), meaningless to user, just db internal. GH: DAS/2 writeback spec, if it's new curation, client assigns temp id, post of xml for that feature to server, server responds back with same xml but with temp id in 'old-uri' and new id in the 'uri' field. EL: Similar idea. When working with db, will generate temp id, and modify it. GH: Related to that: changing one feature can have side effects on other features. Change one exon boundaries, changes phase of other exons downstream. EL: Done via client side through Apollo. Didn't like having server do it, since it ties to a particular db, relying on stored procs ties you into a specific DMBS. Decided to do it on client-side. When time to write to db, client queries db to determine available id space. GH: Queries db before it creates a feature? JDBC? EL: Yes and yes. Type 3 drivers. GH: Changing in light of Gbol? EL: Planning major rewrite of Apollo. Gbol will be able to handle it. Apollo won't care about I/O. That's all through Gbol. For das/1->2 translation, should be efficient with our framework. Conversion between different data sources via the data model should be easy. GH: Regarding batch operations: easy via JDBC? Integrity across several operations. EL: Many DBMS don't work well across lots of transactions. Run out of log space. Forces you to do lots of mini-transactions, with transaction management. Can't do massive update of whole genome. We can do per-CDS/protein/gene type edits as atomic operations. GH: Trasactional integrity in DAS/2: a single http call is the atomic unit. Any changes specific there are to be an atomic operation. EL: Will be an issue with large writeback. GH: Our model is a single human curator editing one gene at a time. Not via a major automated pipeline script. Not sure what happens in http when sending large amounts of data back and forth. EL: Problem with timeouts while client is waiting for response. GH: Have considered an arrangement where client receives 'accepted' (HTTP 202) and then a redirect to another source to receive the writeback, or check status. Not in the spec now. AJ: Has been mentioned before, "come back later" not just for writeback. Not doing anything about it yet. Not hard to add something like this, since most libraries support redirection. Just check the header. GH: Only sending data for features that change not everything (delta). EL: ... GH: Some of this will take trials. Getting to work with single user. AJ: Keep it simple, add it as needed. GH: Write back spec discussion on the mailing list (Gustavo). Can be generalized. Very few things in there now. Think we can have the thing that gets posted be the feature XML (DAS/1 or DAS/2). Can strip out, simplify it. RESTful. Have a link for this on wiki. Not yet populated. [A-new]: Gregg write up new writeback proposal on wiki. [A-new]: Steve - wikify the das/2 writeback here first. AJ: Focused around proteins. Just get it working with Dasty (which uses OpenID). Better for him to post them as DAS/1 style features. GH: Like it because: more restful, and not just for features (applies to seqs, types, alignments, etc.) AJ: Use diff http commands to do different things. Post, put, get GH: Problem for post,put,delete: you might want to do all of those in one operation. In the general case. Something that Google data folks are writing over posts, but are effectively doing puts and deletes too. AJ: Simplicity is the way to go. GH: Reduces the number of elements. Pending Action Items: ======================== [A-081016] SL: Decide Ian or Suzi is PI on grant. Issue reciprocal letters of collab. [A-081016] SL: Summarize authentication pros and cons. Review descriptions, make a decision. [A-081030] All: Review Gregg's DAS UML modeling, post any comments to list. [A-081030] GH: Solicit feedback about security/auth from interested parties. Add to agenda. [A-081030] GH: Contribute to the DAS changes document re: DAS/2, sources & deprecating DSN. [A-081030] JN: Post preliminary java web start IGB release on bioviz.org [A-081030] SC: Merge DAS2 subscribers to DAS list. Redirect DAS2 posts to DAS list. [A-081030] SC: Consider making DAS list auto reject posts from non-subsribers. [A-081030] SC: Add Andy J and Jonathan W as admins to the DAS mailing list. [A-081113] AJ/JW: Put link to 1.6 evolving version of the 1.5 das spec on biodas.org sidebar. [A-081113] GH: Work on translation of method and type in Trellis Ivy proxy. [A-081113] GH: register for '09 Hinxton DAS workshop soon! [A-081113] GH: Write up writeback proposal ideas in the DAS/2.1 wiki. [A-081113] SC: Add AJ and JW as biodas.org sysops (so they can edit side bar) [A-081113] SC: Wikify the das/2.0 writeback HTML document in das/2.1 wiki. [A-081113] All: Next teleconf in three weeks: 04-Dec-08 [A-081113] All: Anyone that has items they want discussed, send to Gregg. ======================================= CVS Repository version: $Id: das2-teleconference-2008-11-13.txt,v 1.3 2008/11/14 01:14:55 sac Exp $ ------------------------------------------------------------ This transmission is intended for the sole use of the individual and entity to whom it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. You are hereby notified that any use, dissemination, distribution or duplication of this transmission by someone other than the intended addressee or its designated agent is strictly prohibited. If you have received this transmission in error, please notify the sender immediately by reply to this transmission and delete it from your computer. From andy.jenkinson at ebi.ac.uk Sat Nov 1 15:31:04 2008 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Sat, 01 Nov 2008 15:31:04 +0000 Subject: [DAS2] [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> References: <49087FF9.5080704@ebi.ac.uk> <4908837C.1000902@nbn.ac.za> <490891A4.8050908@ebi.ac.uk> <4909821B.6050300@nbn.ac.za> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> Message-ID: <490C7638.70101@ebi.ac.uk> Gregg Helt wrote: > On Fri, Oct 31, 2008 at 7:22 AM, Andy Jenkinson wrote: > >> I can't find a description of the response to a writeback command in Asia's >> thesis. Does it contain features (as in DAS2) or just a confirmation? > > > Take a look at the writeback spec ( > http://biodas.org/documents/das2/das2_writeback.html ), it's much shorter > than the retrieval spec, just a few pages. > > The general idea is that a server may not be able to do all the > creations/edits/deletes a client is requesting in exactly the same form the > client has specified, and furthermore that changes a client requests in one > feature can possibly trigger changes in other features. Therefore the > semantics of the client request are "here's what I want to do" and the > writeback server responds with "here's what I actually did". In the DAS2 > writeback spec these are communicated mostly by passing back and forth > feature XML, except for deletion getting it's own special bit of XML. I looked at the DAS2 spec, but I was wondering specifically about Asia's implementation - whether it did the same or returned either a simple confirmation or a DAS 1.53 features response. From asia at student.chalmers.se Sun Nov 2 23:52:02 2008 From: asia at student.chalmers.se (Asia Grzibovska) Date: Mon, 3 Nov 2008 00:52:02 +0100 (CET) Subject: [DAS2] [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <490C7638.70101@ebi.ac.uk> References: <49087FF9.5080704@ebi.ac.uk> <4908837C.1000902@nbn.ac.za> <490891A4.8050908@ebi.ac.uk> <4909821B.6050300@nbn.ac.za> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> Message-ID: <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> Hello, I have noticed a writeback thread in the mailing list and read it with a big interest. I agree with most of the ideas expressed by participants. The writeback specification was quite uncertain at the time when I was implementing it, and the aim of my project was rather to prove the concept. In principle it was proved, and writeback could be implemented in the real-life. However, the writeback specification needs to be more concrete. 1)about response to a writeback command. The response was a simple confirmation, it did not contain features, because writeback was not so complex. It saved exactly as "here's what I want to do". If the changes were successfully saved into the database, a simple confirmation was enough. Otherwise nothing was saved and a full error description was sent back. The source code can be found here http://code.google.com/p/daswriteback/source/browse/trunk/servlet/src/uk/ac/sanger/DatabaseUtilities.java 2)about URI. It is correct that "every feature in DAS/2.0 has a unique URI". For simplicity I did not include it into the writeback document, but it would be good to have it in the real implementation. 3)meta-annotations could simplify many things and add more functionality >> >> >> >> Best regards, Asia > Gregg Helt wrote: >> On Fri, Oct 31, 2008 at 7:22 AM, Andy Jenkinson >> wrote: >> >>> I can't find a description of the response to a writeback command in >>> Asia's >>> thesis. Does it contain features (as in DAS2) or just a confirmation? >> >> >> Take a look at the writeback spec ( >> http://biodas.org/documents/das2/das2_writeback.html ), it's much >> shorter >> than the retrieval spec, just a few pages. >> >> The general idea is that a server may not be able to do all the >> creations/edits/deletes a client is requesting in exactly the same form >> the >> client has specified, and furthermore that changes a client requests in >> one >> feature can possibly trigger changes in other features. Therefore the >> semantics of the client request are "here's what I want to do" and the >> writeback server responds with "here's what I actually did". In the >> DAS2 >> writeback spec these are communicated mostly by passing back and forth >> feature XML, except for deletion getting it's own special bit of XML. > > I looked at the DAS2 spec, but I was wondering specifically about Asia's > implementation - whether it did the same or returned either a simple > confirmation or a DAS 1.53 features response. > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > From andy.jenkinson at ebi.ac.uk Tue Nov 4 12:05:01 2008 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Tue, 04 Nov 2008 12:05:01 +0000 Subject: [DAS2] [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <491022D6.8070204@nbn.ac.za> References: <49087FF9.5080704@ebi.ac.uk> <4908837C.1000902@nbn.ac.za> <490891A4.8050908@ebi.ac.uk> <4909821B.6050300@nbn.ac.za> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> <491022D6.8070204@nbn.ac.za> Message-ID: <49103A6D.3060201@ebi.ac.uk> Gustavo Salazar wrote: > xml:base="http://das.sanger.ac.uk:80/das/interpro/features"> > phosphoinositide 3-kinase in leukocytes > > type="inferred from sequence similarity (ECO:0000044)" > > > > > > > > > > > Then for this example the id for the feature will be the uri > http://das.sanger.ac.uk:80/das/interpro/features/G3DSA:1.10.1070.11 When updating a feature the URI field can be whatever would go into the ID field, because both the server and the client need them to be the same. The ID/URI can be assumed to be relative to the base of the document in the same way as HTML relative links are processed. However, your example is a URI in its own right because it contains a colon but no /. Some examples: URI - foo:something base - http://das.sanger.ac.uk/das/interpro/features final URI - foo:something URI - something base - http://das.sanger.ac.uk/das/interpro/features final URI - http://das.sanger.ac.uk/das/interpro/something URI - ./foo:something base - http://das.sanger.ac.uk/das/interpro/features final URI - http://das.sanger.ac.uk/das/interpro/foo:something See page 26 of RFC3986, http://www.apps.ietf.org/rfc/rfc3986.html So long as the absolute URI is derived in the same way for both servers you should be fine using whatever the "base server" uses. > I'm not sure if this is the right way to submit the types... I'm parsing > the types element that is optional in the writeback document, however as > I understood that part of the document is used to add new types, which I > think is out of the scope of my project. Other issue that i have is > where should I put the information of the method maybe another PROP tag > like ?? It looks like only some of the fields in DAS are supported here, some of those missing are constrained by the ontology (in parentheses): type ID (e.g. SO:001234) type label (e.g. exon) type category (e.g. inferred from sequence similarity...) method ID method label Really any of these can be changed so they should be represented. In the interests of keeping your project manageable we can limit the implementation to not adding new types (DAS and DAS/2 have different ways of interpreting them anyway). But I can especially see a case for changing the category (evidence). Since DAS/2 does not have an equivalent for this, you could put it in a PROP element. From gregghelt at gmail.com Tue Nov 4 22:58:05 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Tue, 4 Nov 2008 14:58:05 -0800 Subject: [DAS2] [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <491085EC.2090003@nbn.ac.za> References: <49087FF9.5080704@ebi.ac.uk> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> <491022D6.8070204@nbn.ac.za> <49103A6D.3060201@ebi.ac.uk> <491085EC.2090003@nbn.ac.za> Message-ID: <50158cb00811041458t5e00c5d2l4022039774db2ecb@mail.gmail.com> On Tue, Nov 4, 2008 at 2:24 AM, Gustavo Salazar wrote: > > About the URI, can this URI be built using the uri in the base of the > document + the uri atributte in the feature?? On Tue, Nov 4, 2008 at 4:05 AM, Andy Jenkinson wrote: > When updating a feature the URI field can be whatever would go into the ID > field, because both the server and the client need them to be the same. The > ID/URI can be assumed to be relative to the base of the document in the same > way as HTML relative links are processed. However, your example is a URI in > its own right because it contains a colon but no /. > ... > So long as the absolute URI is derived in the same way for both servers you > should be fine using whatever the "base server" uses. > Please note that in DAS2.0 the way to resolve relative URI references to absolute URIs does not have to be "assumed". As I mentioned before in this thread: > DAS/2.0 uses the XML Base specification to > resolve relative URI references via xml:base attributes and/or the URI the > document is a representation of. > Most software libraries that deal with URIs have a method to resolve a relative URI reference against a base URI to to yield an absolute URI, so that people don't have to hand-code the URI syntax and resolution rules. Just to check, by "base of the document" do you mean the value of the "xml:base" attribute in the root XML element, or the URL used to retrieve the document? According to the XML Base spec these are both incorporated in a hierarchy of URI reference resolution, so it's possible to have a doc with no "xml:base" attributes and still follow the XML Base spec. However I strongly recommend using an "xml:base" with an absolute URI value on the root XML element of a doc to be more assertive about URI reference resolution. This is particularly important for DAS writeback, since at least so far the URI to POST modifications to has been different than the URIs to GET features/types/segments from. Gregg From gregghelt at gmail.com Tue Nov 4 23:02:41 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Tue, 4 Nov 2008 15:02:41 -0800 Subject: [DAS2] Fwd: [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <491022D6.8070204@nbn.ac.za> References: <49087FF9.5080704@ebi.ac.uk> <490891A4.8050908@ebi.ac.uk> <4909821B.6050300@nbn.ac.za> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> <491022D6.8070204@nbn.ac.za> Message-ID: <50158cb00811041502g3b0f7fd1xf1324240cb812474@mail.gmail.com> Forwarding more messages that didn't get cross-posted to das2 list: ---------- Forwarded message ---------- From: Gustavo Salazar Date: Tue, Nov 4, 2008 at 2:24 AM Subject: Re: [DAS] [DAS2] [Fwd: Re: Writeback implementation] To: Asia Grzibovska Cc: das at lists.open-bio.org Hello, Thanks Asia for clarify some of our doubts of your implementation. I agree with the idea of answer with a confirmation or a detailed error, specially if the writeback is going to process atomic requests, because if the "here's what I want to do" is just one task, the answer just could be or DONE or a detailed error (even if the error is about dependencies with other features). About the URI, can this URI be built using the uri in the base of the document + the uri atributte in the feature?? So far I have implemented a parser for a document that follows the DAS/2 schema and uses the properties defined by Asia for the features, an example to add or update(in the DAS/2 schema there is not difference between those commands): phosphoinositide 3-kinase in leukocytes Then for this example the id for the feature will be the uri http://das.sanger.ac.uk:80/das/interpro/features/G3DSA:1.10.1070.11 I'm not sure if this is the right way to submit the types... I'm parsing the types element that is optional in the writeback document, however as I understood that part of the document is used to add new types, which I think is out of the scope of my project. Other issue that i have is where should I put the information of the method maybe another PROP tag like ?? If anybody have any comments about what else should I include in this document please let me know. Cheers, Gustavo. Asia Grzibovska wrote: > Hello, > I have noticed a writeback thread in the mailing list and read it with a > big interest. I agree with most of the ideas expressed by participants. > The writeback specification was quite uncertain at the time when I was > implementing it, and the aim of my project was rather to prove the > concept. In principle it was proved, and writeback could be implemented > in the real-life. However, the writeback specification needs to be more > concrete. > > 1)about response to a writeback command. The response was a simple > confirmation, it did not contain features, because writeback was not so > complex. It saved exactly as "here's what I want to do". If the changes > were successfully saved into the database, a simple confirmation was > enough. Otherwise nothing was saved and a full error description was sent > back. The source code can be found here > > http://code.google.com/p/daswriteback/source/browse/trunk/servlet/src/uk/ac/sanger/DatabaseUtilities.java > > 2)about URI. It is correct that "every feature in DAS/2.0 has a unique > URI". For simplicity I did not include it into the writeback document, but > it would be good to have it in the real implementation. > > 3)meta-annotations could simplify many things and add more functionality > > >> >>> >>> >>> >>> >>> >> > Best regards, > Asia > > > >> Gregg Helt wrote: >> >> >>> On Fri, Oct 31, 2008 at 7:22 AM, Andy Jenkinson >>> wrote: >>> >>> >>> >>>> I can't find a description of the response to a writeback command in >>>> Asia's >>>> thesis. Does it contain features (as in DAS2) or just a confirmation? >>>> >>>> >>> Take a look at the writeback spec ( >>> http://biodas.org/documents/das2/das2_writeback.html ), it's much >>> shorter >>> than the retrieval spec, just a few pages. >>> >>> The general idea is that a server may not be able to do all the >>> creations/edits/deletes a client is requesting in exactly the same form >>> the >>> client has specified, and furthermore that changes a client requests in >>> one >>> feature can possibly trigger changes in other features. Therefore the >>> semantics of the client request are "here's what I want to do" and the >>> writeback server responds with "here's what I actually did". In the >>> DAS2 >>> writeback spec these are communicated mostly by passing back and forth >>> feature XML, except for deletion getting it's own special bit of XML. >>> >>> >> I looked at the DAS2 spec, but I was wondering specifically about Asia's >> implementation - whether it did the same or returned either a simple >> confirmation or a DAS 1.53 features response. >> _______________________________________________ >> DAS2 mailing list >> DAS2 at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das2 >> >> >> > > > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das > > _______________________________________________ DAS mailing list DAS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/das From gregghelt at gmail.com Tue Nov 4 23:04:23 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Tue, 4 Nov 2008 15:04:23 -0800 Subject: [DAS2] Fwd: [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <491085EC.2090003@nbn.ac.za> References: <49087FF9.5080704@ebi.ac.uk> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> <491022D6.8070204@nbn.ac.za> <49103A6D.3060201@ebi.ac.uk> <491085EC.2090003@nbn.ac.za> Message-ID: <50158cb00811041504v5795198fwab29092aa8d73513@mail.gmail.com> Forwarding more messages that didn't get cross-posted to das2 list: ---------- Forwarded message ---------- From: Gustavo Salazar Date: Tue, Nov 4, 2008 at 9:27 AM Subject: Re: [DAS] [DAS2] [Fwd: Re: Writeback implementation] To: Andy Jenkinson Cc: das at lists.open-bio.org Hello, Thanks Andy for the examples about the use of the URI, now is much clear for me. I haven't implemented the small code to built the URI from the URI and base parameters but I will soon. Following the idea of use the PROP tag to put the attributes missed for DAS/2 that are required in DAS/1.53 I have modify my XML example as follows: phosphoinositide 3-kinase in leukocytes As you can notice i've also add the properties start and stop for the position of the feature since i noticed that the range in LOC is for the location of the segment and a segment can have several features in different positions. In my first tests with a document like this I can built the next answer with MyDas that looks correct for DAS/1.53 polypeptide_structural_domain GENE3D 830 1031 0.0 0 - Cheers, Gustavo. _______________________________________________ DAS mailing list DAS at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/das From andy.jenkinson at ebi.ac.uk Wed Nov 5 23:15:37 2008 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Wed, 05 Nov 2008 23:15:37 +0000 Subject: [DAS2] [DAS] [Fwd: Re: Writeback implementation] In-Reply-To: <491085EC.2090003@nbn.ac.za> References: <49087FF9.5080704@ebi.ac.uk> <4908837C.1000902@nbn.ac.za> <490891A4.8050908@ebi.ac.uk> <4909821B.6050300@nbn.ac.za> <50158cb00810310212i3a44b91ev30244955818796db@mail.gmail.com> <490B0CE0.5070708@nbn.ac.za> <490B1490.6090900@gmail.com> <50158cb00810311034y62ecbe24yc13f77d04fd01e6d@mail.gmail.com> <490C7638.70101@ebi.ac.uk> <63877.81.26.234.157.1225669922.squirrel@webmail.chalmers.se> <491022D6.8070204@nbn.ac.za> <49103A6D.3060201@ebi.ac.uk> <491085EC.2090003@nbn.ac.za> Message-ID: <49122919.6040907@ebi.ac.uk> As I understand, the DAS/2 "type" attribute is a URI, to match the "types" schema. This is really the equivalent of the type ID. Also, elements cannot have arbitrary attributes, instead they need two attributes, one called "key" and the other "value": so it would look more like: With regards to the element, this refers to the location of the feature within a segment. In DAS/2 there can be more than one (i.e. the same feature in multiple locations), but for DAS just send one LOC element. The range attribute should be the start/end/strand positions of the _feature_, so you do not need to send separate start and end properties: However I would be careful with the numbers as I believe DAS/2 uses zero-indexed start positions. Gustavo Salazar wrote: > > Hello, > > Thanks Andy for the examples about the use of the URI, now is much clear > for me. I haven't implemented the small code to built the URI from the > URI and base parameters but I will soon. > Following the idea of use the PROP tag to put the attributes missed for > DAS/2 that are required in DAS/1.53 I have modify my XML example as > follows: > > xml:base="http://das.sanger.ac.uk:80/das/interpro/features"> > phosphoinositide 3-kinase in leukocytes > > type="inferred from sequence similarity (ECO:0000044)" > > > > > > > > > > > > > > > > > > As you can notice i've also add the properties start and stop for the > position of the feature since i noticed that the range in LOC is for the > location of the segment and a segment can have several features in > different positions. > In my first tests with a document like this I can built the next answer > with MyDas that looks correct for DAS/1.53 > > > href="http://localhost:8080/MyDas/das/writeback/features?segment=O00329"> > label="O00329"> > > polypeptide_structural_domain > GENE3D > 830 > 1031 > 0.0 > 0 > - > > > > > > Cheers, > > Gustavo. From gregghelt at gmail.com Thu Nov 6 15:30:51 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Thu, 6 Nov 2008 07:30:51 -0800 Subject: [DAS2] Links to old DAS/2 grant and final progress report In-Reply-To: <5E69F583-38C2-45DD-9F0D-571C35E0FA27@pantherinformatics.com> References: <50158cb00810291414u52198f35kd23774b8e538ba87@mail.gmail.com> <5E69F583-38C2-45DD-9F0D-571C35E0FA27@pantherinformatics.com> Message-ID: <50158cb00811060730r11dc4c29od5669fe519ea5fa3@mail.gmail.com> On Wed, Oct 29, 2008 at 1:41 PM, Brian Gilman < gilmanb at pantherinformatics.com> wrote: > Hey Gregg, > > I was wondering why DAS/DAS2 couldn't be represented using different > wire protocols? For instance, why not das in json and das in xml etc. etc? The DAS2.0 spec does allow server responses in alternative content formats. Every capability in the "sources" doc can specify alternative content formats via the element, and also in the "types" doc alternative formats for annotations can be listed on a per-type basis via elements. Retrieving annotations in these alternative formats is done by adding a "format=XYZ" to the retrieval URI's query parameters. We considered using standard HTTP content negotiation but felt that it would be better to have the format embedded in the retrieval URIs rather than in the HTTP headers, and that this approach also allowed more granularity without additional request overhead. This alternative content ability is used heavily by the Integrated Genome Browser (IGB) DAS2.0 client and the Genometry DAS2.0 server to request/deliver very efficient binary formats when possible. If you take a look at IGB's console output when loading annotations from the Affymetrix Genometry server, you'll see that for many annotation types IGB sends feature requests for custom binary formats like "bgn", "bps", or "brs". In more recent work I'm starting to use Google protocol buffers for alternative content formats, and fiddling a little with json too. One of the nice things about the Trellis DAS2 server framework I've been developing is that the framework handles conversion of data model to data format, at least when it understands the requested format. So I hope to generalize the conversion and add support for various formats into Trellis, so they don't have to be reimplemented for every backend data source. Regarding the discussion of SOAP and REST -- I agree that trying to compare the two directly confuses the issue. SOAP is a specific protocol whereas REST is a set of principles for software design. I try nowadays to compare specific "RESTful" protocols to SOAP when needed. However the grant proposal was written for people who don't generally read web tech specifications, back then whatever else SOAP was it definitely wasn't RESTful, and there was no other handy term for an alternative to SOAP, so I glossed over the category differences when comparing SOAP and REST. Gregg Also, SOAP is a wire protocol while REST may not be... I think this confuses > the issue? > > -B > -- > Brian Gilman > President Panther Informatics Inc. > E-Mail: gilmanb at pantherinformatics.com > gilmanb at jforge.net > AIM: gilmanb1 > > > > > > On Oct 29, 2008, at 5:14 PM, Gregg Helt wrote: > > I've been reminded recently that when I say stuff like "well in the DAS/2 >> grant we proposed XYZ" few people can actually look at the old grant and >> see >> what I'm talking about. So I've posted a copy in a more permanent >> location: >> http://biodas.s3.amazonaws.com/das2grant/DAS2+Grant+Proposal+Feb2003.doc >> . >> I've also posted a copy of the final grant progress report: >> >> http://biodas.s3.amazonaws.com/das2grant/DAS2+Grant+Final+Progress+Report+Aug2008.doc >> . >> If you do take a look at the grant, keep in mind it was written over >> five >> years ago. Back then for example REST was still a relatively new concept, >> and SOAP hype was peaking, so there's a fair amount of text dedicated to >> explaining the RESTful approach we wanted to take and why we weren't just >> using SOAP. Since then some of the specifics have definitely changed, but >> I >> think the general concepts hold up pretty well. For instance the current >> thread about meta-annotation had me looking back at the section on >> "Feature >> References" and finding it's still relevant. >> >> I've also added links on the BioDAS wiki DAS/2 pages to the grant and >> progress report. >> >> Gregg >> _______________________________________________ >> DAS2 mailing list >> DAS2 at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/das2 >> >> > From jw12 at sanger.ac.uk Wed Nov 12 10:41:59 2008 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Wed, 12 Nov 2008 10:41:59 +0000 Subject: [DAS2] DAS workshop registration 2009 Message-ID: <1226486519.4881.33.camel@deskpro20727.dynamic.sanger.ac.uk> Registration is open for the 2009 DAS workshop (8,9,10th March) at the Genome Campus, Hinxton UK. If you are interested in attending, please find out more by going to http://www.dasregistry.org/course.jsp and register via the web link at the bottom of the page. Closing date for registration is 1st Feb 2009. If you register now you can change the details of your registration any time up until this closing date. Please register early as places will be limited. Also if you would be interested in presenting your work please write to Jonathan Warren at jw12 at sanger.ac.uk with a synopsis and a title. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From gregghelt at gmail.com Wed Nov 12 19:28:15 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Wed, 12 Nov 2008 11:28:15 -0800 Subject: [DAS2] DAS Teleconference Thursday November 13th, 9 AM PST Message-ID: <50158cb00811121128k543321acu6d302bb85e2bffe2@mail.gmail.com> Just a reminder to everyone that there's a DAS teleconference tomorrow, Thursday November 13th, at 9AM PST (4 PM GMT). We're switching back to using Suzi's conference line: US Teleconference #: 866-692-3582 Conference #: 4977624 For UK callins, one of these, not sure which is most appropriate for Hinxton: BIRMINGHAM: 0808-238-6019 GLASGOW: 0808-238-6019 LEEDS: 0808-238-6019 LONDON: 0808-238-6019 MANCHESTER: 0808-238-6019 I'd like to talk about possible modifications to the DAS1+2 sources doc, and writeback. If you would like to add to the agenda, please do. Gregg From Steve_Chervitz at affymetrix.com Thu Nov 13 03:52:42 2008 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Wed, 12 Nov 2008 19:52:42 -0800 Subject: [DAS2] [DAS] DAS Teleconference Thursday November 13th, 9 AM PST In-Reply-To: <50158cb00811121128k543321acu6d302bb85e2bffe2@mail.gmail.com> Message-ID: The teleconf info is now updated on the biodas.org wiki community portal page here: http://www.biodas.org/wiki/BioDAS:Community_Portal#Teleconference Also note: 9am PST is 5pm GMT, not 4. Steve > From: Gregg Helt > Date: Wed, 12 Nov 2008 11:28:15 -0800 > To: , > Subject: [DAS] DAS Teleconference Thursday November 13th, 9 AM PST > > Just a reminder to everyone that there's a DAS teleconference tomorrow, > Thursday November 13th, at 9AM PST (4 PM GMT). We're switching back to > using Suzi's conference line: > > US Teleconference #: 866-692-3582 > Conference #: 4977624 > > For UK callins, one of these, not sure which is most appropriate for > Hinxton: > BIRMINGHAM: 0808-238-6019 > GLASGOW: 0808-238-6019 > LEEDS: 0808-238-6019 > LONDON: 0808-238-6019 > MANCHESTER: 0808-238-6019 > > I'd like to talk about possible modifications to the DAS1+2 sources doc, and > writeback. > If you would like to add to the agenda, please do. > > Gregg > _______________________________________________ > DAS mailing list > DAS at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das ------------------------------------------------------------ This transmission is intended for the sole use of the individual and entity to whom it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. You are hereby notified that any use, dissemination, distribution or duplication of this transmission by someone other than the intended addressee or its designated agent is strictly prohibited. If you have received this transmission in error, please notify the sender immediately by reply to this transmission and delete it from your computer. From gregghelt at gmail.com Thu Nov 13 07:39:18 2008 From: gregghelt at gmail.com (Gregg Helt) Date: Wed, 12 Nov 2008 23:39:18 -0800 Subject: [DAS2] Coordinates and sequence URIs Message-ID: <50158cb00811122339x6e727e82n3427a4358fdd78db@mail.gmail.com> On Thu, Oct 30, 2008 at 2:01 PM, Garret Wilson wrote: > ... > This brings up a related issue regarding the assembly and sequence URIs at > http://www.biodas.org/wiki/GlobalSeqIDs . Before on this list I've brought > up the issue of whether DAS has authority to maintain identifiers in > namespaces from domains controlled by third parties (i.e. NCBI). This still > worries me. > > How confident can we be that the DAS GlobalSeqIDs are stable and will not > change for a while? The GlobalSeqIDs were created because at the time there were no stable URIs for genome assemblies and assembly sequences from authoritative sources like NCBI. As far as I know that's still the case, though since then there's been some movement towards stable URIs at NCBI (see here) and other authoritative sources. Also at the time the GlobalSeqIDs were created the DAS registry used IDs for coordinates but not URIs. But now the DAS registry uses the DAS1.53E/2.0 "sources" document, so every COORDINATES entry has a URI. For example: http://www.dasregistry.org/coordsys/CS_DS40 is the registry coordinates URI corresponding to the GlobalSeqID URI http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/ . Given that we have a central DAS registry I do think it makes sense that maintaining stable URIs for sequences and assemblies (and other collections of sequences) be handled in the registry -- at least when there's no stable URIs from an authoritative source. I think there are better ways to assign URIs then either the way currently used in the DAS1 registry (very opaque) or the DAS2 GlobalSeqIDs (transparent but encroaching on NCBI namespace), but the more important point is that we should only have one strategy for all versions of DAS. We discussed this back in 2006/2007, and I know Andreas Prlic joined in on several teleconference conversations about merging the DAS2 notion of global seq and assembly IDs into the DAS registry and "sources" doc coordinates elements. Secondly, related to URI resolution, I note that I cannot take an assembly > URI such as http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/ and simply > resolve the chromosome ID (e.g. chr1) against it to form the sequence URI. > My application instead has to have specific knowledge of this particular > assembly namespace, knowing that it must first append the path segment > "dna/" to the URI, yielding > http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/dna/chr1 . > > I'd rather my application, once it knew the assembly URI, simply need to > resolve the chromosome ID to the assembly URI to determine the sequence URI, > such as http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/chr1 . > > Garret This illustrates one weakness of the current DAS sources XML -- given the coordinates URI, there is still no ability to directly determine authoritative/reference sequence URIs for those coordinates. These sequence URIs can't be reliably inferred from the coordinates URIs, and I don't think they should be inferred (or constructed) at all. Attempts to infer sequence URIs currently lead to all sorts of trouble, as I've found in working on the Trellis/Ivy DAS1-->DAS2 proxy. For example the proxy assumes that if versioned source V1 has coordinates C and entry_points capability E1, then E1 describes the segments available for coordinates C. Based on this assumption if versioned source V2 also has coordinates C but doesn't have an entry_points capability then the proxy uses E1 from V1 instead since the versioned sources share the same coordinates. Which sometimes works but not always -- what happens if versioned source V3 also has coordinates C but has an entry_points capability E3 that disagrees with E1? I'm seeing the above situation in the DAS1 registry -- for example, for coordinates .../CS_DS40 (NCBI human genome assembly v.36) which has 44 different versioned sources in the registry. 2 of these versioned sources have entry_points capabilities: A) http://hgwdev-gencode.cse.ucsc.edu/cgi-bin/das/hg18/entry_points B) http://www.snpbox.org/cgi-box/das/SNPbox_human_44_36f/entry_points However, these entry_points queries don't return the same thing. They agree on naming for chromosome IDs, but for non-chromosomal sequences the naming starts varying, for instance "M" vs "MT" for the mitochondrial DNA. More importantly, they disagree on the stop/length value for nearly every chromosome! So I think the sequence URIs should be specified -- given the coordinate URIs and capability URIs of a versioned source, there should be a query mechanism to return sequence info for the coordinate URI and this info should include sequence URIs. As illustrated above both the DAS1 entry_points and DAS2 segments queries currently seem too disconnected from the coordinates URIs without some changes to the sources XML. One would be to add to the entry_points and/or segments capabilities of "authoritative" versioned sources a coordinates attribute which would be a relative URI reference to the coordinates for which they are the authoratative list of sequences. This is actually in the RelaxNG schema for DAS2, but currently commented out. Gregg From andy.jenkinson at ebi.ac.uk Thu Nov 13 12:29:58 2008 From: andy.jenkinson at ebi.ac.uk (Andy Jenkinson) Date: Thu, 13 Nov 2008 12:29:58 +0000 Subject: [DAS2] [DAS] Coordinates and sequence URIs In-Reply-To: <50158cb00811122339x6e727e82n3427a4358fdd78db@mail.gmail.com> References: <50158cb00811122339x6e727e82n3427a4358fdd78db@mail.gmail.com> Message-ID: <491C1DC6.90003@ebi.ac.uk> Gregg Helt wrote: > I think there are better ways to assign URIs then > either the way currently used in the DAS1 registry (very opaque) or the DAS2 > GlobalSeqIDs (transparent but encroaching on NCBI namespace), but the more > important point is that we should only have one strategy for all versions of > DAS. Currently DAS1 does not formally include URIs, should it do so we can improve how the registry handles them. > > Attempts to infer sequence URIs currently lead to all sorts of trouble, as > I've found in working on the Trellis/Ivy DAS1-->DAS2 proxy. For example the > proxy assumes that if versioned source V1 has coordinates C and entry_points > capability E1, then E1 describes the segments available for coordinates C. > Based on this assumption if versioned source V2 also has coordinates C but > doesn't have an entry_points capability then the proxy uses E1 from V1 > instead since the versioned sources share the same coordinates. Which > sometimes works but not always -- what happens if versioned source V3 also > has coordinates C but has an entry_points capability E3 that disagrees with > E1? > > I'm seeing the above situation in the DAS1 registry -- for example, for > coordinates .../CS_DS40 (NCBI human genome assembly v.36) which has 44 > different versioned sources in the registry. 2 of these versioned sources > have entry_points capabilities: > A) http://hgwdev-gencode.cse.ucsc.edu/cgi-bin/das/hg18/entry_points > B) http://www.snpbox.org/cgi-box/das/SNPbox_human_44_36f/entry_points > However, these entry_points queries don't return the same thing. They agree > on naming for chromosome IDs, but for non-chromosomal sequences the naming > starts varying, for instance "M" vs "MT" for the mitochondrial DNA. More > importantly, they disagree on the stop/length value for nearly every > chromosome! > > So I think the sequence URIs should be specified -- given the coordinate > URIs and capability URIs of a versioned source, there should be a query > mechanism to return sequence info for the coordinate URI and this info > should include sequence URIs. As illustrated above both the DAS1 > entry_points and DAS2 segments queries currently seem too disconnected from > the coordinates URIs without some changes to the sources XML. One would be > to add to the entry_points and/or segments capabilities of "authoritative" > versioned sources a coordinates attribute which would be a relative URI > reference to the coordinates for which they are the authoratative list of > sequences. This is actually in the RelaxNG schema for DAS2, but currently > commented out. Merging sequence info with sequence URIs won't work for UniProt, it's just too big. We need to either make one source authoritative for a coordinate system, either in sources or coordsys documents, or have the registry validate coordinate system compliance. I'd suggest the latter because it allows for redundancy. Either way we need to make it a requirement that a coordinate system has at least one server providing segments and sequence, which is not currently the case. From Steve_Chervitz at affymetrix.com Fri Nov 14 01:16:13 2008 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 13 Nov 2008 17:16:13 -0800 Subject: [DAS2] Minutes from 13 Nov 2008 DAS teleconference Message-ID: Here are my notes from today?s teleconf. I changed the syntax for action items to indicate the date on which they originated. Should help prevent excessive slippage. Steve ====================================== Minutes from 13 Nov 2008 DAS teleconference Teleconference Info: See http://www.biodas.org/wiki/BioDAS:Community_Portal#Teleconference Attendees: Free agent: Gregg Helt Affymetrix: Steve Chervitz EBI: Andy Jenkinson Sanger: Jonathan Warren LBNL (Suzi's Lab): Ed Lee, Leo(?), Nomi Harris Note taker: Steve Chervitz Action items are flagged with '[A-YYMMDD]' indicating the date they originated. New items arising in the discussion are flagged with '[A-new]'. All pending action items are summarized at the bottom of the minutes. The teleconference schedule and links to past minutes are available from the Community Portal section of the biodas.org site: http://www.biodas.org/wiki/BioDAS:Community_Portal#Teleconference DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit on the the discussion list. ======================================================== Agenda: ======== * Matters arising * Review progress on action items from last week (based on minutes) * Discuss possible modifications to the DAS1+2 sources doc * Writeback issues Matters arising ===================== GBOL Discusion [see: http://gmod.org/wiki/Gbol ] EL: Gbol is still in dev. Requires a chado backend, flybase db. GH: UML diags for gbol? EL: No. For simple obj layer is a direct mimic of chado data model. Some menu stuff for convenience. For the biological layer haven't created any diags yet. Based on a subset of SO. GH: For simple obj layer I should refer to the chado object diag? EL: Each table is an obj, with someconvenience things. EL: A data model, lightweight versatile. Chado is complex, not very user friendly. Gbol layer is geared toward biologist. A gene object, not worrying about underlying structure. Plug and play architecture. Set up factory to take care of I/O. Easy to add new data sources. Read from chado and write to GFF3, e.g., or from two diff data sources. One test: Everyone has diff implementation of chado. Gbol will do on the fly translation, based on controlled vocabularies. GH: Primary use case is Apollo. EL: Planned for Jbrowse (Ajax-based Gbrowse in Ian's lab) All done on the server (I/O), not client using web services. Java based. GH: Looking at data model, not hard to do a DAS1/2 translation. What is current das support? EO: No good chado-specific das servers. Can use Gbrowse as an intermediary. Old gbrowse is being deprecated. Gbol will act as a JSON provider for Gbrowse, but no reason it could not act as a DAS server too. Discussion of action items from 30 Oct 2008 teleconf: ===================================================== [A-081030] All: Review Gregg's DAS UML modeling, post any comments to list. AJ: Looked at it. GH: Let me know if you see anything problematic. Its a pretty realistic representation. AJ: Regarding methods: In das/1 should a method be part of a type or an entity in itself. GH: Das/2 combine method and type into the type. There's an optional method in type. Types use ontological terms (not reps thereof). Das/2 type 'transcript' is not a 1:1 mapping to the SO term (e.g., method=Genscan, type=transcript) . So you may have more types than SO terms. AJ: Can do it another way. DAS/1 id for a type is the ontology ID. Important for translation issues. GH: Haven't done much Where do you see it changing? AJ: An optional element with a required attrib. You might have an ontology to describe method. Might want to say something is result from a type of experiment, type of algorithm, type of sample. May not want to shoehorn them into the type. Das has moved away from complex query capabilities. Servers don't impl types. People tend to make a separate das source for each data type. Moving away from queryability and towards simplicity. GH: Complexity is then pushed into understanding different sources. Good to [A-new] Gregg: Work on translation of method and type in Trellis Ivy proxy. [A-081030] All: Review Gregg's DAS1->DAS2 proxy work (Trellis/Ivy/Vine), post any comments to list. [A-081030] AJ: Continue checking out Gregg's DAS1->DAS2 proxy, esp. the XML. GH: Any feedback? JW: Had a look. Interested in locations. GH: Translating das1 feats starts/stops into location. also translating target starts/stop and group. AJ: Seemed to work quite well. Problem comes when people abuse the spec a bit. GH: If no start/stop, = locationless feature. Phase and score are additional complexity. If they are non-numbers it filters them out now. If numbers, uses das1 score element AJ: What non-numbers in score? GH: Dash is allowed = no score available. Sometimes '*' or '.' AJ: DAS spec sometimes uses '.' or '0', for strand [A-081030] AJ: Post info about March '09 Hinxton DAS workshop to biodas.org/current_events JW: Done. Got a lot of registrations already. Aiming for 30 for accomodations, 50 total (including campus folks). Will hit these numbers easily. GH: Hoping to attend. AJ/JW: may have trouble accomodating everyone who wants to talk. [A-new] Gregg: register for '09 Hinxton DAS workshop soon! [A-081030] GH: Send out action and agenda items well in advance of teleconf. Done. [A-081030] GH: Add auth and security on the agenda so interested folks can call in. [A-081030] GH: Solicit feedback about security/auth from interested parties. GH: Not added to agenda this week. [A-081030] GH: Contribute to the DAS changes document re: DAS/2, sources & deprecating DSN. GH: Still pending. Hopefully next week. [A-081030] GH: Get new teleconf number from Suzi; post to list with agenda. GH: We are going to use Suzi's number going forward. SC: I put this on the biodas.org wiki. Can also post the date of upcomming teleconfs. [A-081030] JN: Post preliminary java web start IGB release on bioviz.org GH: Not on this call today. Next time. [A-081030] SC: Merge DAS2 subscribers to DAS list. Redirect DAS2 posts to DAS list. [A-081030] SC: Consider making DAS list auto reject posts from non-subsribers. [A-081030] SC: Add Andy J and Jonathan W as admins to the DAS mailing list. SC: All pending, though I did update the section of the biodas.org wiki to indicate that the das2 list is being retired and all traffic should be sent to the das list. [A-081030] SC: Change 2 -> 2.1 and say it is "evolving"; declare the HTML spec as "frozen" [A-081030] SC: Send link to the 2.1 wiki spec to list. SC: Done. GH: Need to do the same for the 1.5 vs 1.6 spec. [A-new] SC: Add AJ and JW as biodas.org sysops (can't edit side bar) [A-new] AJ/JW: Put link to 1.6 evolving version of the 1.5 das spec on biodas.org sidebar. [A-081030] SC: Fix Affy IGB launching links on SF page. SC: Have not done. Noticed today that they appear to be fixed (probably by Ann's group -- thanks!) [A-081030] SC: Update biodas.org community portal page with new teleconf number. SC: Done [A-081016] SL: Summarize authentication pros and cons. Review descriptions, make a decision. EL: Was there a write up of this? GH: People posted comments to the list: David Nix, Andy, Steven Blanchard. Suzi is supposed to summarize. [A-081016] SL: Decide Ian or Suzi is PI on grant. Issue reciprocal letters of collab. GH: Suzi's grant action item: (Feb 2009) Feedback from funding people is that they're interested in DAS part of it (distributed annotation). Suzi will have some feedback after conf call on 11/14. Topic: Writeback ================== GH: Given LBL folks are here. How does it work in Apollo, retrieve and edit curations? EL: Rudimentary via das. Supports a number of data sources, load into Apollo data model, modify, translat Data sources: Chado, chado-xml, gff3, genbank records, some others. GH: Thinking about for das/2 writeback: ID assignment and batch operation. How do you do that? EL: Id assignment is a chado (db) issue. Configurable by user (in following format), vs database ID. In the db, at time of writeback writing to chado instance, gets next available ID (pk), meaningless to user, just db internal. GH: DAS/2 writeback spec, if it's new curation, client assigns temp id, post of xml for that feature to server, server responds back with same xml but with temp id in 'old-uri' and new id in the 'uri' field. EL: Similar idea. When working with db, will generate temp id, and modify it. GH: Related to that: changing one feature can have side effects on other features. Change one exon boundaries, changes phase of other exons downstream. EL: Done via client side through Apollo. Didn't like having server do it, since it ties to a particular db, relying on stored procs ties you into a specific DMBS. Decided to do it on client-side. When time to write to db, client queries db to determine available id space. GH: Queries db before it creates a feature? JDBC? EL: Yes and yes. Type 3 drivers. GH: Changing in light of Gbol? EL: Planning major rewrite of Apollo. Gbol will be able to handle it. Apollo won't care about I/O. That's all through Gbol. For das/1->2 translation, should be efficient with our framework. Conversion between different data sources via the data model should be easy. GH: Regarding batch operations: easy via JDBC? Integrity across several operations. EL: Many DBMS don't work well across lots of transactions. Run out of log space. Forces you to do lots of mini-transactions, with transaction management. Can't do massive update of whole genome. We can do per-CDS/protein/gene type edits as atomic operations. GH: Trasactional integrity in DAS/2: a single http call is the atomic unit. Any changes specific there are to be an atomic operation. EL: Will be an issue with large writeback. GH: Our model is a single human curator editing one gene at a time. Not via a major automated pipeline script. Not sure what happens in http when sending large amounts of data back and forth. EL: Problem with timeouts while client is waiting for response. GH: Have considered an arrangement where client receives 'accepted' (HTTP 202) and then a redirect to another source to receive the writeback, or check status. Not in the spec now. AJ: Has been mentioned before, "come back later" not just for writeback. Not doing anything about it yet. Not hard to add something like this, since most libraries support redirection. Just check the header. GH: Only sending data for features that change not everything (delta). EL: ... GH: Some of this will take trials. Getting to work with single user. AJ: Keep it simple, add it as needed. GH: Write back spec discussion on the mailing list (Gustavo). Can be generalized. Very few things in there now. Think we can have the thing that gets posted be the feature XML (DAS/1 or DAS/2). Can strip out, simplify it. RESTful. Have a link for this on wiki. Not yet populated. [A-new]: Gregg write up new writeback proposal on wiki. [A-new]: Steve - wikify the das/2 writeback here first. AJ: Focused around proteins. Just get it working with Dasty (which uses OpenID). Better for him to post them as DAS/1 style features. GH: Like it because: more restful, and not just for features (applies to seqs, types, alignments, etc.) AJ: Use diff http commands to do different things. Post, put, get GH: Problem for post,put,delete: you might want to do all of those in one operation. In the general case. Something that Google data folks are writing over posts, but are effectively doing puts and deletes too. AJ: Simplicity is the way to go. GH: Reduces the number of elements. Pending Action Items: ======================== [A-081016] SL: Decide Ian or Suzi is PI on grant. Issue reciprocal letters of collab. [A-081016] SL: Summarize authentication pros and cons. Review descriptions, make a decision. [A-081030] All: Review Gregg's DAS UML modeling, post any comments to list. [A-081030] GH: Solicit feedback about security/auth from interested parties. Add to agenda. [A-081030] GH: Contribute to the DAS changes document re: DAS/2, sources & deprecating DSN. [A-081030] JN: Post preliminary java web start IGB release on bioviz.org [A-081030] SC: Merge DAS2 subscribers to DAS list. Redirect DAS2 posts to DAS list. [A-081030] SC: Consider making DAS list auto reject posts from non-subsribers. [A-081030] SC: Add Andy J and Jonathan W as admins to the DAS mailing list. [A-081113] AJ/JW: Put link to 1.6 evolving version of the 1.5 das spec on biodas.org sidebar. [A-081113] GH: Work on translation of method and type in Trellis Ivy proxy. [A-081113] GH: register for '09 Hinxton DAS workshop soon! [A-081113] GH: Write up writeback proposal ideas in the DAS/2.1 wiki. [A-081113] SC: Add AJ and JW as biodas.org sysops (so they can edit side bar) [A-081113] SC: Wikify the das/2.0 writeback HTML document in das/2.1 wiki. [A-081113] All: Next teleconf in three weeks: 04-Dec-08 [A-081113] All: Anyone that has items they want discussed, send to Gregg. ======================================= CVS Repository version: $Id: das2-teleconference-2008-11-13.txt,v 1.3 2008/11/14 01:14:55 sac Exp $ ------------------------------------------------------------ This transmission is intended for the sole use of the individual and entity to whom it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. You are hereby notified that any use, dissemination, distribution or duplication of this transmission by someone other than the intended addressee or its designated agent is strictly prohibited. If you have received this transmission in error, please notify the sender immediately by reply to this transmission and delete it from your computer.