[DAS] maxbins in DAS1.6?

Andy Jenkinson andy.jenkinson at ebi.ac.uk
Wed Sep 16 11:07:55 UTC 2009


But once you're reading the sources document (which is where you would  
be adding the "root URI") you don't need it... knowing the root URI is  
only useful to formulate the query URI (which you already know) and to  
find a place to go to get the metadata (which you already know). The  
sources document you are parsing can be obtained from a registry -  
whether that is the public DAS registry or individual "servers", there  
is no problem for them to support 'single source' access. That is a  
separate issue from having an explicit extra URI in the sources  
document.

I don't think I'm explaining it very well :/ And to be honest it's  
academic for the moment since, as you rightly say, we are only  
'consolidating' in 1.6.

On 16 Sep 2009, at 11:46, Jonathan Warren wrote:

> which takes us back to my very first point that it would need to be  
> a command url in itself and specified otherwise how do you get the  
> info for a single source.
>
> On 16 Sep 2009, at 11:28, Andy Jenkinson wrote:
>
>> Taking aside the issue surrounding the paradigm I mentioned and  
>> Thomas expanded on, why do you actually need to have a URL for the  
>> "server" itself? Given you already have all the metadata and  
>> command URLs you can't learn anything more from it.
>>
>> On 16 Sep 2009, at 10:28, Jonathan Warren wrote:
>>
>>> I think Thomas is right in that we can't change the das1 base url  
>>> principle at least for 1.6 anyway, as it is supposed to be a  
>>> consolidation.
>>>
>>> As there have been no objections to using for example http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat 
>>>  as a single source request we can put that into 1.6. The only  
>>> real change would need to be in the registry. See explanation  
>>> below. But we can get around that.
>>>
>>>> What I meant was that the root URI isn't actually used for  
>>>> anything, at best it's just the location of the description  
>>>> you're already reading.
>>> Except for the registry sources command where there is then no  
>>> link back to where the server you are talking about is (as you are  
>>> not at the server) apart from the query_uri's (example 1 below).
>>>
>>> das2 has "xml:base", but that is then for all sources so wouldn't  
>>> work for the registry see example 2 below. We could always add  
>>> another prop to the registry I guess ;)
>>>
>>>
>>> example1 registry sources:
>>> <SOURCES>
>>> <SOURCE uri="DS_109" title="uniprot aristotle" doc_href="http://www.ebi.ac.uk/uniprot-das/ 
>>> " description="This datasource (aristotle) is a legacy  datasource  
>>> that comprises the new  'uniprot', 'ipi' and 'uniparc'   
>>> datasources that are available from the  http://www.ebi.ac.uk/das-srv/uniprot/das 
>>>   server.  Despite being a legacy dsn,  there are no plans to  
>>> remove this DAS  datasource from service.">
>>>   <MAINTAINER email="rantunes at ebi.ac.uk" />
>>>   <VERSION uri="DS_109" created="2005-03-21T16:26:03+0000">
>>>     <COORDINATES uri="http://www.dasregistry.org/dasregistry/coordsys/CS_DS93 
>>> " source="Protein Sequence" authority="UniParc"  
>>> test_range="UPI00000017EA">UniParc,Protein Sequence</COORDINATES>
>>>     <COORDINATES uri="http://www.dasregistry.org/dasregistry/coordsys/CS_DS35 
>>> " source="Protein Sequence" authority="IPI"  
>>> test_range="IPI00000021">IPI,Protein Sequence</COORDINATES>
>>>     <COORDINATES uri="http://www.dasregistry.org/dasregistry/coordsys/CS_DS6 
>>> " source="Protein Sequence" authority="UniProt"  
>>> test_range="P00280">UniProt,Protein Sequence</COORDINATES>
>>>     <CAPABILITY type="das1:stylesheet" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/stylesheet 
>>> " />
>>>     <CAPABILITY type="das1:features" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/features 
>>> " />
>>>     <CAPABILITY type="das1:types" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/types 
>>> " />
>>>     <CAPABILITY type="das1:sequence" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/sequence 
>>> " />
>>>     <CAPABILITY type="das1:entry_points" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/entry_points 
>>> " />
>>>     <CAPABILITY type="das1:unknown_segment" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/unknown_segment 
>>> " />
>>>     <CAPABILITY type="das1:error_segment" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/error_segment 
>>> " />
>>>     <PROP name="label" value="Predicted" />
>>>     <PROP name="label" value="Manually curated" />
>>>     <PROP name="label" value="ENSEMBL" />
>>>     <PROP name="leaseTime" value="2009-09-15T11:00:15+0000" />
>>>     <PROP name="projectHome" value="http://www.biosapiens.info" />
>>>     <PROP name="projectIcon" value="http://www.dasregistry.org/ProjectIcon?id=74 
>>> " />
>>>     <PROP name="projectDesc" value="BioSapiens is a Network of  
>>> Excellence, funded by the European Union's 6th Framework  
>>> Programme, and made up of bioinformatics researchers from 25  
>>> institutions based in 14 countries throughout Europe.
>>>
>>> The objective of the BioSapiens is to provide a large" />
>>>     <PROP name="projectName" value="BioSapiens" />
>>>     <PROP name="valid" value="stylesheet" />
>>>     <PROP name="valid" value="features" />
>>>     <PROP name="valid" value="types" />
>>>     <PROP name="valid" value="sequence" />
>>>     <PROP name="valid" value="entry_points" />
>>>     <PROP name="valid" value="error_segment" />
>>>   </VERSION>
>>> </SOURCE>
>>>
>>>
>>>
>>>
>>>
>>> das2 has xml:base, but that is then for all sources so wouldn't  
>>> work for the registry:
>>>
>>> xml:base="http://bioserver.hci.utah.edu:8080/DAS2/das2/" >
>>> <MAINTAINER email="david.nix at hci.utah.edu" />
>>> <SOURCE uri="H_sapiens" title="H_sapiens" >
>>>     <VERSION uri="H_sapiens_Mar_2006" title="H_sapiens_Mar_2006"  
>>> created="2008-01-03 14:39:44" >
>>>          <COORDINATES uri="http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/ 
>>> " authority="NCBI" taxid="9606" version="36" source="Chromosome" />
>>>          <CAPABILITY type="segments" query_uri="H_sapiens_Mar_2006/ 
>>> segments" />
>>>          <CAPABILITY type="types" query_uri="H_sapiens_Mar_2006/ 
>>> types" />
>>>          <CAPABILITY type="features" query_uri="H_sapiens_Mar_2006/ 
>>> features" />
>>>     </VERSION>
>>> </SOURCE>
>>>
>>> On 16 Sep 2009, at 09:25, Andy Jenkinson wrote:
>>>
>>>> What I meant was that the root URI isn't actually used for  
>>>> anything, at best it's just the location of the description  
>>>> you're already reading. That would mean that adding another field  
>>>> to capture it wouldn't be of particular benefit.
>>>>
>>>> Whether we can easily remove the 'paradigm' of server/das/source/ 
>>>> command without confusing people is something else!
>>>>
>>>> On 15 Sep 2009, at 18:11, Jonathan Warren wrote:
>>>>
>>>>> Andy I wasn't suggesting we get rid of query_uri - quite the  
>>>>> opposite in fact. just that the single source uri would have to  
>>>>> be specified with a uri as conceptually all other commands may  
>>>>> not contain the root uri. This also seems to me means we will  
>>>>> have to update das1 code to cope with multiple query uris.
>>>>>
>>>>> On 15 Sep 2009, at 17:56, Andy Jenkinson wrote:
>>>>>
>>>>>> On 15 Sep 2009, at 16:35, Jonathan Warren wrote:
>>>>>>
>>>>>>> I agree with Andy on both these (we talked about versioning  
>>>>>>> before).
>>>>>>> The version numbers really have no meaning at the moment (no  
>>>>>>> web pages anywhere actually explain what a different version  
>>>>>>> means) and don't seem to be used at all in data sources ( I'm  
>>>>>>> guessing people end up just copying the version numbers from  
>>>>>>> examples given.
>>>>>>>
>>>>>>> I've always had an issue with the commands like this http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat 
>>>>>>>  not being a valid das command as it's the most natural  
>>>>>>> request for a person new to das to make. So giving it a  
>>>>>>> specific purpose and response is a good idea.
>>>>>>>
>>>>>>> My only concern is how to handle these if we start using the  
>>>>>>> power of multiple query_uri s per das source (inherited from  
>>>>>>> DAS2, which we have started to talk about, rather than the  
>>>>>>> das1 style where all urls have a root) as currently there is  
>>>>>>> no "root" url specified in the DAS2 spec in the sources  
>>>>>>> document...?? So this would have to be specified as another  
>>>>>>> capability? or you could infer it from the features command,  
>>>>>>> but obviously not the sources cmd!!!
>>>>>>
>>>>>> My take on this is that the root URI identifies the source. In  
>>>>>> a conceptual sense the definition of a source is merely a  
>>>>>> combination of commands acting on a common set of data. It is  
>>>>>> not really important where that information comes from (a  
>>>>>> registry, a server, a flat file...) because a server by itself  
>>>>>> does not really mean anything. So the URI http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat 
>>>>>>  is not actually meaningful, even less so given it is not even  
>>>>>> a resolvable URL.
>>>>>>
>>>>>> The query URI system inherited from DAS/2 has the potential to  
>>>>>> allow the commands to be served from different locations on the  
>>>>>> web. It is not something we have needed up to now (all query  
>>>>>> URIs start with the same path), and does add confusion but I  
>>>>>> can see it being used for stylesheets. For example a "sequence  
>>>>>> ontology stylesheet" served from a single location.
>>>>>>
>>>>>> But the biggest reason to have it is because of the registry.  
>>>>>> The registry assigns its own root URIs for a DAS source (e.g.  
>>>>>> DS_1234), which means it is necessary to provide another URI  
>>>>>> used to actually query it. Since we already have a way of doing  
>>>>>> it in the sources document, I don't really want to change it  
>>>>>> now. It seems we might as well just embrace the extra  
>>>>>> flexibility and merely describe it better.
>>>>>>
>>>>>>> On 15 Sep 2009, at 15:47, Andy Jenkinson wrote:
>>>>>>>
>>>>>>>> On 15 Sep 2009, at 15:19, Thomas Down wrote:
>>>>>>>>> Capabilities are stated in the sources document:
>>>>>>>>> <CAPABILITY type="das1:maxbins" />
>>>>>>>>>
>>>>>>>>> Ah, interesting.  I'd seen that, of course, but hadn't  
>>>>>>>>> explicitly linked this with the idea of capabilities as  
>>>>>>>>> listed in the X-DAS-Capabilities header (although of course  
>>>>>>>>> it makes a lot more sense to have one set of capability  
>>>>>>>>> metadata, rather than two!). There are a couple of issues  
>>>>>>>>> here:
>>>>>>>>>
>>>>>>>>>      1. The SOURCES examples all say "das command" in the  
>>>>>>>>> type attribute of the CAPABILITY element, whereas many of  
>>>>>>>>> the capabilities don't actually map to commands.  I notice  
>>>>>>>>> that the latest DAS1.6 draft does give an example to clarify  
>>>>>>>>> this.
>>>>>>>>>
>>>>>>>>>      2. X-DAS-Capabilities entries are versioned whereas  
>>>>>>>>> SOURCES capabilities aren't, which makes them look rather  
>>>>>>>>> different. (and I note that the 1.6 spec is bumping up the  
>>>>>>>>> version numbers on some of the existing capabilities...)
>>>>>>>>>
>>>>>>>>> How about versioning capabilities in SOURCES, e.g.:
>>>>>>>>>
>>>>>>>>>   <CAPABILITY type="features" version="1.1" query_uri="http://noranti.derkholm.net/das/mydata/features 
>>>>>>>>> " />
>>>>>>>>>   <CAPABILITY type="maxbins" version="1.0" />
>>>>>>>>>
>>>>>>>>> Assume any missing version attributes are "1.0" and  
>>>>>>>>> everything should be backwards compatible.
>>>>>>>>
>>>>>>>> Indeed I did increment the version, just because it seemed  
>>>>>>>> the right thing to do. However as far as I am aware these per- 
>>>>>>>> capability versions are totally superfluous when taken in  
>>>>>>>> context with the X-DAS-Version header, i.e. we do NOT want to  
>>>>>>>> make it possible to implement DAS 1.6 and features 1.0, for  
>>>>>>>> example. This could create a whole world of pain!
>>>>>>>>
>>>>>>>> IMO the per-capability version is unnecessary and confusing.  
>>>>>>>> ProServer does use it internally, but that can be easily  
>>>>>>>> changed. Getting rid of it would make the spec less  
>>>>>>>> confusing, but will of course break things that depend on the  
>>>>>>>> current format (if there are any).
>>>>>>>>
>>>>>>>> What do others think?
>>>>>>>>
>>>>>>>>> The only snag is that right now you have to parse all  
>>>>>>>>> sources. Technically both the registry and proserver allow  
>>>>>>>>> you do do:
>>>>>>>>> http://www.ebi.ac.uk/das-srv/genomicdas/das/sources/eqtl_rat_cis_fat
>>>>>>>>>
>>>>>>>>> But IIRC I didn't include this in the spec to keep things  
>>>>>>>>> simple.
>>>>>>>>>
>>>>>>>>> If this isn't specified yet, how about allowing:
>>>>>>>>>
>>>>>>>>>       http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat/sources
>>>>>>>>>
>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> Then it's possible to stick with the model of passing a  
>>>>>>>>> single URI around to refer to a "DAS datasource", and stick  
>>>>>>>>> a command on the end of it to get the data you're after.
>>>>>>>>
>>>>>>>> Well, the reason we didn't use this format is simply that it  
>>>>>>>> doesn't "read" well, if only because "sources" is plural.  
>>>>>>>> What would perhaps make sense, and which would allow for  
>>>>>>>> quickly 'pinging' a source for other similar uses, is to use  
>>>>>>>> this URL format:
>>>>>>>> http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat
>>>>>>>>
>>>>>>>> Again, this is what seems most 'sensible' to me but I am  
>>>>>>>> happy to go with the consensus.
>>>>>>>> _______________________________________________
>>>>>>>> DAS mailing list
>>>>>>>> DAS at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>>>>>
>>>>>>> Jonathan Warren
>>>>>>> Senior Developer and DAS coordinator
>>>>>>> jw12 at sanger.ac.uk
>>>>>>> Ext: 2314
>>>>>>> Telephone: 01223 492314
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome  
>>>>>>> ResearchLimited, a charity registered in England with number  
>>>>>>> 1021457 and acompany registered in England with number  
>>>>>>> 2742969, whose registeredoffice is 215 Euston Road, London,  
>>>>>>> NW1 2BE.
>>>>>>
>>>>>
>>>>> Jonathan Warren
>>>>> Senior Developer and DAS coordinator
>>>>> jw12 at sanger.ac.uk
>>>>> Ext: 2314
>>>>> Telephone: 01223 492314
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> The Wellcome Trust Sanger Institute is operated by Genome  
>>>>> ResearchLimited, a charity registered in England with number  
>>>>> 1021457 and acompany registered in England with number 2742969,  
>>>>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>>>
>>>
>>> Jonathan Warren
>>> Senior Developer and DAS coordinator
>>> jw12 at sanger.ac.uk
>>> Ext: 2314
>>> Telephone: 01223 492314
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- The Wellcome Trust Sanger Institute is operated by Genome  
>>> Research Limited, a charity registered in England with number  
>>> 1021457 and a company registered in England with number 2742969,  
>>> whose registered office is 215 Euston Road, London, NW1 2BE.
>>
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
>
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome  
> ResearchLimited, a charity registered in England with number 1021457  
> and acompany registered in England with number 2742969, whose  
> registeredoffice is 215 Euston Road, London, NW1 2BE.




More information about the DAS mailing list