[DAS] maxbins in DAS1.6?

Andy Jenkinson andy.jenkinson at ebi.ac.uk
Wed Sep 16 10:28:14 UTC 2009


Taking aside the issue surrounding the paradigm I mentioned and Thomas  
expanded on, why do you actually need to have a URL for the "server"  
itself? Given you already have all the metadata and command URLs you  
can't learn anything more from it.

On 16 Sep 2009, at 10:28, Jonathan Warren wrote:

> I think Thomas is right in that we can't change the das1 base url  
> principle at least for 1.6 anyway, as it is supposed to be a  
> consolidation.
>
> As there have been no objections to using for example http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat 
>  as a single source request we can put that into 1.6. The only real  
> change would need to be in the registry. See explanation below. But  
> we can get around that.
>
>> What I meant was that the root URI isn't actually used for  
>> anything, at best it's just the location of the description you're  
>> already reading.
> Except for the registry sources command where there is then no link  
> back to where the server you are talking about is (as you are not at  
> the server) apart from the query_uri's (example 1 below).
>
> das2 has "xml:base", but that is then for all sources so wouldn't  
> work for the registry see example 2 below. We could always add  
> another prop to the registry I guess ;)
>
>
> example1 registry sources:
> <SOURCES>
>   <SOURCE uri="DS_109" title="uniprot aristotle" doc_href="http://www.ebi.ac.uk/uniprot-das/ 
> " description="This datasource (aristotle) is a legacy  datasource  
> that comprises the new  'uniprot', 'ipi' and 'uniparc'  datasources  
> that are available from the  http://www.ebi.ac.uk/das-srv/uniprot/ 
> das  server.  Despite being a legacy dsn,  there are no plans to  
> remove this DAS  datasource from service.">
>     <MAINTAINER email="rantunes at ebi.ac.uk" />
>     <VERSION uri="DS_109" created="2005-03-21T16:26:03+0000">
>       <COORDINATES uri="http://www.dasregistry.org/dasregistry/coordsys/CS_DS93 
> " source="Protein Sequence" authority="UniParc"  
> test_range="UPI00000017EA">UniParc,Protein Sequence</COORDINATES>
>       <COORDINATES uri="http://www.dasregistry.org/dasregistry/coordsys/CS_DS35 
> " source="Protein Sequence" authority="IPI"  
> test_range="IPI00000021">IPI,Protein Sequence</COORDINATES>
>       <COORDINATES uri="http://www.dasregistry.org/dasregistry/coordsys/CS_DS6 
> " source="Protein Sequence" authority="UniProt"  
> test_range="P00280">UniProt,Protein Sequence</COORDINATES>
>       <CAPABILITY type="das1:stylesheet" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/stylesheet 
> " />
>       <CAPABILITY type="das1:features" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/features 
> " />
>       <CAPABILITY type="das1:types" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/types 
> " />
>       <CAPABILITY type="das1:sequence" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/sequence 
> " />
>       <CAPABILITY type="das1:entry_points" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/entry_points 
> " />
>       <CAPABILITY type="das1:unknown_segment" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/unknown_segment 
> " />
>       <CAPABILITY type="das1:error_segment" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/error_segment 
> " />
>       <PROP name="label" value="Predicted" />
>       <PROP name="label" value="Manually curated" />
>       <PROP name="label" value="ENSEMBL" />
>       <PROP name="leaseTime" value="2009-09-15T11:00:15+0000" />
>       <PROP name="projectHome" value="http://www.biosapiens.info" />
>       <PROP name="projectIcon" value="http://www.dasregistry.org/ProjectIcon?id=74 
> " />
>       <PROP name="projectDesc" value="BioSapiens is a Network of  
> Excellence, funded by the European Union's 6th Framework Programme,  
> and made up of bioinformatics researchers from 25 institutions based  
> in 14 countries throughout Europe.
>
> The objective of the BioSapiens is to provide a large" />
>       <PROP name="projectName" value="BioSapiens" />
>       <PROP name="valid" value="stylesheet" />
>       <PROP name="valid" value="features" />
>       <PROP name="valid" value="types" />
>       <PROP name="valid" value="sequence" />
>       <PROP name="valid" value="entry_points" />
>       <PROP name="valid" value="error_segment" />
>     </VERSION>
>   </SOURCE>
>
>
>
>
>
> das2 has xml:base, but that is then for all sources so wouldn't work  
> for the registry:
>
> xml:base="http://bioserver.hci.utah.edu:8080/DAS2/das2/" >
>   <MAINTAINER email="david.nix at hci.utah.edu" />
>   <SOURCE uri="H_sapiens" title="H_sapiens" >
>       <VERSION uri="H_sapiens_Mar_2006" title="H_sapiens_Mar_2006"  
> created="2008-01-03 14:39:44" >
>            <COORDINATES uri="http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/ 
> " authority="NCBI" taxid="9606" version="36" source="Chromosome" />
>            <CAPABILITY type="segments" query_uri="H_sapiens_Mar_2006/ 
> segments" />
>            <CAPABILITY type="types" query_uri="H_sapiens_Mar_2006/ 
> types" />
>            <CAPABILITY type="features" query_uri="H_sapiens_Mar_2006/ 
> features" />
>       </VERSION>
>   </SOURCE>
>
> On 16 Sep 2009, at 09:25, Andy Jenkinson wrote:
>
>> What I meant was that the root URI isn't actually used for  
>> anything, at best it's just the location of the description you're  
>> already reading. That would mean that adding another field to  
>> capture it wouldn't be of particular benefit.
>>
>> Whether we can easily remove the 'paradigm' of server/das/source/ 
>> command without confusing people is something else!
>>
>> On 15 Sep 2009, at 18:11, Jonathan Warren wrote:
>>
>>> Andy I wasn't suggesting we get rid of query_uri - quite the  
>>> opposite in fact. just that the single source uri would have to be  
>>> specified with a uri as conceptually all other commands may not  
>>> contain the root uri. This also seems to me means we will have to  
>>> update das1 code to cope with multiple query uris.
>>>
>>> On 15 Sep 2009, at 17:56, Andy Jenkinson wrote:
>>>
>>>> On 15 Sep 2009, at 16:35, Jonathan Warren wrote:
>>>>
>>>>> I agree with Andy on both these (we talked about versioning  
>>>>> before).
>>>>> The version numbers really have no meaning at the moment (no web  
>>>>> pages anywhere actually explain what a different version means)  
>>>>> and don't seem to be used at all in data sources ( I'm guessing  
>>>>> people end up just copying the version numbers from examples  
>>>>> given.
>>>>>
>>>>> I've always had an issue with the commands like this http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat 
>>>>>  not being a valid das command as it's the most natural request  
>>>>> for a person new to das to make. So giving it a specific purpose  
>>>>> and response is a good idea.
>>>>>
>>>>> My only concern is how to handle these if we start using the  
>>>>> power of multiple query_uri s per das source (inherited from  
>>>>> DAS2, which we have started to talk about, rather than the das1  
>>>>> style where all urls have a root) as currently there is no  
>>>>> "root" url specified in the DAS2 spec in the sources  
>>>>> document...?? So this would have to be specified as another  
>>>>> capability? or you could infer it from the features command, but  
>>>>> obviously not the sources cmd!!!
>>>>
>>>> My take on this is that the root URI identifies the source. In a  
>>>> conceptual sense the definition of a source is merely a  
>>>> combination of commands acting on a common set of data. It is not  
>>>> really important where that information comes from (a registry, a  
>>>> server, a flat file...) because a server by itself does not  
>>>> really mean anything. So the URI http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat 
>>>>  is not actually meaningful, even less so given it is not even a  
>>>> resolvable URL.
>>>>
>>>> The query URI system inherited from DAS/2 has the potential to  
>>>> allow the commands to be served from different locations on the  
>>>> web. It is not something we have needed up to now (all query URIs  
>>>> start with the same path), and does add confusion but I can see  
>>>> it being used for stylesheets. For example a "sequence ontology  
>>>> stylesheet" served from a single location.
>>>>
>>>> But the biggest reason to have it is because of the registry. The  
>>>> registry assigns its own root URIs for a DAS source (e.g.  
>>>> DS_1234), which means it is necessary to provide another URI used  
>>>> to actually query it. Since we already have a way of doing it in  
>>>> the sources document, I don't really want to change it now. It  
>>>> seems we might as well just embrace the extra flexibility and  
>>>> merely describe it better.
>>>>
>>>>> On 15 Sep 2009, at 15:47, Andy Jenkinson wrote:
>>>>>
>>>>>> On 15 Sep 2009, at 15:19, Thomas Down wrote:
>>>>>>> Capabilities are stated in the sources document:
>>>>>>> <CAPABILITY type="das1:maxbins" />
>>>>>>>
>>>>>>> Ah, interesting.  I'd seen that, of course, but hadn't  
>>>>>>> explicitly linked this with the idea of capabilities as listed  
>>>>>>> in the X-DAS-Capabilities header (although of course it makes  
>>>>>>> a lot more sense to have one set of capability metadata,  
>>>>>>> rather than two!). There are a couple of issues here:
>>>>>>>
>>>>>>>        1. The SOURCES examples all say "das command" in the  
>>>>>>> type attribute of the CAPABILITY element, whereas many of the  
>>>>>>> capabilities don't actually map to commands.  I notice that  
>>>>>>> the latest DAS1.6 draft does give an example to clarify this.
>>>>>>>
>>>>>>>        2. X-DAS-Capabilities entries are versioned whereas  
>>>>>>> SOURCES capabilities aren't, which makes them look rather  
>>>>>>> different. (and I note that the 1.6 spec is bumping up the  
>>>>>>> version numbers on some of the existing capabilities...)
>>>>>>>
>>>>>>> How about versioning capabilities in SOURCES, e.g.:
>>>>>>>
>>>>>>>     <CAPABILITY type="features" version="1.1" query_uri="http://noranti.derkholm.net/das/mydata/features 
>>>>>>> " />
>>>>>>>     <CAPABILITY type="maxbins" version="1.0" />
>>>>>>>
>>>>>>> Assume any missing version attributes are "1.0" and everything  
>>>>>>> should be backwards compatible.
>>>>>>
>>>>>> Indeed I did increment the version, just because it seemed the  
>>>>>> right thing to do. However as far as I am aware these per- 
>>>>>> capability versions are totally superfluous when taken in  
>>>>>> context with the X-DAS-Version header, i.e. we do NOT want to  
>>>>>> make it possible to implement DAS 1.6 and features 1.0, for  
>>>>>> example. This could create a whole world of pain!
>>>>>>
>>>>>> IMO the per-capability version is unnecessary and confusing.  
>>>>>> ProServer does use it internally, but that can be easily  
>>>>>> changed. Getting rid of it would make the spec less confusing,  
>>>>>> but will of course break things that depend on the current  
>>>>>> format (if there are any).
>>>>>>
>>>>>> What do others think?
>>>>>>
>>>>>>> The only snag is that right now you have to parse all sources.  
>>>>>>> Technically both the registry and proserver allow you do do:
>>>>>>> http://www.ebi.ac.uk/das-srv/genomicdas/das/sources/eqtl_rat_cis_fat
>>>>>>>
>>>>>>> But IIRC I didn't include this in the spec to keep things  
>>>>>>> simple.
>>>>>>>
>>>>>>> If this isn't specified yet, how about allowing:
>>>>>>>
>>>>>>>         http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat/sources
>>>>>>>
>>>>>>> ?
>>>>>>>
>>>>>>> Then it's possible to stick with the model of passing a single  
>>>>>>> URI around to refer to a "DAS datasource", and stick a command  
>>>>>>> on the end of it to get the data you're after.
>>>>>>
>>>>>> Well, the reason we didn't use this format is simply that it  
>>>>>> doesn't "read" well, if only because "sources" is plural. What  
>>>>>> would perhaps make sense, and which would allow for quickly  
>>>>>> 'pinging' a source for other similar uses, is to use this URL  
>>>>>> format:
>>>>>> http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat
>>>>>>
>>>>>> Again, this is what seems most 'sensible' to me but I am happy  
>>>>>> to go with the consensus.
>>>>>> _______________________________________________
>>>>>> DAS mailing list
>>>>>> DAS at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>>>
>>>>> Jonathan Warren
>>>>> Senior Developer and DAS coordinator
>>>>> jw12 at sanger.ac.uk
>>>>> Ext: 2314
>>>>> Telephone: 01223 492314
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> The Wellcome Trust Sanger Institute is operated by Genome  
>>>>> ResearchLimited, a charity registered in England with number  
>>>>> 1021457 and acompany registered in England with number 2742969,  
>>>>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>>>
>>>
>>> Jonathan Warren
>>> Senior Developer and DAS coordinator
>>> jw12 at sanger.ac.uk
>>> Ext: 2314
>>> Telephone: 01223 492314
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> The Wellcome Trust Sanger Institute is operated by Genome  
>>> ResearchLimited, a charity registered in England with number  
>>> 1021457 and acompany registered in England with number 2742969,  
>>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
> -- The Wellcome Trust Sanger Institute is operated by Genome  
> Research Limited, a charity registered in England with number  
> 1021457 and a company registered in England with number 2742969,  
> whose registered office is 215 Euston Road, London, NW1 2BE. 




More information about the DAS mailing list