[DAS] maxbins in DAS1.6?
Andy Jenkinson
andy.jenkinson at ebi.ac.uk
Wed Sep 16 10:28:14 UTC 2009
Taking aside the issue surrounding the paradigm I mentioned and Thomas
expanded on, why do you actually need to have a URL for the "server"
itself? Given you already have all the metadata and command URLs you
can't learn anything more from it.
On 16 Sep 2009, at 10:28, Jonathan Warren wrote:
> I think Thomas is right in that we can't change the das1 base url
> principle at least for 1.6 anyway, as it is supposed to be a
> consolidation.
>
> As there have been no objections to using for example http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat
> as a single source request we can put that into 1.6. The only real
> change would need to be in the registry. See explanation below. But
> we can get around that.
>
>> What I meant was that the root URI isn't actually used for
>> anything, at best it's just the location of the description you're
>> already reading.
> Except for the registry sources command where there is then no link
> back to where the server you are talking about is (as you are not at
> the server) apart from the query_uri's (example 1 below).
>
> das2 has "xml:base", but that is then for all sources so wouldn't
> work for the registry see example 2 below. We could always add
> another prop to the registry I guess ;)
>
>
> example1 registry sources:
> <SOURCES>
> <SOURCE uri="DS_109" title="uniprot aristotle" doc_href="http://www.ebi.ac.uk/uniprot-das/
> " description="This datasource (aristotle) is a legacy datasource
> that comprises the new 'uniprot', 'ipi' and 'uniparc' datasources
> that are available from the http://www.ebi.ac.uk/das-srv/uniprot/
> das server. Despite being a legacy dsn, there are no plans to
> remove this DAS datasource from service.">
> <MAINTAINER email="rantunes at ebi.ac.uk" />
> <VERSION uri="DS_109" created="2005-03-21T16:26:03+0000">
> <COORDINATES uri="http://www.dasregistry.org/dasregistry/coordsys/CS_DS93
> " source="Protein Sequence" authority="UniParc"
> test_range="UPI00000017EA">UniParc,Protein Sequence</COORDINATES>
> <COORDINATES uri="http://www.dasregistry.org/dasregistry/coordsys/CS_DS35
> " source="Protein Sequence" authority="IPI"
> test_range="IPI00000021">IPI,Protein Sequence</COORDINATES>
> <COORDINATES uri="http://www.dasregistry.org/dasregistry/coordsys/CS_DS6
> " source="Protein Sequence" authority="UniProt"
> test_range="P00280">UniProt,Protein Sequence</COORDINATES>
> <CAPABILITY type="das1:stylesheet" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/stylesheet
> " />
> <CAPABILITY type="das1:features" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/features
> " />
> <CAPABILITY type="das1:types" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/types
> " />
> <CAPABILITY type="das1:sequence" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/sequence
> " />
> <CAPABILITY type="das1:entry_points" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/entry_points
> " />
> <CAPABILITY type="das1:unknown_segment" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/unknown_segment
> " />
> <CAPABILITY type="das1:error_segment" query_uri="http://www.ebi.ac.uk/das-srv/uniprot/das/aristotle/error_segment
> " />
> <PROP name="label" value="Predicted" />
> <PROP name="label" value="Manually curated" />
> <PROP name="label" value="ENSEMBL" />
> <PROP name="leaseTime" value="2009-09-15T11:00:15+0000" />
> <PROP name="projectHome" value="http://www.biosapiens.info" />
> <PROP name="projectIcon" value="http://www.dasregistry.org/ProjectIcon?id=74
> " />
> <PROP name="projectDesc" value="BioSapiens is a Network of
> Excellence, funded by the European Union's 6th Framework Programme,
> and made up of bioinformatics researchers from 25 institutions based
> in 14 countries throughout Europe.
>
> The objective of the BioSapiens is to provide a large" />
> <PROP name="projectName" value="BioSapiens" />
> <PROP name="valid" value="stylesheet" />
> <PROP name="valid" value="features" />
> <PROP name="valid" value="types" />
> <PROP name="valid" value="sequence" />
> <PROP name="valid" value="entry_points" />
> <PROP name="valid" value="error_segment" />
> </VERSION>
> </SOURCE>
>
>
>
>
>
> das2 has xml:base, but that is then for all sources so wouldn't work
> for the registry:
>
> xml:base="http://bioserver.hci.utah.edu:8080/DAS2/das2/" >
> <MAINTAINER email="david.nix at hci.utah.edu" />
> <SOURCE uri="H_sapiens" title="H_sapiens" >
> <VERSION uri="H_sapiens_Mar_2006" title="H_sapiens_Mar_2006"
> created="2008-01-03 14:39:44" >
> <COORDINATES uri="http://www.ncbi.nlm.nih.gov/genome/H_sapiens/B36.1/
> " authority="NCBI" taxid="9606" version="36" source="Chromosome" />
> <CAPABILITY type="segments" query_uri="H_sapiens_Mar_2006/
> segments" />
> <CAPABILITY type="types" query_uri="H_sapiens_Mar_2006/
> types" />
> <CAPABILITY type="features" query_uri="H_sapiens_Mar_2006/
> features" />
> </VERSION>
> </SOURCE>
>
> On 16 Sep 2009, at 09:25, Andy Jenkinson wrote:
>
>> What I meant was that the root URI isn't actually used for
>> anything, at best it's just the location of the description you're
>> already reading. That would mean that adding another field to
>> capture it wouldn't be of particular benefit.
>>
>> Whether we can easily remove the 'paradigm' of server/das/source/
>> command without confusing people is something else!
>>
>> On 15 Sep 2009, at 18:11, Jonathan Warren wrote:
>>
>>> Andy I wasn't suggesting we get rid of query_uri - quite the
>>> opposite in fact. just that the single source uri would have to be
>>> specified with a uri as conceptually all other commands may not
>>> contain the root uri. This also seems to me means we will have to
>>> update das1 code to cope with multiple query uris.
>>>
>>> On 15 Sep 2009, at 17:56, Andy Jenkinson wrote:
>>>
>>>> On 15 Sep 2009, at 16:35, Jonathan Warren wrote:
>>>>
>>>>> I agree with Andy on both these (we talked about versioning
>>>>> before).
>>>>> The version numbers really have no meaning at the moment (no web
>>>>> pages anywhere actually explain what a different version means)
>>>>> and don't seem to be used at all in data sources ( I'm guessing
>>>>> people end up just copying the version numbers from examples
>>>>> given.
>>>>>
>>>>> I've always had an issue with the commands like this http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat
>>>>> not being a valid das command as it's the most natural request
>>>>> for a person new to das to make. So giving it a specific purpose
>>>>> and response is a good idea.
>>>>>
>>>>> My only concern is how to handle these if we start using the
>>>>> power of multiple query_uri s per das source (inherited from
>>>>> DAS2, which we have started to talk about, rather than the das1
>>>>> style where all urls have a root) as currently there is no
>>>>> "root" url specified in the DAS2 spec in the sources
>>>>> document...?? So this would have to be specified as another
>>>>> capability? or you could infer it from the features command, but
>>>>> obviously not the sources cmd!!!
>>>>
>>>> My take on this is that the root URI identifies the source. In a
>>>> conceptual sense the definition of a source is merely a
>>>> combination of commands acting on a common set of data. It is not
>>>> really important where that information comes from (a registry, a
>>>> server, a flat file...) because a server by itself does not
>>>> really mean anything. So the URI http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat
>>>> is not actually meaningful, even less so given it is not even a
>>>> resolvable URL.
>>>>
>>>> The query URI system inherited from DAS/2 has the potential to
>>>> allow the commands to be served from different locations on the
>>>> web. It is not something we have needed up to now (all query URIs
>>>> start with the same path), and does add confusion but I can see
>>>> it being used for stylesheets. For example a "sequence ontology
>>>> stylesheet" served from a single location.
>>>>
>>>> But the biggest reason to have it is because of the registry. The
>>>> registry assigns its own root URIs for a DAS source (e.g.
>>>> DS_1234), which means it is necessary to provide another URI used
>>>> to actually query it. Since we already have a way of doing it in
>>>> the sources document, I don't really want to change it now. It
>>>> seems we might as well just embrace the extra flexibility and
>>>> merely describe it better.
>>>>
>>>>> On 15 Sep 2009, at 15:47, Andy Jenkinson wrote:
>>>>>
>>>>>> On 15 Sep 2009, at 15:19, Thomas Down wrote:
>>>>>>> Capabilities are stated in the sources document:
>>>>>>> <CAPABILITY type="das1:maxbins" />
>>>>>>>
>>>>>>> Ah, interesting. I'd seen that, of course, but hadn't
>>>>>>> explicitly linked this with the idea of capabilities as listed
>>>>>>> in the X-DAS-Capabilities header (although of course it makes
>>>>>>> a lot more sense to have one set of capability metadata,
>>>>>>> rather than two!). There are a couple of issues here:
>>>>>>>
>>>>>>> 1. The SOURCES examples all say "das command" in the
>>>>>>> type attribute of the CAPABILITY element, whereas many of the
>>>>>>> capabilities don't actually map to commands. I notice that
>>>>>>> the latest DAS1.6 draft does give an example to clarify this.
>>>>>>>
>>>>>>> 2. X-DAS-Capabilities entries are versioned whereas
>>>>>>> SOURCES capabilities aren't, which makes them look rather
>>>>>>> different. (and I note that the 1.6 spec is bumping up the
>>>>>>> version numbers on some of the existing capabilities...)
>>>>>>>
>>>>>>> How about versioning capabilities in SOURCES, e.g.:
>>>>>>>
>>>>>>> <CAPABILITY type="features" version="1.1" query_uri="http://noranti.derkholm.net/das/mydata/features
>>>>>>> " />
>>>>>>> <CAPABILITY type="maxbins" version="1.0" />
>>>>>>>
>>>>>>> Assume any missing version attributes are "1.0" and everything
>>>>>>> should be backwards compatible.
>>>>>>
>>>>>> Indeed I did increment the version, just because it seemed the
>>>>>> right thing to do. However as far as I am aware these per-
>>>>>> capability versions are totally superfluous when taken in
>>>>>> context with the X-DAS-Version header, i.e. we do NOT want to
>>>>>> make it possible to implement DAS 1.6 and features 1.0, for
>>>>>> example. This could create a whole world of pain!
>>>>>>
>>>>>> IMO the per-capability version is unnecessary and confusing.
>>>>>> ProServer does use it internally, but that can be easily
>>>>>> changed. Getting rid of it would make the spec less confusing,
>>>>>> but will of course break things that depend on the current
>>>>>> format (if there are any).
>>>>>>
>>>>>> What do others think?
>>>>>>
>>>>>>> The only snag is that right now you have to parse all sources.
>>>>>>> Technically both the registry and proserver allow you do do:
>>>>>>> http://www.ebi.ac.uk/das-srv/genomicdas/das/sources/eqtl_rat_cis_fat
>>>>>>>
>>>>>>> But IIRC I didn't include this in the spec to keep things
>>>>>>> simple.
>>>>>>>
>>>>>>> If this isn't specified yet, how about allowing:
>>>>>>>
>>>>>>> http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat/sources
>>>>>>>
>>>>>>> ?
>>>>>>>
>>>>>>> Then it's possible to stick with the model of passing a single
>>>>>>> URI around to refer to a "DAS datasource", and stick a command
>>>>>>> on the end of it to get the data you're after.
>>>>>>
>>>>>> Well, the reason we didn't use this format is simply that it
>>>>>> doesn't "read" well, if only because "sources" is plural. What
>>>>>> would perhaps make sense, and which would allow for quickly
>>>>>> 'pinging' a source for other similar uses, is to use this URL
>>>>>> format:
>>>>>> http://www.ebi.ac.uk/das-srv/genomicdas/das/eqtl_rat_cis_fat
>>>>>>
>>>>>> Again, this is what seems most 'sensible' to me but I am happy
>>>>>> to go with the consensus.
>>>>>> _______________________________________________
>>>>>> DAS mailing list
>>>>>> DAS at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/das
>>>>>
>>>>> Jonathan Warren
>>>>> Senior Developer and DAS coordinator
>>>>> jw12 at sanger.ac.uk
>>>>> Ext: 2314
>>>>> Telephone: 01223 492314
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> The Wellcome Trust Sanger Institute is operated by Genome
>>>>> ResearchLimited, a charity registered in England with number
>>>>> 1021457 and acompany registered in England with number 2742969,
>>>>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>>>
>>>
>>> Jonathan Warren
>>> Senior Developer and DAS coordinator
>>> jw12 at sanger.ac.uk
>>> Ext: 2314
>>> Telephone: 01223 492314
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> The Wellcome Trust Sanger Institute is operated by Genome
>>> ResearchLimited, a charity registered in England with number
>>> 1021457 and acompany registered in England with number 2742969,
>>> whose registeredoffice is 215 Euston Road, London, NW1 2BE.
>>
>
> Jonathan Warren
> Senior Developer and DAS coordinator
> jw12 at sanger.ac.uk
> Ext: 2314
> Telephone: 01223 492314
>
>
>
>
>
>
> -- The Wellcome Trust Sanger Institute is operated by Genome
> Research Limited, a charity registered in England with number
> 1021457 and a company registered in England with number 2742969,
> whose registered office is 215 Euston Road, London, NW1 2BE.
More information about the DAS
mailing list