[DAS2] Re: what info is needed for DAS/2 registration?

Andreas Prlic ap3 at sanger.ac.uk
Mon Nov 14 13:29:26 UTC 2005


Hi Andrew!

>   Looks like I will be more involved with the DAS/2 spec development,
> and I'll be visiting the UK more often.

good!

>   I want to make sure that the spec includes more of what's
> needed for registration.


o.k. very good, let's go through your mail:

>  My thought is to let the registration
> system be able to query the DAS/2 server to get most of the fields
> it needs, if not all.

o.k.

>   There may still be some need to override the
> definitions,

The experience from doing the das1 registry tells
that some corrections are needed every now and then.  It seems to be  
inevitable
  that sometimes users make mistakes / inaccuracies, etc.

> so at the manual registration level this will be used
> more to pre-populate an entry with a default.

sounds good. - so this means the configuration for setting up a DAS  
source will
get a little bigger.

> In looking at the manual registration page I see the following,
> along with comparisons to the existing DAS/2 spec
>
>  ** Title/Nickname

used by DAS clients for the display of the das tracks

>  ** Description

for the user to get a quick grasp what the data is about. - we have 60  
sources in the registry
  by now and we expect to be up around 100 soon, so one needs a way to  
learn which of the
sources are serving the data which is of particular interest ...

>  ** URL for more detailed description

a link back to the homepage  of the project that provides the data

>
> DAS/2 does not have this information for the service as a whole.
> It does have it for each of the databases, somewhat.  Here is
> an example from the spec.
>
>   <SOURCE id="volvox" description="Volvox Example Database"
>           taxon="http://www.ncbi.nlm.nih.gov/taxon-browser?id=29118"
>      
> doc_href="http://www.wormbase.org/documentation/users_guide/ 
> volvox.html" >
>
>
> Should we add a "title" field to each data source?

yes that would be good

> Should we
> add title/description/url fields to the DAS/2 service as a whole?

not sure what you mean by that

>   ** coordinate system
>
> Each data source may have 1 or more versions.  The version information
> looks like
>      <VERSION id="volvox/1" description="Build 1, October 2002">
>        <ASSEMBLY id="http://www.ensembl.org/das/genome/vv116" />
>      </VERSION>>
>
> In theory that assembly id could be a URL with more detailed
> information about the assembly.  Right now it's used as a unique
> identifier.  There is nothing there to convert these URLs into
> something human-readable.

Hm. not sure if I am completely convinced with representing a  
coordinate system as a url.
What  if two reference servers provide the same assembly or are mirrors  
of each other?

I would see it  in a way where a DAS client would asks the registry  
"where are all the reference servers
for  NCBI 35- homo sapiens?"
  and then gets a list providing e.g. an american and a european mirror  
server
the client could choose the one which is geographically closer.


>
> Possible solutions for this are:
>   - define an "assembly" document, to be put at that URL and
>      include the authority/version/type/organism data mentioned at
>      http://das.sanger.ac.uk/registry/help_coordsys.jsp

something like that.


>  ** DAS url
>
> Yep, DAS/2 has that one.  :)

:-)

>
>   ** Admin email
>
> Hmm.  Yeah, there should be more information about the service as
> a whole.  Admin email and perhaps a documentation href, eg, with
> information about planned downtime.

would be good.

>
>   ** DAS capabilities
>
> That's handled differently in DAS/2.  Did people really use this
> information?

actually this information  is important (for das1) - it is used to  
distinguish reference servers
and annotation servers ( on the client side)
and needed for validation (on the registry side)
"capabilities" are also related to data-types. E.g. a genome DAS client  
does not need
to query a protein structure, because it can not do 3D...

>   ** Test access/ segment code labels

I think there is a misunderstanding here:
the test code is not a  "label"
The test code is e.g. a chromosomal segment or an accession code for a  
protein database
for which annotations are returned if a feature request is being made.

The "label" is used mainly to describe by which project a source is  
being funded.

>> We are currently discussing if the labels should be used to describe
>> a DAS source in more detail. e.g. "experimentally verified",
>> "computational prediction", etc.
>
> These are two different things in one field.

yes you are very right. Together with the BioSapiens DAS people we  
recently decided that there
should be the possibility to assign gene-ontology evidence codes to  
each das source, so in the next
update of the registry, this will be changed.


>
> What I'm going to propose is a generic key/value data structure
> for just about all records.  Some of the key names will be well
> defined.  Others can add new fields to experiment with / extend
> the spec in a semi-constrained fashion.  This would let people
> try out a new property easily.

sounds good.

> In summary it sound like DAS/2 needs:
>   - a few more pieces of meta data (eg, information about the
>       service as a whole)
>   - a bit better defined way to get information about the
>       reference assembly
>

I would agree to both that

Greetings,
Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891




More information about the DAS2 mailing list