[BioPerl] Re: [Bioperl-l] What's that number?

Fri Oct 24 10:26:26 EDT 2003

There are at least two "initiatives" looking at exactly this problem,
but they wont solve the precise problem that you raise (i.e. determining
the namespace of an in-hand ID number post facto):

1)  The LSID proposal from the I3C makes ID numbers "resolvable" so that
you can always determine the namespace of an ID by querying a resolver
service with your in-hand data.  In this system the ID number must take
the form of a LSID-type URI, and someone (the authority, or someone
else) must have set up a resolver service for URI's of that type.

2)  The BioMOBY project, in which ID numbers are always passed together
with a second identifier indicating the namespace under which that ID is
to be interpreted.  Although, unlike LSIDs, we in the BioMOBY project
use the existing ID numbers _verbatim_, this still wont help you in this
case since the person who gave you that ID is obligated by the BioMOBY
API to give you the namespace identifier at the same time.  We do have a
public registry of namespaces, however, largely based on the GO Xref
Abbreviations List.

in any case, once you get an ID number without namespace information you
are throwing your life on the mercy of regular expression searches of
loosely defined conventions...  bad luck for you!

All the more reason we should start using these two technologies as
quickly as possible!! (though call me biased ;-) )

Mark

On Thu, 2003-10-23 at 19:41, Peter Wilkinson wrote:
> Well it would be nice if there existed some type of global namespace 
> management for that sort of thing, however there is none.
> 
> I am afraid that we will have to put up with the mountainous clutter of 
> sequences and annotations.
> 
> Now if someone wants to start some type of public registry for defining 
> theses namespaces .... which would not be such a bad idea if one could 
> prove how it would help the community. GO is an attempt for function, and 
> HUGO for gene names ... no reason not to do it for namespaces 
> (databasenamespaces ;) ).
> 
> Peter
> 
> p.s. perhaps a big pair of billy boots is good enough to wade through it all.
> 
> 
> 
> 
> 
> At 09:01 PM 23/10/2003 +0100, michael wrote:
> 
> 
> >         Is there any code out there that can make an educated guess to the
> >origin of any given (biological) ID/accession number?
> >
> >         X93993 AJ010957 are both valid EMBL accession numbers, P47202
> >looks similar but is from swissprot (a bit of digging reveals the
> >swissprot patten as /[OPQ]\d[A-Z\d][A-Z\d][A-Z\d]\d/ ).  Failing code is
> >there a biological name space document out there?
> >
> >         I'm writing a public gene symbol submission form and want it to do
> >the right thing regardless of where the data is pasted in.
> >
> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >Michael John Lush PhD                   Tel:44-20-7679-5027
> >Nomenclature Bioinformatics Support     Fax:44-20-7387-3496
> >HUGO Gene Nomenclature Committee Email:  nome at galton.ucl.ac.uk
> >The Galton Laboratory
> >University College London, UK
> >URL: http://www.gene.ucl.ac.uk/nomenclature/
> >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at portal.open-bio.org
> >http://portal.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> -------------------------------------
> Peter Wilkinson
> Bioinformatics Consultant
> 
> -------------------------------------  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
Mark Wilkinson <markw at illuminae.com>
Illuminae