[Open-bio-l] LSIDs

Lincoln Stein lstein at cshl.org
Thu Apr 3 13:10:34 EST 2003


The format/alphabet nomenclature was strongly pressed by Thomas!

Lincoln

On Friday 28 March 2003 05:47 am, Matthew Pocock wrote:
> Hi,
>
> I've got mailing list fatigue. Which one should I be
> posting to about LSIDs for file formats?
>
> Anyway, I was about to add a whole load of common
> formats to biojava and hit a snag. For your
> convenience, I've pasted in part of the spec below.
>
> I know I'm risking being a radical pedant by even
> bringing this up, but presumably these ids are meant
> to be used by more than one individual.
>
> The spec says that we should use things like:
>
> URN:LSID:open-bio.org:<format>/<alphabet>
>
> This is bad for several reasons. The first one is that
> file formats and sequence databases can become
> trivialy confused. Does URN:LSID:open-bio.org:embl
> refer to the embl database, or to the embl format with
> default alphabet?
>
> Seccondly, what do we do with non-sequence formats?
> For example, Unigene and Enzyme don't fit into this
> world very well.
>
> Thirdly, (and bless them for doing this) there are
> some ambiguities about format names unless scoped
> propperly. An example is the Enzyme db's enzyme.dat
> file which is similar to embl in structure, and the
> ligand enzyme file which is shaped like genebank. They
> both tell you things about ec numbers, but are
> defintiely not the same format.
>
> I propose that we carve up the fourth field more
> sanely. We can firstly prefix the format name with the
> constant string "format/", leaving room in the future
> for namespaces like "database" or "application".
> Secondly, the format name should (optionaly) be
> compound. Thirdly, variables (like alphabet) should be
> encoded using an agreed upon URL query scheme.
>
> URN:LSID:open-bio.org:format/enzyme
>
> URN:LSID:open-bio.org:format/ligand/enzyme
> URN:LSID:open-bio.org:format/ligand/compound
> URN:LSID:open-bio.org:format/ligand/ligand
>
> URN:LSID:open-bio.org:format/embl?alphabet=DNA
> URN:LSID:open-bio.org:format/genbank
> URN:LSID:open-bio.org:format/genbank?alphabet=PROTEIN
>
> This leaves us room for URNs for other things that
> perhaps people haven't named yet:
>
> URN:LSID:open-bio.org:database/embl
> URN:LSID:open-bio.org:database/swissprot
> URN:LSID:open-bio.org:database/ligand
>
> URN:LSID:open-bio.org:application/blast/n:2.2.5
>
> Can I ammend the OBDA documentation to match this, or
> is this the wrong way to go?
>
> Matthew
>
> (from
> http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/registry/lsi
>d_for_dbformats.txt?rev=1.1&cvsroot=obf-common&content-type=text/vnd.viewcvs
>-markup)
>
> All flat file formats are identified using this
> format:
>
> URN:LSID:open-bio.org:<format>/<alphabet>
>
> where <format> is one of:
>       embl
>       genbank
>       fasta
>       swiss
>       pdb
>
> and <alphabet> is one of:
>     dna
>     rna
>     protein
>
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at open-bio.org
> http://open-bio.org/mailman/listinfo/open-bio-l

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein at cshl.org			                  Cold Spring Harbor, NY
========================================================================




More information about the Open-Bio-l mailing list