[Open-bio-l] LSIDs

Thomas Down td2 at sanger.ac.uk
Fri Apr 4 09:40:20 EST 2003


Well, I definitely wanted something which specified format
and alphabet information.  I don't think I said very much
about the specifics -- Matthew's proposals look nicely general
to me.

     Thomas.

Examples he's shown me:

   urn:lsid:open-bio.org:format:swissprot
   urn:lsid:open-bio.org:format:ligand/enzyme
   urn:lsid:open-bio.org:format:enzyme
   urn:lsid:open-bio.org:format:fasta?alphabet=DNA
   urn:lsid:open-bio.org:format:fasta?alphabet=PEPTIDE

The latter cases are the ones where the alphabet information
is clearly ambiguous, and therefore needs to be specified
explicitly.

For cases like Swissprot, if you know how to parse it you
should already know the alphabet...

     Thomas.


On Thu, Apr 03, 2003 at 01:10:34PM -0500, Lincoln Stein wrote:
> The format/alphabet nomenclature was strongly pressed by Thomas!
> 
> Lincoln
> 
> On Friday 28 March 2003 05:47 am, Matthew Pocock wrote:
> > Hi,
> >
> > I've got mailing list fatigue. Which one should I be
> > posting to about LSIDs for file formats?
> >
> > Anyway, I was about to add a whole load of common
> > formats to biojava and hit a snag. For your
> > convenience, I've pasted in part of the spec below.
> >
> > I know I'm risking being a radical pedant by even
> > bringing this up, but presumably these ids are meant
> > to be used by more than one individual.
> >
> > The spec says that we should use things like:
> >
> > URN:LSID:open-bio.org:<format>/<alphabet>
> >
> > This is bad for several reasons. The first one is that
> > file formats and sequence databases can become
> > trivialy confused. Does URN:LSID:open-bio.org:embl
> > refer to the embl database, or to the embl format with
> > default alphabet?
> >
> > Seccondly, what do we do with non-sequence formats?
> > For example, Unigene and Enzyme don't fit into this
> > world very well.
> >
> > Thirdly, (and bless them for doing this) there are
> > some ambiguities about format names unless scoped
> > propperly. An example is the Enzyme db's enzyme.dat
> > file which is similar to embl in structure, and the
> > ligand enzyme file which is shaped like genebank. They
> > both tell you things about ec numbers, but are
> > defintiely not the same format.
> >
> > I propose that we carve up the fourth field more
> > sanely. We can firstly prefix the format name with the
> > constant string "format/", leaving room in the future
> > for namespaces like "database" or "application".
> > Secondly, the format name should (optionaly) be
> > compound. Thirdly, variables (like alphabet) should be
> > encoded using an agreed upon URL query scheme.
> >
> > URN:LSID:open-bio.org:format/enzyme
> >
> > URN:LSID:open-bio.org:format/ligand/enzyme
> > URN:LSID:open-bio.org:format/ligand/compound
> > URN:LSID:open-bio.org:format/ligand/ligand
> >
> > URN:LSID:open-bio.org:format/embl?alphabet=DNA
> > URN:LSID:open-bio.org:format/genbank
> > URN:LSID:open-bio.org:format/genbank?alphabet=PROTEIN
> >
> > This leaves us room for URNs for other things that
> > perhaps people haven't named yet:
> >
> > URN:LSID:open-bio.org:database/embl
> > URN:LSID:open-bio.org:database/swissprot
> > URN:LSID:open-bio.org:database/ligand
> >
> > URN:LSID:open-bio.org:application/blast/n:2.2.5
> >
> > Can I ammend the OBDA documentation to match this, or
> > is this the wrong way to go?
> >
> > Matthew
> >
> > (from
> > http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/registry/lsi
> >d_for_dbformats.txt?rev=1.1&cvsroot=obf-common&content-type=text/vnd.viewcvs
> >-markup)
> >
> > All flat file formats are identified using this
> > format:
> >
> > URN:LSID:open-bio.org:<format>/<alphabet>
> >
> > where <format> is one of:
> >       embl
> >       genbank
> >       fasta
> >       swiss
> >       pdb
> >
> > and <alphabet> is one of:
> >     dna
> >     rna
> >     protein
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Everything you'll ever need on one web page
> > from News and Sport to Email and Music Charts
> > http://uk.my.yahoo.com
> > _______________________________________________
> > Open-Bio-l mailing list
> > Open-Bio-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/open-bio-l
> 
> -- 
> ========================================================================
> Lincoln D. Stein                           Cold Spring Harbor Laboratory
> lstein at cshl.org			                  Cold Spring Harbor, NY
> ========================================================================
> 
> 
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at open-bio.org
> http://open-bio.org/mailman/listinfo/open-bio-l


More information about the Open-Bio-l mailing list