[Open-bio-l] LSIDs

Fri Mar 28 15:55:14 EST 2003

Thanks James.

After re-reading the lsid spec, I realised that I
hadn't even written down legal LSIDs! Duh. Replacing
the first '/' with ':' fixes things and gets us to IDs
like this:

URN:LSID:open-bio.org:format:enzyme

URN:LSID:open-bio.org:format:ligand/enzyme
URN:LSID:open-bio.org:format:ligand/compound
URN:LSID:open-bio.org:format:ligand/ligand

URN:LSID:open-bio.org:format:embl?alphabet=DNA
URN:LSID:open-bio.org:format:genbank

With regards to the issue of using domain-names as the
authority, I guess technicaly at some point in the
future the publisher of the LSID could loose control
of the domain. However, the same is true for
XML-schema URIs or java package names, both of which
tend to have domain names encoded in them to provide
uniqueness. Anyway, probably nothing you haven't
already discussed to the point of booredom.

 --- James Freeman <jmfreeman at attbi.com> wrote: > Hi
Matthew,
> On Friday, March 28, 2003, at 05:47 AM, Matthew
> Pocock wrote:
> 
> > Hi,
> >
> > I've got mailing list fatigue. Which one should I
> be
> > posting to about LSIDs for file formats?
> >
> 
> The one with the most traffic at the moment about
> LSID's is the  
> i3c-techarch committee mailing list.  I have am
> forwarding your mail to  
> them in my response to you.
> 
> An excellent commentary about the current thinking
> in implementing  
> LSID's can be found in a document from Joshua
> Phillips, from the  
> National Cancer Institute, see:
> 
>
ftp://ftp1.nci.nih.gov/pub/cacore/caBIO/lsid/lsid_memo.doc
> 
> Currently there is an open issue about using a DNS
> name as part of the  
> LSID, re the American Hipaa regulation on patient
> privacy.
> See:
> 
>
http://answers.hhs.gov/cgi-bin/hhs.cfg/php/enduser/std_alp.php
> 
> My understanding is a medical record used in
> research must have removed  
> protected health information (PHI), and part of that
> removal is any DNS  
> entry or IP address.  An LSID containing a DNS name
> or IP address could  
> be stripped from the record as a result of this
> rule.  Research done  
> for eventual release to the FDA requires that the
> complete PHI be  
> available to the FDA, and LSID in this case would be
> kept.  This leads  
> to research not for consumption by the FDA, using
> American medical  
> records, potentially having the LSID removed as part
> of the standard  
> removal of PHI information, or a statistician must
> give a justification  
> that the LSID could not be used to identify a
> patient for each  
> Independent Review Board (IRB) using the LSID
> indexed information as  
> part of their medical records.  If the DNS entry is
> removed, then the  
> implementation of LSID resolvers cannot use standard
> DNS directly and  
> complicates the resolution, and might require the
> current spec be  
> modified:
> 
>
http://www.i3c.org/workgroups/technical_architecture/resources/lsid/
> 
> docs/LSIDSyntax9-20-02.htm
> 
> I am sending this along for further comment.
> 
> Warmest Regards,
> 
> Jim Freeman
> 
> > Anyway, I was about to add a whole load of common
> 
> > formats to biojava and hit a snag. For your
> > convenience, I've pasted in part of the spec
> below.
> >
> > I know I'm risking being a radical pedant by even
> > bringing this up, but presumably these ids are
> meant
> > to be used by more than one individual.
> >
> > The spec says that we should use things like:
> >
> > URN:LSID:open-bio.org:<format>/<alphabet>
> >
> > This is bad for several reasons. The first one is
> that
> > file formats and sequence databases can become
> > trivialy confused. Does URN:LSID:open-bio.org:embl
> > refer to the embl database, or to the embl format
> with
> > default alphabet?
> >
> > Seccondly, what do we do with non-sequence
> formats?
> > For example, Unigene and Enzyme don't fit into
> this
> > world very well.
> >
> > Thirdly, (and bless them for doing this) there are
> > some ambiguities about format names unless scoped
> > propperly. An example is the Enzyme db's
> enzyme.dat
> > file which is similar to embl in structure, and
> the
> > ligand enzyme file which is shaped like genebank.
> They
> > both tell you things about ec numbers, but are
> > defintiely not the same format.
> >
> > I propose that we carve up the fourth field more
> > sanely. We can firstly prefix the format name with
> the
> > constant string "format/", leaving room in the
> future
> > for namespaces like "database" or "application".
> > Secondly, the format name should (optionaly) be
> > compound. Thirdly, variables (like alphabet)
> should be
> > encoded using an agreed upon URL query scheme.
> >
> > URN:LSID:open-bio.org:format/enzyme
> >
> > URN:LSID:open-bio.org:format/ligand/enzyme
> > URN:LSID:open-bio.org:format/ligand/compound
> > URN:LSID:open-bio.org:format/ligand/ligand
> >
> > URN:LSID:open-bio.org:format/embl?alphabet=DNA
> > URN:LSID:open-bio.org:format/genbank
> >
>
URN:LSID:open-bio.org:format/genbank?alphabet=PROTEIN
> >
> > This leaves us room for URNs for other things that
> > perhaps people haven't named yet:
> >
> > URN:LSID:open-bio.org:database/embl
> > URN:LSID:open-bio.org:database/swissprot
> > URN:LSID:open-bio.org:database/ligand
> >
> > URN:LSID:open-bio.org:application/blast/n:2.2.5
> >
> > Can I ammend the OBDA documentation to match this,
> or
> > is this the wrong way to go?
> >
> > Matthew
> >
> > (from
> >
>
http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/
> 
> >
>
registry/lsid_for_dbformats.txt?rev=1.1&cvsroot=obf-common&content-
> 
> > type=text/vnd.viewcvs-markup)
> >
> > All flat file formats are identified using this
> > format:
> >
> > URN:LSID:open-bio.org:<format>/<alphabet>
> >
> > where <format> is one of:
> >       embl
> >       genbank
> >       fasta
> >       swiss
> >       pdb
> >
> > and <alphabet> is one of:
> >     dna
> >     rna
> >     protein
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Everything you'll ever need on one web page
> > from News and Sport to Email and Music Charts
> > http://uk.my.yahoo.com
> > _______________________________________________
> > Open-Bio-l mailing list
> > Open-Bio-l at open-bio.org
> > http://open-bio.org/mailman/listinfo/open-bio-l
> >
> 
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at open-bio.org
> http://open-bio.org/mailman/listinfo/open-bio-l 

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com