[Open-bio-l] LSIDs
James Freeman
jmfreeman at attbi.com
Fri Mar 28 09:43:03 EST 2003
Hi Matthew,
On Friday, March 28, 2003, at 05:47 AM, Matthew Pocock wrote:
> Hi,
>
> I've got mailing list fatigue. Which one should I be
> posting to about LSIDs for file formats?
>
The one with the most traffic at the moment about LSID's is the
i3c-techarch committee mailing list. I have am forwarding your mail to
them in my response to you.
An excellent commentary about the current thinking in implementing
LSID's can be found in a document from Joshua Phillips, from the
National Cancer Institute, see:
ftp://ftp1.nci.nih.gov/pub/cacore/caBIO/lsid/lsid_memo.doc
Currently there is an open issue about using a DNS name as part of the
LSID, re the American Hipaa regulation on patient privacy.
See:
http://answers.hhs.gov/cgi-bin/hhs.cfg/php/enduser/std_alp.php
My understanding is a medical record used in research must have removed
protected health information (PHI), and part of that removal is any DNS
entry or IP address. An LSID containing a DNS name or IP address could
be stripped from the record as a result of this rule. Research done
for eventual release to the FDA requires that the complete PHI be
available to the FDA, and LSID in this case would be kept. This leads
to research not for consumption by the FDA, using American medical
records, potentially having the LSID removed as part of the standard
removal of PHI information, or a statistician must give a justification
that the LSID could not be used to identify a patient for each
Independent Review Board (IRB) using the LSID indexed information as
part of their medical records. If the DNS entry is removed, then the
implementation of LSID resolvers cannot use standard DNS directly and
complicates the resolution, and might require the current spec be
modified:
http://www.i3c.org/workgroups/technical_architecture/resources/lsid/
docs/LSIDSyntax9-20-02.htm
I am sending this along for further comment.
Warmest Regards,
Jim Freeman
> Anyway, I was about to add a whole load of common
> formats to biojava and hit a snag. For your
> convenience, I've pasted in part of the spec below.
>
> I know I'm risking being a radical pedant by even
> bringing this up, but presumably these ids are meant
> to be used by more than one individual.
>
> The spec says that we should use things like:
>
> URN:LSID:open-bio.org:<format>/<alphabet>
>
> This is bad for several reasons. The first one is that
> file formats and sequence databases can become
> trivialy confused. Does URN:LSID:open-bio.org:embl
> refer to the embl database, or to the embl format with
> default alphabet?
>
> Seccondly, what do we do with non-sequence formats?
> For example, Unigene and Enzyme don't fit into this
> world very well.
>
> Thirdly, (and bless them for doing this) there are
> some ambiguities about format names unless scoped
> propperly. An example is the Enzyme db's enzyme.dat
> file which is similar to embl in structure, and the
> ligand enzyme file which is shaped like genebank. They
> both tell you things about ec numbers, but are
> defintiely not the same format.
>
> I propose that we carve up the fourth field more
> sanely. We can firstly prefix the format name with the
> constant string "format/", leaving room in the future
> for namespaces like "database" or "application".
> Secondly, the format name should (optionaly) be
> compound. Thirdly, variables (like alphabet) should be
> encoded using an agreed upon URL query scheme.
>
> URN:LSID:open-bio.org:format/enzyme
>
> URN:LSID:open-bio.org:format/ligand/enzyme
> URN:LSID:open-bio.org:format/ligand/compound
> URN:LSID:open-bio.org:format/ligand/ligand
>
> URN:LSID:open-bio.org:format/embl?alphabet=DNA
> URN:LSID:open-bio.org:format/genbank
> URN:LSID:open-bio.org:format/genbank?alphabet=PROTEIN
>
> This leaves us room for URNs for other things that
> perhaps people haven't named yet:
>
> URN:LSID:open-bio.org:database/embl
> URN:LSID:open-bio.org:database/swissprot
> URN:LSID:open-bio.org:database/ligand
>
> URN:LSID:open-bio.org:application/blast/n:2.2.5
>
> Can I ammend the OBDA documentation to match this, or
> is this the wrong way to go?
>
> Matthew
>
> (from
> http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/obda-specs/
> registry/lsid_for_dbformats.txt?rev=1.1&cvsroot=obf-common&content-
> type=text/vnd.viewcvs-markup)
>
> All flat file formats are identified using this
> format:
>
> URN:LSID:open-bio.org:<format>/<alphabet>
>
> where <format> is one of:
> embl
> genbank
> fasta
> swiss
> pdb
>
> and <alphabet> is one of:
> dna
> rna
> protein
>
>
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l at open-bio.org
> http://open-bio.org/mailman/listinfo/open-bio-l
>
More information about the Open-Bio-l
mailing list