[BioSQL-l] reference.reference_medline datatype (and name)
Hilmar Lapp
hlapp@gnf.org
Tue, 19 Nov 2002 14:44:59 -0800
Reference.reference_medline used to be a number. I've some comments
and complaints about this attribute.
1) I don't see why this should be restricted to pointing at Medline.
Hence, in our Oracle version of the schema I in fact named this
Document_ID from the beginning. Would anyone have an issue with me
changing the name to Document_ID (or an even better one) in the
MySQL (and Pg) version.
2) Conceiving it as a Document_ID, the type should be a VARCHAR
because not everyone may use an integer number as ID.
3) In order to support updates without multiplying identical
references, I have to be able to identify references by a UK. For
those that have a Medline ID this is their UK. However, in the
absence of a Medline ID there is no UK that can be easily enforced
by the database. In essence, the only natural UK is the combination
of all other attributes, i.e., authors, title, location. Since their
values are far too large for an efficient RDBMS-enforced UK, I
decided to encode those into a CRC. It turns out this seems to work
fairly well. Right now I'm using the Medline/Document_ID column to
store the CRC, which is a kludge if you understand Document_ID as a
value that means something outside of biosql (the CRC obviously
doesn't). I could introduce another column CRC with a UK on it. What
are people's thoughts on this?
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------