[BioSQL-l] reference.reference_medline datatype (and name)

Hilmar Lapp hlapp@gnf.org
Tue, 19 Nov 2002 14:44:59 -0800


Reference.reference_medline used to be a number. I've some comments 
and complaints about this attribute.

1) I don't see why this should be restricted to pointing at Medline. 
Hence, in our Oracle version of the schema I in fact named this 
Document_ID from the beginning. Would anyone have an issue with me 
changing the name to Document_ID (or an even better one) in the 
MySQL (and Pg) version.

2) Conceiving it as a Document_ID, the type should be a VARCHAR 
because not everyone may use an integer number as ID.

3) In order to support updates without multiplying identical 
references, I have to be able to identify references by a UK. For 
those that have a Medline ID this is their UK. However, in the 
absence of a Medline ID there is no UK that can be easily enforced 
by the database. In essence, the only natural UK is the combination 
of all other attributes, i.e., authors, title, location. Since their 
values are far too large for an efficient RDBMS-enforced UK, I 
decided to encode those into a CRC. It turns out this seems to work 
fairly well. Right now I'm using the Medline/Document_ID column to 
store the CRC, which is a kludge if you understand Document_ID as a 
value that means something outside of biosql (the CRC obviously 
doesn't). I could introduce another column CRC with a UK on it. What 
are people's thoughts on this?

	-hilmar
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------