[BioSQL-l] Bug in loading duplicate but non-identical swissprot references

Elia Stupka elia at tll.org.sg
Thu Apr 17 16:00:29 EDT 2003


Hello there,

we are finally finding the time to tackle properly BioSQL and we are 
loading in entire databases, hoping to help fix issues either on the 
BioSQL front and/or on the SeqIO front which are most likely to arise 
as always not from bugs in the code but from dirt in the databases....

The first problem we have identified is that sometimes the same 
references are cited without their MEDLINE identifiers and other times 
their MEDLINE identifiers are included. This means the first time it is 
encountered it is given a CRC-64 value and a NULL dbxref foreign key 
and thus the UK check is done on the CRC, the next time it has a 
MEDLINE id and so the UK check gets done on the dbxref_id and it gets 
stored as if it was a new record... at which point the insert fails 
because the CRC is duplicated.

Checking only for crc in the get_unique_key_query method of 
ReferenceAdaptor solves the duplication prolbem and lets the medline 
dbxref be stored when it is encountered, however it does not trigger 
the update of the dbxref column in the reference table...

...I am still venturing in this wonderful world of UKs and FKs and 
persistence so I got stuck at this point, suggestions? The main problem 
seems to be that we want to convert an orphan (with no FK) to a child...

Elia

---
Bioinformatics Program Manager
Temasek Life Sciences Laboratory
1, Research Link
Singapore 117604
Tel. +65 6874 4945
Fax. +65 6872 7007



More information about the BioSQL-l mailing list