[BioSQL-l] Bug in loading duplicate but non-identical swissprot
references
Hilmar Lapp
hlapp at gnf.org
Thu Apr 17 02:01:26 EDT 2003
This has been seen and debated before. See
http://open-bio.org/pipermail/biosql-l/2003-March/000277.html
for the start of the thread (it might be useful to read the whole
thread if you hadn't followed it originally). There are some
non-obvious implications.
I haven't gotten around to implement the solution. It involves special
case code; I wanted to come up with an implementation that limits the
damage.
-hilmar
On Thursday, April 17, 2003, at 12:00 AM, Elia Stupka wrote:
> Hello there,
>
> we are finally finding the time to tackle properly BioSQL and we are
> loading in entire databases, hoping to help fix issues either on the
> BioSQL front and/or on the SeqIO front which are most likely to arise
> as always not from bugs in the code but from dirt in the databases....
>
> The first problem we have identified is that sometimes the same
> references are cited without their MEDLINE identifiers and other times
> their MEDLINE identifiers are included. This means the first time it
> is encountered it is given a CRC-64 value and a NULL dbxref foreign
> key and thus the UK check is done on the CRC, the next time it has a
> MEDLINE id and so the UK check gets done on the dbxref_id and it gets
> stored as if it was a new record... at which point the insert fails
> because the CRC is duplicated.
>
> Checking only for crc in the get_unique_key_query method of
> ReferenceAdaptor solves the duplication prolbem and lets the medline
> dbxref be stored when it is encountered, however it does not trigger
> the update of the dbxref column in the reference table...
>
> ...I am still venturing in this wonderful world of UKs and FKs and
> persistence so I got stuck at this point, suggestions? The main
> problem seems to be that we want to convert an orphan (with no FK) to
> a child...
>
> Elia
>
> ---
> Bioinformatics Program Manager
> Temasek Life Sciences Laboratory
> 1, Research Link
> Singapore 117604
> Tel. +65 6874 4945
> Fax. +65 6872 7007
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list