[BioSQL-l] Bug in loading duplicate but non-identical swissprot references

Hilmar Lapp hlapp at gnf.org
Sun Apr 20 19:27:19 EDT 2003


On Thursday, April 17, 2003, at 01:56  AM, Elia Stupka wrote:

>
>> What needs to be done is overriding find_by_unique_key() in 
>> Bio::DB::BioSQL::ReferenceAdaptor and calling the inherited method 
>> with the three searches until it is found. I can do this tomorrow 
>> (today that is). It shouldn't be that hard.
>
> Ok, that's much clearer, it's end of day here and beginning of 
> long-weekend, but if you don't find the time to do it I'll be happy to 
> give it a shot after the week-end.
>

After giving it a second thought I decided why not implement this 
capability in the base-adaptor find_by_unique_query implementation so 
that it is available to all adaptors that want it. I've committed the 
changes to both code and documentation so that it's hopefully not 
entirely obfuscated how to enable this feature. Basically 
get_unique_key_query now can return an array, and ReferenceAdaptor does 
exactly that. The order of keys that are going to be searched for 
references is

	- medline ID (if $reference->medline returns a value)
	- PubMed ID (if $reference->pubmed returns a value)
	- CRC (if at least $reference->authors returns a value).

Also, I added code that PubMed ID substitutes for Medline ID if Medline 
ID is absent (i.e., medline ID takes precedence).

All tests pass, but that doesn't mean that the case that triggered the 
problem is proven to be solved, as it is not yet included in any of the 
tests. I'll do that later, or Elia you're welcome to add that to a test 
too.

	-hilmar

BTW this is really about bioperl-db; is bioperl-l or biosql-l supposed 
to be the forum for bioperl-db? Or shall it receive its own?

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list