[BioSQL-l] Bug in loading duplicate but non-identical swissprot references

Hilmar Lapp hlapp at gnf.org
Thu Apr 17 02:45:20 EDT 2003


On Thursday, April 17, 2003, at 01:16  AM, Elia Stupka wrote:

> If, as I was mentioning one checks uniqueness only by crc isn't there 
> an easy way to force the update of the reference table once the 
> medline id is found?
>

Once it's been located you can just say $reference->store(), and the 
medline ID would be updated.

The problem is in locating, and I'd be happy to hear how the 'old' 
bioperl-db would have solved this, given that it did not employ the 
solution strategy outlined in the aforementioned thread under 3). The 
problem is that one and the same reference isn't always given with the 
same literal title/author/journal or medline ID. I.e., you do not know 
a-priori which search is going to locate the reference at hand in the 
database. Sometimes (in fact, very often) it is going to be the medline 
ID, *not* the CRC (due to slight variations in authors or journal 
between say Swissprot and genbank). It could, however, also be the 
pubmed ID. Or indeed the CRC.

Hence, you need to conduct all three searches (and break if found). The 
reason this results in special case code is because for all other 
unique key searches the result of the first search for which you have 
all attribute values available is the definitive answer. There is a 
single instance of find_by_unique_key() in 
Bio::DB::BioSQL::BasePersistenceAdaptor that implements this approach 
for all adaptors.

What needs to be done is overriding find_by_unique_key() in 
Bio::DB::BioSQL::ReferenceAdaptor and calling the inherited method with 
the three searches until it is found. I can do this tomorrow (today 
that is). It shouldn't be that hard.

> I guess I really come to BioSQL from a different angle, i.e. I would 
> like to keep the beauty of the new and shiny BioSQL but regain its 
> ability to be a complete repository of public database sequences for 
> which we are still using the old bioperl-db....
>

That's not really such a different angle. More or less, this is what I 
have been using it for. The problem you've encountered is just one we 
didn't have before Singapore (not because of the schema changes, but 
because swissprot changed).

	-hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list