[BioSQL-l] Bug in loading duplicate but non-identical swissprot
references
Hilmar Lapp
hlapp at gnf.org
Thu Apr 17 02:45:20 EDT 2003
On Thursday, April 17, 2003, at 01:16 AM, Elia Stupka wrote:
> If, as I was mentioning one checks uniqueness only by crc isn't there
> an easy way to force the update of the reference table once the
> medline id is found?
>
Once it's been located you can just say $reference->store(), and the
medline ID would be updated.
The problem is in locating, and I'd be happy to hear how the 'old'
bioperl-db would have solved this, given that it did not employ the
solution strategy outlined in the aforementioned thread under 3). The
problem is that one and the same reference isn't always given with the
same literal title/author/journal or medline ID. I.e., you do not know
a-priori which search is going to locate the reference at hand in the
database. Sometimes (in fact, very often) it is going to be the medline
ID, *not* the CRC (due to slight variations in authors or journal
between say Swissprot and genbank). It could, however, also be the
pubmed ID. Or indeed the CRC.
Hence, you need to conduct all three searches (and break if found). The
reason this results in special case code is because for all other
unique key searches the result of the first search for which you have
all attribute values available is the definitive answer. There is a
single instance of find_by_unique_key() in
Bio::DB::BioSQL::BasePersistenceAdaptor that implements this approach
for all adaptors.
What needs to be done is overriding find_by_unique_key() in
Bio::DB::BioSQL::ReferenceAdaptor and calling the inherited method with
the three searches until it is found. I can do this tomorrow (today
that is). It shouldn't be that hard.
> I guess I really come to BioSQL from a different angle, i.e. I would
> like to keep the beauty of the new and shiny BioSQL but regain its
> ability to be a complete repository of public database sequences for
> which we are still using the old bioperl-db....
>
That's not really such a different angle. More or less, this is what I
have been using it for. The problem you've encountered is just one we
didn't have before Singapore (not because of the schema changes, but
because swissprot changed).
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list