[BioSQL-l] null title and CRC
Angel Pizarro
angel at mail.med.upenn.edu
Thu Aug 10 20:05:08 UTC 2006
Here are a set of records that make a new install of biosql fail b/c of
the CRC constraint using the script :
bioperl-db/scripts/biosql/load_seqdatabase.pl
My test setup was latest CVS tarball (as of last week ;) ) of bioperl,
mysql 5.0. Also recreated the error on a fresh postgres 7.4.8 (and 8.1)
install. I ran the script like so:
perl ~/bin/load_seqdatabase.pl --dsn "dbi:mysql:bstest" --format genbank
--dbuser xxxx --dbpass xxxx --namespace gb --lookup
test_load_seqdatabase_crc.gbff
Here is the debug error message from one of the runs I did:
> --------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,
> values were ("","Danio rerio small nuclear ribonucleoprotein
> polypeptide C, mRNA (cDNA clone MGC:109792
> IMAGE:7292940)","Unpublished
> (2005)","CRC-0E44E80E2C988097","1","159","") FKs (<NULL>)
> ERROR: duplicate key violates unique constraint "reference_crc_key"
Cheers,
-angel
Angel Pizarro wrote:
> Hilmar Lapp wrote:
>
>> I think I need to debug this. If bioperl-db stumbles over this, then
>> it sounds like that's what needs to be fixed.
>>
>> Can you or somebody else provide with two sample records that
>> exemplify (i.e., replicate) the problem and which I can turn into a
>> test case?
>>
>>
> Since these where bulk loads, I am not sure which records conflicted,
> but I'll have a poke around and see if I can grab a test set for you.
> -angel
>
>
>> -hilmar
>>
>> On Aug 3, 2006, at 2:12 PM, Angel Pizarro wrote:
>>
>>
>>> From hilmar:
>>>
>>>> The CRC for references uses the authors, title, and location
>>>> attributes in Bioperl-db, and empty (or null) strings default to the
>>>> string "<undef>".
>>>>
>>>> If title is empty and authors and location do not distinguish two
>>>> references, then why do you want to have two rows for those
>>>> references? Basically, there are identical for all intents and
>>>> purposes, or are they not?
>>>>
>>>> -hilmar
>>>>
>>> Sorry for not replying to the original thread, but I just joined this
>>> list.
>>> This was an issue for me with bioperl loading as well, since I was using
>>> the same biosql instance to load two different biodatabases with the
>>> same entry. Specifically, I loaded IPI, which has no feature table in
>>> the entries, and the genbank equivalents to get the feature tables.
>>> Namely the constraint caused an error when the the genbank record was
>>> loaded.
>>>
>>> I think that this is primarily an issue with bioperl, but I raise it
>>> here to make the java folks aware of the potential pitfall and maybe ask
>>> if whether the CRC should be calculated with the biodatabase in mind?
>>> Probably not, since as hilmar states, it's still the same reference.
>>>
>>> BTW - I solved the issue by dropping the constraint, since I really
>>> don't care about references. Not optimal, but certainly easiest thing to
>>> do ;)
>>>
>>> -angel
>>>
>>> _______________________________________________
>>> BioSQL-l mailing list
>>> BioSQL-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biosql-l
>>>
>>>
>> --===========================================================
>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>
>
--
Angel Pizarro
Director, Bioinformatics Facility
Institute for Translational Medicine and Therapeutics
University of Pennsylvania
806 BRB II/III
421 Curie Blvd.
Philadelphia, PA 19104-6160
P: 215-573-3736
F: 215-573-9004
E: angel at mail.med.upenn.edu
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test_load_seqdatabase_crc.gbff
URL: <http://lists.open-bio.org/pipermail/biosql-l/attachments/20060810/1e7d3285/attachment.ksh>
More information about the BioSQL-l
mailing list