[BioSQL-l] Problem loading GenPept files into mysql biosql
Neil Saunders
n.saunders at uq.edu.au
Tue Jun 6 05:41:34 UTC 2006
hi,
I've installed the MySQL BioSQL schema (Ubuntu Linux 5.10, BioPerl 1.5, MySQL
4.1.12). I have written a script that uses Bio::DB::GenPept to retrieve files
by GI and then tries to load them using load_seqdatabase.pl:
load_seqdatabase.pl --safe --dbname DBNAME --dbuser DBUSER --dbpass DBPASS
--namespace genpept --format genbank <files>
I'm getting a lot of errors of type:
-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were
("","Direct Submission","Submitted (11-SEP-2004) National Center for
Biotechnology Information, NIH, Bethesda, MD 20894,
USA","CRC-EFE0D20CE0E07E7D","1","637","") FKs (<NULL>)
Duplicate entry 'CRC-EFE0D20CE0E07E7D' for key 3
---------------------------------------------------
This seems to be related to a similar problem using UniProt discussed on this list:
http://lists.open-bio.org/pipermail/biosql-l/2006-May/000977.html
Am I right in thinking that a CRC is generated from the JOURNAL line of a
GenPept file and that non-unique CRCs are causing this problem? My GenPept
files are actually RefSeq entries from complete microbial genomes. An example
would be NP_378145 (GI 15922476). The REFERENCE for such records is often
"Direct Submission" rather than a journal and obviously in these cases, the set
of all proteins from a genome has the same REFERENCE, so unique CRCs don't seem
like a good idea.
I'd be grateful if anyone could confirm that these records are a problem and
suggest any workarounds,
thanks,
Neil
--
School of Molecular and Microbial Sciences
University of Queensland
Brisbane 4072 Australia
http://psychro.bioinformatics.unsw.edu.au/neil
More information about the BioSQL-l
mailing list