[BioSQL-l] load_seqdatabase.pl warnings and errors

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Wed May 20 11:25:52 UTC 2009


We have a winner :)

NC_003992, NC_011452, NC_011451, NC_011450 all share at least one reference.

Would changing --flatlookup to --lookup change the behaviour so it checks for an existing reference before trying to insert the duplicate?

The answer is no :( (see below).

I guess this may need some coding then!

Thanks!
Mick

perl load_seqdatabase.pl --host localhost --dbname fmd_biosql --format genbank --dbuser removed --dbpass removed --lookup --remove NC_003992.gbk 
Loading NC_003992.gbk ...

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values were ("","Direct Submission","Submitted (12-AUG-2004) National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA","CRC-E8D3CBBD80002FA1","1","8203","") FKs (<NULL>)
Duplicate entry 'CRC-E8D3CBBD80002FA1' for key 3
---------------------------------------------------
Could not store NC_003992: 
------------- EXCEPTION  -------------
MSG: create: object (Bio::Annotation::Reference) failed to insert or to be found by unique key
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK Bio::DB::Persistent::PersistentObject::store /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271
STACK Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:217
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK Bio::DB::Persistent::PersistentObject::store /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271
STACK Bio::DB::BioSQL::SeqAdaptor::store_children /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/BioSQL/SeqAdaptor.pm:224
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK Bio::DB::Persistent::PersistentObject::store /usr/users/bioinformatics/data/foot-and-mouth/bioperl-db-1.5.2_100/Bio/DB/Persistent/PersistentObject.pm:271
STACK (eval) load_seqdatabase.pl:622
STACK toplevel load_seqdatabase.pl:604

--------------------------------------

 at load_seqdatabase.pl line 635

-----Original Message-----
From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On Behalf Of Peter
Sent: 20 May 2009 11:59
To: michael watson (IAH-C)
Cc: Hilmar Lapp; biosql-l at lists.open-bio.org
Subject: Re: [BioSQL-l] load_seqdatabase.pl warnings and errors

On Wed, May 20, 2009 at 10:52 AM, michael watson (IAH-C)
<michael.watson at bbsrc.ac.uk> wrote:
>
> Hi Guys
>
> Ok, the warnings were due to duplicate sequences - I had downloaded a
> stream using Bio::DB::GenBank and I guess I assumed that would mean only
> unique entries were sent back.  Using "--flatlookup --remove" gets rid
> of the warnings.

Great - easy :)

> Now for NC_003992.gbk...
>
> To answer Hilmar's question:
> ...
> And when I run load_seqdatabase.pl on NC_003992.gbk alone I still get:
>
> perl load_seqdatabase.pl --host localhost --dbname fmd_biosql --format
> genbank --dbuser removed --dbpass removed --flatlookup --remove
> NC_003992.gbk
>
> Loading NC_003992.gbk ...
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed, values
> were ("","Direct Submission","Submitted (12-AUG-2004) National Center
> for Biotechnology Information, NIH, Bethesda, MD 20894,
> USA","CRC-E8D3CBBD80002FA1","1","8203","") FKs (<NULL>)
> Duplicate entry 'CRC-E8D3CBBD80002FA1' for key 3
> ---------------------------------------------------
> Could not store NC_003992:
> ------------- EXCEPTION  -------------
> MSG: create: object (Bio::Annotation::Reference) failed to insert or to
> be found by unique key
> ...

I would guess that the problem is this rather generic reference in
NC_003992 may be repeated exactly in another genome (causing the CRC
collision):

CONSRTM   NCBI Genome Project
TITLE     Direct Submission
JOURNAL   Submitted (12-AUG-2004) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA

See http://www.ncbi.nlm.nih.gov/nuccore/NC_011452

i.e. Could there be another direct submission by the NCBI on that date
in your collection?  You could search the database looking for that
CRC and trace it back to a bioentry, or just try grep for "JOURNAL
Submitted (12-AUG-2004) National Center for Biotechnology" on your
GenBank files. e.g. Something like this SQL statement might be
interesting:

SELECT bioentry.accession, reference.title FROM bioentry,
bioentry_reference, reference WHERE
bioentry.bioentry_id=bioentry_reference.bioentry_id AND
bioentry_reference.reference_id=reference.reference_id AND
reference.crc="CRC-E8D3CBBD80002FA1";

Peter




More information about the BioSQL-l mailing list