[BioSQL-l] Help with load_seqdatabase.pl

Hilmar Lapp hlapp at gnf.org
Sun Jan 26 00:02:42 EST 2003


Jansen, sorry for the late response. The problem is due to PostgreSQL 
handling failures within a transaction differently (than MySQL/InnoDB 
and Oracle). The way the adaptor layer works is that those entities 
which are practically infinite in number are not looked up before 
insert, but instead their presence is detected by an insert failing the 
UK constraint. Comment is such an entity. PostgreSQL, however, aborts 
the entire transaction upon such a (handled or not) failure. I have yet 
to write certain functions in PL/PgSQL that will get around that 
problem.

Generally speaking though, updating bioentries through --update is not 
very robust, because 1-n and n-n connected relations require more than 
a simple update (e.g., the new version of a sequence may have less 
features or features with a different key than the old version; a 
simple update would leave you with stale features attached to the 
bioentry).

I have found it much more robust to simply delete associations and 
FK-connected relations, and re-inserting the new set. So, all that is 
really UPDATEd in this case is the bioentry (and biosequence) table. 
For an example of how to do this, have a look at 
scripts/update-on-new-version.pl, which is a closure you can pass to 
the --mergeobjs option of load_seqdatabase.pl. I wrote this to update 
RefSeq, and it works well for me.

	-hilmar

On Thursday, January 23, 2003, at 11:24  AM, Jansen E Lim wrote:

> Hello,
>
> I seem to be having trouble using the -lookup option of
> load_seqdatabase.pl script.  In particular, I wanted to see what
> the option
> would
> do as documented as follows:
>             --lookup
>             flag to look-up by unique key first, converting the
> insert
>             into an update if the object is found
>
> I also tried using --lookup 1 without success.  I have no trouble
> using  -noupdate and -remove option with -lookup.
>
> Here's how I invoke the script:  load_seqdatabase.pl  -dbname
> refseq -driver Pg -lookup -format genbank dup.dat
> Here's the error message I get:
>
> DBD::Pg::st execute failed: ERROR:  Cannot insert a duplicate key
> into unique index comment_bioentry_id_key at
> /libpath/Bio/DB/BioSQL/BaseDriver.pm line 564, <GEN0> line 116.
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed,
> values were ("PROVISIONAL REFSEQ: This record
> has not yet been subject to final NCBI review. The reference
> sequence was derived from J04733.1. ","1") FKs (3)
> ERROR:  Cannot insert a duplicate key into unique index
> comment_bioentry_id_key
> ---------------------------------------------------
> NOTICE:  current transaction is aborted, queries ignored until
> end of transaction block
> DBD::Pg::st fetchall_arrayref failed: no statement executing at
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line
> 801, <GEN0> line 116.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Could not store NM_012500:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: create: object (Bio::Annotation::Comment) failed to insert
> or to be found by unique key
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /stf/sys64/perl/newlib/Bio/Root/Root.pm:342
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:197
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:240
> STACK: Bio::DB::Persistent::PersistentObject::store
> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> STACK:
> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children
> /stf/biocgi/limje/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:220
>
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:240
> STACK: Bio::DB::Persistent::PersistentObject::store
> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children
> /stf/biocgi/limje/Bio/DB/BioSQL/SeqAdaptor.pm:179
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:260
> STACK: Bio::DB::Persistent::PersistentObject::store
> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> STACK: ../load_seqdatabase.pl:400
> -----------------------------------------------------------
>
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /stf/sys64/perl/newlib/Bio/Root/Root.pm:342
> STACK: ../load_seqdatabase.pl:409
> -----------------------------------------------------------
>
> Thanks for helping out.
>
> -Jansen
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list