[BioSQL-l] Help with load_seqdatabase.pl
Hilmar Lapp
hlapp at gnf.org
Sun Jan 26 00:02:42 EST 2003
Jansen, sorry for the late response. The problem is due to PostgreSQL
handling failures within a transaction differently (than MySQL/InnoDB
and Oracle). The way the adaptor layer works is that those entities
which are practically infinite in number are not looked up before
insert, but instead their presence is detected by an insert failing the
UK constraint. Comment is such an entity. PostgreSQL, however, aborts
the entire transaction upon such a (handled or not) failure. I have yet
to write certain functions in PL/PgSQL that will get around that
problem.
Generally speaking though, updating bioentries through --update is not
very robust, because 1-n and n-n connected relations require more than
a simple update (e.g., the new version of a sequence may have less
features or features with a different key than the old version; a
simple update would leave you with stale features attached to the
bioentry).
I have found it much more robust to simply delete associations and
FK-connected relations, and re-inserting the new set. So, all that is
really UPDATEd in this case is the bioentry (and biosequence) table.
For an example of how to do this, have a look at
scripts/update-on-new-version.pl, which is a closure you can pass to
the --mergeobjs option of load_seqdatabase.pl. I wrote this to update
RefSeq, and it works well for me.
-hilmar
On Thursday, January 23, 2003, at 11:24 AM, Jansen E Lim wrote:
> Hello,
>
> I seem to be having trouble using the -lookup option of
> load_seqdatabase.pl script. In particular, I wanted to see what
> the option
> would
> do as documented as follows:
> --lookup
> flag to look-up by unique key first, converting the
> insert
> into an update if the object is found
>
> I also tried using --lookup 1 without success. I have no trouble
> using -noupdate and -remove option with -lookup.
>
> Here's how I invoke the script: load_seqdatabase.pl -dbname
> refseq -driver Pg -lookup -format genbank dup.dat
> Here's the error message I get:
>
> DBD::Pg::st execute failed: ERROR: Cannot insert a duplicate key
> into unique index comment_bioentry_id_key at
> /libpath/Bio/DB/BioSQL/BaseDriver.pm line 564, <GEN0> line 116.
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::CommentAdaptor (driver) failed,
> values were ("PROVISIONAL REFSEQ: This record
> has not yet been subject to final NCBI review. The reference
> sequence was derived from J04733.1. ","1") FKs (3)
> ERROR: Cannot insert a duplicate key into unique index
> comment_bioentry_id_key
> ---------------------------------------------------
> NOTICE: current transaction is aborted, queries ignored until
> end of transaction block
> DBD::Pg::st fetchall_arrayref failed: no statement executing at
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line
> 801, <GEN0> line 116.
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: Could not store NM_012500:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: create: object (Bio::Annotation::Comment) failed to insert
> or to be found by unique key
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /stf/sys64/perl/newlib/Bio/Root/Root.pm:342
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:197
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:240
> STACK: Bio::DB::Persistent::PersistentObject::store
> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> STACK:
> Bio::DB::BioSQL::AnnotationCollectionAdaptor::store_children
> /stf/biocgi/limje/Bio/DB/BioSQL/AnnotationCollectionAdaptor.pm:220
>
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:240
> STACK: Bio::DB::Persistent::PersistentObject::store
> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> STACK: Bio::DB::BioSQL::SeqAdaptor::store_children
> /stf/biocgi/limje/Bio/DB/BioSQL/SeqAdaptor.pm:179
> STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /stf/biocgi/limje/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:260
> STACK: Bio::DB::Persistent::PersistentObject::store
> /stf/biocgi/limje/Bio/DB/Persistent/PersistentObject.pm:266
> STACK: ../load_seqdatabase.pl:400
> -----------------------------------------------------------
>
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /stf/sys64/perl/newlib/Bio/Root/Root.pm:342
> STACK: ../load_seqdatabase.pl:409
> -----------------------------------------------------------
>
> Thanks for helping out.
>
> -Jansen
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list