[Bioperl-l] Error while running load_seqdatabase.pl
Hilmar Lapp
hlapp at gmx.net
Tue Jan 9 04:11:38 UTC 2007
George,
this is almost certainly caused by using FASTA format and bioperl's
treatment of it. I am guilty of not having written a FAQ yet for
Bioperl-db, as this would certainly be there.
Specifically, the Bioperl fasta SeqIO parser (load_seqdatabase.pl
uses Bioperl to parse sequence files) does not extract the accession
number from the description line of the fasta sequence, and instead
sets the accession_number property if sequence objects it creates to
"unknown". Since there is a unique key constraint on
(accession,version,namespace) the second sequence loaded will raise
an exception as it will violate the constraint.
The simplest way to deal with this is to write a SeqProcessor that
massages the accession_number appropriately and then supply the
module to load_seqdatabase.pl using the --pipeline command line switch.
There are several examples for how to do this in the email archives.
See for example this thread on the Biosql list:
http://lists.open-bio.org/pipermail/biosql-l/2005-August/000901.html
with two links to examples, and Marc Logghe gives another one in the
thread itself.
Hth,
-hilmar
On Jan 8, 2007, at 3:17 PM, George Heller wrote:
> Hi all.
>
> I am new to Bioperl and am trying to run the load_seqdatabase.pl
> script to load sequence data from a file into Postgres database. I
> am invoking the script through the following command:
>
> perl load_seqdatabase.pl -host localhost -dbname biodb06 -format
> fasta
> -dbuser postgres -driver Pg <name of file>
>
> I am getting the following error:
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values
> were ("FGENES
> HT0000001||AC155633|570|4400|1","FGENESHT0000001||AC155633|570|4400|
> 1","unknown"
> ,"","0","") FKs (1,<NULL>)
> ERROR: duplicate key violates unique constraint
> "bioentry_accession_key"
> ---------------------------------------------------
> Could not store unknown:
> ------------- EXCEPTION -------------
> MSG: error while executing statement in
> Bio::DB::BioSQL::SeqAdaptor::find_by_uni
> que_key: ERROR: current transaction is aborted, commands ignored
> until end of t
> ransaction block
> STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/
> lib/perl
> 5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:948
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /
> usr/lib/perl5
> /site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:852
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/
> perl5/site_perl/5
> .8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:203
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/
> site_perl/5.
> 8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
> STACK Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/
> site_perl/5.8.
> 5/Bio/DB/Persistent/PersistentObject.pm:271
> STACK (eval) load_seqdatabase.pl:620
> STACK toplevel load_seqdatabase.pl:602
> --------------------------------------
> at load_seqdatabase.pl line 633
>
> Can anyone tell me how I can correct this error and get my script
> running? Thanks!!!
>
> George.
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list