[Bioperl-l] load_seqdatabase.pl does not like fasta format
Hilmar Lapp
hlapp at gmx.net
Sat Jun 12 02:58:31 EDT 2004
First off, note that if you don't specify a namespace for your
sequences, they will all go into a default namespace ("bioperl"). If
your Genbank load and fasta file contain redundant sequences, but you
want them both in the database, you will need to specify different
namespaces for the two uploads.
What are you printing out to get the NM accession number? The first
line of your stack trace basically means that the accession number of
your fasta sequence was 'unknown'. Since the triple of
(accession,version,namespace) is constrained by and used as a unique
key, and given that fasta doesn't provide version numbers, your
sequences will all be considered identical if the accession is
'unknown' for all of them. I.e., after the first one is inserted, the
second one and all others will fail to insert.
Do you have proper identifiers in the fasta file(s)?
-hilmar
On Friday, June 11, 2004, at 08:09 PM, Andy Hammer wrote:
> I used load_seqdatabase.pl just fine to load over
> 20,000 genbank sequences into a biosql database. Then
> I tried to load a fasta file into a new biosql
> database and got the following:
>
> postgres at westwater:/var/local/ucsc$
> ./load_seqdatabase.pl -dbname ucsc -dbuser postgres
> -format fasta refMrna.fa
> Loading refMrna.fa ...
> Processing NM_000367 at length 2742
> Processing NM_000597 at length 1433
> Could not store unknown:
> ------------- EXCEPTION -------------
> MSG: You're trying to lie about the length: is 1433
> but you say 2742
> STACK Bio::PrimarySeq::length
> /usr/local/share/perl/5.6.1/Bio/PrimarySeq.pm:419
> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD
> /usr/local/share/perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:541
> STACK Bio::Seq::length
> /usr/local/share/perl/5.6.1/Bio/Seq.pm:612
> STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD
> /usr/local/share/perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:541
> STACK
> Bio::DB::BioSQL::BiosequenceAdaptor::populate_from_row
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/BiosequenceAdaptor.pm:251
> STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/
> BasePersistenceAdaptor.pm:1300
> STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:977
> STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:856
> STACK
> Bio::DB::BioSQL::PrimarySeqAdaptor::attach_children
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:284
> STACK Bio::DB::BioSQL::SeqAdaptor::attach_children
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/SeqAdaptor.pm:279
> STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/
> BasePersistenceAdaptor.pm:1331
> STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:977
> STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:856
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:204
> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
> /usr/local/share/perl/5.6.1/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:253
> STACK Bio::DB::Persistent::PersistentObject::store
> /usr/local/share/perl/5.6.1/Bio/DB/Persistent/PersistentObject.pm:270
> STACK (eval) ./load_seqdatabase.pl:521
> STACK toplevel ./load_seqdatabase.pl:504
>
>
> I added the Processing at length lines to see what was
> going on. Only the first entry actually makes it into
> the db. It seems to keep the last sequence in memory
> for some reason. I also tried destroying the $seq at
> the end of the loop with a $seq->DESTROY; command but
> got the same results.
>
> Any ideas on this?
> Thanks.
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Friends. Fun. Try the all-new Yahoo! Messenger.
> http://messenger.yahoo.com/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list