[Bioperl-l] loading yeast data failing...

Hilmar Lapp hlapp at gmx.net
Tue Jan 3 21:07:54 EST 2006


I suggest you read the SeqIO HOWTO and have a look at the FASTA format
definition (try Google - it's your friend).

Hint: you're answering your own question. Did someone forbid you to
play around and use the debugger (or simple print statements for that
matter)?

On 1/3/06, Angshu Kar <angshu96 at gmail.com> wrote:
> Thanks Hilmar.
> Now I've another query:
>
> Here is the accessor.pm I'm using (one written by Marc):
>
> use strict;
> use vars qw(@ISA);
>  use lib '/home/akar/local/perl/';
> use Bio::Seq::BaseSeqProcessor;
> use Bio::SeqFeature::Generic;
>
> @ISA = qw(Bio::Seq::BaseSeqProcessor);
>
>  sub process_seq
> {
>   my ($self, $seq) = @_;
>    $seq->accession_number($seq->display_id);
>   return ($seq);
>  }
>
> Could you please let me know what is display_id here? Also which variable
> contains the "gi|51013395|gb|AAT92991.1|" string?
>
>
> Thanks,
> Angshu
>
>
> On 1/3/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> > On 1/3/06, Angshu Kar <angshu96 at gmail.com> wrote:
> > > Hi Hilmar,
> > >
> > > On what basis should I parse? I found the following 3 entries
> (arbitrary) in
> > > the bioentry table. The same 3 entries all went to each of the name,
> > > identifier and accession fields!And the version field contains all 0s!
> > >
> > >
> > > gi|51013395|gb|AAT92991.1|
> > > gi|732941|emb|CAA54130.1|
> > >  gi|6321883|ref|NP_011959.1|
> > >
> > > So, here for record 1: gi|51013395 is the identifier, AAT92991 is the
> > > accession number, 1 is the version. Am I right? And then what is the
> name?
> >
> > I'd only used 51013395 as the identifier. Other than that: correct.
> > There is no name in the above examples, either because the entry
> > doesn't have one designated, or because the tool that wrote the FASTA
> > file didn't put it into the identifier part. FASTA format doesn't
> > define these things. Have you checked the description whether there is
> > a name somewhere? If there isn't one, I'd default name to accession
> > number.
> >
> > >
> > > Also I found out just the following entry in the 3 same fields in the
> same
> > > table:
> > >
> > >  AT1G08520.1
> > >
> > > I'm not getting this!I used the TAIR6 dataset.How to parse this data?
> > > Could you please advise on how to resolve this?
> >
> > I have no idea about the TAIR6 datasets - why don't you ask the people
> > who create those files?
> >
> >   -hilmar
> >
> > >
> > > Thanks,
> > > Angshu
> > >
> > >
> > >
> > > On 1/3/06, Hilmar Lapp < hlapp at gmx.net> wrote:
> > > > You could do that but first that puts you out of sync with the
> > > > official schema, and second if you look at the value it isn't really
> > > > an accession number anyway that's causing the problem but rather a
> > > > concatenation of identifiers, accession numbers, and namespace
> > > > acronyms. Since you're using a custom SeqProcessor anyway already why
> > > > don't you just add a line or two of code that parses the display_id
> > > > value into the accession and identifier? (for instance, the token
> > > > between two '|' characters following the token 'gb')
> > > >
> > > >    -hilmar
> > > >
> > > > On 1/3/06, Angshu Kar < angshu96 at gmail.com> wrote:
> > > > > Hi,
> > > > >
> > > > > Could you please help me resolve the follwoing error?
> > > > >
> > > > > I run:
> > > > >
> > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta
> > > > > --driver=Pg --pipeline="SeqProcessor::Accession"
> > > yeast_nrpep.fasta
> > > > >
> > > > > The error:
> > > > >
> > > > > Loading yeast_nrpep.fasta ...
> > > > >
> > > > > -------------------- WARNING ---------------------
> > > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values
> were
> > > > >
> > >
> ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown
> > > > > [Saccharomyces cerevisiae]","0","") FKs (19,<NULL>)
> > > > > ERROR:  value too long for type character varying(40)
> > > > > ---------------------------------------------------
> > > > > Could not store
> > > gi|4261605|gb|AAD13905.1|S58126_11111111111111:
> > > > > ------------- EXCEPTION  -------------
> > > > > MSG: error while executing statement in
> > > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key:
> ERROR:
> > >  current transaction
> > > > > is aborted, commands ignored until end of transaction block
> > > > > STACK
> > >
> Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> > > > >
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951
> > > > > STACK
> > >
> Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> > > > >
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> > > > > STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::create
> > > > >
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
> > > > > STACK
> Bio::DB::BioSQL::BasePersistenceAdaptor::store
> > > > >
> > >
> /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
> > > > > STACK Bio::DB::Persistent::PersistentObject::store
> > > > >
> > >
> /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272
> > > > > STACK (eval) ./load_seqdatabase.pl:621
> > > > > STACK toplevel ./load_seqdatabase.pl:604
> > > > >
> > > > > --------------------------------------
> > > > >
> > > > >  at ./load_seqdatabase.pl line 634
> > > > >
> > > > > Should I change the field lengths for accession, name and identifier
> to
> > > some
> > > > > value >40 in the bioentry table?  What  should I change it to?
> > > > >
> > > > > Thanks,
> > > > > Angshu
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at portal.open-bio.org
> > > > >
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > >
> ----------------------------------------------------------
> > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> > > >
> > >
> ----------------------------------------------------------
> > > >
> > >
> > >
> >
> >
> > --
> >
> ----------------------------------------------------------
> > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> >
> ----------------------------------------------------------
> >
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



More information about the Bioperl-l mailing list