[Bioperl-l] loading yeast data failing...

Angshu Kar angshu96 at gmail.com
Tue Jan 3 21:15:11 EST 2006


I'll try that out Hilmar. And thanks for the clue. :)
Scent a good mentor in you. :)

Thanks again,
Angshu

PS: And no one forbid me but being a tyro I'm not feeling much confident to
fiddle with the real data!


On 1/3/06, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> I suggest you read the SeqIO HOWTO and have a look at the FASTA format
> definition (try Google - it's your friend).
>
> Hint: you're answering your own question. Did someone forbid you to
> play around and use the debugger (or simple print statements for that
> matter)?
>
> On 1/3/06, Angshu Kar <angshu96 at gmail.com> wrote:
> > Thanks Hilmar.
> > Now I've another query:
> >
> > Here is the accessor.pm I'm using (one written by Marc):
> >
> > use strict;
> > use vars qw(@ISA);
> >  use lib '/home/akar/local/perl/';
> > use Bio::Seq::BaseSeqProcessor;
> > use Bio::SeqFeature::Generic;
> >
> > @ISA = qw(Bio::Seq::BaseSeqProcessor);
> >
> >  sub process_seq
> > {
> >   my ($self, $seq) = @_;
> >    $seq->accession_number($seq->display_id);
> >   return ($seq);
> >  }
> >
> > Could you please let me know what is display_id here? Also which
> variable
> > contains the "gi|51013395|gb|AAT92991.1|" string?
> >
> >
> > Thanks,
> > Angshu
> >
> >
> > On 1/3/06, Hilmar Lapp <hlapp at gmx.net> wrote:
> > > On 1/3/06, Angshu Kar <angshu96 at gmail.com> wrote:
> > > > Hi Hilmar,
> > > >
> > > > On what basis should I parse? I found the following 3 entries
> > (arbitrary) in
> > > > the bioentry table. The same 3 entries all went to each of the name,
> > > > identifier and accession fields!And the version field contains all
> 0s!
> > > >
> > > >
> > > > gi|51013395|gb|AAT92991.1|
> > > > gi|732941|emb|CAA54130.1|
> > > >  gi|6321883|ref|NP_011959.1|
> > > >
> > > > So, here for record 1: gi|51013395 is the identifier, AAT92991 is
> the
> > > > accession number, 1 is the version. Am I right? And then what is the
> > name?
> > >
> > > I'd only used 51013395 as the identifier. Other than that: correct.
> > > There is no name in the above examples, either because the entry
> > > doesn't have one designated, or because the tool that wrote the FASTA
> > > file didn't put it into the identifier part. FASTA format doesn't
> > > define these things. Have you checked the description whether there is
> > > a name somewhere? If there isn't one, I'd default name to accession
> > > number.
> > >
> > > >
> > > > Also I found out just the following entry in the 3 same fields in
> the
> > same
> > > > table:
> > > >
> > > >  AT1G08520.1
> > > >
> > > > I'm not getting this!I used the TAIR6 dataset.How to parse this
> data?
> > > > Could you please advise on how to resolve this?
> > >
> > > I have no idea about the TAIR6 datasets - why don't you ask the people
> > > who create those files?
> > >
> > >   -hilmar
> > >
> > > >
> > > > Thanks,
> > > > Angshu
> > > >
> > > >
> > > >
> > > > On 1/3/06, Hilmar Lapp < hlapp at gmx.net> wrote:
> > > > > You could do that but first that puts you out of sync with the
> > > > > official schema, and second if you look at the value it isn't
> really
> > > > > an accession number anyway that's causing the problem but rather a
> > > > > concatenation of identifiers, accession numbers, and namespace
> > > > > acronyms. Since you're using a custom SeqProcessor anyway already
> why
> > > > > don't you just add a line or two of code that parses the
> display_id
> > > > > value into the accession and identifier? (for instance, the token
> > > > > between two '|' characters following the token 'gb')
> > > > >
> > > > >    -hilmar
> > > > >
> > > > > On 1/3/06, Angshu Kar < angshu96 at gmail.com> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > Could you please help me resolve the follwoing error?
> > > > > >
> > > > > > I run:
> > > > > >
> > > > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres
> --format=fasta
> > > > > > --driver=Pg --pipeline="SeqProcessor::Accession"
> > > > yeast_nrpep.fasta
> > > > > >
> > > > > > The error:
> > > > > >
> > > > > > Loading yeast_nrpep.fasta ...
> > > > > >
> > > > > > -------------------- WARNING ---------------------
> > > > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed,
> values
> > were
> > > > > >
> > > >
> >
> ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown
> > > > > > [Saccharomyces cerevisiae]","0","") FKs (19,<NULL>)
> > > > > > ERROR:  value too long for type character varying(40)
> > > > > > ---------------------------------------------------
> > > > > > Could not store
> > > > gi|4261605|gb|AAD13905.1|S58126_11111111111111:
> > > > > > ------------- EXCEPTION  -------------
> > > > > > MSG: error while executing statement in
> > > > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key:
> > ERROR:
> > > >  current transaction
> > > > > > is aborted, commands ignored until end of transaction block
> > > > > > STACK
> > > >
> > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> > > > > >
> > > >
> > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951
> > > > > > STACK
> > > >
> > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> > > > > >
> > > >
> > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> > > > > > STACK
> > Bio::DB::BioSQL::BasePersistenceAdaptor::create
> > > > > >
> > > >
> > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
> > > > > > STACK
> > Bio::DB::BioSQL::BasePersistenceAdaptor::store
> > > > > >
> > > >
> > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
> > > > > > STACK Bio::DB::Persistent::PersistentObject::store
> > > > > >
> > > >
> > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272
> > > > > > STACK (eval) ./load_seqdatabase.pl:621
> > > > > > STACK toplevel ./load_seqdatabase.pl:604
> > > > > >
> > > > > > --------------------------------------
> > > > > >
> > > > > >  at ./load_seqdatabase.pl line 634
> > > > > >
> > > > > > Should I change the field lengths for accession, name and
> identifier
> > to
> > > > some
> > > > > > value >40 in the bioentry table?  What  should I change it to?
> > > > > >
> > > > > > Thanks,
> > > > > > Angshu
> > > > > >
> > > > > > _______________________________________________
> > > > > > Bioperl-l mailing list
> > > > > > Bioperl-l at portal.open-bio.org
> > > > > >
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > >
> > ----------------------------------------------------------
> > > > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> > > > >
> > > >
> > ----------------------------------------------------------
> > > > >
> > > >
> > > >
> > >
> > >
> > > --
> > >
> > ----------------------------------------------------------
> > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> > >
> > ----------------------------------------------------------
> > >
> >
> >
>
>
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>



More information about the Bioperl-l mailing list