[Bioperl-l] loading yeast data failing...
Angshu Kar
angshu96 at gmail.com
Tue Jan 3 20:56:21 EST 2006
Thanks Hilmar.
Now I've another query:
Here is the accessor.pm I'm using <http://accessor.pm/> (one written by
Marc):
use strict;
use vars qw(@ISA);
use lib '/home/akar/local/perl/';
use Bio::Seq::BaseSeqProcessor;
use Bio::SeqFeature::Generic;
@ISA = qw(Bio::Seq::BaseSeqProcessor);
sub process_seq
{
my ($self, $seq) = @_;
$seq->accession_number($seq->display_id);
return ($seq);
}
Could you please let me know what is display_id here? Also which variable
contains the "gi|51013395|gb|AAT92991.1|" string?
Thanks,
Angshu
On 1/3/06, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On 1/3/06, Angshu Kar <angshu96 at gmail.com> wrote:
> > Hi Hilmar,
> >
> > On what basis should I parse? I found the following 3 entries
> (arbitrary) in
> > the bioentry table. The same 3 entries all went to each of the name,
> > identifier and accession fields!And the version field contains all 0s!
> >
> >
> > gi|51013395|gb|AAT92991.1|
> > gi|732941|emb|CAA54130.1|
> > gi|6321883|ref|NP_011959.1|
> >
> > So, here for record 1: gi|51013395 is the identifier, AAT92991 is the
> > accession number, 1 is the version. Am I right? And then what is the
> name?
>
> I'd only used 51013395 as the identifier. Other than that: correct.
> There is no name in the above examples, either because the entry
> doesn't have one designated, or because the tool that wrote the FASTA
> file didn't put it into the identifier part. FASTA format doesn't
> define these things. Have you checked the description whether there is
> a name somewhere? If there isn't one, I'd default name to accession
> number.
>
> >
> > Also I found out just the following entry in the 3 same fields in the
> same
> > table:
> >
> > AT1G08520.1
> >
> > I'm not getting this!I used the TAIR6 dataset.How to parse this data?
> > Could you please advise on how to resolve this?
>
> I have no idea about the TAIR6 datasets - why don't you ask the people
> who create those files?
>
> -hilmar
>
> >
> > Thanks,
> > Angshu
> >
> >
> >
> > On 1/3/06, Hilmar Lapp < hlapp at gmx.net> wrote:
> > > You could do that but first that puts you out of sync with the
> > > official schema, and second if you look at the value it isn't really
> > > an accession number anyway that's causing the problem but rather a
> > > concatenation of identifiers, accession numbers, and namespace
> > > acronyms. Since you're using a custom SeqProcessor anyway already why
> > > don't you just add a line or two of code that parses the display_id
> > > value into the accession and identifier? (for instance, the token
> > > between two '|' characters following the token 'gb')
> > >
> > > -hilmar
> > >
> > > On 1/3/06, Angshu Kar < angshu96 at gmail.com> wrote:
> > > > Hi,
> > > >
> > > > Could you please help me resolve the follwoing error?
> > > >
> > > > I run:
> > > >
> > > > ./load_seqdatabase.pl --dbname=USBA --dbuser=postgres --format=fasta
> > > > --driver=Pg --pipeline="SeqProcessor::Accession"
> > yeast_nrpep.fasta
> > > >
> > > > The error:
> > > >
> > > > Loading yeast_nrpep.fasta ...
> > > >
> > > > -------------------- WARNING ---------------------
> > > > MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values
> were
> > > >
> >
> ("gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","gi|4261605|gb|AAD13905.1|S58126_11111111111111","Unknown
> > > > [Saccharomyces cerevisiae]","0","") FKs (19,<NULL>)
> > > > ERROR: value too long for type character varying(40)
> > > > ---------------------------------------------------
> > > > Could not store
> > gi|4261605|gb|AAD13905.1|S58126_11111111111111:
> > > > ------------- EXCEPTION -------------
> > > > MSG: error while executing statement in
> > > > Bio::DB::BioSQL::SeqAdaptor::find_by_unique_key: ERROR:
> > current transaction
> > > > is aborted, commands ignored until end of transaction block
> > > > STACK
> > Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
> > > >
> > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:951
> > > > STACK
> > Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
> > > >
> > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
> > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
> > > >
> > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
> > > > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
> > > >
> > /home/akar/local/perl//Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
> > > > STACK Bio::DB::Persistent::PersistentObject::store
> > > >
> > /home/akar/local/perl//Bio/DB/Persistent/PersistentObject.pm:272
> > > > STACK (eval) ./load_seqdatabase.pl:621
> > > > STACK toplevel ./load_seqdatabase.pl:604
> > > >
> > > > --------------------------------------
> > > >
> > > > at ./load_seqdatabase.pl line 634
> > > >
> > > > Should I change the field lengths for accession, name and identifier
> to
> > some
> > > > value >40 in the bioentry table? What should I change it to?
> > > >
> > > > Thanks,
> > > > Angshu
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > >
> > >
> > > --
> > >
> > ----------------------------------------------------------
> > > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> > >
> > ----------------------------------------------------------
> > >
> >
> >
>
>
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
More information about the Bioperl-l
mailing list