[Bioperl-l] error running load_seqdatabase.pl

Chris Fields cjfields at uiuc.edu
Fri Jan 13 11:06:43 EST 2006


Sorry, I should have clarified; NP_249092.gpt is a file carrying all protein
sequences with significant BLASTP score hits to NP_249092 in GenPept
format(which also includes the sequence NP_249092).  Only a number of these
had problems, all of which seem to be Uniprot.  I had problems using my
script to download the sequences b/c of NCBI's limit for batch sequence
extraction, so I used the Batch Entrez interface to download them (i.e. they
are directly from the protein  database at NCBI).  NP_252217.gpt is the same
as above (a file with sig. hits to NP_252217) but had fewer hits, so batch
extraction through Bio::DB::GenPept worked (they were then passed as
Bio::SeqIO objects and saved in GenBank format).  As reported before, there
were no errors with that file. 

The other issue, with taxonomy, was fixed when I loaded the database using
load_ncbi_taxonomy.pl.  I dropped the old database, reinstalled the schema,
but forgot to add in the taxonomic info.  

I think we really should probably give credit to Baohua Wang for noting the
change in throw.  If it pans out, this may be what is responsible for error
messages popping up every once in a while with bioperl scripts.  There is
one thing of note:  Steve mentions that Error.pm should be present:

> -----Original Message-----
> From: Steve Chervitz [mailto:Steve_Chervitz at affymetrix.com]
> Sent: Friday, January 13, 2006 4:26 AM
> To: Hilmar Lapp
> Cc: Chris Fields; Steve Chervitz; bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] error running load_seqdatabase.pl
> 
> looks like the trouble is when Bio::Root::Root::throw() tries to call
> Error::throw(). Perhaps there is some windows-specific problem with
> Error.pm? Can't say I've seen this before since I don't use perl on
> windows.
> 
> Some things to try, in this order:
> 
> * Verify that Error.pm is installed for perl on your system.
> * Try running t/Exception.t and
> the examples/root/exceptions[1-4].pl scripts and see if they
> produce the expected behavior.
> * Try changing the 'throw $class ...' statements in Root.pm to
> 'Error::throw $class ...'
> * If Error.pm seems to be installed but isn't working right, either
> uninstall it or get in the habit of putting this line in your main
> scripts: INIT { $DONT_USE_ERROR=1; }
> 
> Steve

The requirement didn't pop up when creating the PPM distro.  It also isn't
included in ActivePerl but is available.  I've installed it and will go
through the above to see if it changes anything using unmodified Root.pm.  

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
> Lapp
> Sent: Thursday, January 12, 2006 10:28 PM
> To: Chris Fields
> Cc: Steve Chervitz; bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] error running load_seqdatabase.pl
> 
> On 1/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
> > Looks like the below modification Baohua Wang made to Root.pm works.  I
> did
> > run into another weird issue, but I think it is a sequence formatting
> > problem.  I try loading in a file with protein sequences in GenPept
> format
> > (pulled from BLASTP output using Bio::DB::GenPept and saved in a file
> using
> > SeqIO) after changing Root.pm:
> > ______________________________________________________________________
> >
> > C:\Perl\Scripts>load_seqdatabase.pl -dbname biosql -dbuser root -dbpass
> > ****** -format genbank -safe NP_252217.gpt
> > Loading NP_252217.gpt ...
> >
> > C:\Perl\Scripts>
> > ______________________________________________________________________
> >
> > Good!
> 
> Great! So we'll have to test that the effect of adding that comma
> isn't negative on Unix platforms but I suspect it's in fact required
> by syntax and maybe on Windows perl is less lenient? Odd at any rate.
> 
> >
> > The strangeness comes in when using Genpept seqs NOT passed through
> SeqIO
> > (pulled directly from NCBI, saved in a similar file).  Most sequences
> will
> > load, but a number of them will not:
> >
> > ______________________________________________________________________
> > C:\Perl\Scripts>load_seqdatabase.pl -dbname biosql -dbuser root -dbpass
> > ****** -format genbank -safe NP_249092.gpt
> > Loading NP_249092.gpt ...
> >
> > -------------------- WARNING ---------------------
> > MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were
> > ("","HAMAPMF_00220","0") FKs ()
> > Column 'dbname' cannot be null
> > ---------------------------------------------------
> > Could not store Q59712:
> 
> Are you sure you pulled this from NCBI using NP_249092 as the
> accession? I'm asking because NP_249092 is a perfectly sane looking
> RefSeq record and in fact does not contain the string HAMAPMF, whereas
> Q59712 in reality is a Uniprot record moulded into GenPept format;
> some of the db_xrefs come out odd and in fact for the one above
> (HAMAPMF_00220) there is no dbname, most likely because dbname and
> accession are concatenated like for the following InterPro db_xref.
> 
> So I don't think this is worrisome unless you insist you used the
> NP_249092 entry ...
> 
> I would generally advise against taking Uniprot/Swissprot entries from
> their GenPept reincarnation. The formats are incompatible in some
> aspects (e.g., Swissprot, like EMBL, has first-level db_xrefs, whereas
> GenBank format doesn't; instead it puts db_xrefs into the feature
> table).
> 
> > [...]
> > at C:\Perl\Scripts\load_seqdatabase.pl line 633
> > Could not store AAU82296:
> > ------------- EXCEPTION  -------------
> > MSG: create: object (Bio::Species) failed to insert or to be found by
> unique
> > key
> 
> "uncultured archaeon GZfos13E1" is not something Bioperl will parse
> correctly into the appropriate Bio::Species structure (not that I
> would even know what that would have to look like ;).
> 
> However, if you preload your Biosql instance with the NCBI taxonomy
> database then this is not a problem because the species will be looked
> up correctly by its NCBI taxon ID (which the genbank SeqIO parser
> extracts from the feature table if it's there - and it is in this
> case).
> 
> > [...]
> > I'll check them out to try and derive what the differences are.  I will
> also
> > pass the above file through SeqIO to see what happens.
> 
> Note that everything you pull down through Bio::DB::GenPept does get
> parsed by Bio::SeqIO::genbank - if there is any difference it must be
> because the input files aren't identical.
> 
> > I think it could be some of the GenPept formatted stuff is clogging up
> the works since I saved
> > everything in Genbank format through SeqIO.
> 
> Ah - meaning you got the file by calling $seqio->write_seq($seq) ?
> That could cause it's own problems (even though theoretically it
> shouldn't and therefore if it does it counts as a bug).
> 
> >  For now, though, bioperl-db on
> > Windows works!  Any idea why the 'throw' change works?
> 
> No, no idea - but great that you found out.
> 
>    -hilmar
> 
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > -----Original Message-----
> > From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
> Lapp
> > Sent: Wednesday, January 11, 2006 5:13 PM
> > To: Chris Fields; Steve Chervitz
> > Cc: bioperl-l at portal.open-bio.org
> > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl
> >
> > Interesting. That posting didn't receive much attention did it. So he
> > states:
> >
> > <quote>
> > The script failed on throw() in loading Bio/Root/Root.pm on Windows.
> > The problem lines are those "throw $class (...".   After I put comma
> > after $class as "throw $class, (...", the BioSQL tests and load scripts
> > are succeeded
> > </quote>
> >
> > Can anyone of those who wrote the Root exception and warning code
> > comment? Maybe Steve?
> >
> >    -hilmar
> >
> > On 1/11/06, Chris Fields <cjfields at uiuc.edu> wrote:
> > > Hilmar,
> > >
> > > As an update on what's going on:
> > >
> > > I've run into a few problems with load_seqdatabase.pl and bioperl-db
> on
> > > cygwin which I'll try to hash through this week; I'll post if I can't
> > figure
> > > it out soon.  It's not as buggy as trying to run it using the latest
> > > ActivePerl on WinXP, but it still has issues.
> > >
> > > I'm also looking through the ActiveState documentation for the latest
> > > version of perl they have (5.8.7), which I am running.  AFAIK, they
> enable
> > > dynamic loading when building.  I'll send them an email directly to
> see
> > what
> > > they say.  There may be some Win32-specific way of configuring a
> script
> > for
> > > dynamic loading of perl modules which isn't needed in other
> environments.
> > >
> > > There was also this previous email on bioperl-l:
> > >
> > > http://portal.open-bio.org/pipermail/bioperl-l/2005-May/018937.html
> > >
> > > Baohua Wang seemed to narrow it down somewhat, but I'm not sure if
> > changing
> > > the modules is a solution until I figure out why he made the changes.
> > They
> > > seem mainly geared towards getting load_seqdatabase to work with
> MsSQL,
> > but
> > > if he got it to work on Windows, then he may be onto something.  The
> > > modified Bio* modules can be found at:
> > >
> > > ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows
> > >
> > > I'll check them out to see if they work out and see what specific
> > > modifications he made (they're not detailed).
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > > -----Original Message-----
> > > From: bioperl-l-bounces at portal.open-bio.org
> > > [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Chris
> Fields
> > > Sent: Friday, January 06, 2006 1:28 PM
> > > To: 'Hilmar Lapp'
> > > Cc: bioperl-l at portal.open-bio.org
> > > Subject: RE: [Bioperl-l] error running load_seqdatabase.pl
> > >
> > > I'll try installing bioperl-db using Cygwin.  I know that I can
> connect to
> > > the native Windows mysql database from inside cygwin, so perhaps this
> will
> > > do as a short term workaround.  I'll also try using a different native
> > win32
> > > Perl version (maybe 5.6) and look into the dynamic loading issue.  I
> know
> > > that the AS Perl has given errors like this before and not had
> problems (I
> > > think it was also cranky with older versions bioperl), but this one is
> > > pretty serious.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > > -----Original Message-----
> > > From: Hilmar Lapp [mailto:hlapp at gmx.net]
> > > Sent: Friday, January 06, 2006 12:02 PM
> > > To: Chris Fields
> > > Cc: bioperl-l at portal.open-bio.org
> > > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl
> > >
> > >
> > > On Jan 6, 2006, at 9:20 AM, Chris Fields wrote:
> > >
> > > > Hilmar,
> > > >
> > > > Did this ever get resolved?  I tried to reinstall a biosql database
> > > > using
> > > > bioperl-db and got the same problems.  I'll list out everything I
> ran
> > > > into
> > > > and what I pan on trying, as it's been a long time since I've tried
> > > > this.
> > > >
> > > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL
> > > > 4.1.14.
> > > > Using nmake and installing worked fine.  Loading the biosql schema
> and
> > > > loading taxonomy info also worked fine, although I had to manually
> > > > untar the
> > > > taxonomy archive so load_ncbi_taxonomy.pl could find the files
> (stupid
> > > > windows).  However, this is what happens when using
> > > > load_seqdatabase.pl:
> > > >
> > > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser
> root
> > > > NP_249092.gpt
> > > > Loading NP_249092.gpt ...
> > > > Undefined subroutine &Bio::Root::Root::debug called at
> > > > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537,
> > > > <GEN0>
> > > > line 65.
> > > >
> > > > If I removed all args except the sequence file, it gives the same
> > > > response,
> > > > which means it happens before the connection is made to the
> database:
> > > >
> > >
> > > This happens indeed before a connection is made because it happens at
> > > the point it tries to dynamically load the BioSQL driver for the
> > > adaptor:
> > >
> > >         $self->debug("attempting to load driver for adaptor class
> > > $class\n");
> > >
> > > The BioSQL driver is loaded before the DBD driver is loaded.
> > >
> > > The module in which this happens (i.e., the persistence adaptor) has
> > > been loaded dynamically as well.
> > >
> > > Bio::Root::Root is in the 'use' statements, and the debug() method
> > > clearly exists. I'm at a loss as to why perl complains on certain
> > > Windows platforms. If somebody can tell me what, if anything, can be
> > > done to make this work on those platforms too I'll be glad to
> implement
> > > it.
> > >
> > > > [...]
> > > > Here's the error messages from that first test (warning it's very
> > > > messy):
> > > >
> > > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0,
> > > > 'bl
> > > > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t
> > > > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t
> t\08genbank.t
> > > > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t
> t\13remove.t
> > > > t\14query.t t\15cluster.t
> > > > t\01dbadaptor.....ok 1/19Subroutine new redefined at
> > > > [...]
> > > > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line
> > > > 356.
> > >
> > > So obviously it is there, right? So why doesn't perl see it a minute
> > > later?
> > >
> > > > [...]
> > > > I'll end with that.  At this moment, I can't see it working with the
> > > > current
> > > > setup.  I was using perl 5.8 with the old setup but I upgraded mysql
> > > > at some
> > > > point when working with gbrowse (I can't remember what the old
> version
> > > > was);
> > > > I'll try upgrading to the newest ActiveState version to see what
> > > > happens.
> > > > Could it be the MySQL version?
> > >
> > > I don't think it has anything to do with the MySQL version, or the DBD
> > > driver for that matter. Instead, it looks like on issue with dynamic
> > > loading of perl modules on your particular platform.
> > >
> > >         -hilmar
> > >
> > > >
> > > > Christopher Fields
> > > > Postdoctoral Researcher - Switzer Lab
> > > > Dept. of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at portal.open-bio.org
> > > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > > --
> > > -------------------------------------------------------------
> > > Hilmar Lapp                            email: lapp at gnf.org
> > > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > > -------------------------------------------------------------
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> >
> >
> > --
> > ----------------------------------------------------------
> > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> > ----------------------------------------------------------
> >
> >
> 
> 
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------



More information about the Bioperl-l mailing list