[Bioperl-l] error running load_seqdatabase.pl

Hilmar Lapp hlapp at gmx.net
Thu Jan 12 23:28:14 EST 2006


On 1/12/06, Chris Fields <cjfields at uiuc.edu> wrote:
> Looks like the below modification Baohua Wang made to Root.pm works.  I did
> run into another weird issue, but I think it is a sequence formatting
> problem.  I try loading in a file with protein sequences in GenPept format
> (pulled from BLASTP output using Bio::DB::GenPept and saved in a file using
> SeqIO) after changing Root.pm:
> ______________________________________________________________________
>
> C:\Perl\Scripts>load_seqdatabase.pl -dbname biosql -dbuser root -dbpass
> ****** -format genbank -safe NP_252217.gpt
> Loading NP_252217.gpt ...
>
> C:\Perl\Scripts>
> ______________________________________________________________________
>
> Good!

Great! So we'll have to test that the effect of adding that comma
isn't negative on Unix platforms but I suspect it's in fact required
by syntax and maybe on Windows perl is less lenient? Odd at any rate.

>
> The strangeness comes in when using Genpept seqs NOT passed through SeqIO
> (pulled directly from NCBI, saved in a similar file).  Most sequences will
> load, but a number of them will not:
>
> ______________________________________________________________________
> C:\Perl\Scripts>load_seqdatabase.pl -dbname biosql -dbuser root -dbpass
> ****** -format genbank -safe NP_249092.gpt
> Loading NP_249092.gpt ...
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values were
> ("","HAMAPMF_00220","0") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------
> Could not store Q59712:

Are you sure you pulled this from NCBI using NP_249092 as the
accession? I'm asking because NP_249092 is a perfectly sane looking
RefSeq record and in fact does not contain the string HAMAPMF, whereas
Q59712 in reality is a Uniprot record moulded into GenPept format;
some of the db_xrefs come out odd and in fact for the one above
(HAMAPMF_00220) there is no dbname, most likely because dbname and
accession are concatenated like for the following InterPro db_xref.

So I don't think this is worrisome unless you insist you used the
NP_249092 entry ...

I would generally advise against taking Uniprot/Swissprot entries from
their GenPept reincarnation. The formats are incompatible in some
aspects (e.g., Swissprot, like EMBL, has first-level db_xrefs, whereas
GenBank format doesn't; instead it puts db_xrefs into the feature
table).

> [...]
> at C:\Perl\Scripts\load_seqdatabase.pl line 633
> Could not store AAU82296:
> ------------- EXCEPTION  -------------
> MSG: create: object (Bio::Species) failed to insert or to be found by unique
> key

"uncultured archaeon GZfos13E1" is not something Bioperl will parse
correctly into the appropriate Bio::Species structure (not that I
would even know what that would have to look like ;).

However, if you preload your Biosql instance with the NCBI taxonomy
database then this is not a problem because the species will be looked
up correctly by its NCBI taxon ID (which the genbank SeqIO parser
extracts from the feature table if it's there - and it is in this
case).

> [...]
> I'll check them out to try and derive what the differences are.  I will also
> pass the above file through SeqIO to see what happens.

Note that everything you pull down through Bio::DB::GenPept does get
parsed by Bio::SeqIO::genbank - if there is any difference it must be
because the input files aren't identical.

> I think it could be some of the GenPept formatted stuff is clogging up the works since I saved
> everything in Genbank format through SeqIO.

Ah - meaning you got the file by calling $seqio->write_seq($seq) ?
That could cause it's own problems (even though theoretically it
shouldn't and therefore if it does it counts as a bug).

>  For now, though, bioperl-db on
> Windows works!  Any idea why the 'throw' change works?

No, no idea - but great that you found out.

   -hilmar

>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> -----Original Message-----
> From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar Lapp
> Sent: Wednesday, January 11, 2006 5:13 PM
> To: Chris Fields; Steve Chervitz
> Cc: bioperl-l at portal.open-bio.org
> Subject: Re: [Bioperl-l] error running load_seqdatabase.pl
>
> Interesting. That posting didn't receive much attention did it. So he
> states:
>
> <quote>
> The script failed on throw() in loading Bio/Root/Root.pm on Windows.
> The problem lines are those "throw $class (...".   After I put comma
> after $class as "throw $class, (...", the BioSQL tests and load scripts
> are succeeded
> </quote>
>
> Can anyone of those who wrote the Root exception and warning code
> comment? Maybe Steve?
>
>    -hilmar
>
> On 1/11/06, Chris Fields <cjfields at uiuc.edu> wrote:
> > Hilmar,
> >
> > As an update on what's going on:
> >
> > I've run into a few problems with load_seqdatabase.pl and bioperl-db on
> > cygwin which I'll try to hash through this week; I'll post if I can't
> figure
> > it out soon.  It's not as buggy as trying to run it using the latest
> > ActivePerl on WinXP, but it still has issues.
> >
> > I'm also looking through the ActiveState documentation for the latest
> > version of perl they have (5.8.7), which I am running.  AFAIK, they enable
> > dynamic loading when building.  I'll send them an email directly to see
> what
> > they say.  There may be some Win32-specific way of configuring a script
> for
> > dynamic loading of perl modules which isn't needed in other environments.
> >
> > There was also this previous email on bioperl-l:
> >
> > http://portal.open-bio.org/pipermail/bioperl-l/2005-May/018937.html
> >
> > Baohua Wang seemed to narrow it down somewhat, but I'm not sure if
> changing
> > the modules is a solution until I figure out why he made the changes.
> They
> > seem mainly geared towards getting load_seqdatabase to work with MsSQL,
> but
> > if he got it to work on Windows, then he may be onto something.  The
> > modified Bio* modules can be found at:
> >
> > ftp://ftp.tc.cornell.edu/Outgoing/bwang/BioSQL-On-Windows
> >
> > I'll check them out to see if they work out and see what specific
> > modifications he made (they're not detailed).
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > -----Original Message-----
> > From: bioperl-l-bounces at portal.open-bio.org
> > [mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Chris Fields
> > Sent: Friday, January 06, 2006 1:28 PM
> > To: 'Hilmar Lapp'
> > Cc: bioperl-l at portal.open-bio.org
> > Subject: RE: [Bioperl-l] error running load_seqdatabase.pl
> >
> > I'll try installing bioperl-db using Cygwin.  I know that I can connect to
> > the native Windows mysql database from inside cygwin, so perhaps this will
> > do as a short term workaround.  I'll also try using a different native
> win32
> > Perl version (maybe 5.6) and look into the dynamic loading issue.  I know
> > that the AS Perl has given errors like this before and not had problems (I
> > think it was also cranky with older versions bioperl), but this one is
> > pretty serious.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > -----Original Message-----
> > From: Hilmar Lapp [mailto:hlapp at gmx.net]
> > Sent: Friday, January 06, 2006 12:02 PM
> > To: Chris Fields
> > Cc: bioperl-l at portal.open-bio.org
> > Subject: Re: [Bioperl-l] error running load_seqdatabase.pl
> >
> >
> > On Jan 6, 2006, at 9:20 AM, Chris Fields wrote:
> >
> > > Hilmar,
> > >
> > > Did this ever get resolved?  I tried to reinstall a biosql database
> > > using
> > > bioperl-db and got the same problems.  I'll list out everything I ran
> > > into
> > > and what I pan on trying, as it's been a long time since I've tried
> > > this.
> > >
> > > Currently, I'm using ActiveState Perl 5.8.7.813 on WinXP and MySQL
> > > 4.1.14.
> > > Using nmake and installing worked fine.  Loading the biosql schema and
> > > loading taxonomy info also worked fine, although I had to manually
> > > untar the
> > > taxonomy archive so load_ncbi_taxonomy.pl could find the files (stupid
> > > windows).  However, this is what happens when using
> > > load_seqdatabase.pl:
> > >
> > > C:\Perl\Scripts>load_seqdatabase.pl -dbname dihydroorotase -dbuser root
> > > NP_249092.gpt
> > > Loading NP_249092.gpt ...
> > > Undefined subroutine &Bio::Root::Root::debug called at
> > > C:/Perl/site/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm line 1537,
> > > <GEN0>
> > > line 65.
> > >
> > > If I removed all args except the sequence file, it gives the same
> > > response,
> > > which means it happens before the connection is made to the database:
> > >
> >
> > This happens indeed before a connection is made because it happens at
> > the point it tries to dynamically load the BioSQL driver for the
> > adaptor:
> >
> >         $self->debug("attempting to load driver for adaptor class
> > $class\n");
> >
> > The BioSQL driver is loaded before the DBD driver is loaded.
> >
> > The module in which this happens (i.e., the persistence adaptor) has
> > been loaded dynamically as well.
> >
> > Bio::Root::Root is in the 'use' statements, and the debug() method
> > clearly exists. I'm at a loss as to why perl complains on certain
> > Windows platforms. If somebody can tell me what, if anything, can be
> > done to make this work on those platforms too I'll be glad to implement
> > it.
> >
> > > [...]
> > > Here's the error messages from that first test (warning it's very
> > > messy):
> > >
> > > C:\Perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0,
> > > 'bl
> > > ib\lib', 'blib\arch')" t\01dbadaptor.t t\02species.t t\03simpleseq.t
> > > t\04swiss.t t\05seqfeature.t t\06comment.t t\07dblink.t t\08genbank.t
> > > t\09fuzzy2.t t\10ensembl.t t\11locuslink.t t\12ontology.t t\13remove.t
> > > t\14query.t t\15cluster.t
> > > t\01dbadaptor.....ok 1/19Subroutine new redefined at
> > > [...]
> > > Subroutine debug redefined at C:/Perl/site/lib/Bio\Root\Root.pm line
> > > 356.
> >
> > So obviously it is there, right? So why doesn't perl see it a minute
> > later?
> >
> > > [...]
> > > I'll end with that.  At this moment, I can't see it working with the
> > > current
> > > setup.  I was using perl 5.8 with the old setup but I upgraded mysql
> > > at some
> > > point when working with gbrowse (I can't remember what the old version
> > > was);
> > > I'll try upgrading to the newest ActiveState version to see what
> > > happens.
> > > Could it be the MySQL version?
> >
> > I don't think it has anything to do with the MySQL version, or the DBD
> > driver for that matter. Instead, it looks like on issue with dynamic
> > loading of perl modules on your particular platform.
> >
> >         -hilmar
> >
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
>
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



More information about the Bioperl-l mailing list