[Bioperl-l] Loading taxonomy data into BioSQL

Hilmar Lapp hlapp at gmx.net
Fri Mar 18 23:40:33 EST 2005


On Friday, March 18, 2005, at 06:25  AM, SG Edwards wrote:

> Thanks Hilmar,
>
> Yeah I am using Postgres, should I take it out of auto-commit mode? I  
> thought
> the script deals with this but maybe not?

Why would you ever want to run a database in auto-commit mode unless  
that's the only option you have like with mysql?

If you run this in auto-commit mode the users will see a totally  
inconsistent state for possibly more than half an hour. The script goes  
to great lengths not to leave the transction unless it really doesn't  
know any better.

>
> If I run it with:
>
> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205  
> -dbpass
>  password -directory /home/s0460205/
>
> I get the error message:
>
> loading NCBI taxon database in /home/s0460205:
> ... retrieving all taxon nodes in the database
>     ... reading in taxon nodes from nodes.dmp
>     ... insert/update/delete taxon nodes
> failed to insert node (1;1;1;no rank;1;0): ERROR: column "taxon_id" is  
> of type
> integer but expression is of type character varying
> HINT: You will need to rewrite or cast the expression

OK this is the piece that reveals it. It's a bug in DBD::Pg 1.40  
against 8.0x PostgreSQL servers.

Check here for the thread, a fix is in preparation but apparently  
doesn't fully catch  it yet.

http://gborg.postgresql.org/pipermail/dbdpg-general/2005-March/ 
001514.html

Maybe 1.41 is out already? Or you can downgrade Pg to 7.4.x? Or wait  
until the DBD::Pg people fixed it?

In any event, beyond our control.

	-hilmar

>
>
> Quoting Hilmar Lapp <hlapp at gmx.net>:
>
>> Why do you believe the script thinks that taxon_id is a varchar? It
>> doesn't AFAIK.
>>
>> Also, not sure why your Pg (you are using PostgreSQL, right?) is in
>> auto-commit mode. That doesn't sound right.
>>
>> 	-hilmar
>>
>> On Friday, March 18, 2005, at 06:05  AM, SG Edwards wrote:
>>
>>> I find that if I manually gunzip and tar the download from ncbi then
>>> the script
>>> finds the file nodes.dmp (N.B not sure if this is a fault with
>>> load_ncbi_taxonomy.pl or something with my system?!)
>>>
>>> The script then tries to load the data into the taxon table but the
>>> column
>>> "taxon_id" type is INTEGER but the script thinks it is varchar. So
>>> either need
>>> to change the database column to varchar or change the perl script to
>>> INTEGER.
>>>
>>> Has anyone had this problem?!
>>>
>>>
>>> Quoting s0460205 at sms.ed.ac.uk:
>>>
>>>> I have been trying:
>>>>
>>>> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205
>>>> -dbpass
>>>> password -download
>>>>
>>>> and this gave me the error message below.
>>>> If I download the ncbi_taxonomy data manually it and direct the perl
>>>> script
>>>> to
>>>> this using:
>>>>
>>>> perl load_ncbi_taxonomy.pl -dbname milk -driver Pg -dbuser s0460205
>>>> -dbpass
>>>> password -directory /home/s0460205/
>>>>
>>>> This seems to get a bit further but still results in error,
>>>>
>>>> "loading NCBI taxon database in /home/s0460205:
>>>>    ... retrieving all taxon nodes in the database
>>>>    ... reading in taxon nodes from nodes.dmp
>>>> Couldn't open data file taxdata/nodes.dmp: No such file or directory
>>>> rollback ineffective with AutoCommit enabled at  
>>>> load_ncbi_taxonomy.pl
>>>> line
>>>> 818.
>>>> Use of uninitialized value in concatenation (.) or string at
>>>> load_ncbi_taxonomy.pl line 820.
>>>> rollback failed
>>>>
>>>> It seems to be choking on finding the nodes.dmp but I'm not sure  
>>>> why?!
>>>>
>>>>
>>>> Quoting Brian Osborne <brian_osborne at cognia.com>:
>>>>
>>>>> SG,
>>>>>
>>>>> =head1 DESCRIPTION
>>>>>
>>>>> This script loads or updates a biosql schema with the NCBI Taxon
>>>>> Database. There are a number of options to do with where the biosql
>>>>> database is (i.e., database name, hostname, user for database,
>>>>> password, database name).
>>>>>
>>>>> This script may download the NCBI Taxon Database from the NCBI FTP
>>>>> server on-the-fly (ftp://ftp.ncbi.nih.gov/pub/taxonomy/). Otherwise
>>>>> it
>>>>> expects the files to be downloaded already.
>>>>>
>>>>>
>>>>>
>>>>> Brian O.
>>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at portal.open-bio.org
>>>>> [mailto:bioperl-l-bounces at portal.open-bio.org]On Behalf Of SG  
>>>>> Edwards
>>>>> Sent: Friday, March 18, 2005 6:45 AM
>>>>> To: bioperl-l at portal.open-bio.org
>>>>> Subject: [Bioperl-l] Loading taxonomy data into BioSQL
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Can you please help me with an error message? I have just  
>>>>> installed a
>>>> BioSQL
>>>>> database and am trying to run the load_ncbi_taxonomy.pl script to  
>>>>> get
>>>>> taxonomy
>>>>> data into my database before I start to load sequences in. The
>>>>> database has
>>>>> been created and is empty, however, I get the following error
>>>>> message:
>>>>>
>>>>>
>>>>> Cannot open Local file taxdata/taxdump.tar.gz: No such file or
>>>>> directory at
>>>>> load_ncbi_taxonom.pl line 628
>>>>> gunzip: taxdata/taxdump.tar.gz: No such file or directory
>>>>> sh: line 1: cd: taxdata: No such file or directory
>>>>> tar: taxdump.tar: cannot open: No such file or directory
>>>>> tar: error is not recoverable: exiting now
>>>>> loading NCBI taxon database in taxdata:
>>>>>        ... retrieving all taxon nodes in the database
>>>>>        ... reading in taxon nodes from nodes.dmp
>>>>> Couldn't open data file taxdata/nodes.dmp: No such file or  
>>>>> directory
>>>>> rollback ineffective with AutoCommit enabled at
>>>>> load_ncbi_taxonomy.pl line
>>>>> 818.
>>>>> Use of uninitialized value in concatenation (.) or string at
>>>>> load_ncbi_taxonomy.pl line 820.
>>>>> rollback failed
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at portal.open-bio.org
>>>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at portal.open-bio.org
>>> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>> --
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




More information about the Bioperl-l mailing list