[BioSQL-l] load_ncbi_taxonomy.pl

Peter biopython at maubp.freeserve.co.uk
Fri Aug 1 20:29:23 UTC 2008


On Fri, Aug 1, 2008 at 7:18 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
>
> On Aug 1, 2008, at 1:35 PM, Peter wrote:
>
>>> So long story short it [load_ncbi_taxonomy.pl] should be fixed now. There
>>> may be some remnant bugs so any testing would be much appreciated.
>>> The changes are committed to svn, but may need a bit more time to
>>> percolate to the anonymous svn server.
>>
>> I won't be able to make time to try this until next week at the
>> earliest (i.e. after your planned release), but when I get back to
>> using Biopython with BioSQL again in earnest I will check this out.
>
> By testing I meant primarily if people use other platforms that I do
> (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can give this
> a whirl as in, load the NCBI taxonomy into a scratch database (using the
> script), then load it again (simulating an update), and see whether there
> are any error or warning messages that'd be great.

OK, as a very cursory check I did a quick test on a Linux machine
using MySQL.  I just grabbed the latest script via the SVN webpage,
then using an existing (partly populated) database:

$ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
--dbuser root --download true
Downloading NCBI taxon database to taxdata
Unable to close datastream at ./load_ncbi_taxonomy.pl line 726

This may be a network issue... the taxdata/taxdump.tar.gz file had
downloaded OK, so I manually unzipped it, and then:

$ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
--dbuser root Loading NCBI taxon database in taxdata:
        ... retrieving all taxon nodes in the database
        ... reading in taxon nodes from nodes.dmp
        ... insert / update / delete taxon nodes
        ... updating new parent IDs
        ... (committing nodes)
        ... rebuilding nested set left/right values
        ... reading in taxon names from names.dmp
        ... deleting old taxon names
        ... inserting new taxon names
        ... cleaning up
Done.

So no further error messages - however, I have not actually checked to
see what exactly this did to my database ;)

Peter



More information about the BioSQL-l mailing list