[BioSQL-l] load_ncbi_taxonomy.pl
Peter
biopython at maubp.freeserve.co.uk
Fri Aug 1 20:58:14 UTC 2008
>> By testing I meant primarily if people use other platforms that I do
>> (PostgreSQL on MacOSX), such as MySQL or Oracle on Linux, and can give this
>> a whirl as in, load the NCBI taxonomy into a scratch database (using the
>> script), then load it again (simulating an update), and see whether there
>> are any error or warning messages that'd be great.
>
> OK, as a very cursory check I did a quick test on a Linux machine
> using MySQL. I just grabbed the latest script via the SVN webpage,
> then using an existing (partly populated) database:
>
> $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
> --dbuser root --download true
> Downloading NCBI taxon database to taxdata
> Unable to close datastream at ./load_ncbi_taxonomy.pl line 726
>
> This may be a network issue... the taxdata/taxdump.tar.gz file had
> downloaded OK, so I manually unzipped it, and then:
>
> $ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
> --dbuser root Loading NCBI taxon database in taxdata:
> ... retrieving all taxon nodes in the database
> ... reading in taxon nodes from nodes.dmp
> ... insert / update / delete taxon nodes
> ... updating new parent IDs
> ... (committing nodes)
> ... rebuilding nested set left/right values
> ... reading in taxon names from names.dmp
> ... deleting old taxon names
> ... inserting new taxon names
> ... cleaning up
> Done.
>
> So no further error messages - however, I have not actually checked to
> see what exactly this did to my database ;)
I then simulated an update by deleting the downloaded taxdata, and
rerunning the script:
$ perl ./load_ncbi_taxonomy.pl --dbname bioseqdb --driver mysql
--dbuser root --download true
Downloading NCBI taxon database to taxdata
Unable to close datastream at ./load_ncbi_taxonomy.pl line 726
Loading NCBI taxon database in taxdata:
... retrieving all taxon nodes in the database
... reading in taxon nodes from nodes.dmp
... insert / update / delete taxon nodes
... updating new parent IDs
... (committing nodes)
... rebuilding nested set left/right values
... reading in taxon names from names.dmp
... deleting old taxon names
... inserting new taxon names
... cleaning up
Done.
[Note that after the "unable to close" message I just left the script
running this time, and it continued fine]
Again, I haven't checked the database.
Peter
More information about the BioSQL-l
mailing list