[Bioperl-l] Starting to use Bioperl
Gordon Haverland
ghaverla at materialisations.com
Sat May 12 23:26:19 UTC 2018
On Fri, 11 May 2018 10:12:04 +0100
Peter Cock <p.j.a.cock at googlemail.com> wrote:
> This year the NCBI started offering this data in a slightly newer
> format:
>
> https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/
>
> Most of these files are plain text tables using the rather
> unusual field separator of "\t|\t" (tab, pipe, tab), but the
> README files are very comprehensive.
I found this, and got the tarball version. I thought the README said
it was \t|\n? Doesn't matter, it's an unusual separator.
There are Perl scripts in the tarball. I think I read there, that if
the NCBI dump files are older than 180 days, it downloads newer
versions? Or maybe I was reading something else.
In any event, the BioSQL site at Github doesn't see much updating. It
looks to me like all the activity is in biopython, so I downloaded that
for my Devuan machine.
> This is in Python, but my most recent occasion to process
> this data was to make a cut-down version of the NCBI
> taxonomy as part of constructing a small test dataset:
>
> https://github.com/abaizan/kodoja/blob/master/test/taxonomy/filter_taxonomy.py
I seen this at Google, you labelled something a bug.
In looking for the new_taxdump thing (via Google), another Perl script
about findingSpeciesFromGenus (or something like that) popped up. So,
I have a few things of source to look through.
Thanks.
Gord
More information about the Bioperl-l
mailing list