[Bioperl-l] Starting to use Bioperl

Sun May 13 22:47:42 UTC 2018

On Sun, May 13, 2018 at 12:26 AM, Gordon Haverland <
ghaverla at materialisations.com> wrote:

> On Fri, 11 May 2018 10:12:04 +0100
> Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> > This year the NCBI started offering this data in a slightly newer
> > format:
> >
> > https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/
> >
> > Most of these files are plain text tables using the rather
> > unusual field separator of "\t|\t" (tab, pipe, tab), but the
> > README files are very comprehensive.
>
> I found this, and got the tarball version.  I thought the README said
> it was \t|\n?  Doesn't matter, it's an unusual separator.
>

>From memory, yes, the record separator is tab pipe newline,
but the field separator is tab pipe tab.

> There are Perl scripts in the tarball.  I think I read there, that if
> the NCBI dump files are older than 180 days, it downloads newer
> versions?  Or maybe I was reading something else.
>
> In any event, the BioSQL site at Github doesn't see much updating.  It
> looks to me like all the activity is in biopython, so I downloaded that
> for my Devuan machine.
>

As a mature database schema, we'd not expect much change.
The only substantial change in BioSQL in recent years was
extending the schema to work on SQLite.

> > This is in Python, but my most recent occasion to process
> > this data was to make a cut-down version of the NCBI
> > taxonomy as part of constructing a small test dataset:
> >
> > https://github.com/abaizan/kodoja/blob/master/test/
> taxonomy/filter_taxonomy.py
>
> I seen this at Google, you labelled something a bug.
>

Possibly you meant this recent work - something I had been
meaning to fix, but this conversation promoted me to do it:

https://github.com/abaizan/kodoja/pull/24

> In looking for the new_taxdump thing (via Google), another Perl script
> about findingSpeciesFromGenus (or something like that) popped up.  So,
> I have a few things of source to look through.
>
> Thanks.
>
> Gord
>
>
Yes, the NCBI taxonomy has existing in this format for over
a decade I think - there should be lots of scripts out there
for use/guidance.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/bioperl-l/attachments/20180513/edaea8ec/attachment.html>