<div dir="ltr">Hi Gordon,<div><br></div><div>A couple of bits of background reading for you.</div><div><br></div><div>First, there is a database schema called BioSQL which might be</div><div>of interest in that it includes taxon tables - based primarily on the</div><div>NCBI taxonomy tree but it could be used for another taxonomy.</div><div>There is an SQLite version of this (in use by Biopython) but that</div><div>has not as far as a I know been integrated into BioPerl yet.</div><div><br></div><div><a href="http://biosql.org">http://biosql.org</a></div><div><a href="https://github.com/biosql/biosql">https://github.com/biosql/biosql</a><br></div><div><br></div><div>I think given your taxonomy focus, you can ignore BioSQL which</div><div>is more suited to working with NCBI/EMBL annotated sequences.</div><div><br></div><div>Now, I mentioned the NCBI taxonomy, which is a de facto world</div><div>standard but will not always reflect the latest expert opinion in</div><div>all branches of life. Nevertheless, I would start there.</div><div><br></div><div>You can query the NCBI taxonomy via Entrez (and by hand on</div><div>the website), see how to walk the tree, ignore the boring ranks,</div><div>until you reach the root of the tree.</div><div><br></div><div>Or, you can download the NCBI taxonomy as a set of text files,</div><div>for which you should have no trouble finding examples scripts</div><div>to load and work with:</div><div><br></div><div><a href="https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/">https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/</a><br></div><div><br></div><div>This year the NCBI started offering this data in a slightly newer</div><div>format:</div><div><br></div><div><a href="https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/">https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/</a><br></div><div><br></div><div>Most of these files are plain text tables using the rather</div><div>unusual field separator of "\t|\t" (tab, pipe, tab), but the</div><div>README files are very comprehensive.</div><div><br></div><div>This is in Python, but my most recent occasion to process</div><div>this data was to make a cut-down version of the NCBI</div><div>taxonomy as part of constructing a small test dataset:</div><div><br></div><div><a href="https://github.com/abaizan/kodoja/blob/master/test/taxonomy/filter_taxonomy.py">https://github.com/abaizan/kodoja/blob/master/test/taxonomy/filter_taxonomy.py</a><br></div><div><br></div><div>Peter</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 11, 2018 at 1:50 AM, Gordon Haverland <span dir="ltr"><<a href="mailto:ghaverla@materialisations.com" target="_blank">ghaverla@materialisations.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Wed, 9 May 2018 09:54:18 -0700<br>

Gordon Haverland <<a href="mailto:ghaverla@materialisations.com">ghaverla@materialisations.com</a><wbr>> wrote:<br>

<br>

</span>>           ... I am researching a deer problem.<br>

<br>

There are BioPerl and Bio-LITE routines which can work with taxonomy<br>

information.  Finding something which can write a SQLite3 dbase took a<br>

little digging, but something does exist.<br>

<br>

I've never played with BioPerl before, and I am still trying to clean<br>

and expand my deer plant data, so I ran my latest effort with a call to<br>

BioPerl to look up a taxonid and then a taxon.  It just happened the<br>

first element in my list was a hybrid species (Abelia x grandiflora).<br>

Anyway, following some BioPerl documentation I connected to -entrez<br>

(excuse any spelling mistakes) and it came up with a hit.  A species<br>

hit, which is what I was hoping for.<br>

<br>

>From that returned object, I can get an ancestor object (which is a<br>

genus), and from that I can get an ancestor object which is a family,<br>

and from that I can get an ancestor object which is an order and then<br>

further iterations on ancestor get non_ranked clade stuff which I am<br>

not sure how to handle.  I haven't tried iterating to the limit, I was<br>

hoping that at some point an attempt to return an ancestor would return<br>

under.  But I really don't know what to do with this non_rank clade<br>

stuff.<br>

<br>

I suspect, I need to iterate this ancestor stuff until I get to kingdom<br>

plantae?  This gives me a "root".  I now have a species (usually) with<br>

N ancestors up to a common root (kingdom plantae).  That constitutes a<br>

tree as I understand things, but it is all one sided.<br>

<br>

If I go to the next entry in my deer resistant plants data, I may have<br>

M ancestors up to kingdom plantae.   And do this for 1000 or so other<br>

entries.<br>

<br>

For each set of ancestor lookups, I need to make a tree.<br>

<br>

All of these trees have the same root (kingdom plantae).  So I should<br>

be able to add all these trees together.  And then I think I found the<br>

utilities to save this mess as SQLite.<br>

<br>

As I understand things, I probably want to be working with NCBI ID<br>

numbers on the species entered?  And what you call annotation, I would<br>

save in one or more separate SQLite3 dbases keyed on the NCBI ID number?<br>

<br>

Let's assume one of the fields of annotation is the USDA growing zone.<br>

A person thinks they want to do a query on USDA Zone 3, so the program<br>

changes this to a query for USDA Zones 2-4, which picks off all the<br>

NCBI ID numbers, and then a person can use BioPerl to make a picture of<br>

all the deer resistant taxonomy known.<br>

<br>

One of the sources of data into this, has colour of the flowers.  So<br>

someone could conceivably be looking for pink flowered, deer resistant<br>

plants.  That's why I suggested there might be more than 1 SQLite dbase<br>

of annotation to go with this stuff.<br>

<br>

I'll stop writing, and go back to reading code.  I downloaded the<br>

Bio-LITE modules (not at Debian/Devuan), and I think there were<br>

suggestions of other code to download.  And read.<br>

<div class="HOEnZb"><div class="h5"><br>

Have a great day!<br>

Gord<br>

<br>

<br>

______________________________<wbr>_________________<br>

Bioperl-l mailing list<br>

<a href="mailto:Bioperl-l@mailman.open-bio.org">Bioperl-l@mailman.open-bio.org</a><br>

<a href="http://mailman.open-bio.org/mailman/listinfo/bioperl-l" rel="noreferrer" target="_blank">http://mailman.open-bio.org/<wbr>mailman/listinfo/bioperl-l</a><br>

</div></div></blockquote></div><br></div>