[Bioperl-l] Starting to use Bioperl

Fri May 11 00:50:55 UTC 2018

On Wed, 9 May 2018 09:54:18 -0700
Gordon Haverland <ghaverla at materialisations.com> wrote:

>           ... I am researching a deer problem.

There are BioPerl and Bio-LITE routines which can work with taxonomy
information.  Finding something which can write a SQLite3 dbase took a
little digging, but something does exist.

I've never played with BioPerl before, and I am still trying to clean
and expand my deer plant data, so I ran my latest effort with a call to
BioPerl to look up a taxonid and then a taxon.  It just happened the
first element in my list was a hybrid species (Abelia x grandiflora).
Anyway, following some BioPerl documentation I connected to -entrez
(excuse any spelling mistakes) and it came up with a hit.  A species
hit, which is what I was hoping for.

>From that returned object, I can get an ancestor object (which is a
genus), and from that I can get an ancestor object which is a family,
and from that I can get an ancestor object which is an order and then
further iterations on ancestor get non_ranked clade stuff which I am
not sure how to handle.  I haven't tried iterating to the limit, I was
hoping that at some point an attempt to return an ancestor would return
under.  But I really don't know what to do with this non_rank clade
stuff.

I suspect, I need to iterate this ancestor stuff until I get to kingdom
plantae?  This gives me a "root".  I now have a species (usually) with
N ancestors up to a common root (kingdom plantae).  That constitutes a
tree as I understand things, but it is all one sided.

If I go to the next entry in my deer resistant plants data, I may have
M ancestors up to kingdom plantae.   And do this for 1000 or so other
entries.

For each set of ancestor lookups, I need to make a tree.

All of these trees have the same root (kingdom plantae).  So I should
be able to add all these trees together.  And then I think I found the
utilities to save this mess as SQLite.

As I understand things, I probably want to be working with NCBI ID
numbers on the species entered?  And what you call annotation, I would
save in one or more separate SQLite3 dbases keyed on the NCBI ID number?

Let's assume one of the fields of annotation is the USDA growing zone.
A person thinks they want to do a query on USDA Zone 3, so the program
changes this to a query for USDA Zones 2-4, which picks off all the
NCBI ID numbers, and then a person can use BioPerl to make a picture of
all the deer resistant taxonomy known.

One of the sources of data into this, has colour of the flowers.  So
someone could conceivably be looking for pink flowered, deer resistant
plants.  That's why I suggested there might be more than 1 SQLite dbase
of annotation to go with this stuff.

I'll stop writing, and go back to reading code.  I downloaded the
Bio-LITE modules (not at Debian/Devuan), and I think there were
suggestions of other code to download.  And read.

Have a great day!
Gord