[Bioperl-l] Starting to use Bioperl

Thu May 10 00:02:40 UTC 2018

On Wed, 9 May 2018 09:54:18 -0700
Gordon Haverland <ghaverla at materialisations.com> wrote:

> I believe I mentioned in my note about that Native-Plants dbase, that
> I came into this because I am researching a deer problem.
> 
> I have about 1000 entries, ....

I start writing this note, and now I see a reply from Henry Liu.  Thank
you Henry.

Yes, I agree that having 17k lines of array of hash in a Perl source
file is not a good plan.  :-)  I agree that SQLite3 is likely a good
option.  I was hoping that BioPerl might be a way to move this in that
direction.

To install bioperl for Debian/Devuan is 135 MB of compressed packages,
installed as 750 MB.

I have some biology knowledge, but it is more like biophysics, medical
physics, health physics and biochemistry.  And a bunch of Perl.  But at
heart, I am a materials science and engineering person who is good at
numerical methods.

To go from lists of common names and usually binomial taxonomic names,
sometimes with other information, to something technically correct has
been fun.  Mistakes in common names, mistakes in Genus, mistakes in
species and sometimes just dumb mistakes (sorry, deer really like
eating aspen and willow).  I've been using Wikipedia to flesh out more
information, but wikipedia is not that reliable.

As I understand things, it is possible to have the same binomial name
(Genus species) in different order, family, or tribe.  But in reading
at Wikipedia or other places tracking down some points (like what is
the toxic component that is in the precursor to carrots), it is
apparent that there are many conventions in describing things.  Which
is not like the periodic table and chart of the nuclides I am familiar
with.

In Bio::DB::Taxonomy, I can search for a binomial name, and it will give
me a list of hits, which could be empty.  If I get multiple hits, a
source of this is that there are synonyms for the binomial name.
Another source could be that there is a similarly named Genus species
in some other division/order/family/tribe.

There are two different ways to add metadata.  One kind of metadata is
related to genetics and is capable of having a "location or position"
in the genome of that entity.  The other is data that is without
position.  Which is where most (all?) information related to why deer
might not want to eat this stuff would be held.

Some of the various plant entities I have run across information on are
well defined.  Other entities have genetics such that there is
significant variance in plant properties grown in slightly different
microclimate, let alone different soils, growth zones or whatever.  I
guess this is normal to biology, it is strange to someone who is more
of a physicist.

I think that at some point I am going to need position dependent
metadata.  Mostly in terms of toxicity.  I'll take spinach and rhubarb
for examples.  Both plants contain a fair amount of oxalic
acid/oxalates.  One has edible leaves, and the other doesn't.  It is
apparently possible to make a DIY insecticide from rhubarb leaves.
Oxalic acid concentrations in the leaves don't seem to explain the
ability to make an insecticide from the leaves, there must be other
toxic components that are involved here.

A common toxin "modality" in plants is something attached to a sugar.
Cyanogenic glycosides is an example.  Do I want to try and relate this
to RNA (as they involve sugars from the work I've done with
pharmaceutical chemists), to DNA or proteins (those seem to be the
three metrics common to many of these databases)?  Something else?

I will grind on this a while.  Having autism makes me want to classify
things.

Have a great day!
Gord