[Bioperl-l] Bio::Taxonomy confusion

Jason Stajich jason.stajich at duke.edu
Thu May 11 00:53:43 UTC 2006


I would use the implementation that talks to the flatfile db as the  
standard here.  nodes are defined by the data in from taxonomy dump  
dbs from ncbi.
the eutils is pretty worthless except for taxid->name or reverse, you  
can't get the full taxonomy (or couldn't when that implementation was  
written).

The "name" method refers to the name of the node - each level in the  
taxonomy can have a "name".

The bits of hackiness relate to wrapping the node object as a  
Bio::Species and/or being able to read  a genbank file and the  
organism taxonomy data as a list and instantiating.  If we could rely  
on everything being in a DB of course this would be simpler.

Another problem is the depth of the taxonomy is not constant for  
every node so assuming that a fixed number of slots will be filled in  
to generate the taxonomy leads to problems.

Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the  
best example of working code as this is how I really wanted it to  
work, the Bio::Species hacks are only there to shoehorn data  
retrieved from genbank files in.  With the flatfile implementation  
you have to walk all the way up the db hierarchy to get the kingdom  
for a node so you do have to build up the classification hierarchy as  
each node only stores data about itsself.

I'm not exactly sure what you are proposing to do, but would  
definitely enjoy another pair of hands, I don't really have time to  
mess with it any time soon.

-jason
On May 10, 2006, at 5:30 AM, Sendu Bala wrote:

> Hi,
> I'm a little confused as to how names are supposed to work in
> Bio::Taxonomy::Node.
>
> In the bioperl versions that I've looked at a Node doesn't seem to  
> store
> the most important information about itself - it's scientific name  
> - in
> an obvious place. bioperl 1.5.1 puts it at the start of the
> classification list. I'd have thought sticking it in -name would make
> more sense, but this is used only for the GenBank common name.
>
> The Bio::Taxonomy docs still suggests:
>
> my $node_species_sapiens = Bio::Taxonomy::Node->new(
>    -object_id => 9606, # or -ncbi_taxid. Requird tag
>    -names => {
>        'scientific' => ['sapiens'],
>        'common_name' => ['human']
>    },
>    -rank => 'species'  # Required tag
> );
>
> and whilst Bio::Taxonomy::Node does not accept -names, it does have a
> 'name' method which claims to work like:
>
> $obj->name('scientific', 'sapiens');
>
> This kind of thing would be really nice, but afaics
> Bio::Taxonomy::Node->new takes the -name value and makes a common name
> out of it, whilst the name() method passes any 'scientific' name to  
> the
> scientific_name() method which is unable to set any value (and warns
> about this), only get.
>
> It seems like the need to have this classification array work the same
> way as Bio::Species is causing some unnecessary restrictions. Can't  
> the
> more sensible idea of having a dedicated storage spot for the
> ScientificName and other parameters be used, with the classification
> array either being generated just-in-time from the hash-stored  
> data, or
> indeed being generated from the Lineage field?
>
>
> Also, why does a node store the complete hierarchy on itself in the
> classification array? If we're going that far, why don't the
> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> complete Bio::Taxonomy with all the nodes. Whilst most nodes would  
> only
> have a minimum of information, if you could simply ask a node what its
> rank and scientific name was you could easily build a classification
> array, or ask what Kingdom your species was in etc.
>
> Are there good reasons for Taxonomy working the way it does in  
> 1.5.1, or
> would I not be wasting my time re-writing things to make more sense  
> (to me)?
>
>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12





More information about the Bioperl-l mailing list