[Bioperl-l] Bio::Taxonomy confusion
Jason Stajich
jason.stajich at duke.edu
Thu May 11 00:53:43 UTC 2006
I would use the implementation that talks to the flatfile db as the
standard here. nodes are defined by the data in from taxonomy dump
dbs from ncbi.
the eutils is pretty worthless except for taxid->name or reverse, you
can't get the full taxonomy (or couldn't when that implementation was
written).
The "name" method refers to the name of the node - each level in the
taxonomy can have a "name".
The bits of hackiness relate to wrapping the node object as a
Bio::Species and/or being able to read a genbank file and the
organism taxonomy data as a list and instantiating. If we could rely
on everything being in a DB of course this would be simpler.
Another problem is the depth of the taxonomy is not constant for
every node so assuming that a fixed number of slots will be filled in
to generate the taxonomy leads to problems.
Use the flatfile implementation (Bio::DB::Taxonomy::flatfile) as the
best example of working code as this is how I really wanted it to
work, the Bio::Species hacks are only there to shoehorn data
retrieved from genbank files in. With the flatfile implementation
you have to walk all the way up the db hierarchy to get the kingdom
for a node so you do have to build up the classification hierarchy as
each node only stores data about itsself.
I'm not exactly sure what you are proposing to do, but would
definitely enjoy another pair of hands, I don't really have time to
mess with it any time soon.
-jason
On May 10, 2006, at 5:30 AM, Sendu Bala wrote:
> Hi,
> I'm a little confused as to how names are supposed to work in
> Bio::Taxonomy::Node.
>
> In the bioperl versions that I've looked at a Node doesn't seem to
> store
> the most important information about itself - it's scientific name
> - in
> an obvious place. bioperl 1.5.1 puts it at the start of the
> classification list. I'd have thought sticking it in -name would make
> more sense, but this is used only for the GenBank common name.
>
> The Bio::Taxonomy docs still suggests:
>
> my $node_species_sapiens = Bio::Taxonomy::Node->new(
> -object_id => 9606, # or -ncbi_taxid. Requird tag
> -names => {
> 'scientific' => ['sapiens'],
> 'common_name' => ['human']
> },
> -rank => 'species' # Required tag
> );
>
> and whilst Bio::Taxonomy::Node does not accept -names, it does have a
> 'name' method which claims to work like:
>
> $obj->name('scientific', 'sapiens');
>
> This kind of thing would be really nice, but afaics
> Bio::Taxonomy::Node->new takes the -name value and makes a common name
> out of it, whilst the name() method passes any 'scientific' name to
> the
> scientific_name() method which is unable to set any value (and warns
> about this), only get.
>
> It seems like the need to have this classification array work the same
> way as Bio::Species is causing some unnecessary restrictions. Can't
> the
> more sensible idea of having a dedicated storage spot for the
> ScientificName and other parameters be used, with the classification
> array either being generated just-in-time from the hash-stored
> data, or
> indeed being generated from the Lineage field?
>
>
> Also, why does a node store the complete hierarchy on itself in the
> classification array? If we're going that far, why don't the
> Bio::DB::Taxonomy modules like Bio::DB::Taxonomy::entrez just have a
> get_taxonomy() method instead of a get_Taxonomy_Node() method.
> get_taxonomy() could, from a single efetch.fcgi lookup, create a
> complete Bio::Taxonomy with all the nodes. Whilst most nodes would
> only
> have a minimum of information, if you could simply ask a node what its
> rank and scientific name was you could easily build a classification
> array, or ask what Kingdom your species was in etc.
>
> Are there good reasons for Taxonomy working the way it does in
> 1.5.1, or
> would I not be wasting my time re-writing things to make more sense
> (to me)?
>
>
> Cheers,
> Sendu.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
More information about the Bioperl-l
mailing list