[Bioperl-l] taxonomy and speices
Jason Stajich
jason at cgt.duhs.duke.edu
Thu Aug 28 12:45:10 EDT 2003
Glad you are taking this on. I think you can think about just dropping
Bio::Taxonomy::Tree and Bio::Taxonomy::Taxon if they don't provide
anything useful any more. Dan has moved on AFAIK and I couldn't make them
work in a way that I think was useful, so I wrote Bio::Taxonomy::Node to
be an entity in the entire taxonomy with ability to move up or down
classification levels.
Here is how I envisioned this working.
Bio::Species (or its successor) will be the collection of all the
information above a node in the Taxonomy hierarchy. So as you have
described we can talk about sub species, etc. This is what I think Dan
had in mind with Bio::Taxonomy::Taxon, meaning the tip nodes in the
Hierarchy.
I think it is fine to fix/replace any code in Bio::Taxonomy as I don't
think it has been used anywhere yet.
If you can think about how to keep using the Factory object for access to
the taxonomy so that underneath the factory can be locally indexed taxdmp
from NCBI as I have implemented, the limited HTTP access that NCBI
provides to this data, or a BioSQL system. I would try and use the
taxon tables that Aaron setup in BioSQL - it was our intention all along
to provide this Factory with access to biosql, we just have not had time
to get it working.
Once this is integrated in, we can start to rely on using taxonid numbers
for lookups and query constraints more easily which I think will be a
big plus.
-jason
On Thu, 28 Aug 2003, Juguang Xiao wrote:
> Hi guys,
>
> I tried to write a simple bioperl-db scripts functioning like the
> search on http://www.ncbi.nih.gov/Taxonomy/taxonomyhome.html/ , to
> return a full taxonomy path, and all sub taxonomy nodes. Say, If I
> search 'mouse', it will return the full path as
>
> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus; mouse
>
> And all sub taxonomy nodes will be also returned, like 'asian house
> mouse', 'european house mouse', etc.
>
> However, the Guru Hilmar told me that current bioperl-db works on
> Bio::Species, but not Bio::Taxonomy, and now bioperl-db cannot satisfy
> my above requirement until the code will adapt Taxomony after Taxonomy
> replaces Species. Hence I investigate the species-related modules,
> found some puzzles and would like to volunteer the idea and the code.
>
> Bio::Taxonomy is written by Dan Kortschak, and the main and only
> functional method (rather than get/set, I mean), 'classify', is to
> convert a Species object into an array of names. It wastes such nice
> module name ;-)
>
> Jason wrote Bio::Taxonomy::Node, and Bio::DB::Taxonomy which access
> NCBI Entrez over HTTP OR read the NCBI Tax dump files.
> Bio::Taxonomy::Node is tied to Bio::DB::Taxonomy closely, hence it
> objects to be adapted in bioperl-db system so easily.
>
> My plan to reform them is described below.
>
> DATA STRUCTURE
> Taxonomy should be abstracted as a hash with the keys as rank names,
> such as 'class', 'genus', and values as the identifiers, such as NCBI
> taxid, scientific name or Taxonomy::Node object.
>
> $taxonomy = {
> '_rank' => ['root', 'superkingdom', ..., 'species', 'subspecies'...,
> 'no rank'], # copied from the current Taxonomy module.
>
> '_hierarchy' => { # Though the keys are unordered in this hash, its
> order is defined in rank.
> ...
> 'class' => 40674, # or mammlia, or the Taxonomy::Node
> 'genus' => 'Mus',
> 'species' => $tax_Node_musculus
> ....
> },
> '_factory' => $factory, # explained later.
> };
>
> NOTE: the new taxonomy can represent more than species level, e.g. it
> is flexible to represent a object at genus level without species.
>
> $taxNode_mammalia = {
> 'object_id' => 40674, # NCBI taxid, and the reason why it is called
> 'object_id' for the consistence to Bio:;IdentifiableI
> 'rank' => 'class',
> 'name' => 'Mammalia', # scientific name
> 'common_name' => 'mammals', # Genbank common name, as NCBI site uses
> the term.
> 'alias' => { # a hash with name_class as key and variant name as value
> '' => ''
> },
> '_factory' => $factory
> };
>
>
> $taxNode_mouse = {
> 'object_id' => 10090,
> 'rank' => 'species',
> 'names' => { # This is a general solution!!
> 'specific' => ['musclus'],
> 'common' => ['mouse', 'Mickey'],
> 'includes' => ['nude mice']
> }
> };
>
> OBJECTS
>
> Bio::Taxonomy will override all methods in Bio::Species, for the sake
> of backwards compatibility. If the tax object represents a level higher
> than species, the sub 'binomial' returns undef, otherwise simple make
> the result by combining the species and genus; the sub 'classification'
> will look like "
>
> foreach(@ranks){
> unshift @classification, $taxonomy{$_} if defined exists $taxonomy{$_}
> }
>
>
> Bio::Taxonomy::Node has NO reference to either the parent node or
> taxonomy object, so that Node objects can be freely shared among
> Taxonomy. Tricky: once a Node object is created, it should be changed
> on its content. If a Taxonomy requires one of its Nodes modified, it
> has to make a new Node, in case that Node was shared by other Taxonomy.
>
> Definitely, we need a Taxonomy factory, like Jason's Bio::DB::Taxonomy
> or what we are going to create in bioperl-db. Both Taxonomy objects and
> Node ones have a reference to this factory, so that Taxonomy can be
> created automatically, and Node can ask who his parent is,
> ($node->get_parent_node, e. g.
> $node->_factory->find_parent_node($node)).
>
> Comments, please, and I will transform the idea into the code.
>
>
> Thanks.
>
> Juguang
>
>
>
> ------------ATGCCGAGCTTNNNNCT--------------
> Juguang Xiao
> Bioinformatics Engineer
> Temasek Life Sciences Laboratory, National University of Singapore
> 1 Research Link, Singapore 117604
> fax: (+65) 68727007
>
> juguang at tll.org.sg
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu
More information about the Bioperl-l
mailing list