[Bioperl-l] taxonomic information handler

Jason Stajich jason@cgt.mc.duke.edu
Tue, 1 Oct 2002 20:25:09 -0400 (EDT)


On Wed, 2 Oct 2002, Dan Kortschak wrote:

> Forgot to cc to list...
>
> Jason, I can suggest the kinds of things that I would find useful in this
> hierarchy (and some things that would be fun) - not sure if this is what
> you want, but anyway:
>
>
I was thinking we'd separate this into objects which contained
Taxonomic information (akin to the Bio::Seq objects) and objects
which were db adaptors to fetch from NCBI/local taxanomic dbs/etc (akin to
Bio::DB::GenBank).  I think that Martin has done this rather well with the
Bio::Biblio stuff which hides the separation some with the factory
contructor objects (Bio::Biblio) but still there is a separation from the
data fetchers and the data holders.  Make sense - perhaps this is what
you are modeling things after in your ideas below?

Specifically, in bioperl-live we'd have Bio::DB::Taxonomy::Entrez to fetch
from NCBI entrez via HTTP, this could be instantiated with
Bio::Taxonomy(-method => $method,
              -serverinfo => $info); # DBI dsn or HTTP server info, proxy,etc

If we wrote the SQL adaptor layer we'd implement than in the bioperl-db
CVS module and name it appropriately for those namespaces.

> Bio::Taxonomy
> 	->new ($method)
> 		argument: $method - NCBI/Entrez or local SQL species tree
> 		returns: Bio::Taxonomy object
>
> 	->get_organism_by_taxid ($taxid)
> 		argument: taxon id number
> 		returns: Bio::Species object
>
> 	->get_organism_by_name ($name,$mode)
> 		argument: taxon name, search mode common/Linean/any/et c.
> 		returns: Bio::Species object
>
> 	->get_organism_by_acc ($acc)
> 		argument: gene accession number
> 		returns: Bio::Species object
>
> 	->get_organism_by_id ($id)
> 		argument: gene/protein id
> 		returns: Bio::Species object
>
> 	->decendants ($taxid)
> 		argument: taxon id number
>                 returns: array of Bio::Tree::SpeciesTree object
> 		# I'm not sure that the NCBI taxonomy nodes table actually
> 		# allows this is a reasonable way since links only point
> 		# towards the root node, but once done the ::Tree objects
> 		# point downwards.
>
> 	::Tree
> 	# Inherits from Bio::TreeI pretty much the same as
> 	# Bio::Tree::Tree, but doesn't need branch lengths (maybe, useful
> 	# for diverence times?)
>
> 	::Node
> 	# Inherits from Bio::NodeI pretty much the same as
> 	# Bio::Tree::Node, but doesn't need bootstrap et c. (length for
> 	# divergence time?)
> 		->description ($species)
>                 	argument: Bio::Species object
>                 	returns: Bio::Species object
>
>
> Since a taxid can be at levels above species, the Bio::Species objects may
> be semantically broken (though still functional, since
> Bio::Species->classification can hold the appropriate information and
> ->genus...sub_species can be left undef to reflect this - ->common_name
> still useful at the > genus level).
>
> I'd be keen to write some of this, but at the moment my SQL can be
> quantified as non-existant (hence my use of arrays in my previous post).
> I'd be happy to learn when I have time and do it then though. I guess I'll
> have to learn about writing perl objects too :)
>
No really - no need to do the SQL stuff - that would go in bioperl-db type
code, rather we can just write the objects which contain the data, so you
just build in-memory objects which contain the data and worry about how
they are proprigated with data later.

Happy to help you through the process - feel free to code things locally
and we'll see about accounts at the end of the week when our server
changeover is finished.


> cheers
> Dan
>
> BTW It has just occured to me that something along the same lines would be
> appropriate for the GeneOntology data heirarchy as well. Would this fit
> into Bio::Taxonomy?
>
> On Tue, 1 Oct 2002, Jason Stajich wrote:
>
> > I think we would like someone to do the Taxonomy namespace for sure.
> >
> > I have written a connector to NCBI to retrieve ncbi_taxaid for an organism
> > name, its in scripts/taxa.  Would love to be able to populate a whole data
> > object as well if we can design one.
> >
> > Would be great if someone (other than me) would flesh out the
> > Bio::Taxonomy objects.  There was a proposed relational structure from
> > Matthew that may have made its way a module of the biosql schema.
> >
> > Dan would you be brave enough to propose a set of objects which relate a
> > taxanomic hierarchy?  This might utilize objects such as Bio::Graph but I
> > wouldn't worry about that right now.  I would drive this based on use
> > cases that you have for taxonomic data.
> >
> > -jason
>
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu