[Bioperl-l] taxonomic information handler
Jason Stajich
jason@cgt.mc.duke.edu
Tue, 1 Oct 2002 20:25:09 -0400 (EDT)
On Wed, 2 Oct 2002, Dan Kortschak wrote:
> Forgot to cc to list...
>
> Jason, I can suggest the kinds of things that I would find useful in this
> hierarchy (and some things that would be fun) - not sure if this is what
> you want, but anyway:
>
>
I was thinking we'd separate this into objects which contained
Taxonomic information (akin to the Bio::Seq objects) and objects
which were db adaptors to fetch from NCBI/local taxanomic dbs/etc (akin to
Bio::DB::GenBank). I think that Martin has done this rather well with the
Bio::Biblio stuff which hides the separation some with the factory
contructor objects (Bio::Biblio) but still there is a separation from the
data fetchers and the data holders. Make sense - perhaps this is what
you are modeling things after in your ideas below?
Specifically, in bioperl-live we'd have Bio::DB::Taxonomy::Entrez to fetch
from NCBI entrez via HTTP, this could be instantiated with
Bio::Taxonomy(-method => $method,
-serverinfo => $info); # DBI dsn or HTTP server info, proxy,etc
If we wrote the SQL adaptor layer we'd implement than in the bioperl-db
CVS module and name it appropriately for those namespaces.
> Bio::Taxonomy
> ->new ($method)
> argument: $method - NCBI/Entrez or local SQL species tree
> returns: Bio::Taxonomy object
>
> ->get_organism_by_taxid ($taxid)
> argument: taxon id number
> returns: Bio::Species object
>
> ->get_organism_by_name ($name,$mode)
> argument: taxon name, search mode common/Linean/any/et c.
> returns: Bio::Species object
>
> ->get_organism_by_acc ($acc)
> argument: gene accession number
> returns: Bio::Species object
>
> ->get_organism_by_id ($id)
> argument: gene/protein id
> returns: Bio::Species object
>
> ->decendants ($taxid)
> argument: taxon id number
> returns: array of Bio::Tree::SpeciesTree object
> # I'm not sure that the NCBI taxonomy nodes table actually
> # allows this is a reasonable way since links only point
> # towards the root node, but once done the ::Tree objects
> # point downwards.
>
> ::Tree
> # Inherits from Bio::TreeI pretty much the same as
> # Bio::Tree::Tree, but doesn't need branch lengths (maybe, useful
> # for diverence times?)
>
> ::Node
> # Inherits from Bio::NodeI pretty much the same as
> # Bio::Tree::Node, but doesn't need bootstrap et c. (length for
> # divergence time?)
> ->description ($species)
> argument: Bio::Species object
> returns: Bio::Species object
>
>
> Since a taxid can be at levels above species, the Bio::Species objects may
> be semantically broken (though still functional, since
> Bio::Species->classification can hold the appropriate information and
> ->genus...sub_species can be left undef to reflect this - ->common_name
> still useful at the > genus level).
>
> I'd be keen to write some of this, but at the moment my SQL can be
> quantified as non-existant (hence my use of arrays in my previous post).
> I'd be happy to learn when I have time and do it then though. I guess I'll
> have to learn about writing perl objects too :)
>
No really - no need to do the SQL stuff - that would go in bioperl-db type
code, rather we can just write the objects which contain the data, so you
just build in-memory objects which contain the data and worry about how
they are proprigated with data later.
Happy to help you through the process - feel free to code things locally
and we'll see about accounts at the end of the week when our server
changeover is finished.
> cheers
> Dan
>
> BTW It has just occured to me that something along the same lines would be
> appropriate for the GeneOntology data heirarchy as well. Would this fit
> into Bio::Taxonomy?
>
> On Tue, 1 Oct 2002, Jason Stajich wrote:
>
> > I think we would like someone to do the Taxonomy namespace for sure.
> >
> > I have written a connector to NCBI to retrieve ncbi_taxaid for an organism
> > name, its in scripts/taxa. Would love to be able to populate a whole data
> > object as well if we can design one.
> >
> > Would be great if someone (other than me) would flesh out the
> > Bio::Taxonomy objects. There was a proposed relational structure from
> > Matthew that may have made its way a module of the biosql schema.
> >
> > Dan would you be brave enough to propose a set of objects which relate a
> > taxanomic hierarchy? This might utilize objects such as Bio::Graph but I
> > wouldn't worry about that right now. I would drive this based on use
> > cases that you have for taxonomic data.
> >
> > -jason
>
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu