[Bioperl-l] Bio::*Taxonomy* changes
Sendu Bala
bix at sendu.me.uk
Tue Jul 25 07:05:23 UTC 2006
Chris Fields wrote:
>
> There is one thing I will make perfectly clear here: there should
> never, ever be enforced lookups for SeqIO (even using caches), though
> I have no problem having optional ones. This is something I have
> stated before and what you propose below steers dangerously in that
> direction. Where, for instance, do you store the lineage from a
> GenBank file? Do you want to do a series of Tax lookups to restore
> that data? I think that the number one complaint for sequence
> parsing is speed, which would only get slower with lookups (even
> cached).
I already gave a code example of exactly how Bio::Taxonomy is perfect
for storing the lineage data in a GenBank file with or without a
database lookup. I think perhaps at the time you first read this you
basically ignored it because you had trouble with the idea of adding
nodes to a species. If you have been glossing over my argument, it may
be instructive to go over what I've been saying with a clear eye.
Anyway, here it is again, and remember in this example, Bio::Species isa
Bio::Taxonomy:
## the fully-manual way
my $species = new Bio::Species;
my $node = new Bio::Taxonomy::Node(-name => 'Saccharomyces cerevisiae',
-rank => 'species', -object_id => 1,
-parent_id => 2);
my $n2 = new Bio::Taxonomy::Node(-name => 'Saccharomyces',
-object_id => 2, -parent_id => 3);
# (no assumption that 'Saccharomyces' is the genus, so rank() undefined)
my $n3 = [etc]
$species->add_node($node);
$species->add_node($n2);
[etc]
## Using a factory without db access
# assume that Bio::Taxonomy::GenbankFactory implements
# some modified Bio::Taxonomy::FactoryI
my $factory = Bio::Taxonomy::GenbankFactory->new();
my $species = $factory->generate(-classification => ['Saccharomyces
cerevisiae', 'Saccharomyces', 'Saccharomycetaceae' ...]);
# the generate() method above just does the fully-manual way for you
## Using a factory with db access
# assume that Bio::Taxonomy::EntrezFactory implements some
# modified Bio::Taxonomy::FactoryI and uses Bio::DB::Taxonomy::entrez
# to get the nodes
my $factory = Bio::Taxonomy::EntrezFactory->new();
my $species = $factory->fetch(-scientifc_name => 'Saccharomyces
cerevisiae');
So now do you see how we're able to do the Genbank no-db way and the
db-using way with the same object model? We're able to do it the same,
sane way because a Node is just a node; you can make them yourself
manually, or retrieve them from a database. Once you stick them in a
Taxonomy you can then (potentially) ask all the questions of the data
that you can with existing Bio::Species. No cruft is required anywhere
at all. All the Taxonomy classes can be 'pure', while only Bio::Species
has to have backward-compatibility methods.
More information about the Bioperl-l
mailing list