[Bioperl-l] Bio::*Taxonomy* changes

Sendu Bala bix at sendu.me.uk
Mon Jul 24 22:15:31 UTC 2006


Chris Fields wrote:
>
> Also, I'm trying to follow the original idea as proposed by Jason (this is
> from perldoc Bio::Taxonomy::Node):
>
> Which, to me, indicated that this would eventually replace Bio::Species

Well, we don't really know that Jason didn't later change his mind, but 
in any case it doesn't make sense (anymore, given that we have 
Bio::Taxonomy).

In a direct reply to me you point out specific passages in the current 
docs that explain why you have thought we should delegate or replace 
Bio::Species with Bio::Taxonomy::Node. With respect, the old plans are 
not something we are forced to blindly follow. We decide for ourselves 
if they make sense, we decide for ourselves if there is a better way of 
doing it, and then we do it the best way.

So if you ignore what those old bits of documentation say, just pretend 
you never ever read them, would my proposals make sense or not? Since 
those old proposals were never implemented we have no reason to try and 
stick with them if there is a better proposal.

And for the record, '...Bio::Species which is able to represent only 
species-level' can (correctly) be interpreted as 'Bio::Species is only 
supposed to be used for representing a taxonomy that includes the 
species-level'. You can't interpret it literally because Bio::Species is 
used for levels below species, and also represents all the levels above 
species-level as well. Either Jason got it wrong when he wrote that, or 
you have misinterpreted it.

Likewise, let's play the interpretation game again: 'Previously all 
information was managed by a single object called Bio::Species. [the 
Bio::Taxonomy::Node] implementation allows representation of the 
intermediate nodes not just the species nodes'. Note the apposition of 
'single object' vs implication of multiple Node objects to do the same 
job. I imagine at the time Jason wrote that there was no Bio::Taxonomy, 
no holder for multiple Nodes.


> I had originally wanted to start delegating everything over to
> Taxonomy::Node about a month ago, when I found that it was remarkably easy
> to do so.  However, when Sendu proposed making changes to remove methods in
> Bio::Taxonomy::Node and make sweeping changes to Taxonomy which would
> prevent an easy transition over to Node,

But an equally easy transition to Bio::Taxonomy instead. I don't know 
why you would care about the name of the class we switch to. My concern 
is that when the switch is made it makes sense.


> If we think it would be better to completely toss all this out the window
> and use only a bare-bones Node, then I'm fine with that.   But if we go that
> route we should just get rid of the Bio::Species 'disease' completely and
> have things be much simpler.  Simple is good!
> 
> I think Node can still act as a viable container class for the tax data from
> a GenBank file (it's original purpose) as long as it has the very basic
> methods for doing so.  That would require:
> 
> scientific_name() - ORGANISM line data
> common_names() - which could hold common names (in parentheses on the SOURCE
> line) and the abbreviated name (from the SOURCE line)
> ncbi_taxid() - from the 'source' seqfeature (already there).
> 
> The lineage information and organelle information could be stored in Node or
> in SimpleValue objects.  My vote is for the latter as there's no need for a
> classification() container for Node, which you have repeatedly pointed out.

No, this is the whole point. The lineage information can NOT be stored 
in a Node (unless you absuse Node by having all those crufty methods 
like genus() and classification()), and why would we store it in 
SimpleValue objects when we have Bio::Taxonomy?

Bio::Taxonomy is completely perfect for storing the taxonomic 
information from a GenBank file. That's all you need to worry about. Can 
we represent the data correctly? Yes. Do we gain all the good things 
about a pure Bio::Taxonomy? Yes. Can we still do everything we used to 
be able to do? Yes.


> I think we should just get rid of Bio::Species completely.

There's no need to get rid of Bio::Species. It can be a Bio::Taxonomy 
with backward-compatible methods. No harm done, all good.


I'll tell you what. This will be easier if I just write the code for my 
proposals, including whatever changes would be needed in 
Bio::SeqIO::genbank et al. You'll see how easy and appropriate it is, 
and hopefully everyone will be happy.

Perhaps you could just hold off doing any similar-but-contradictory work 
until then.



More information about the Bioperl-l mailing list