Bio::Taxon/Bio::Taxonomy was:Re: [Bioperl-l] Re: Fwd: questions and
freeze (fwd)
Dan Kortschak
Dan Kortschak <kortschak@rsbs.anu.edu.au>
Fri, 11 Oct 2002 13:48:41 +1000 (EST)
First, I'll give it a decent subject (sorry about that - I was tired when
I sent it).
On Thu, 10 Oct 2002, Jason Stajich wrote:
> I think Dan was thinking in terms of hooking Species more properly in with
> a taxonomy structure if one had a local database or wanted to rely on
> a connection to a NCBI system (like if I want to grab all the info on
> a specific order, could I build an appropriate bioperl data structure for
> this so I could query my data which shared a species in my structure). I
> think bascially what it comes down to is we need a totally
> parallel set of objects to handle taxonomic information rather than try
> and retrofit Bio::Species for this.
I was really just starting from Bio::Tree/Node as a ground point. But yes,
I was basically setting out to build something that would easily contain
the kinds of data that the NCBI taxonomy database provides.
>
> I don't think this is really a problem, if we can obtain an NCBI taxa_id
> for a given species then we can relate Bio::Species objects to some
> Taxonomic structure where needed. This is the route I'd prefer we go
> rather than try and glob onto Bio::Species.
Bio::Species was there, so I (reluctantly) used it. I agree that the
Taxonomy object classes should be separate from the non-interface classes.
Perhaps as Bio::Taxon inheritting from Bio::NodeI and Bio::Taxonomy from
Bio::TreeI. If Bio::Taxon has a way of importing from Bio::Species along
the lines of what I specified but giving `no rank' then Bio::Species need
no change. This will break the use of recent_common_ancestor, but in a
sensible way since without knowing the rank two taxa really can't be
compared (perhaps a relaxation on this requirement could be an option - I
was reluctant to do this because a number of cases exist where different
ranks have the same name even where the species themselves are very
unrelated. But as an option I don't see an issue). This makes Bio::Taxon a
general taxonomic entity (which may be a species). Which is essentially, I
think, what I was aiming for with Bio::Taxonomy::Node.
>
> In the same way, I don't want to make Bio::Tree objects explictly
> "species-aware" or even sequence aware so they can be reused for a variety
> of uses. Rather we can build taxon objects as Hilmar alludes to and these
> will hopefully reuse the Bio::Tree basic structure if we've made it
> general enough for this.
>
> Dan we're not trying to be harsh on your proposal, but realistic about the
> current dependancies - do these arguments make sense to you?
>
No worries. These things really just make concrete the worries that I was
having while I was trying to get it into shape. Maybe the suggestions
above make sense and take into account the suggestions made (I hope so).
At the moment this is a low priority (I was writing it while waiting for
clones and big jobs to finish - that prompted the question in the first
place). It's all just one big learning adventure at the moment.
cheers
Dan
> -jason
> On Thu, 10 Oct 2002, Hilmar Lapp wrote:
>
> > Dan,
> >
> > several comments.
> >
> > 1) First off, this should really take place on the list, as many
> > more people may have an opinion on this, which may or may not
> > coincide with what I think or Jason. I'm therefore copying the list
> > on my response, I hope you don't mind.
> >
> > 2) We are careful not to change an API that's been in a major stable
> > release without providing backward compatibility, at least if it's a
> > 'core' module. Changing the way $species ->classification() needs to
> > be called is a no-no IMO. You can add optional other ways though,
> > which can be distinguished in code (that's what I did). Another
> > alternative is to write an entire new module if you want a radically
> > different API, and over time we could adopt that in the parsers
> > (backward compatibility still being a problem).
> >
> > 3) Having to pass the ranks as literals makes the whole thing much
> > stricter than it is now, and we're having problems with the code
> > being too strict already. I don't know of any major input source
> > that actually gives you the ranks along with the values (other than
> > NCBI taxon DB itself), and I certainly wouldn't want to rely on them
> > being always in a predefined order in the species section of the
> > databank entry. So, I don't even know where I would take the values
> > from to pass to your variant. How did you envision this value being
> > constructed? Ideally you could have both, but I feel the ranks need
> > to be optional.
> >
> > 4) Performance wise, classification arrays can be lengthy. If change
> > something, I'd also pass references instead of arrays or hashes.
> >
> > 5) As for the connection to Bio::Tree, my take on this is that there
> > should eventually be a Bio::TaxonI interface with no connection to
> > Bio::Tree on the interface level. Implementors then may or may not
> > choose to utilize Bio::Tree::* classes for their implementation. I
> > made a similar argument for the Bio::Ontology::* interfaces.
> >
> > You may want to briefly look at my changes. I basically added
> > variant() for strain/isolate/etc information, and added a faster
> > calling alternative to classification() (array ref instead of array)
> > which also potentially bypasses name validation (which is a major
> > problem).
> >
> > -hilmar
> >
> > (The enclosed file is from Dan's original email, it is _not_ my
> > version of Species.pm)
--
_____________________________________________________________ .`.`o
o| ,\__ `./`r
Dan Kortschak kortschak@rsbs.anu.spanner.edu.au <\/ \_O> O
"|`...'.\
Before you criticise a man, try to walk a mile in his ` :\
shoes. Then, if he doesn't like what you have to say, : \
you'll be a mile away, and you'll have his shoes. : \
The address above will not work, remove the spanner from the works.
By replying to this email you implicitly accept that your response may
be forwarded to other recipients.
Permission is granted for fair use reproduction.