Bio::Taxon/Bio::Taxonomy was:Re: [Bioperl-l] Re: Fwd: questions and freeze (fwd)

Hilmar Lapp hlapp@gnf.org
Fri, 11 Oct 2002 00:04:10 -0700


Dan, this makes all a lot of sense to me. I was confused when you 
wanted to revamp Bio::Species, now I remember your Taxonomy 
suggestions. Shape Bio::Taxonomy and/or Bio::Taxon -- there's no-one 
getting into your way :-)

I'm confident we can think later how to reconcile with Bio::Species 
if we want that, so that shouldn't hamper your creativity :-)

	-hilmar

On Thursday, October 10, 2002, at 08:48 PM, Dan Kortschak wrote:

> First, I'll give it a decent subject (sorry about that - I was 
> tired when
> I sent it).
>
> On Thu, 10 Oct 2002, Jason Stajich wrote:
>
>> I think Dan was thinking in terms of hooking Species more properly 
>> in with
>> a taxonomy structure if one had a local database or wanted to rely on
>> a connection to a NCBI system (like if I want to grab all the info on
>> a specific order, could I build an appropriate bioperl data 
>> structure for
>> this so I could query my data which shared a species in my 
>> structure).  I
>> think bascially what it comes down to is we need a totally
>> parallel set of objects to handle taxonomic information rather 
>> than try
>> and retrofit Bio::Species for this.
>
> I was really just starting from Bio::Tree/Node as a ground point. 
> But yes,
> I was basically setting out to build something that would easily 
> contain
> the kinds of data that the NCBI taxonomy database provides.
>
>>
>> I don't think this is really a problem, if we can obtain an NCBI 
>> taxa_id
>> for a given species then we can relate Bio::Species objects to some
>> Taxonomic structure where needed.  This is the route I'd prefer we go
>> rather than try and glob onto Bio::Species.
>
> Bio::Species was there, so I (reluctantly) used it. I agree that the
> Taxonomy object classes should be separate from the non-interface 
> classes.
> Perhaps as Bio::Taxon inheritting from Bio::NodeI and Bio::Taxonomy 
> from
> Bio::TreeI. If Bio::Taxon has a way of importing from Bio::Species 
> along
> the lines of what I specified but giving `no rank' then 
> Bio::Species need
> no change. This will break the use of recent_common_ancestor, but in a
> sensible way since without knowing the rank two taxa really can't be
> compared (perhaps a relaxation on this requirement could be an 
> option - I
> was reluctant to do this because a number of cases exist where 
> different
> ranks have the same name even where the species themselves are very
> unrelated. But as an option I don't see an issue). This makes 
> Bio::Taxon a
> general taxonomic entity (which may be a species). Which is 
> essentially, I
> think, what I was aiming for with Bio::Taxonomy::Node.
>
>>
>> In the same way, I don't want to make Bio::Tree objects explictly
>> "species-aware" or even sequence aware so they can be reused for a 
>> variety
>> of uses.  Rather we can build taxon objects as Hilmar alludes to 
>> and these
>> will hopefully reuse the Bio::Tree basic structure if we've made it
>> general enough for this.
>>
>> Dan we're not trying to be harsh on your proposal, but realistic 
>> about the
>> current dependancies - do these arguments make sense to you?
>>
>
> No worries. These things really just make concrete the worries that 
> I was
> having while I was trying to get it into shape. Maybe the suggestions
> above make sense and take into account the suggestions made (I hope 
> so).
>
> At the moment this is a low priority (I was writing it while 
> waiting for
> clones and big jobs to finish - that prompted the question in the first
> place). It's all just one big learning adventure at the moment.
>
>
> cheers
> Dan
>
>> -jason
>> On Thu, 10 Oct 2002, Hilmar Lapp wrote:
>>
>>> Dan,
>>>
>>> several comments.
>>>
>>> 1) First off, this should really take place on the list, as many
>>> more people may have an opinion on this, which may or may not
>>> coincide with what I think or Jason. I'm therefore copying the list
>>> on my response, I hope you don't mind.
>>>
>>> 2) We are careful not to change an API that's been in a major stable
>>> release without providing backward compatibility, at least if it's a
>>> 'core' module. Changing the way $species ->classification() needs to
>>> be called is a no-no IMO. You can add optional other ways though,
>>> which can be distinguished in code (that's what I did). Another
>>> alternative is to write an entire new module if you want a radically
>>> different API, and over time we could adopt that in the parsers
>>> (backward compatibility still being a problem).
>>>
>>> 3) Having to pass the ranks as literals makes the whole thing much
>>> stricter than it is now, and we're having problems with the code
>>> being too strict already. I don't know of any major input source
>>> that actually gives you the ranks along with the values (other than
>>> NCBI taxon DB itself), and I certainly wouldn't want to rely on them
>>> being always in a predefined order in the species section of the
>>> databank entry. So, I don't even know where I would take the values
>>> from to pass to your variant. How did you envision this value being
>>> constructed? Ideally you could have both, but I feel the ranks need
>>> to be optional.
>>>
>>> 4) Performance wise, classification arrays can be lengthy. If change
>>> something, I'd also pass references instead of arrays or hashes.
>>>
>>> 5) As for the connection to Bio::Tree, my take on this is that there
>>> should eventually be a Bio::TaxonI interface with no connection to
>>> Bio::Tree on the interface level. Implementors then may or may not
>>> choose to utilize Bio::Tree::* classes for their implementation. I
>>> made a similar argument for the Bio::Ontology::* interfaces.
>>>
>>> You may want to briefly look at my changes. I basically added
>>> variant() for strain/isolate/etc information, and added a faster
>>> calling alternative to classification() (array ref instead of array)
>>> which also potentially bypasses name validation (which is a major
>>> problem).
>>>
>>> 	-hilmar
>>>
>>> (The enclosed file is from Dan's original email, it is _not_ my
>>> version of Species.pm)
> --
> _____________________________________________________________   .`.`o
>                                                          o| ,\__ `./`r
>   Dan Kortschak    kortschak@rsbs.anu.spanner.edu.au     <\/    \_O> O
>                                                           "|`...'.\
>   Before you criticise a man, try to walk a mile in his    `      :\
>   shoes. Then, if he doesn't like what you have to say,           : \
>   you'll be a mile away, and you'll have his shoes.               :  \
>
>   The address above will not work, remove the spanner from the works.
>
> By replying to this email you implicitly accept that your response may
> be forwarded to other recipients.
> Permission is granted for fair use reproduction.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------