[Bioperl-l] Re: Fwd: questions and freeze (fwd)
Jason Stajich
jason@cgt.mc.duke.edu
Thu, 10 Oct 2002 21:23:04 -0400 (EDT)
I think Dan was thinking in terms of hooking Species more properly in with
a taxonomy structure if one had a local database or wanted to rely on
a connection to a NCBI system (like if I want to grab all the info on
a specific order, could I build an appropriate bioperl data structure for
this so I could query my data which shared a species in my structure). I
think bascially what it comes down to is we need a totally
parallel set of objects to handle taxonomic information rather than try
and retrofit Bio::Species for this.
I don't think this is really a problem, if we can obtain an NCBI taxa_id
for a given species then we can relate Bio::Species objects to some
Taxonomic structure where needed. This is the route I'd prefer we go
rather than try and glob onto Bio::Species.
In the same way, I don't want to make Bio::Tree objects explictly
"species-aware" or even sequence aware so they can be reused for a variety
of uses. Rather we can build taxon objects as Hilmar alludes to and these
will hopefully reuse the Bio::Tree basic structure if we've made it
general enough for this.
Dan we're not trying to be harsh on your proposal, but realistic about the
current dependancies - do these arguments make sense to you?
-jason
On Thu, 10 Oct 2002, Hilmar Lapp wrote:
> Dan,
>
> several comments.
>
> 1) First off, this should really take place on the list, as many
> more people may have an opinion on this, which may or may not
> coincide with what I think or Jason. I'm therefore copying the list
> on my response, I hope you don't mind.
>
> 2) We are careful not to change an API that's been in a major stable
> release without providing backward compatibility, at least if it's a
> 'core' module. Changing the way $species ->classification() needs to
> be called is a no-no IMO. You can add optional other ways though,
> which can be distinguished in code (that's what I did). Another
> alternative is to write an entire new module if you want a radically
> different API, and over time we could adopt that in the parsers
> (backward compatibility still being a problem).
>
> 3) Having to pass the ranks as literals makes the whole thing much
> stricter than it is now, and we're having problems with the code
> being too strict already. I don't know of any major input source
> that actually gives you the ranks along with the values (other than
> NCBI taxon DB itself), and I certainly wouldn't want to rely on them
> being always in a predefined order in the species section of the
> databank entry. So, I don't even know where I would take the values
> from to pass to your variant. How did you envision this value being
> constructed? Ideally you could have both, but I feel the ranks need
> to be optional.
>
> 4) Performance wise, classification arrays can be lengthy. If change
> something, I'd also pass references instead of arrays or hashes.
>
> 5) As for the connection to Bio::Tree, my take on this is that there
> should eventually be a Bio::TaxonI interface with no connection to
> Bio::Tree on the interface level. Implementors then may or may not
> choose to utilize Bio::Tree::* classes for their implementation. I
> made a similar argument for the Bio::Ontology::* interfaces.
>
> You may want to briefly look at my changes. I basically added
> variant() for strain/isolate/etc information, and added a faster
> calling alternative to classification() (array ref instead of array)
> which also potentially bypasses name validation (which is a major
> problem).
>
> -hilmar
>
> (The enclosed file is from Dan's original email, it is _not_ my
> version of Species.pm)
>
> Begin forwarded message:
>
> > From: Jason Stajich <jason@cgt.mc.duke.edu>
> > Date: Thu Oct 10, 2002 04:56:54 PM US/Pacific
> > To: Hilmar Lapp <lapp@gnf.org>
> > Cc: <kortschak@rsbs.anu.edu.au>
> > Subject: questions and freeze (fwd)
> >
> >
> > Hilmar - I've not looked at your changes to Bio::Species nor have I had
> > time to pour over Dan's proposal (sorry, dan, major lack of braincell
> > bandwidth) - Hilmar, does any or all of what dan is suggesting jive
> > with
> > your stuff?
> >
> > -j
> >
> > --
> > Jason Stajich
> > Duke University
> > jason at cgt.mc.duke.edu
> >
> > ---------- Forwarded message ----------
> > Date: Fri, 4 Oct 2002 08:59:28 +1000 (EST)
> > From: Dan Kortschak <kortschak@rsbs.anu.edu.au>
> > To: Jason Stajich <jason@cgt.mc.duke.edu>
> > Subject: questions and freeze
> >
> > Jason, I couldn't leave it alone, so the rest of the stuff is added
> > in now
> > (though I did think of some more things... but I really have to
> > concentrate on my real work).
> >
> > I will get a chance to figure out how to use CVS sometime next week
> > when
> > I've finished (or at least started to seriously tackle) the paper I'm
> > working on at the moment - until then I can't test the code.
> >
> > I've made changes to Bio::Species so that the classification method
> > stores
> > both the taxa and ranks in a hash - this will break any previous use of
> > Species, but it makes more sense, since taxonomic classification
> > schemes
> > seem to differ between different lineages, this get around the
> > variance of
> > levels used.
> >
> > The change to Species requires that a hash is passed at new, but
> > I'm not
> > sure how that will go through argument handler (it is undoubtedly wrong
> > as it stands).
> >
> > In Node.pm, has_rank and recent_common_ancestor both return a Node
> > object,
> > in C++ I'd return a pointer so the node isn't being duplicated, but I'm
> > not sure whether a perl ref works the same way (I'm much happer with
> > pointers and handles).
> >
> > When you have time, comments and answers would be appreciated.
> >
> > cheers
> > Dan
> >
> >
> > --
> > _____________________________________________________________ .`.`o
> > o| ,\__ `./`r
> > Dan Kortschak kortschak@rsbs.anu.spanner.edu.au <\/ \_O> O
> > "|`...'.\
> > Before you criticise a man, try to walk a mile in his ` :\
> > shoes. Then, if he doesn't like what you have to say, : \
> > you'll be a mile away, and you'll have his shoes. : \
> >
> > The address above will not work, remove the spanner from the works.
> >
> > By replying to this email you implicitly accept that your response may
> > be forwarded to other recipients.
> > Permission is granted for fair use reproduction.
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu