[Bioperl-l] Bio::Species, Bio::Taxonomy::Node overhaul
Chris Fields
cjfields at uiuc.edu
Sun Aug 6 23:44:14 UTC 2006
Sendu, I feel this needs to be posted to the main list for further
responses from anyone interested in making a point, one way or
another. I'm dropping out of this; you can have the last word.
This is in response to Sendu's proposal to have $species->species
return the binomial name for that rank, as documented on Bugzilla.
Any other responses would be appreciated.
(In reply to comment #5)
> (In reply to comment #4)
> See also http://en.wikipedia.org/wiki/Species and
> http://en.wikipedia.org/wiki/Binomial_nomenclature. "The name of
the species is
> the whole binomial, not just the second term (which may be called
specific
> epithet, for plants, or specific name, for animals)".
>
> We can't have a method for the 'specific name' because we have no
way of always
> correctly working out what that is. The NCBI taxonomy database
doesn't tell us,
> and neither do the various sequence file formats.
Let's say, for instance, that the single definition of 'species,' as
you have shown, was the only correct definition. But in your
response quoting the Wikipedia articles you leave out a plethora of
other definitions, including one used by taxonomists: the second name
in a binomial nomenclature, aka the species descriptor or what you
have as the 'specific epithet'. This is also explicitly stated in
the second link you provide, for 'binomial nomenclature':
"As the word "binomial" suggests, the scientific name of a species is
formed by the combination of two terms: the genus name and the
species descriptor."
The previous use of species() in Bio::Species fits that definition,
in that the species() method originally gave only the species
descriptor (one name), NOT the binomial name, which is given by
binomial(). Similarly, genus() gave only the genus name. Why have a
genus() or binomial() at all if you get the entire name via species()?
So, is there a correct definition of 'species'? The same wikipedia
pages you use to bolster your case for using a binomial species name
actually indicates otherwise:
"Since the advent of the theory of evolution, the conception of
species has undergone vast changes in biology; however no consensus
on the definition of the word has yet been reached."
Seems ambiguous to me. Is there another way?
Our proposal (actually Hilmar's) was to let Bio::Species hold the
data as parsed in the SeqIO modules as is, but also have the same
data contained in a Bio::Taxon object for I/O. Then, slowly
deprecate Bio::Species in favor of Bio::Taxon. No confusion as to
the data returned, no redundant methods, and the change is gradual,
not sudden. So, you could get the name ('Homo sapiens') as a
Bio::Taxon object scientific name:
# returns NCBI TaxID scientific name from Bio::Taxon object
$seq->taxon->scientific_name();
which doesn't carry the ambiguity of what would be returned like
# returns species name from Bio::Species object
$seq->species->species(); # what is it?
Is it a single name? The binomial? Both definitions could be
correct (but only the first one is used). At least with the first
version (again proposed by Hilmar), you can state that this
explicitly returns the scientific name as defined by NCBI (and have
something from the NCBI server to point to). No tainting of
Bio::Taxon with odd useless methods which can be misconstrued five ways.
I'm not going to get drawn into another long-winded argument about
this. My point is made. It's your baby. I feel that we sometimes
get too impassioned trying to defend our views when coding is the
best course of action. And I feel that not making concise arguments
can be wasteful and, ultimately, pointless.
It's my firm belief, though, using species() in this way will
generate more confusion than it's worth. I'll leave it to you to
answer the confused emails from bioperl users who don't expect this.
Chris
More information about the Bioperl-l
mailing list