[Bioperl-l] Fwd: questions and freeze (fwd)
Hilmar Lapp
hlapp@gnf.org
Thu, 10 Oct 2002 18:09:10 -0700
--Apple-Mail-1-156775446
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
format=flowed
Dan,
several comments.
1) First off, this should really take place on the list, as many
more people may have an opinion on this, which may or may not
coincide with what I think or Jason. I'm therefore copying the list
on my response, I hope you don't mind.
2) We are careful not to change an API that's been in a major stable
release without providing backward compatibility, at least if it's a
'core' module. Changing the way $species ->classification() needs to
be called is a no-no IMO. You can add optional other ways though,
which can be distinguished in code (that's what I did). Another
alternative is to write an entire new module if you want a radically
different API, and over time we could adopt that in the parsers
(backward compatibility still being a problem).
3) Having to pass the ranks as literals makes the whole thing much
stricter than it is now, and we're having problems with the code
being too strict already. I don't know of any major input source
that actually gives you the ranks along with the values (other than
NCBI taxon DB itself), and I certainly wouldn't want to rely on them
being always in a predefined order in the species section of the
databank entry. So, I don't even know where I would take the values
from to pass to your variant. How did you envision this value being
constructed? Ideally you could have both, but I feel the ranks need
to be optional.
4) Performance wise, classification arrays can be lengthy. If change
something, I'd also pass references instead of arrays or hashes.
5) As for the connection to Bio::Tree, my take on this is that there
should eventually be a Bio::TaxonI interface with no connection to
Bio::Tree on the interface level. Implementors then may or may not
choose to utilize Bio::Tree::* classes for their implementation. I
made a similar argument for the Bio::Ontology::* interfaces.
You may want to briefly look at my changes. I basically added
variant() for strain/isolate/etc information, and added a faster
calling alternative to classification() (array ref instead of array)
which also potentially bypasses name validation (which is a major
problem).
-hilmar
(The enclosed file is from Dan's original email, it is _not_ my
version of Species.pm)
Begin forwarded message:
> From: Jason Stajich <jason@cgt.mc.duke.edu>
> Date: Thu Oct 10, 2002 04:56:54 PM US/Pacific
> To: Hilmar Lapp <lapp@gnf.org>
> Cc: <kortschak@rsbs.anu.edu.au>
> Subject: questions and freeze (fwd)
>
>
> Hilmar - I've not looked at your changes to Bio::Species nor have I had
> time to pour over Dan's proposal (sorry, dan, major lack of braincell
> bandwidth) - Hilmar, does any or all of what dan is suggesting jive
> with
> your stuff?
>
> -j
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>
> ---------- Forwarded message ----------
> Date: Fri, 4 Oct 2002 08:59:28 +1000 (EST)
> From: Dan Kortschak <kortschak@rsbs.anu.edu.au>
> To: Jason Stajich <jason@cgt.mc.duke.edu>
> Subject: questions and freeze
>
> Jason, I couldn't leave it alone, so the rest of the stuff is added
> in now
> (though I did think of some more things... but I really have to
> concentrate on my real work).
>
> I will get a chance to figure out how to use CVS sometime next week
> when
> I've finished (or at least started to seriously tackle) the paper I'm
> working on at the moment - until then I can't test the code.
>
> I've made changes to Bio::Species so that the classification method
> stores
> both the taxa and ranks in a hash - this will break any previous use of
> Species, but it makes more sense, since taxonomic classification
> schemes
> seem to differ between different lineages, this get around the
> variance of
> levels used.
>
> The change to Species requires that a hash is passed at new, but
> I'm not
> sure how that will go through argument handler (it is undoubtedly wrong
> as it stands).
>
> In Node.pm, has_rank and recent_common_ancestor both return a Node
> object,
> in C++ I'd return a pointer so the node isn't being duplicated, but I'm
> not sure whether a perl ref works the same way (I'm much happer with
> pointers and handles).
>
> When you have time, comments and answers would be appreciated.
>
> cheers
> Dan
>
>
> --
> _____________________________________________________________ .`.`o
> o| ,\__ `./`r
> Dan Kortschak kortschak@rsbs.anu.spanner.edu.au <\/ \_O> O
> "|`...'.\
> Before you criticise a man, try to walk a mile in his ` :\
> shoes. Then, if he doesn't like what you have to say, : \
> you'll be a mile away, and you'll have his shoes. : \
>
> The address above will not work, remove the spanner from the works.
>
> By replying to this email you implicitly accept that your response may
> be forwarded to other recipients.
> Permission is granted for fair use reproduction.
--Apple-Mail-1-156775446
Content-Disposition: attachment;
filename=Species.pm
Content-Transfer-Encoding: 7bit
Content-Type: application/octet-stream;
x-unix-mode=0666;
name="Species.pm"
# $Id: Species.pm,v 1.21 2002/09/27 02:24:58 jason Exp $
#
# BioPerl module for Bio::Species
#
# Cared for by James Gilbert <jgrg@sanger.ac.uk>
#
# You may distribute this module under the same terms as perl itself
# POD documentation - main docs before the code
=head1 NAME
Bio::Species - Generic species object
=head1 SYNOPSIS
$species = Bio::Species->new(-classification => [@classification]);
# Can also pass classification
# array to new as below
$species->classification(qw( sapiens Homo Hominidae
Catarrhini Primates Eutheria
Mammalia Vertebrata Chordata
Metazoa Eukaryota ));
$genus = $species->genus();
$bi = $species->binomial(); # $bi is now "Homo sapiens"
# For storing common name
$species->common_name("human");
# For storing subspecies
$species->sub_species("accountant");
=head1 DESCRIPTION
Provides a very simple object for storing phylogenetic
information. The classification is stored in an array,
which is a list of nodes in a phylogenetic tree. Access to
getting and setting species and genus is provided, but not
to any of the other node types (eg: "phylum", "class",
"order", "family"). There's plenty of scope for making the
model more sophisticated, if this is ever needed.
A methods are also provided for storing common
names, and subspecies.
=head1 CONTACT
James Gilbert email B<jgrg@sanger.ac.uk>
=head1 APPENDIX
The rest of the documentation details each of the object
methods. Internal methods are usually preceded with a _
=cut
#' Let the code begin...
package Bio::Species;
use vars qw(@ISA);
use strict;
# Object preamble - inherits from Bio::Root::Object
use Bio::Root::Root;
@ISA = qw(Bio::Root::Root);
sub new {
my($class,%arg) = @_;
my $self = $class->SUPER::new(%arg);
$self->{'classification'} = [];
$self->{'common_name'} = undef;
my ($classification) = $self->_rearrange([qw(CLASSIFICATION)], %arg);
if( defined $classification &&
(ref($classification) eq "HASH") ) {
$self->classification(%classification);
}
return $self;
}
=head2 classification
Title : classification
Usage : $self->classification(%class_hash);
@classification = $self->classification();
Function: Fills or returns the classification list in
the object. The array provided must be in
the order SPECIES, GENUS ---> KINGDOM.
Checks are made that species is in lower case,
and all other elements are in title case.
Example : $obj->classification(qw( sapiens Homo Hominidae
Catarrhini Primates Eutheria Mammalia Vertebrata
Chordata Metazoa Eukaryota));
Returns : Classification hash
Args : Classification hash
=cut
sub classification {
my ($self,%args) = @_;
if (%args) {
# Check the names supplied in the classification string
{
# Species should be in lower case
$self->validate_species_name($args{species});
# All other names must be in title case
for (my $i= (keys %args) {
$self->validate_name($args{$rank});
}
}
# Store classification
$self->{'classification'} = %args;
}
return %{$self->{'classification'}};
}
=head2 common_name
Title : common_name
Usage : $self->common_name( $common_name );
$common_name = $self->common_name();
Function: Get or set the common name of the species
Example : $self->common_name('human')
Returns : The common name in a string
Args : String, which is the common name
=cut
sub common_name {
my($self, $name) = @_;
if ($name) {
$self->{'common_name'} = $name;
} else {
return $self->{'common_name'}
}
}
=head2
Title : organelle
Usage : $self->organelle( $organelle );
$organelle = $self->organelle();
Function: Get or set the organelle name
Example : $self->organelle('Chloroplast')
Returns : The organelle name in a string
Args : String, which is the organelle name
=cut
sub organelle {
my($self, $name) = @_;
if ($name) {
$self->{'organelle'} = $name;
} else {
return $self->{'organelle'}
}
}
=head2 species
Title : species
Usage : $self->species( $species );
$species = $self->species();
Function: Get or set the scientific species name. The species
name must be in lower case.
Example : $self->species( 'sapiens' );
Returns : Scientific species name as string
Args : Scientific species name as string
=cut
sub species {
my($self, $species) = @_;
if ($species) {
$self->validate_species_name( $species );
$self->{'classification'}{'species'} = $species;
}
return $self->{'classification'}{'species'};
}
=head2 genus
Title : genus
Usage : $self->genus( $genus );
$genus = $self->genus();
Function: Get or set the scientific genus name. The genus
must be in title case.
Example : $self->genus( 'Homo' );
Returns : Scientific genus name as string
Args : Scientific genus name as string
=cut
sub genus {
my($self, $genus) = @_;
if ($genus) {
$self->validate_name( $genus );
$self->{'classification'}{'genus'} = $genus;
}
return $self->{'classification'}{'genus'};
}
=head2 sub_species
Title : sub_species
Usage : $obj->sub_species($newval)
Function:
Returns : value of sub_species
Args : newvalue (optional)
=cut
sub sub_species {
my($self, $sub) = @_;
if ($sub) {
$self->validate_sub_species_name( $sub );
$self->{'classification'}{'subspecies'} = $sub;
}
return $self->{'classification'}{'subspecies'};
}
=head2 binomial
Title : binomial
Usage : $binomial = $self->binomial();
$binomial = $self->binomial('FULL');
Function: Returns a string "Genus species", or "Genus species subspecies",
the first argument is 'FULL' (and the species has a subspecies).
Args : Optionally the string 'FULL' to get the full name including the
the subspecies.
=cut
sub binomial {
my( $self, $full ) = @_;
my( $species, $genus ) = ($self->classification{'species'},$self->classification{'genus'});
unless( defined $species) {
$species = '';
$self->warn("classification was not set");
}
$genus = '' unless( defined $genus);
my $bi = "$genus $species";
if (defined($full) && ((uc $full) eq 'FULL')) {
my $ssp = $self->classification{'subspecies'};
$bi .= " $ssp" if $ssp;
}
return $bi;
}
sub validate_species_name {
my( $self, $string ) = @_;
return 1 if $string =~ /^[a-z][\w\s]+$/i;
$self->throw("Invalid species name '$string'");
}
sub validate_sub_species_name {
my( $self, $string ) = @_;
return 1 if $string =~ /^[a-z][\w\s]+$/i;
$self->throw("Invalid subspecies name '$string'");
}
sub validate_name {
my( $self, $string ) = @_;
return 1 if $string =~ /^[\w\s\-\,\.]+$/ or
$self->throw("Invalid name '$string'");
}
=head2 ncbi_taxid
Title : ncbi_taxid
Usage : $obj->ncbi_taxid($newval)
Function:
Returns : value of ncbi_taxid as string
Args : newvalue (optional)
=cut
sub ncbi_taxid {
my( $self, $sub ) = @_;
if ($sub) {
$self->{'_ncbi_taxid'} = $sub;
}
return $self->{'_ncbi_taxid'};
}
1;
__END__
--Apple-Mail-1-156775446
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
format=flowed
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
--Apple-Mail-1-156775446--