[Bioperl-l] Fwd: questions and freeze (fwd)

Hilmar Lapp hlapp@gnf.org
Thu, 10 Oct 2002 18:09:10 -0700


--Apple-Mail-1-156775446
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed

Dan,

several comments.

1) First off, this should really take place on the list, as many 
more people may have an opinion on this, which may or may not 
coincide with what I think or Jason. I'm therefore copying the list 
on my response, I hope you don't mind.

2) We are careful not to change an API that's been in a major stable 
release without providing backward compatibility, at least if it's a 
'core' module. Changing the way $species ->classification() needs to 
be called is a no-no IMO. You can add optional other ways though, 
which can be distinguished in code (that's what I did). Another 
alternative is to write an entire new module if you want a radically 
different API, and over time we could adopt that in the parsers 
(backward compatibility still being a problem).

3) Having to pass the ranks as literals makes the whole thing much 
stricter than it is now, and we're having problems with the code 
being too strict already. I don't know of any major input source 
that actually gives you the ranks along with the values (other than 
NCBI taxon DB itself), and I certainly wouldn't want to rely on them 
being always in a predefined order in the species section of the 
databank entry. So, I don't even know where I would take the values 
from to pass to your variant. How did you envision this value being 
constructed? Ideally you could have both, but I feel the ranks need 
to be optional.

4) Performance wise, classification arrays can be lengthy. If change 
something, I'd also pass references instead of arrays or hashes.

5) As for the connection to Bio::Tree, my take on this is that there 
should eventually be a Bio::TaxonI interface with no connection to 
Bio::Tree on the interface level. Implementors then may or may not 
choose to utilize Bio::Tree::* classes for their implementation. I 
made a similar argument for the Bio::Ontology::* interfaces.

You may want to briefly look at my changes. I basically added 
variant() for strain/isolate/etc information, and added a faster 
calling alternative to classification() (array ref instead of array) 
which also potentially bypasses name validation (which is a major 
problem).

	-hilmar

(The enclosed file is from Dan's original email, it is _not_ my 
version of Species.pm)

Begin forwarded message:

> From: Jason Stajich <jason@cgt.mc.duke.edu>
> Date: Thu Oct 10, 2002  04:56:54 PM US/Pacific
> To: Hilmar Lapp <lapp@gnf.org>
> Cc: <kortschak@rsbs.anu.edu.au>
> Subject: questions and freeze (fwd)
>
>
> Hilmar - I've not looked at your changes to Bio::Species nor have I had
> time to pour over Dan's proposal (sorry, dan, major lack of braincell
> bandwidth) - Hilmar, does any or all of what dan is suggesting jive 
> with
> your stuff?
>
> -j
>
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
>
> ---------- Forwarded message ----------
> Date: Fri, 4 Oct 2002 08:59:28 +1000 (EST)
> From: Dan Kortschak <kortschak@rsbs.anu.edu.au>
> To: Jason Stajich <jason@cgt.mc.duke.edu>
> Subject: questions and freeze
>
> Jason, I couldn't leave it alone, so the rest of the stuff is added 
> in now
> (though I did think of some more things... but I really have to
> concentrate on my real work).
>
> I will get a chance to figure out how to use CVS sometime next week 
> when
> I've finished (or at least started to seriously tackle) the paper I'm
> working on at the moment - until then I can't test the code.
>
> I've made changes to Bio::Species so that the classification method 
> stores
> both the taxa and ranks in a hash - this will break any previous use of
> Species, but it makes more sense, since taxonomic classification 
> schemes
> seem to differ between different lineages, this get around the 
> variance of
> levels used.
>
> The change to Species requires that a hash is passed at new, but 
> I'm not
> sure how that will go through argument handler (it is undoubtedly wrong
> as it stands).
>
> In Node.pm, has_rank and recent_common_ancestor both return a Node 
> object,
> in C++ I'd return a pointer so the node isn't being duplicated, but I'm
> not sure whether a perl ref works the same way (I'm much happer with
> pointers and handles).
>
> When you have time, comments and answers would be appreciated.
>
> cheers
> Dan
>
>
> --
> _____________________________________________________________   .`.`o
>                                                          o| ,\__ `./`r
>   Dan Kortschak    kortschak@rsbs.anu.spanner.edu.au     <\/    \_O> O
>                                                           "|`...'.\
>   Before you criticise a man, try to walk a mile in his    `      :\
>   shoes. Then, if he doesn't like what you have to say,           : \
>   you'll be a mile away, and you'll have his shoes.               :  \
>
>   The address above will not work, remove the spanner from the works.
>
> By replying to this email you implicitly accept that your response may
> be forwarded to other recipients.
> Permission is granted for fair use reproduction.

--Apple-Mail-1-156775446
Content-Disposition: attachment;
	filename=Species.pm
Content-Transfer-Encoding: 7bit
Content-Type: application/octet-stream;
	x-unix-mode=0666;
	name="Species.pm"

# $Id: Species.pm,v 1.21 2002/09/27 02:24:58 jason Exp $
#
# BioPerl module for Bio::Species
#
# Cared for by James Gilbert <jgrg@sanger.ac.uk>
#
# You may distribute this module under the same terms as perl itself

# POD documentation - main docs before the code

=head1 NAME

Bio::Species - Generic species object

=head1 SYNOPSIS

    $species = Bio::Species->new(-classification => [@classification]);
                                    # Can also pass classification
                                    # array to new as below

    $species->classification(qw( sapiens Homo Hominidae
                                 Catarrhini Primates Eutheria
                                 Mammalia Vertebrata Chordata
                                 Metazoa Eukaryota ));

    $genus = $species->genus();

    $bi = $species->binomial();     # $bi is now "Homo sapiens"

    # For storing common name
    $species->common_name("human");

    # For storing subspecies
    $species->sub_species("accountant");

=head1 DESCRIPTION

Provides a very simple object for storing phylogenetic
information.  The classification is stored in an array,
which is a list of nodes in a phylogenetic tree.  Access to
getting and setting species and genus is provided, but not
to any of the other node types (eg: "phylum", "class",
"order", "family").  There's plenty of scope for making the
model more sophisticated, if this is ever needed.

A methods are also provided for storing common
names, and subspecies.

=head1 CONTACT

James Gilbert email B<jgrg@sanger.ac.uk>

=head1 APPENDIX

The rest of the documentation details each of the object
methods. Internal methods are usually preceded with a _

=cut


#' Let the code begin...

package Bio::Species;
use vars qw(@ISA);
use strict;

# Object preamble - inherits from Bio::Root::Object

use Bio::Root::Root;


@ISA = qw(Bio::Root::Root);

sub new {
  my($class,%arg) = @_;

  my $self = $class->SUPER::new(%arg);

  $self->{'classification'} = [];
  $self->{'common_name'} = undef;
  my ($classification) = $self->_rearrange([qw(CLASSIFICATION)], %arg);
  if( defined $classification &&
      (ref($classification) eq "HASH") ) {
      $self->classification(%classification);
  }
  return $self;
}

=head2 classification

 Title   : classification
 Usage   : $self->classification(%class_hash);
           @classification = $self->classification();
 Function: Fills or returns the classification list in
           the object.  The array provided must be in
           the order SPECIES, GENUS ---> KINGDOM.
           Checks are made that species is in lower case,
           and all other elements are in title case.
 Example : $obj->classification(qw( sapiens Homo Hominidae
           Catarrhini Primates Eutheria Mammalia Vertebrata
           Chordata Metazoa Eukaryota));
 Returns : Classification hash
 Args    : Classification hash

=cut



sub classification {
    my ($self,%args) = @_;

    if (%args) {

        # Check the names supplied in the classification string
        {
            # Species should be in lower case
            $self->validate_species_name($args{species});

            # All other names must be in title case
            for (my $i= (keys %args) {
                $self->validate_name($args{$rank});
            }
        }
        # Store classification
        $self->{'classification'} = %args;
    }
    return %{$self->{'classification'}};
}

=head2 common_name

 Title   : common_name
 Usage   : $self->common_name( $common_name );
           $common_name = $self->common_name();
 Function: Get or set the common name of the species
 Example : $self->common_name('human')
 Returns : The common name in a string
 Args    : String, which is the common name

=cut

sub common_name {
    my($self, $name) = @_;

    if ($name) {
        $self->{'common_name'} = $name;
    } else {
        return $self->{'common_name'}
    }
}
=head2

 Title   : organelle
 Usage   : $self->organelle( $organelle );
           $organelle = $self->organelle();
 Function: Get or set the organelle name
 Example : $self->organelle('Chloroplast')
 Returns : The organelle name in a string
 Args    : String, which is the organelle name

=cut

sub organelle {
    my($self, $name) = @_;

    if ($name) {
        $self->{'organelle'} = $name;
    } else {
        return $self->{'organelle'}
    }
}

=head2 species

 Title   : species
 Usage   : $self->species( $species );
           $species = $self->species();
 Function: Get or set the scientific species name.  The species
           name must be in lower case.
 Example : $self->species( 'sapiens' );
 Returns : Scientific species name as string
 Args    : Scientific species name as string

=cut


sub species {
    my($self, $species) = @_;

    if ($species) {
        $self->validate_species_name( $species );
        $self->{'classification'}{'species'} = $species;
    }
    return $self->{'classification'}{'species'};
}

=head2 genus

 Title   : genus
 Usage   : $self->genus( $genus );
           $genus = $self->genus();
 Function: Get or set the scientific genus name.  The genus
           must be in title case.
 Example : $self->genus( 'Homo' );
 Returns : Scientific genus name as string
 Args    : Scientific genus name as string

=cut


sub genus {
    my($self, $genus) = @_;

    if ($genus) {
        $self->validate_name( $genus );
        $self->{'classification'}{'genus'} = $genus;
    }
    return $self->{'classification'}{'genus'};
}

=head2 sub_species

 Title   : sub_species
 Usage   : $obj->sub_species($newval)
 Function:
 Returns : value of sub_species
 Args    : newvalue (optional)


=cut

sub sub_species {
    my($self, $sub) = @_;

    if ($sub) {
        $self->validate_sub_species_name( $sub );
        $self->{'classification'}{'subspecies'} = $sub;
    }
    return $self->{'classification'}{'subspecies'};
}

=head2 binomial

 Title   : binomial
 Usage   : $binomial = $self->binomial();
           $binomial = $self->binomial('FULL');
 Function: Returns a string "Genus species", or "Genus species subspecies",
           the first argument is 'FULL' (and the species has a subspecies).
 Args    : Optionally the string 'FULL' to get the full name including the
           the subspecies.

=cut


sub binomial {
    my( $self, $full ) = @_;

    my( $species, $genus ) = ($self->classification{'species'},$self->classification{'genus'});
    unless( defined $species) {
	$species = '';
	$self->warn("classification was not set");
    }
    $genus = ''   unless( defined $genus);
    my $bi = "$genus $species";
    if (defined($full) && ((uc $full) eq 'FULL')) {
 	my $ssp = $self->classification{'subspecies'};
        $bi .= " $ssp" if $ssp;
    }
    return $bi;
}

sub validate_species_name {
    my( $self, $string ) = @_;

    return 1 if $string =~ /^[a-z][\w\s]+$/i;
    $self->throw("Invalid species name '$string'");
}

sub validate_sub_species_name {
    my( $self, $string ) = @_;

    return 1 if $string =~ /^[a-z][\w\s]+$/i;
    $self->throw("Invalid subspecies name '$string'");
}

sub validate_name {
    my( $self, $string ) = @_;

    return 1 if $string =~ /^[\w\s\-\,\.]+$/ or
        $self->throw("Invalid name '$string'");
}

=head2 ncbi_taxid

 Title   : ncbi_taxid
 Usage   : $obj->ncbi_taxid($newval)
 Function:
 Returns : value of ncbi_taxid as string
 Args    : newvalue (optional)


=cut

sub ncbi_taxid {
    my( $self, $sub ) = @_;

    if ($sub) {
        $self->{'_ncbi_taxid'} = $sub;
    }
    return $self->{'_ncbi_taxid'};
}

1;

__END__

--Apple-Mail-1-156775446
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed

>
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

--Apple-Mail-1-156775446--