[Bioperl-l] t/SimpleAlign: not ok 18
Heikki Lehvaslaiho
heikki@ebi.ac.uk
17 Sep 2002 13:37:32 +0100
Allan,
I fixed the function according your suggestion. I also added you as
contributor to the module.
-Heikki
On Fri, 2002-09-13 at 04:35, Allen Smith wrote:
> On Sep 12, 8:05am, Jason Stajich wrote:
> > I cannot replicate on either the released tarball or current 1-0-0 branch
> > on IRIX with perl 5.6.1. Very strange. Can it be a 5.8.0 bug? That
> > seems odd but possible.
>
> Well, I just took a look at SimpleAlign's consensus procedure, and I can see
> why there's a difference - and it is a _bioperl_ bug, not a perl bug. Perl
> 5.8.0 uses a different hash algorithm, resulting in having a different
> ordering of letters with "each". The alignment in question has equal numbers
> of 'D's and 'E's at the third position. Previously, the ordering of the hash
> resulted in 'D' coming first; it now results in 'E' coming first. I suggest
>
> sub _consensus_aa {
> my $self = shift;
> my $point = shift;
> my $threshold_percent = shift || -1 ;
> my ($seq,%hash,$count,$letter,$key);
>
> foreach $seq ( $self->each_seq() ) {
> $letter = substr($seq->seq,$point,1);
> $self->throw("--$point-----------") if $letter eq '';
> ($letter =~ /\./) && next;
> # print "Looking at $letter\n";
> $hash{$letter}++;
> }
> my $number_of_sequences = $self->no_sequences();
> my $threshold = $number_of_sequences * $threshold_percent / 100. ;
> $count = -1;
> $letter = '?';
>
> foreach $key ( keys %hash ) {
> # print "Now at $key $hash{$key}\n";
> if( $hash{$key} > $count && $hash{$key} >= $threshold) {
> $letter = $key;
> $count = $hash{$key};
> }
> }
> return $letter;
> }
>
> be replaced with
>
> sub _consensus_aa {
> my $self = shift;
> my $point = shift;
> my $threshold_percent = shift || -1 ;
> my ($seq,%hash,$count,$letter,$key);
>
> foreach $seq ( $self->each_seq() ) {
> $letter = substr($seq->seq,$point,1);
> $self->throw("--$point-----------") if $letter eq '';
> ($letter =~ /\./) && next;
> # print "Looking at $letter\n";
> $hash{$letter}++;
> }
> my $number_of_sequences = $self->no_sequences();
> my $threshold = $number_of_sequences * $threshold_percent / 100. ;
> $count = -1;
> $letter = '?';
>
> foreach $key ( sort(keys %hash) ) {
> # print "Now at $key $hash{$key}\n";
> if( $hash{$key} > $count && $hash{$key} >= $threshold) {
> $letter = $key;
> $count = $hash{$key};
> }
> }
> return $letter;
> }
>
> And any tests that differ as a result being edited in their expected
> answer. The 'sort' in the above will result in the consensus sequence not
> being affected by changes in the hash algorithm.
>
> This is, however, not what I would describe as an ideal fix. I suggest that
> taking into account what the other residues are (if doing a protein
> consensus) and which one of the two (or more) tied residues they are most
> similar to would be preferable (using the CONSERVATION_GROUPS rules for
> which is most similar, probably, although allowing user modification of this
> is desirable).
>
> -Allen
>
> --
> Allen Smith http://cesario.rutgers.edu/easmith/
> September 11, 2001 A Day That Shall Live In Infamy II
> "They that can give up essential liberty to obtain a little temporary
> safety deserve neither liberty nor safety." - Benjamin Franklin
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________