Bioperl: relative-majority consensus, fast code sought

Tue, 2 Mar 1999 11:28:51 -0400 (AST)

> $threshold = 0; # $threshold==0 (or 0.249) implies relative majority
>                 # $threshold==0.33 implies relative majority > one-third
>                 # $threshold==0.5 implies absolute majority,
>                 # $threshold==0.66 a two-thirds majority
> $threshold *= ($#$chars+1);  #eg if there are 50 chars, $threshold==0.5, 
>                              #25 is the lower bound for absolute majority
> %temp = ();
> @list = sort { $temp{$b}<=>$temp{$a} } grep ++$temp{$_} > $threshold, @chars;
>   #@list is ordered by number of occurances, only chars observed enough times
> @list2 = sort {$a cmp $b} grep { $temp{$_} == $temp{$list[0]} } @list;
>   #@list2 is ordered lexicographically, only chars observed most often
> return (defined($list2[0]) ? $list2[0] : "!"); 
>   #"!" -> no consensus
> 
> How can this code be made really fast ?

In terms of sorting, you don't need to sort the list twice, you can do it
in one subroutine.  i.e.

@list = sort {$temp{$b}<=>$temp{$a} || $a cmp $b} {grep ++$temp{$_} >
$threshold, @chars};

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================