Bioperl: relative-majority consensus, fast code sought

Georg Fuellen fuellen@alum.mit.edu
Mon, 1 Mar 1999 19:24:07 +0100 (MET)


  [-re- the empty messages recently sent to this list: I've suggested
  to the majordomo administrator to filter these out in the same way as 
  many (un)subscription requests and some spam are filtered, using perl 
  regular expressions.]

Now here's a question that's been bothering me for some time:
does anyone have suggestions to improve the speed of the following code, 
or can point me to a solution that is already available somewhere ?

Goal: To compute the consensus of the letters in @chars,
w/ the option to request the relative-majority winner,
and break ties according to alphabetical order.

$threshold = 0; # $threshold==0 (or 0.249) implies relative majority
                # $threshold==0.33 implies relative majority > one-third
                # $threshold==0.5 implies absolute majority,
                # $threshold==0.66 a two-thirds majority
$threshold *= ($#$chars+1);  #eg if there are 50 chars, $threshold==0.5, 
                             #25 is the lower bound for absolute majority
%temp = ();
@list = sort { $temp{$b}<=>$temp{$a} } grep ++$temp{$_} > $threshold, @chars;
  #@list is ordered by number of occurances, only chars observed enough times
@list2 = sort {$a cmp $b} grep { $temp{$_} == $temp{$list[0]} } @list;
  #@list2 is ordered lexicographically, only chars observed most often
return (defined($list2[0]) ? $list2[0] : "!"); 
  #"!" -> no consensus

How can this code be made really fast ?
Is there a very fast solution if $threshold is made redundant, i.e.
it's always the relative majority, $threshold==0 ?
best wishes,
georg
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================