Bioperl: relative-majority consensus, fast code sought
Georg Fuellen
fuellen@alum.mit.edu
Mon, 1 Mar 1999 19:24:07 +0100 (MET)
[-re- the empty messages recently sent to this list: I've suggested
to the majordomo administrator to filter these out in the same way as
many (un)subscription requests and some spam are filtered, using perl
regular expressions.]
Now here's a question that's been bothering me for some time:
does anyone have suggestions to improve the speed of the following code,
or can point me to a solution that is already available somewhere ?
Goal: To compute the consensus of the letters in @chars,
w/ the option to request the relative-majority winner,
and break ties according to alphabetical order.
$threshold = 0; # $threshold==0 (or 0.249) implies relative majority
# $threshold==0.33 implies relative majority > one-third
# $threshold==0.5 implies absolute majority,
# $threshold==0.66 a two-thirds majority
$threshold *= ($#$chars+1); #eg if there are 50 chars, $threshold==0.5,
#25 is the lower bound for absolute majority
%temp = ();
@list = sort { $temp{$b}<=>$temp{$a} } grep ++$temp{$_} > $threshold, @chars;
#@list is ordered by number of occurances, only chars observed enough times
@list2 = sort {$a cmp $b} grep { $temp{$_} == $temp{$list[0]} } @list;
#@list2 is ordered lexicographically, only chars observed most often
return (defined($list2[0]) ? $list2[0] : "!");
#"!" -> no consensus
How can this code be made really fast ?
Is there a very fast solution if $threshold is made redundant, i.e.
it's always the relative majority, $threshold==0 ?
best wishes,
georg
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================