Bioperl: relative-majority consensus, fast code sought
Paul Gordon
pgordon@cs.dal.ca
Tue, 2 Mar 1999 11:28:51 -0400 (AST)
> $threshold = 0; # $threshold==0 (or 0.249) implies relative majority
> # $threshold==0.33 implies relative majority > one-third
> # $threshold==0.5 implies absolute majority,
> # $threshold==0.66 a two-thirds majority
> $threshold *= ($#$chars+1); #eg if there are 50 chars, $threshold==0.5,
> #25 is the lower bound for absolute majority
> %temp = ();
> @list = sort { $temp{$b}<=>$temp{$a} } grep ++$temp{$_} > $threshold, @chars;
> #@list is ordered by number of occurances, only chars observed enough times
> @list2 = sort {$a cmp $b} grep { $temp{$_} == $temp{$list[0]} } @list;
> #@list2 is ordered lexicographically, only chars observed most often
> return (defined($list2[0]) ? $list2[0] : "!");
> #"!" -> no consensus
>
> How can this code be made really fast ?
In terms of sorting, you don't need to sort the list twice, you can do it
in one subroutine. i.e.
@list = sort {$temp{$b}<=>$temp{$a} || $a cmp $b} {grep ++$temp{$_} >
$threshold, @chars};
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================