Bioperl: relative-majority consensus, fast code sought
James Gilbert
jgrg@sanger.ac.uk
Mon, 1 Mar 1999 23:04:06 +0000 (GMT)
George,
You might like to try using the tr/// operator to
count the characters in a string, which may be
faster than splitting into an array (but you may
already have it in an array):
$temp{'A'} = ($string =~ tr/A/A/);
$temp{'C'} = ($string =~ tr/C/C/);
$temp{'G'} = ($string =~ tr/G/G/);
$temp{'T'} = ($string =~ tr/T/T/);
$threshold *= (length($string));
etc...
> Many thanks to Andrew Dalke for mailing me a suggestion to
> accelerate the sort-- he has also pointed me to two typos
> which are now corrected in the example appended.
> Any further hints would be greatly appreciated :)
>
> Andrew Dalke wrote,
> > Finally, why do the sorts (n*log(n)) operations when you really
> > want a search (order(n))?
> >
> > @chars = split(//, "AGCATCGATGATAGTGCTATGAATCGTATGATATGCCAA");
> > $threshold = 0.15;
> > $threshold *= ($#chars+1);
> >
> > %temp = (); # count the occurance of each character
> > grep ++$temp{$_}, @chars;
> >
> > $best_key = undef; # flag to tell if something met the threshold
> > $best_val = $threshold; # known minimum value
> > while ( ($k, $v) = each %temp) { # linear search to find the max
> > if ($v>$best_val ||
> > ($v==$best_val && (! $best_key || $best_key lt $k))) {
> > $best_key = $k;
> > $best_val = $v;
> > }
> > }
> > if ($best_key) {
> > print "$best_key ($best_val)\n";
> > } else {
> > print "!\n";
> > }
>
> Thanks-- most of the @char vectors are just 4-10 characters long,
> but since the subroutine is called many many times, it's probably
> a big improvement (it seems to be about 50% on my small test data set)
>
> > Andrew
> > dalke@bioreason.com
> >
James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge Tel: 01223 494906
CB10 1SA Fax: 01223 494919
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================