Bioperl: relative-majority consensus, fast code sought

James Gilbert jgrg@sanger.ac.uk
Mon, 1 Mar 1999 23:04:06 +0000 (GMT)


George,

You might like to try using the tr/// operator to
count the characters in a string, which may be
faster than splitting into an array (but you may
already have it in an array):

$temp{'A'} = ($string =~ tr/A/A/);
$temp{'C'} = ($string =~ tr/C/C/);
$temp{'G'} = ($string =~ tr/G/G/);
$temp{'T'} = ($string =~ tr/T/T/);

$threshold *= (length($string));

etc...

> Many thanks to Andrew Dalke for mailing me a suggestion to
> accelerate the sort-- he has also pointed me to two typos 
> which are now corrected in the example appended.
> Any further hints would be greatly appreciated :)
> 
> Andrew Dalke wrote,
> > Finally, why do the sorts (n*log(n)) operations when you really
> > want a search (order(n))?
> > 
> > @chars = split(//, "AGCATCGATGATAGTGCTATGAATCGTATGATATGCCAA");
> > $threshold = 0.15;
> > $threshold *= ($#chars+1);
> > 
> > %temp = ();               # count the occurance of each character
> > grep ++$temp{$_}, @chars;
> > 
> > $best_key = undef;        # flag to tell if something met the threshold
> > $best_val = $threshold;   # known minimum value
> > while ( ($k, $v) = each %temp) {   # linear search to find the max
> >   if ($v>$best_val ||
> >      ($v==$best_val && (! $best_key || $best_key lt $k))) {
> >     $best_key = $k;
> >     $best_val = $v;
> >   }
> > }
> > if ($best_key) {
> >   print "$best_key ($best_val)\n";
> > } else {
> >   print "!\n";
> > }
> 
> Thanks-- most of the @char vectors are just 4-10 characters long,
> but since the subroutine is called many many times, it's probably
> a big improvement (it seems to be about 50% on my small test data set)
> 
> > 						Andrew
> > 						dalke@bioreason.com
> > 

James G.R. Gilbert
The Sanger Centre
Wellcome Trust Genome Campus
Hinxton
Cambridge                        Tel: 01223 494906
CB10 1SA                         Fax: 01223 494919

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================