Bioperl: making a random sequence

Paul Gordon
Tue, 30 May 2000 11:33:28 -0300 (ADT)

> I'm trying to make some 'control' DNA sequences......
> srand();
> for($x=1;$x<=1000000;$x++)
> {
> 	$r = rand(1);
> 	if($r <= 0.25){ print "A"; }
> 	elsif($r <= 0.5){ print "C"; }
> 	elsif($r <= 0.75){ print "G"; }
> 	elsif($r <= 1.0){ print "T"; }
> }
> This spews out nonsense sequence but the proportions of bases are not equal.
> A tends to be overrepresented at around 44% or so.  The others are
> correspondingly reduced (sorry, I left the exact figures at home).  I've
> checked that rand(1) really does generate numbers between 0 and 1, so why
> the skew to larger numbers???

There would be a *slight* skewing because rand() returns a number in the
range [0,1) (i.e. including 0, excluding 1) so your <= operators should be
< operators.  Your script works out fine on my SGI running 5.004_04, but I
know there are some releases that had print buffering problems I ran into,
which caused printed stuff to be reprinted (so 'print "a"; print "b";
print "c"' prints "aababc" sometimes).  You could test this by pushing the
letters onto an array, then printing the array at the end of the loop,
instead of printing the sequence letter by letter.

> Do I need to tweak srand() somehow?
I beleive that the first time rand() is called, srand is called
implicitly, so that's not actually necessary (since calling the program
twice in a row without srand() yields different results).

I seem to remember someone on the mailing list talking about writing a
module to generate random sequences with base composition bias, but I
could be wrong...


Paul Gordon                           
Genomic Technologies
Institute for Marine Biosciences
National Research Council Canada

=========== Bioperl Project Mailing List Message Footer =======
Project URL:
For info about how to (un)subscribe, where messages are archived, etc: