[BioRuby] Biased Bio::Sequence randomize()

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Fri Oct 17 15:10:38 UTC 2008


Hi,

First, I implemented unit tests for Bio::Sequence::Common,
including chi-square test of equiprobability, and fixed the bug
by using Fisher-Yates shuffle, as suggested by Anders.

http://github.com/bioruby/bioruby/commit/02de70cbf036b41a50d770954f3b16ba2beca880
(Sorry for typo in the commit message)

On Tue, 14 Oct 2008 05:55:41 +0900
Toshiaki Katayama <ktym at hgc.jp> wrote:

> Dear Jacobsen,
> 
> > I believe that the current sequence randomization/shuffle method is severely
> > biased, infrequent bases are more likely to occur in the end of the sequence
> > than in the beginning:
> 
> You are right. 
> 
> I had fixed a while ago, but it seems that I forgot to commit to the repository, sorry.
> 
> Could you try the following replacement?

It works fine without blocks, but when giving a block,
the bahavior is different from the original one.
In addition, "(0..len).to_a.sort_by{rand}" is too expensive
when the sequence is long. So, I changed directly modifying
duplicated sequence, and fixed the bugs, based on your code.

Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org



More information about the BioRuby mailing list