[BioRuby] Biased Bio::Sequence randomize()
Naohisa GOTO
ngoto at gen-info.osaka-u.ac.jp
Fri Oct 17 15:10:38 UTC 2008
Hi,
First, I implemented unit tests for Bio::Sequence::Common,
including chi-square test of equiprobability, and fixed the bug
by using Fisher-Yates shuffle, as suggested by Anders.
http://github.com/bioruby/bioruby/commit/02de70cbf036b41a50d770954f3b16ba2beca880
(Sorry for typo in the commit message)
On Tue, 14 Oct 2008 05:55:41 +0900
Toshiaki Katayama <ktym at hgc.jp> wrote:
> Dear Jacobsen,
>
> > I believe that the current sequence randomization/shuffle method is severely
> > biased, infrequent bases are more likely to occur in the end of the sequence
> > than in the beginning:
>
> You are right.
>
> I had fixed a while ago, but it seems that I forgot to commit to the repository, sorry.
>
> Could you try the following replacement?
It works fine without blocks, but when giving a block,
the bahavior is different from the original one.
In addition, "(0..len).to_a.sort_by{rand}" is too expensive
when the sequence is long. So, I changed directly modifying
duplicated sequence, and fixed the bugs, based on your code.
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org
More information about the BioRuby
mailing list