[BioRuby] Ruby speed

Robert Citek robert.citek at gmail.com
Wed Nov 4 01:32:12 UTC 2009


On Tue, Nov 3, 2009 at 9:11 AM, Yannick Wurm <yannick.wurm at unil.ch> wrote:
> this is a more general ruby question, but since my application is
> bioinformatics, I'm posting it here.
>
> Just wanted to prepend a few characters in front of FASTA identifiers.
>
> $time cat clustering/dirsForAssembly/singlets.fasta | ruby -pe "gsub(/^>/,
> '>MyPrefix')" > abc
>        real    0m20.379s
>        user    0m0.741s
>        sys     0m0.168s
>
>
> While the perl equivalent is one heck of a lot faster!!!
>
>
> $time cat clustering/dirsForAssembly/singlets.fasta | perl -p -i -e
> 's/^>/>MyPrefix/g' > ab
>        real    0m2.165s
>        user    0m0.266s
>        sys     0m0.146s
>
>
> Is there any hope for ruby?

I get a factor of about three on a 10,000,000 line FASTA file:

$ time -p yes ">foo"$'\n'"bar" | head -10000000 | ruby -pe "gsub(/^>/,
'>MyPrefix')" > /dev/null
real 42.99
user 43.39
sys 0.63

$ time -p yes ">foo"$'\n'"bar" | head -10000000 | perl -pe
's/^>/>MyPrefix/g' > /dev/null
real 15.89
user 16.33
sys 0.26

This is with perl 5.8.8 and ruby 1.8.6 on a dual 1.6 GHz CPU with 512 MB RAM.

Notice your user and system times are less than a factor of three.
It's only the real time that is 10x, which suggests that ruby is
waiting on other processes, e.g. disk reads.

Regards,
- Robert




More information about the BioRuby mailing list