[BioRuby] Bio::Faster plugin

Francesco Strozzi francesco.strozzi at gmail.com
Wed Jan 4 09:50:14 UTC 2012


Hi guys,

I have created a BioRuby plugin called bio-faster, that implements a fast
and simple parser for FastA and FastQ files. It's based on the C library
Kseq written by Heng Li (author of Samtools and BWA). Compared to
Bio::FastQ it is actually 4-5 times faster in parsing large FastQ files.
The code will not create a Bio object for each sequence but it will return
a simple array with sequence data and quality values for FastQ (it supports
Sanger/Phred format only).
Bio::Faster could be a good choice when you just need to parse huge files,
for example to extract information or to store sequence data in a database,
and you don't need to create an object for each sequence but you only want
to parse the dataset easily and quickly.

Here is the code: https://github.com/fstrozzi/bioruby-faster
Here is the wiki for more details:
https://github.com/fstrozzi/bioruby-faster/wiki
To get the gem: gem install bio-faster

Tested with Ruby 1.9 only.

Any comment or feedback is much appreciated!

Cheers

-- 

Francesco



More information about the BioRuby mailing list