[BioRuby] Parsing large Blast xml files - a new bioruby plugin

Rob Syme rob.syme at gmail.com
Wed Jun 1 07:17:30 UTC 2011


I've written a quick bioruby plugin to help parse blast results that
are too large to fit into memory.

Install: gem install bio-lazyblastxml
Code: github.com/robsyme/bioruby-lazyblastxml
Blog post: biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/

The plugin uses LibXML::Reader to iterate through nodes, yielding ruby
objects when required.
The interface is as close to Bio::Blast::Report as I could keep it,
but there are a few changes:
  Iteration.hits, hit.hsps etc do not return arrays. Instead, Report
is a enumerable that yields iterations, Iteration is an enumerable
that yields hits, Hits are enumerables that yield hsps, etc.

This is my first attempt real shared code, and all comments and
criticism are very welcome.

-r

Rob Syme
PhD Candidate
Curtin University
Western Australia




More information about the BioRuby mailing list