[BioRuby] Parsing large Blast xml files - a new bioruby plugin

Pjotr Prins pjotr.public14 at thebird.nl
Wed Jun 1 07:30:16 UTC 2011


Hi Rob,

Why did you not start from my lazy fast and big-data XML parser for
BLAST?

  https://github.com/pjotrp/blastxmlparser

I hear it is being used in the NGS plugin. Be good to do some
performance tests, when you introduce something new.

I have a feeling you were simply not aware of it. 

Pj.

On Wed, Jun 01, 2011 at 03:17:30PM +0800, Rob Syme wrote:
> I've written a quick bioruby plugin to help parse blast results that
> are too large to fit into memory.
> 
> Install: gem install bio-lazyblastxml
> Code: github.com/robsyme/bioruby-lazyblastxml
> Blog post: biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/
> 
> The plugin uses LibXML::Reader to iterate through nodes, yielding ruby
> objects when required.
> The interface is as close to Bio::Blast::Report as I could keep it,
> but there are a few changes:
>   Iteration.hits, hit.hsps etc do not return arrays. Instead, Report
> is a enumerable that yields iterations, Iteration is an enumerable
> that yields hits, Hits are enumerables that yield hsps, etc.
> 
> This is my first attempt real shared code, and all comments and
> criticism are very welcome.
> 
> -r
> 
> Rob Syme
> PhD Candidate
> Curtin University
> Western Australia
> 
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
> 



More information about the BioRuby mailing list