[BioRuby] Parsing large Blast xml files - a new bioruby plugin

Philipp Comans philipp.comans at googlemail.com
Wed Jun 1 08:25:37 UTC 2011


Hi,

I had a similar problem recently. I needed an efficient parser for Blast XML results and I discovered that the default parser in BioRuby was not suitable. So I wrote my own using Nokogiri.
In my opinion it is way too hard at the moment to discover BioPlugins. When people use the default XML or GFF parser that comes with BioRUby, they do not expect that there is another, more efficient version. There should be a section on the front page or even in the corresponding parts of the API documentation that makes people aware of the existence of these efficient parsers.

BTW thank you all for BioRuby, I used in a project recently and it made my life tremendously easier.

Cheers,

Philipp

Am Mittwoch, 1. Juni 2011 um 10:07 schrieb Rob Syme:

> You're right, I hadn't seen your project. My mistake.
> -r
> 
> On Wed, Jun 1, 2011 at 3:30 PM, Pjotr Prins <pjotr.public14 at thebird.nl (mailto:pjotr.public14 at thebird.nl)> wrote:
> > Hi Rob,
> > 
> > Why did you not start from my lazy fast and big-data XML parser for
> > BLAST?
> > 
> > https://github.com/pjotrp/blastxmlparser
> > 
> > I hear it is being used in the NGS plugin. Be good to do some
> > performance tests, when you introduce something new.
> > 
> > I have a feeling you were simply not aware of it.
> > 
> > Pj.
> > 
> > On Wed, Jun 01, 2011 at 03:17:30PM +0800, Rob Syme wrote:
> > > I've written a quick bioruby plugin to help parse blast results that
> > > are too large to fit into memory.
> > > 
> > > Install: gem install bio-lazyblastxml
> > > Code: github.com/robsyme/bioruby-lazyblastxml (http://github.com/robsyme/bioruby-lazyblastxml)
> > > Blog post: biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/ (http://biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/)
> > > 
> > > The plugin uses LibXML::Reader to iterate through nodes, yielding ruby
> > > objects when required.
> > > The interface is as close to Bio::Blast::Report as I could keep it,
> > > but there are a few changes:
> > >  Iteration.hits, hit.hsps etc do not return arrays. Instead, Report
> > > is a enumerable that yields iterations, Iteration is an enumerable
> > > that yields hits, Hits are enumerables that yield hsps, etc.
> > > 
> > > This is my first attempt real shared code, and all comments and
> > > criticism are very welcome.
> > > 
> > > -r
> > > 
> > > Rob Syme
> > > PhD Candidate
> > > Curtin University
> > > Western Australia
> > > 
> > > _______________________________________________
> > > BioRuby Project - http://www.bioruby.org/
> > > BioRuby mailing list
> > > BioRuby at lists.open-bio.org (mailto:BioRuby at lists.open-bio.org)
> > > http://lists.open-bio.org/mailman/listinfo/bioruby
> 
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org (mailto:BioRuby at lists.open-bio.org)
> http://lists.open-bio.org/mailman/listinfo/bioruby





More information about the BioRuby mailing list