[BioRuby] Parsing large Blast xml files - a new bioruby plugin

Rob Syme rob.syme at gmail.com
Wed Jun 1 08:07:13 UTC 2011


You're right, I hadn't seen your project. My mistake.
-r

On Wed, Jun 1, 2011 at 3:30 PM, Pjotr Prins <pjotr.public14 at thebird.nl> wrote:
> Hi Rob,
>
> Why did you not start from my lazy fast and big-data XML parser for
> BLAST?
>
>  https://github.com/pjotrp/blastxmlparser
>
> I hear it is being used in the NGS plugin. Be good to do some
> performance tests, when you introduce something new.
>
> I have a feeling you were simply not aware of it.
>
> Pj.
>
> On Wed, Jun 01, 2011 at 03:17:30PM +0800, Rob Syme wrote:
>> I've written a quick bioruby plugin to help parse blast results that
>> are too large to fit into memory.
>>
>> Install: gem install bio-lazyblastxml
>> Code: github.com/robsyme/bioruby-lazyblastxml
>> Blog post: biolateral.wordpress.com/2011/05/31/parsing-huge-blast-files-with-bioruby/
>>
>> The plugin uses LibXML::Reader to iterate through nodes, yielding ruby
>> objects when required.
>> The interface is as close to Bio::Blast::Report as I could keep it,
>> but there are a few changes:
>>   Iteration.hits, hit.hsps etc do not return arrays. Instead, Report
>> is a enumerable that yields iterations, Iteration is an enumerable
>> that yields hits, Hits are enumerables that yield hsps, etc.
>>
>> This is my first attempt real shared code, and all comments and
>> criticism are very welcome.
>>
>> -r
>>
>> Rob Syme
>> PhD Candidate
>> Curtin University
>> Western Australia
>>
>> _______________________________________________
>> BioRuby Project - http://www.bioruby.org/
>> BioRuby mailing list
>> BioRuby at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioruby
>>
>




More information about the BioRuby mailing list