[BioRuby] Bringing the fun back to programming! (The first BioRuby IRC conference on Dec 19th)
Chris Fields
cjfields at illinois.edu
Mon Dec 13 21:48:23 UTC 2010
On Dec 13, 2010, at 12:16 PM, Pjotr Prins wrote:
> On Mon, Dec 13, 2010 at 06:57:39PM +0100, Raoul Bonnal wrote:
>>> Part of how we try to handle big data files in Biopython is using
>>> Python iterators, whereby the file is loaded record by record (how
>>> depends on the file format - for BLAST we do this query by query),
>>> not all into memory in one go. I think BioPerl does something very
>>> similar in their parsers, I'm not so familiar with BioJava.
>
> BioJava uses a visitor pattern. In effect an iterator.
>
> With all current implementations IO runs, then code, the IO, etc.
> While we are IO constrained, we are actually doing worse.
>
> What I want is an IO thread going at maximum throughput. Every item
> should get parcelled out for further parsing and processing, in
> parallel to the IO thread.
>
> We should do better, and make it a generalization. I think we can do
> it by using Scala and the standard BioJava iterators. With Scala it
> can be turned in a parallelized iterator. That is a fun project.
>
>> From my point of view Python guys are doing a very good job on all fields.
>> Unfortunately I'm in love with ruby :-)
>
> All you need is love :)
>
> Pj.
At some point the choice of a language will not matter as much, as long as it is implemented in a VM (something Perl 5 cannot claim at the moment, but Perl 6 does with the Parrot VM).
chris
More information about the BioRuby
mailing list