[BioRuby] Bringing the fun back to programming! (The first BioRuby IRC conference on Dec 19th)

Mon Dec 13 21:48:23 UTC 2010

On Dec 13, 2010, at 12:16 PM, Pjotr Prins wrote:

> On Mon, Dec 13, 2010 at 06:57:39PM +0100, Raoul Bonnal wrote:
>>> Part of how we try to handle big data files in Biopython is using
>>> Python iterators, whereby the file is loaded record by record (how
>>> depends on the file format - for BLAST we do this query by query),
>>> not all into memory in one go. I think BioPerl does something very
>>> similar in their parsers, I'm not so familiar with BioJava.
> 
> BioJava uses a visitor pattern. In effect an iterator.
> 
> With all current implementations IO runs, then code, the IO, etc.
> While we are IO constrained, we are actually doing worse.
> 
> What I want is an IO thread going at maximum throughput. Every item
> should get parcelled out for further parsing and processing, in
> parallel to the IO thread.
> 
> We should do better, and make it a generalization. I think we can do
> it by using Scala and the standard BioJava iterators. With Scala it
> can be turned in a parallelized iterator. That is a fun project.
> 
>> From my point of view Python guys are doing a very good job on all fields.
>> Unfortunately I'm in love with ruby :-)
> 
> All you need is love :)
> 
> Pj.

At some point the choice of a language will not matter as much, as long as it is implemented in a VM (something Perl 5 cannot claim at the moment, but Perl 6 does with the Parrot VM).  

chris