[BioRuby] Bringing the fun back to programming! (The first BioRuby IRC conference on Dec 19th)

Pjotr Prins pjotr.public14 at thebird.nl
Mon Dec 13 18:16:59 UTC 2010


On Mon, Dec 13, 2010 at 06:57:39PM +0100, Raoul Bonnal wrote:
> > Part of how we try to handle big data files in Biopython is using
> > Python iterators, whereby the file is loaded record by record (how
> > depends on the file format - for BLAST we do this query by query),
> > not all into memory in one go. I think BioPerl does something very
> > similar in their parsers, I'm not so familiar with BioJava.

BioJava uses a visitor pattern. In effect an iterator.

With all current implementations IO runs, then code, the IO, etc.
While we are IO constrained, we are actually doing worse.

What I want is an IO thread going at maximum throughput. Every item
should get parcelled out for further parsing and processing, in
parallel to the IO thread.

We should do better, and make it a generalization. I think we can do
it by using Scala and the standard BioJava iterators. With Scala it
can be turned in a parallelized iterator. That is a fun project.

> From my point of view Python guys are doing a very good job on all fields.
> Unfortunately I'm in love with ruby :-)

All you need is love :)

Pj.



More information about the BioRuby mailing list