[BioRuby] [GSoC] Weekly report No 0.5

Artem Tarasov lomereiter at googlemail.com
Sun May 13 20:10:45 UTC 2012


Hi all,

this is yet another GSoC report.

During last week, I was mainly concentrated on D part of the project,
adding functionality to it. I implemented parsing of the whole BAM file :)
Today I wrote a simple utility in
D<https://github.com/lomereiter/BAMread/blob/master/examples/bam2sam.d>,
which uses my library to convert BAM to SAM. It doesn’t work with array
tags yet, and not as fast as samtools, but nevertheless… On a couple of BAM
files from test/data directory (namely, bins.bam and ex1_header.bam) the
output is identical to that of samtools view — I checked with diff — and
that kinda proves that everything works fine. Speed issues are mainly due
to using std.variant module for storing tags. It uses runtime reflection
which is quite slow. Maybe, there’re some other reasons. Anyway, I’m going
to write my own tagged union type next week, it should improve the
performance quite a bit, and also fix design flaws.

For testing tag parsing, I used file tags.bam provided to me by Peter Cock.
It contains tests for all types of tags, and my library successfully passes
them. Later I’ll experiment with possible speed improvements, and having
unit tests covering full range of possible tag types is a must.

Also, I downloaded and compiled gdc from trunk. It provides decent
performance, not worse than dmd, at least. We expect gdc to gain shared
library support in the next two months. Before that happens, we have to use
dmd, although there’re some issues with its garbage collector, causing
segfaults. We discussed that with Marjan and Pjotr and decided that the
best option in such circumstances would be to disable GC during development
— testing library on small files won’t consume much memory anyway.

Another thing I downloaded and compiled, is Rubinius. I’m going to
investigate why it hangs on BioRuby unittests in 1.9 mode. Another mode,
1.8, seems to work fine except maybe some very minor bugs.

During next week, I’m going to learn how to use Cucumber and Rspec, improve
D library performance a little, and start to write Ruby bindings. So it
will be mostly ‘Ruby week’ ;)



--
Artem




More information about the BioRuby mailing list