[BioRuby] GSoC weekly status report No.3

Marjan Povolni marian.povolny at gmail.com
Mon Jun 11 20:52:05 UTC 2012


http://blog.mpthecoder.com/post/24904798973/gsoc-weekly-status-report-no-3

My first report as a Master of Computer Engineering and Communications :)

Here is a list with what I’ve been working on the last week:

more cleanup and refactoring validation code, README etc,
made a validation utility in D, which simply reports problems found to
stderr,
made a benchmark tool with -v option for measuring parser speed with and
without validation,
after having a basic benchmark tool, found a few places which were very bad
for performance. After fixing that code, parsing a 233MB GFF3 file on a
five year old PC took 6 seconds, but without validation, and with only a
single thread, and replacing escaped characters turned off,
made replacing escaped characters optional, because the current
implementation requires creation of additional string objects to do that,
which has a big impact on performance. There is a plan for making it
faster, but is scheduled for later,
added minimal parallelisation, by reading the file in a separate thread.

Two additional days were spent on a segmentation fault in the D garbage
collector which occured when parsing a big file with a lot of errors. That
should never happen, as I’m using the safe part of the D language, that is
no pointers or anything similar. The worst that should happen is an
exception. But a segmentation fault points to an error in either the
compiler, the runtime or support library.

The minimum reproducible example is still 42 lines long:

https://gist.github.com/2911818

but changing anything in it makes the segmentation fault go away. More info
on this topic can be found in the discussion here:

https://github.com/mamarjan/bioruby-hpc-gff3/issues/31

I’ll be probably posting a bug report on the Dlang webpage tomorrow.

For the coming week I would like to add more parallelisation, change the
validation code so that exceptions almost never happen (and the seg fault
also) and add support for merging records into features.

--
Marjan




More information about the BioRuby mailing list