[BioRuby] GSoC weekly status report No.1

Sat May 5 13:07:30 UTC 2012

Hello all,

It might be a little early, but there has been so much going on in the last
10 days since the results of GSoC were published...

http://blog.mpthecoder.com/post/22380853664/gsoc-weekly-status-report-no-1

A short summary:

It has been 10 days since the GSoC results were published, and a lot has
happened since then. I got to know the other students and mentors in a
longish meeting on Google hangout, I got into a discussion with my mentor
on IRC in which we didn’t agree about the parallelization strategy for the
parser (experiments will show who’s right) and my inbox is full with mails
from my mentor and other students, in which we exchanged loads of
interesting ideas. Also, I solved a bug in biogems.info website, which was
stopping Pjotr from updating the website with new information about biogems.

There is now a GitHub repository for my project:

https://github.com/mamarjan/bioruby-hpc-gff3

The work for the first week of coding is halfway done too.

There seems to be huge interest for a GFF3 parser with more features, like
indexing, random access and writing output, and also support for linking
into trees of features that are not located close to each other in the
file. A fast sequential parser could be used to generate indexes, and the
lower-level parts can be used to reorder the file for faster future usage.
Based on that, I think this project is a good start.

*I would like to ask you if you’re using the GFF3/GTF file formats in your
research, to send me example files and descriptions of how are your
applications using the data. This way I’ll be able to test the parser
against your files and optimize it for your applications. Currently I have
GFF files from Ensembl and Wormbase, and Pjotr pointed me to the genome
browser web application at wormbase.org.*

--
Marjan