[GSoC] GSoC Project Update -- 10

Wibowo Arindrarto w.arindrarto at gmail.com
Wed Jul 18 19:49:37 UTC 2012


Hi everyone,

I've just posted two new updates for my GSoC project, here:
http://bow.web.id/blog/2012/07/parsing-blast-plain-text-files-in-searchio/
and here: http://bow.web.id/blog/2012/07/exonerate-in-searchio/

The first one is about a somewhat unofficial new format to be
supported by SearchIO: the BLAST plain text output. I know that
current Biopython text parser is obsoleted, but I figure it still
could be useful for some to have a similar model in SearchIO.

It is unofficial since it's basically a wrapper around the current
parser, and after discussing things with Peter, it doesn't seem wise
to say that we officially support parsing the format. Especially when
NCBI itself does not guarantee a stable style between each BLAST
release. I should note that I've also made a small change to the
current NCBIStandalone code as there were some problems when I try to
parse BLAST 2.2.26+ text output with multiple queries.

The second one, is about the program I've been spending most of my
time on: Exonerate. We now have three Exonerate formats that SearchIO
can parse and index: `exonerate-text`, for human-readable aligments,
`exonerate-vulgar`, for vulgar lines, and `exonerate-cigar`, for
vulgar lines. It's one of the more interesting formats I've been
working on so far :), since it has so much information in it. I've
tried to capture them as sensible as possible, and I made a small
demonstration using it in my post.

In addition to writing these two formats, I've also written their tests.

Now, having finished almost all of the parsers, I'm planning to devote
more time to start writing the documentation during the coming weeks.

regards,
Bow



More information about the GSoC mailing list