[Bioperl-l] Bio::SearchIO additions

Jason Eric Stajich jason@cgt.mc.duke.edu
Thu, 6 Dec 2001 17:19:43 -0500 (EST)


(Steve C I couldn't wait - so I got started on this)

I've commited Bio::SearchIO::blast which parses plain text blast files
from both ncbi and wublast.  Appropriate tests have been added to
t/SearchIO.t for this.  It has not been tested on anything other than
blastp reports right now so understand that it is experimental code.  That
said, if you wanted to help out and add that functionality here, you are
more than welcome!

This is part of the new event based parsing for reports.  It is a little
strange looking as we have one object throwing events (Bio::SearchIO and
subclasses) and another processing them
(Bio::SearchIO::SearchResultEventBuilder).

My implementation is also a little weird in that it works on the
granularity of the objects in a DB Search report - Reports, Subjects, and
HSPs.  It also reads a whole report in to memory instead of parsing each
HSP at a time (not the entire blast report file for all seqs, however).
This behaves similar to the blastxml (since I wrote that first).  Perhaps
if you think it is really crufty, it could be looked at by someone who
loves event based parsing and can provide improvements.

Future directions include adopting a FASTA parser to this framework and
support writing output in the various formats and flavors - most likely
first - xml (NCBI blast DTD unless there is a better one) and html (ala
Steve C's html output from Bio::Tools::Blast).

(Yes Aaron M, this could be the long awaited for FASTA -> XML converter
you've been waiting for ... =).

-jason

-- 
Jason Stajich
Duke University
jason@cgt.mc.duke.edu