[Biojava-l] Blast Parsing Framework re: was biojava Blast parser proposal

Wayne Parrott wayne@workingobjects.com
Fri, 18 Feb 2000 18:54:23 -0600


Biojava-ers:

Sorry about the late entry into this Blast Parsing thread. Terry
Tripplet and Matthew Pocock turned me on to the recent activities of
this list. I've been consulting off and on in the bioinformatic field
for the last 8 years, mostly heavy object-technology, corba stuff. Due
to a number of reoccuring tasks that required manipulation of Blast
results from Java, I developed and maintained the Blast Parsing
Framework and BlastXML in late '98 - mid '99. I recall the SAX-based
parser idea has been around for at least a couple of years (work at
Sequana, now AxyS). While these facilities don't currently support every
feature wished for in Simon's feature list they do provide the
following:

Blast Parsing Framework (BPF)
1) SAX-like architecture for converting a blast report into an event
stream
2) Blast object model (optional result type)
3) several example event-handlers for processing a blast event stream
into BlastXML document, html, builder for blast object model, and simple
text report
4) programmer and javadoc documentation
5) sample test suite and test script
6) source code and redistribution license


BlastXML builds upon BPF
1) BlastXML DTD
2) BPF event-handler adapter architecture to allow reuse of BFP
event-handlers, i.e., a BFP-to-SAX event handler adapter
3) experiemental IE5-specific XSL, for example see
http://www.workingobjects.com/blastxml/sv40_short_ncbi.xml
4) programmer and javadoc documentation
5) sample test suite and test script

When I last checked the BPF supported NCBI web and email reports from
the NCBI Blast version 1.4-2.0.8. I was planning to provide support for
a wider range of formats but I just ran out of time and sweat. It also
does not interface directly to any blast executive.

It's not a perfect implementation but I believe it to be a decent start.
As I told Matthew and Terry via email I'm open to licensing it under the
biojava org's licensing criteria. As Simon points out there is not a
Blast result parser/reformatter package with all of the desired
features. The amount of work to develop from scratch this feature list
is not a trivial or a weekend effort. I'm open to working with the group
using the two products as a base for a "rape and paste"
redesign/refactoring effort to build a more robust and comprehensive
parser base.

In '97 I worked on a project for Baylor's Genome Center that resulted in
a highly functional prototype architecture by the name of SearchBroker.
This system provided CORBA interfaces to an execution and parsing
serivce/framework for Blast, FASTA, SW, and Beauty results. If I were
starting over from scratch I would try to ensure that my architecture
would support inclusion of other parsers. 

Wayne

A side note: I did crack open the NCBI Blast code and add some hooks for
generating BlastXML about 18 mo. ago. Not owning the codebase made
version maintenance a pain; plus I changed the BlastXML DTD along the
way. End result was an obsolete codebase.
-- 
-----------------------------------------------------------------------
 Wayne Parrott                   email: wayne@workingobjects.com      |
 WorkingObjects.com              voice: (972)491-3704                 |
                                 web: http://www.workingobjects.com   |
----------------------------------------------------------------------- 
 "The main thing, is to keep the main thing, the main thing" 
   lyrics by Scott Krippayne