[Biojava-l] Blast-xml parser

Simon Brocklehurst simon.brocklehurst@CambridgeAntibody.com
Wed, 06 Mar 2002 18:43:17 +0000


Ewan Birney wrote:

> On Wed, 6 Mar 2002, Jason Stajich wrote:
>
> > #!/usr/bin/perl -w
> > use strict;
> >
> > use Bio::SearchIO;
> > use Bio::SearchIO::Writer::HTMLResultWriter;
> >
> > my $in = new Bio::SearchIO(-format => 'blastxml',
> >                            -file   => shift @ARGV);
> >
> > my $writer = new Bio::SearchIO::Writer::HTMLResultWriter();
> > my $out = new Bio::SearchIO(-writer => $writer);
> > $out->write_result($in->next_result);
> >
>
> <sigh>. That's lovely code. I can't believe you Java guys would really
> want to have to make that an object.

Not sure what you mean by "have to make an object".  That Perl code is
indeed elegant. *If* we had the correct functionality in BioJava (which we
don't as of this moment i.e. right now we don't handle Blast XML format -
rather, we handle Blast pairwise output only), then the BioJava code would
look like:

java eventbasedparsing.BlastLike2HTML <input file pathname>

You can throw NCBI Blast, Wu-Blast, HMMER  and other output at this, and
get HTML output. That is, the file format would be auto-detected (both
program and version).

The BioJava Blast-like HTML rendering stuff has a number of (uncommon)
benefits:

o Highly configurable/pluggable look and feel (including coloring/markup
of sequence alignments)

o Fast to render with very large output streams on legacy browsers (e.g.
Netscape 4.7)

o Prints reasonably well from a wide variety of browsers/hardware/OS
combinations e.g. IE;Netscape;MacOS 9/X;MS Windows/Solaris

An "out of the box" example HTML output is at:

http://www.biojava.org/tutorials/blastlikeParsingCookBook/blastp.html


> Beautiful... I'm really impressed by the SearchIO stuff (SearchIO is
> Jason's baby inside Bioperl).
>
> <grin>
> Can you beat that BioJava?
> </grin>

Hmmmm... I think the easiest way to deal with this in a scalable (in the
sense of dealing with large outputs) way to do this would be for someone
to write a ContentHandler that takes Blast XML format as input and which
outputs SAX2 events (or just XML) that comply with the BioJava BlastLike
DTD.   No-one has done this yet - I've been meaning to do this for about a
year. Never got to the top of the priority list.

Simon
--
Simon M. Brocklehurst, Ph.D.
Head of Bioinformatics & Advanced IS
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK
http://www.CambridgeAntibody.com/
mailto:simon.brocklehurst@CambridgeAntibody.com