[Biojava-l] Parsing a BLAST file

Keith James kdj@sanger.ac.uk
05 Nov 2001 09:41:45 +0000


>>>>> "Susan" == Susan Glass <SGlass@genetics.com> writes:

[...]

    Susan> This works fine, but I believe I should be able to use
    Susan> objects in the org.biojava.bio.search package to avoid
    Susan> writing my own content handler.  Is this true? If so, could
    Susan> someone please point me to some example code (or other
    Susan> help) that starts with a BLAST output file, and ends with
    Susan> hit objects?  Unfortunately I'm having trouble choosing the
    Susan> correct classes to use with only the javadocs as a guide.
    Susan> I found a mail list posting from August that deals with
    Susan> this(
    Susan> http://biojava.org/pipermail/biojava-l/2001-August/001480.html
    Susan> ), but it uses a package org.biojava.bio.program.ssbind
    Susan> that I don't have and can't find.

Hi Susan,

Here goes,

 org.biojava.bio.search

The package currently contains classes for representing search data
(which are not specific to any search program or algorithm), but it
does not deal with getting/parsing/interpreting the data from the
search program.

 org.biojava.bio.program.sax

Which as you know, does the SAX stuff for searches (and other files)

(For completeness, org.biojava.bio.program.search is a more primitive
and less flexible way to approach some of the tasks the SAX package
achieves. It did (and still does) use the org.biojava.bio.search
classes to store its results).

There was an obvious gap between the two; you could parse the search
output with the SAX package, but had no way of making the result/hit
objects in org.biojava.bio.search. The reason you can't find the
org.biojava.bio.program.ssbind package which does this is probably
that you have downloaded an older release.

org.biojava.bio.program.ssbind exists in the current CVS head and the
September 20th build on the ftp site (I'd recommend checking out the
CVS version as then you get some tests too).

There is a SAX handler in there called SeqSimilarityAdapter which
convertes the SAX events into method calls on an
org.biojava.bio.program.search.SearchContentHandler. Currently two
implementations of the SearchContentHandler interface are provided

BlastLikeSearchBuilder

which is the one which will build the org.biojava.bio.search.*
results. Also present is

BlastLikeHomologyBuilder

which builds org.biojava.bio.seq.homol.Homology objects
instead.

If you do get the CVS version you'll find a whole bunch of working
Blast -> search objects and Fasta -> search objects in the ssbind
tests subdirectory.

I hope this is of some help.

cheers,

Keith

-- 

-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Cambridge, UK