[Biojava-l] Parsing a BLAST file

Susan Glass SGlass@genetics.com
Wed, 14 Nov 2001 13:25:11 -0500


     Thanks to David and Keith for help with the BLAST file parsing.  I have now managed to download the ssbind package and run David's demo program.  It's a big help.  Thanks a lot.
     I did have to alter the BlastLikeSearchBuilder class slightly because of a problem with the makeSubHit() method throwing an exception when it encountered the float "e-121" (instead of 1e-121) in David's sample BLAST output file.  Easily altered, but I wonder if anyone else had this problem.

Thanks,
Susan

>>> "David Waring" <dwaring@u.washington.edu> 11/02 4:19 PM >>>
I have just been doing this type of thing. I like Keith's ssbind classes. I
have attached a demo. Just run it as 'java BlastTest blast.out
blastInput.fasta'.

This parses the blast file and builds SequenceDBSearchResults into a list.
It is a little bit more complicated than that really. But this complication
gives very great functionality. The SearchResultBuilder must have two things
that you might not expect, a SequenceDB with all the query sequences that
blast was called with, and a SequenceDBInstallation which contains a
SequenceDB with the same name as that found in the blast output file, in the
demo this is 'genome'. With these things in place you can get both the
subject, and query sequences of any hit from the SequenceDBSearchResult. I
included a little sample of how to do this below since it is not in the
demo.

But, you say, this is a blast against some foreign database, How can I have
a sequencDB with all this data. The truth is you do not really need it. You
just need an empty SequenceDB with the correct name inside your
SequenceDBInstallation. But then of course you can not get the subject
sequences from the search result.

If anyone is interested I have been working on a tool that will run blast
against a local blast database, and give results back as
SequenceDBSearchResults. It makes a system call to your local blast program
to do this. Because of this it requires a little environment specific
configuration. I have it running on Unix, but it will probably run on
Windows too. It is just about finished. Because of the system specific
issue, I have not planned on putting this into biojava, but if anyone is
interested in it, let me know.

David

example of getting subject and query sequences from a SequenceDBSearchResult
( this is typed in from scratch so I don't guarantee it will complile :)

for (int j = 0; i < resultList.size(); i++){
    SequenceDBSearchResult r = (SequenceDBSearchResult)resultList.get(i);
    Sequence querySeq = (Sequence)r.getQuerySequence();
    SequenceDB subjectDB = r.getSequenceDB();

    hits = r.getHits();
    hitsI = hits.iterator();
    hit = (SeqSimilaritySearchHit)hitsI.next();
    String hid = hit.getSequenceID();
    Sequence subjectSeq = subjectDB.getSequence(hid);
}
---------------------------------------------------------------

> -----Original Message-----

> From: biojava-l-admin@biojava.org [mailto:biojava-l-admin@biojava.org]On 
> Behalf Of Susan Glass

>
> I have just downloaded the Biojava package and made a BLAST
> output file parser according to the tutorial on
> http://biojava.org/tutorials/blastlikeParsingCookBook/index.html 
>
> That means I have written my own XML content handler and put the
> parsing logic into startElement, using the
> BlastLikeDataSetCollection DTD file as a guide to the tag names.
>
> This works fine, but I believe I should be able to use objects in
> the org.biojava.bio.search package to avoid writing my own
> content handler.  Is this true? If so, could someone please point
> me to some example code (or other help) that starts with a BLAST
> output file, and ends with hit objects?  Unfortunately I'm having
> trouble choosing the correct classes to use with only the
> javadocs as a guide.  I found a mail list posting from August
> that deals with
s( 
> http://biojava.org/pipermail/biojava-l/2001-August/001480.html ),