[Biojava-l] getting % identity from blast search results
Fri, 07 Jun 2002 08:18:48 +0100
Susan Glass wrote:
> Hello there,
> Last year I wrote a class that parsed a BLAST search file and returned a List of SequenceDBSearchResults, using BlastLikeSAXParser and BlastLikeSearchBuilder. I modelled it on demo code that David Waring had written (thanks!). Once the results were returned, I picked the hits that were below a given evalue cutoff.
> My problem now is that the client has requested that instead of an evalue cutoff, the program should pick all hits that match, say, with 90% identity over 90% of the query sequence length. I'm not sure how to get the identity information, since SequenceDBSeachHit and SequenceDBSearchSubHit don't have fields for this info. In browsing through the demos it seems that blast2html has a handler that creates its own classes (HSP, HSPSummary) that accesses this info, which might be a good starting point for new code.
> I was wondering if anyone else has already encountered this problem, and could possibly point me in the right direction. Any help would be appreciated.
So, your changing needs are exactly the reason why it's a good idea to
use a SAX approach to parsing i.e. one object does not fit all needs.
Your kind of requirement is not unusual at all. So, I would say don't be
afraid to make your own objects, and populate them from SAX events
passed to your own ContentHandler. These objects don't have to be
reusable across loads of use cases, they just have to meet your needs.
Certainly, the Blast2HTML stuff is a good place to start, because those
classes extact pretty much everything that the BlastLikeSAXParser throws
out. In fact, one of the reasons we put the HTML rendering stuff in was
to provide an example of how to write a "mother of all ContentHandlers"
for events produced by the BlastLikeSAXParser.
Simon M. Brocklehurst, Ph.D.
Head of Bioinformatics & Advanced IS
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK