[Biopython-dev] Proposed addition to Standalone BLAST

Brad Chapman chapmanb at arches.uga.edu
Sun Oct 29 11:09:28 EST 2000


[I was having problems with BLAST failing on really low-quality 
sequences from GenBank]

Jeff:
> I'm not sure what's going on, but it looks like BLAST may be masking out
> low-complexity regions and ending up with little or nothing to search
> with.  Unfortunately, there's nothing in the output that clearly tells us
> what's going on.  For example, it'd be nice if there were a message
> explaining why the parameters are missing.

Agreed. The BLAST report doesn't look like there is really any problem 
(it just looks like it didn't find any hits). There are error messages 
in the xterm when you are running it from the command line, but they
aren't very helpful either, since they don't have any info about which 
sequences are failing.
 
> Although something's clearly wrong here, I'm hesitant to try and diagnose
> the error within the parser.  I don't know what's a real syntax error and
> what's a BLAST error.

This is a very good point. We don't want to cluter the parser trying
to deal with BLAST errors.

> However, perhaps we can push the error detection higher up.  Possible 
> solutions might be:
> 1) developed a Parser that could catch a SyntaxError, do some diagnostics
> on the Record, and then raise a BlastError

I really like this option, and think this is a good way to go. I have
been doing something semi-similar to find the bad records in my big
BLAST files, which basically involves:

1. Using the iterator (without a parser) to grab records one at a time 
from the file. 

2. Copying the handle so we can parse it and have an extra copy to
work with later.

3. Parse the record I got.
   If I get a SyntaxError, figure out what is wrong with the record
(right now I've just been writing it out to a file.

I actually wrote about this in the documentation (section 3.1.7) so
that should give you a better idea of what exactly I'm trying to do.

What do you think about generalizing this somehow to get the kind of
functionality you are talking about? I'm not sure if there is a better 
way to do it, and I don't know how much overhead is introduced by
copying the handle. So I'm very open to suggestions on this...

Thanks!

Brad






More information about the Biopython-dev mailing list