[Biopython-dev] Blast Parser error

Brad Chapman chapmanb at uga.edu
Mon Jan 5 17:34:01 EST 2004


Hello;
Sorry about the delay in responding -- I haven't been touching a
computer for the past couple of weeks; very good for my mental
health.

> I got a problem with the Blast parser
[...]
> I use a sliding window to extract different parts of the n sequences
> and, how stupid not to check it but I did not exepted, one sequence was much
> shorter, so I sent to blast a sequence of zero length :-(

I ran into a similar problem a while back in my own work in which I
was BLASTing some bad quality sequences which were completed
screened out -- the results of doing this look very similar to what
you've included below.

The solution I devised at the time was the BlastErrorParser, which
you use exactly like the BlastParser.

So instead of:

b_parser=NCBIStandalone.BlastParser()   # appel du parser

you would do:

b_parser = NCBIStandalone.BlastErrorParser()

The error parser tries to diagnose errors and if it sees a
recognizable error will raise NCBIStandalone.LowQualityBlastError,
which at least lets you know that you've reached broken BLAST output
and can toss it away safely.

So you're loop would look like:

while 1:
    try:
        b_record = b_iter.next()
    except NCBIStandalone.LowQualityBlastError:
        # deal with getting this error however you like

[...]
> blast output, section concerned
> BLASTN 2.2.2 [Dec-14-2001]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= lcl|97633|sp=CYB 296
>          (0 letters)
> 
> Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P
>            90 sequences; 14,728 total letters
> 
> 
> 
>  ***** No hits found ******
> 
>   Database: D:\Bases\Bac16S\BLAST\Chapon_4133-P
>     Posted date:  Dec 19, 2003  4:42 PM
>   Number of letters in database: 14,728
>   Number of sequences in database:  90

However, saying all that I'm not sure if the error parser will pick
up your problem, and I wonder if there isn't a line missing between
the Database and 'No hits found' lines. Normally BLAST should have a
line starting with 'Searching' here -- in the case of my bad
sequences you'd see:

Searchingdone

instead of the normal:

Searching.............done

If this is not the case perhaps we can find some other way to
diagnose these files as having the error. If you'd attach the BLAST
output from just this sequence we could work on that.

Hopefully this helps!
Brad



More information about the Biopython-dev mailing list