[Biopython-dev] RE: blastpgp parsing buglet

Coleman, Michael MKC at Stowers-Institute.org
Fri Jun 6 16:53:46 EDT 2003


Hi,

It looks like a further change is required on this.  The problem is that when blank lines following 'CONVERGED' (and perhaps in other cases) are not consumed, _scan_alignments will see them and its tests will not work properly.

Mike








--- NCBIStandalone.py~  2003-05-08 13:36:06.000000000 -0500
+++ NCBIStandalone.py   2003-06-06 15:38:31.000000000 -0500
@@ -247,6 +247,7 @@
                 read_and_call_while(uhandle, consumer.noevent, blank=1)

         attempt_read_and_call(uhandle, consumer.converged, start='CONVERGED')
+       read_and_call_while(uhandle, consumer.noevent, blank=1)

         consumer.end_descriptions()

> -----Original Message-----
> From: Coleman, Michael 
> Sent: Thursday, May 08, 2003 1:45 PM
> To: biopython-dev at biopython.org
> Subject: blastpgp parsing buglet
> 
> 
> Parsing by NCBIStandalone.py fails for BLASTP 2.2.5 output.  
> This is the partial output that trips the problem:
> 
> gi|23099742|ref|NP_693208.1| ornithine aminotransferase 
> [Oceanob...   430   e-119
> gi|16081241|ref|NP_393547.1| L-2, 
> 4-diaminobutyrate:2-ketoglutar...   430   e-119
> 
> Sequences not found previously or not previously below threshold:
> 
> >gi|23466947|gb|ZP_00122533.1| hypothetical protein 
> [Haemophilus somnus 129PT]
>           Length = 432
> 
>  Score =  591 bits (1524), Expect = e-167
>  Identities = 191/420 (45%), Positives = 291/420 (69%), Gaps 
> = 7/420 (1%)
> 
> The code expects to see a 'CONVERGED' but none is given here. 
>  One possible fix would be to also look for a line beginning 
> with '>', like so
> 
>             # Read the descriptions and the following blank lines.
>             read_and_call_while(uhandle, consumer.noevent, blank=1)
>             l = safe_peekline(uhandle)
>             if l[:9] != 'CONVERGED' and l[:1] != '>':
>                 read_and_call_until(uhandle, 
> consumer.description, blank=1)
>                 read_and_call_while(uhandle, 
> consumer.noevent, blank=1)
> 
> Mike
> 
> Mike Coleman, Scientific Programmer, +1 816 926 4419
> Stowers Institute for Biomedical Research
> 1000 E. 50th St., Kansas City, MO  64110
> 




More information about the Biopython-dev mailing list