[BioPython] blast parser

Brad Chapman chapmanb@arches.uga.edu
Mon, 25 Nov 2002 12:26:03 -0500


--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hey Ken;

> I encountered an error during a Blast parse:
[...]
> SyntaxError: Expected blank line, but got:
>            1,455,628 sequences; 7,234,536,489 total letters

I actually got this error myself yesterday when I was playing around
with the examples and put a fix into CVS. See, I promised to only use
this time machine for good :-).

> The following change seems to fix this error.
> 
> Bio.Blast.NCBIWWW.py line 174
> -        read_and_call(uhandle, consumer.noevent, blank=1)
> -        read_and_call(uhandle, consumer.noevent,
> -                      contains='problems or questions')
> +        read_and_call_until(uhandle, consumer.noevent,
> +                      contains='problems or questions')
> +        read_and_call(uhandle, consumer.noevent)

The only problem with this is that it throws away the database
information, which we do store. The fix I used, in CVS, is attached as a
diff. This should also be in the new release, due out real soon now.

Thanks for reporting this!
Brad

--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="NCBIWWW.diff"

Index: NCBIWWW.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Blast/NCBIWWW.py,v
retrieving revision 1.24
retrieving revision 1.25
diff -c -r1.24 -r1.25
*** NCBIWWW.py	2002/09/22 05:25:29	1.24
--- NCBIWWW.py	2002/11/24 18:52:11	1.25
***************
*** 168,176 ****
          attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
          read_and_call(uhandle, consumer.database_info, contains='Database')
          # Sagar Damle reported that databases can consist of multiple lines.
          read_and_call_until(uhandle, consumer.database_info,
!                             contains='sequences')
!         read_and_call(uhandle, consumer.database_info, contains='sequences')
          read_and_call(uhandle, consumer.noevent, blank=1)
          read_and_call(uhandle, consumer.noevent,
                        contains='problems or questions')
--- 168,178 ----
          attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
          read_and_call(uhandle, consumer.database_info, contains='Database')
          # Sagar Damle reported that databases can consist of multiple lines.
+         # But, trickily enough, sometimes the second line can also have the
+         # word sequences in it. Try to use 'sequences;' (with a semicolon)
          read_and_call_until(uhandle, consumer.database_info,
!                             contains='sequences;')
!         read_and_call(uhandle, consumer.database_info, contains='sequences;')
          read_and_call(uhandle, consumer.noevent, blank=1)
          read_and_call(uhandle, consumer.noevent,
                        contains='problems or questions')

--X1bOJ3K7DJ5YkBrT--