[BioPython] blast parser
Brad Chapman
chapmanb@arches.uga.edu
Mon, 25 Nov 2002 12:26:03 -0500
--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Hey Ken;
> I encountered an error during a Blast parse:
[...]
> SyntaxError: Expected blank line, but got:
> 1,455,628 sequences; 7,234,536,489 total letters
I actually got this error myself yesterday when I was playing around
with the examples and put a fix into CVS. See, I promised to only use
this time machine for good :-).
> The following change seems to fix this error.
>
> Bio.Blast.NCBIWWW.py line 174
> - read_and_call(uhandle, consumer.noevent, blank=1)
> - read_and_call(uhandle, consumer.noevent,
> - contains='problems or questions')
> + read_and_call_until(uhandle, consumer.noevent,
> + contains='problems or questions')
> + read_and_call(uhandle, consumer.noevent)
The only problem with this is that it throws away the database
information, which we do store. The fix I used, in CVS, is attached as a
diff. This should also be in the new release, due out real soon now.
Thanks for reporting this!
Brad
--X1bOJ3K7DJ5YkBrT
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="NCBIWWW.diff"
Index: NCBIWWW.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/Blast/NCBIWWW.py,v
retrieving revision 1.24
retrieving revision 1.25
diff -c -r1.24 -r1.25
*** NCBIWWW.py 2002/09/22 05:25:29 1.24
--- NCBIWWW.py 2002/11/24 18:52:11 1.25
***************
*** 168,176 ****
attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
read_and_call(uhandle, consumer.database_info, contains='Database')
# Sagar Damle reported that databases can consist of multiple lines.
read_and_call_until(uhandle, consumer.database_info,
! contains='sequences')
! read_and_call(uhandle, consumer.database_info, contains='sequences')
read_and_call(uhandle, consumer.noevent, blank=1)
read_and_call(uhandle, consumer.noevent,
contains='problems or questions')
--- 168,178 ----
attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
read_and_call(uhandle, consumer.database_info, contains='Database')
# Sagar Damle reported that databases can consist of multiple lines.
+ # But, trickily enough, sometimes the second line can also have the
+ # word sequences in it. Try to use 'sequences;' (with a semicolon)
read_and_call_until(uhandle, consumer.database_info,
! contains='sequences;')
! read_and_call(uhandle, consumer.database_info, contains='sequences;')
read_and_call(uhandle, consumer.noevent, blank=1)
read_and_call(uhandle, consumer.noevent,
contains='problems or questions')
--X1bOJ3K7DJ5YkBrT--