[BioPython] Re: Blast parsing

R.Austin ryan.austin@utoronto.ca
26 Nov 2002 19:02:08 -0500


Hi 
I think i've found a bug in NCBIStandalone in Python2.2.2

I have some code that was written on my Mandrake box in Python2.0 and
runs perfectly, but when I copy it to a RedHat8 box running Python2.2.2
and the same version of biopython, i get an error.

The code is right out of the biopython tutorial (almost)and just grabs
the first E-value and fasta tag for every blast output file in a
directory.

____________________________________________


from Bio.Blast import NCBIStandalone
import glob

blast_glob = '/home/user_name/blastout/*'
b_parser = NCBIStandalone.BlastParser()

for next_file in glob.glob(blast_glob):
	blast_file = open(next_file, 'r')
	b_iterator = NCBIStandalone.Iterator(blast_file, b_parser)
	b_record = b_iterator.next()
	print b_record.query,
	print '\t E-value: ', b_record.alignments[0].hsps[0].expect

_________________________________________________________

And it gives the error:

Traceback (most recent call last):
  File "./bparse", line 25, in ?
    b_record = b_parser.parse(blast_file)
  File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 515, in parse
    self._scanner.feed(handle, self._consumer)
  File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 84, in feed
    self._scan_rounds(uhandle, consumer)
  File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 140, in _scan_rounds
    self._scan_alignments(uhandle, consumer)
  File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 261, in _scan_alignments
    self._scan_masterslave_alignment(uhandle, consumer)
  File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 364, in _scan_masterslave_alignment
    consumer.multalign(line)
  File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 769, in multalign
    name = string.rstrip(line[:self._name_length])
TypeError: sequence index must be integer


_______________________________________________________

Any help would be appreciated as I really need this to run on the 
Redhat8 box in python2.2.2

Thanks in advance
R.Austin

On Tue, 2002-11-26 at 12:00, biopython-request@biopython.org wrote:
> Send BioPython mailing list submissions to
> 	biopython@biopython.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://biopython.org/mailman/listinfo/biopython
> or, via email, send a message with subject or body 'help' to
> 	biopython-request@biopython.org
> 
> You can reach the person managing the list at
> 	biopython-admin@biopython.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of BioPython digest..."
> 
> 
> Today's Topics:
> 
>    1. blast parser (Ken Sugino)
>    2. Re: blast parser (Brad Chapman)
> 
> --__--__--
> 
> Message: 1
> Date: Mon, 25 Nov 2002 12:12:31 -0500
> From: Ken Sugino <sugino@brandeis.edu>
> To: biopython@biopython.org
> Reply-To: sugino@brandeis.edu
> Subject: [BioPython] blast parser
> 
> Hi all,
> 
> I encountered an error during a Blast parse:
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 47, in parse
>     self._scanner.feed(handle, self._consumer)
>   File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 98, in feed
>     self._scan_header(uhandle, consumer)
>   File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 161, in _scan_header
>     self._scan_database_info(uhandle, consumer)
>   File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 174, in _scan_database_info
>     read_and_call(uhandle, consumer.noevent, blank=1)
>   File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/ParserSupport.py", line 331, in read_and_call
>     raise SyntaxError, errmsg
> SyntaxError: Expected blank line, but got:
>            1,455,628 sequences; 7,234,536,489 total letters
> 
> 
> The following change seems to fix this error.
> 
> Bio.Blast.NCBIWWW.py line 174
> -        read_and_call(uhandle, consumer.noevent, blank=1)
> -        read_and_call(uhandle, consumer.noevent,
> -                      contains='problems or questions')
> +        read_and_call_until(uhandle, consumer.noevent,
> +                      contains='problems or questions')
> +        read_and_call(uhandle, consumer.noevent)
> 
> --__--__--
> 
> Message: 2
> Date: Mon, 25 Nov 2002 12:26:03 -0500
> From: Brad Chapman <chapmanb@arches.uga.edu>
> To: biopython@biopython.org
> Subject: Re: [BioPython] blast parser
> 
> 
> --X1bOJ3K7DJ5YkBrT
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> 
> Hey Ken;
> 
> > I encountered an error during a Blast parse:
> [...]
> > SyntaxError: Expected blank line, but got:
> >            1,455,628 sequences; 7,234,536,489 total letters
> 
> I actually got this error myself yesterday when I was playing around
> with the examples and put a fix into CVS. See, I promised to only use
> this time machine for good :-).
> 
> > The following change seems to fix this error.
> > 
> > Bio.Blast.NCBIWWW.py line 174
> > -        read_and_call(uhandle, consumer.noevent, blank=1)
> > -        read_and_call(uhandle, consumer.noevent,
> > -                      contains='problems or questions')
> > +        read_and_call_until(uhandle, consumer.noevent,
> > +                      contains='problems or questions')
> > +        read_and_call(uhandle, consumer.noevent)
> 
> The only problem with this is that it throws away the database
> information, which we do store. The fix I used, in CVS, is attached as a
> diff. This should also be in the new release, due out real soon now.
> 
> Thanks for reporting this!
> Brad
> 
> --X1bOJ3K7DJ5YkBrT
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: attachment; filename="NCBIWWW.diff"
> 
> Index: NCBIWWW.py
> ===================================================================
> RCS file: /home/repository/biopython/biopython/Bio/Blast/NCBIWWW.py,v
> retrieving revision 1.24
> retrieving revision 1.25
> diff -c -r1.24 -r1.25
> *** NCBIWWW.py	2002/09/22 05:25:29	1.24
> --- NCBIWWW.py	2002/11/24 18:52:11	1.25
> ***************
> *** 168,176 ****
>           attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
>           read_and_call(uhandle, consumer.database_info, contains='Database')
>           # Sagar Damle reported that databases can consist of multiple lines.
>           read_and_call_until(uhandle, consumer.database_info,
> !                             contains='sequences')
> !         read_and_call(uhandle, consumer.database_info, contains='sequences')
>           read_and_call(uhandle, consumer.noevent, blank=1)
>           read_and_call(uhandle, consumer.noevent,
>                         contains='problems or questions')
> --- 168,178 ----
>           attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
>           read_and_call(uhandle, consumer.database_info, contains='Database')
>           # Sagar Damle reported that databases can consist of multiple lines.
> +         # But, trickily enough, sometimes the second line can also have the
> +         # word sequences in it. Try to use 'sequences;' (with a semicolon)
>           read_and_call_until(uhandle, consumer.database_info,
> !                             contains='sequences;')
> !         read_and_call(uhandle, consumer.database_info, contains='sequences;')
>           read_and_call(uhandle, consumer.noevent, blank=1)
>           read_and_call(uhandle, consumer.noevent,
>                         contains='problems or questions')
> 
> --X1bOJ3K7DJ5YkBrT--
> 
> 
> --__--__--
> 
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
> 
> 
> End of BioPython Digest