[BioPython] Re: Blast parsing
R.Austin
ryan.austin@utoronto.ca
26 Nov 2002 19:02:08 -0500
Hi
I think i've found a bug in NCBIStandalone in Python2.2.2
I have some code that was written on my Mandrake box in Python2.0 and
runs perfectly, but when I copy it to a RedHat8 box running Python2.2.2
and the same version of biopython, i get an error.
The code is right out of the biopython tutorial (almost)and just grabs
the first E-value and fasta tag for every blast output file in a
directory.
____________________________________________
from Bio.Blast import NCBIStandalone
import glob
blast_glob = '/home/user_name/blastout/*'
b_parser = NCBIStandalone.BlastParser()
for next_file in glob.glob(blast_glob):
blast_file = open(next_file, 'r')
b_iterator = NCBIStandalone.Iterator(blast_file, b_parser)
b_record = b_iterator.next()
print b_record.query,
print '\t E-value: ', b_record.alignments[0].hsps[0].expect
_________________________________________________________
And it gives the error:
Traceback (most recent call last):
File "./bparse", line 25, in ?
b_record = b_parser.parse(blast_file)
File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 515, in parse
self._scanner.feed(handle, self._consumer)
File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 84, in feed
self._scan_rounds(uhandle, consumer)
File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 140, in _scan_rounds
self._scan_alignments(uhandle, consumer)
File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 261, in _scan_alignments
self._scan_masterslave_alignment(uhandle, consumer)
File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 364, in _scan_masterslave_alignment
consumer.multalign(line)
File "/usr/local/lib/python2.2/site-packages/Bio/Blast/NCBIStandalone.py", line 769, in multalign
name = string.rstrip(line[:self._name_length])
TypeError: sequence index must be integer
_______________________________________________________
Any help would be appreciated as I really need this to run on the
Redhat8 box in python2.2.2
Thanks in advance
R.Austin
On Tue, 2002-11-26 at 12:00, biopython-request@biopython.org wrote:
> Send BioPython mailing list submissions to
> biopython@biopython.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://biopython.org/mailman/listinfo/biopython
> or, via email, send a message with subject or body 'help' to
> biopython-request@biopython.org
>
> You can reach the person managing the list at
> biopython-admin@biopython.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of BioPython digest..."
>
>
> Today's Topics:
>
> 1. blast parser (Ken Sugino)
> 2. Re: blast parser (Brad Chapman)
>
> --__--__--
>
> Message: 1
> Date: Mon, 25 Nov 2002 12:12:31 -0500
> From: Ken Sugino <sugino@brandeis.edu>
> To: biopython@biopython.org
> Reply-To: sugino@brandeis.edu
> Subject: [BioPython] blast parser
>
> Hi all,
>
> I encountered an error during a Blast parse:
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 47, in parse
> self._scanner.feed(handle, self._consumer)
> File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 98, in feed
> self._scan_header(uhandle, consumer)
> File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 161, in _scan_header
> self._scan_database_info(uhandle, consumer)
> File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/Blast/NCBIWWW.py", line 174, in _scan_database_info
> read_and_call(uhandle, consumer.noevent, blank=1)
> File "/home/sugino/soft.py.modules/biopython-cvs/biopython/Bio/ParserSupport.py", line 331, in read_and_call
> raise SyntaxError, errmsg
> SyntaxError: Expected blank line, but got:
> 1,455,628 sequences; 7,234,536,489 total letters
>
>
> The following change seems to fix this error.
>
> Bio.Blast.NCBIWWW.py line 174
> - read_and_call(uhandle, consumer.noevent, blank=1)
> - read_and_call(uhandle, consumer.noevent,
> - contains='problems or questions')
> + read_and_call_until(uhandle, consumer.noevent,
> + contains='problems or questions')
> + read_and_call(uhandle, consumer.noevent)
>
> --__--__--
>
> Message: 2
> Date: Mon, 25 Nov 2002 12:26:03 -0500
> From: Brad Chapman <chapmanb@arches.uga.edu>
> To: biopython@biopython.org
> Subject: Re: [BioPython] blast parser
>
>
> --X1bOJ3K7DJ5YkBrT
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
>
> Hey Ken;
>
> > I encountered an error during a Blast parse:
> [...]
> > SyntaxError: Expected blank line, but got:
> > 1,455,628 sequences; 7,234,536,489 total letters
>
> I actually got this error myself yesterday when I was playing around
> with the examples and put a fix into CVS. See, I promised to only use
> this time machine for good :-).
>
> > The following change seems to fix this error.
> >
> > Bio.Blast.NCBIWWW.py line 174
> > - read_and_call(uhandle, consumer.noevent, blank=1)
> > - read_and_call(uhandle, consumer.noevent,
> > - contains='problems or questions')
> > + read_and_call_until(uhandle, consumer.noevent,
> > + contains='problems or questions')
> > + read_and_call(uhandle, consumer.noevent)
>
> The only problem with this is that it throws away the database
> information, which we do store. The fix I used, in CVS, is attached as a
> diff. This should also be in the new release, due out real soon now.
>
> Thanks for reporting this!
> Brad
>
> --X1bOJ3K7DJ5YkBrT
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: attachment; filename="NCBIWWW.diff"
>
> Index: NCBIWWW.py
> ===================================================================
> RCS file: /home/repository/biopython/biopython/Bio/Blast/NCBIWWW.py,v
> retrieving revision 1.24
> retrieving revision 1.25
> diff -c -r1.24 -r1.25
> *** NCBIWWW.py 2002/09/22 05:25:29 1.24
> --- NCBIWWW.py 2002/11/24 18:52:11 1.25
> ***************
> *** 168,176 ****
> attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
> read_and_call(uhandle, consumer.database_info, contains='Database')
> # Sagar Damle reported that databases can consist of multiple lines.
> read_and_call_until(uhandle, consumer.database_info,
> ! contains='sequences')
> ! read_and_call(uhandle, consumer.database_info, contains='sequences')
> read_and_call(uhandle, consumer.noevent, blank=1)
> read_and_call(uhandle, consumer.noevent,
> contains='problems or questions')
> --- 168,178 ----
> attempt_read_and_call(uhandle, consumer.noevent, start='<p>')
> read_and_call(uhandle, consumer.database_info, contains='Database')
> # Sagar Damle reported that databases can consist of multiple lines.
> + # But, trickily enough, sometimes the second line can also have the
> + # word sequences in it. Try to use 'sequences;' (with a semicolon)
> read_and_call_until(uhandle, consumer.database_info,
> ! contains='sequences;')
> ! read_and_call(uhandle, consumer.database_info, contains='sequences;')
> read_and_call(uhandle, consumer.noevent, blank=1)
> read_and_call(uhandle, consumer.noevent,
> contains='problems or questions')
>
> --X1bOJ3K7DJ5YkBrT--
>
>
> --__--__--
>
> _______________________________________________
> BioPython mailing list - BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
>
>
> End of BioPython Digest