[Biopython] Problems parsing with PSIBlastParser
Miguel Ortiz Lombardia
ibdeno at gmail.com
Tue Oct 13 13:57:13 UTC 2009
Le 13 oct. 09 à 15:36, Peter a écrit :
> On Tue, Oct 13, 2009 at 12:58 PM, Miguel Ortiz Lombardia
> <ibdeno at gmail.com> wrote:
>>>
>>> Hmm - the switch to using subprocess (on Python 2.4+ or later) was
>>> made
>>> in October 2008, and would have first appeared in Biopython 1.49.
>>> Maybe
>>> you were using Biopython 1.48 before - or the issue is something
>>> else.
>>>
>>> Peter
>>
>>
>> It may well have been 1.48... Having a closer look at the files
>> from my last
>> successful runs I discover the actually come from November 2008...
>>
>> I'm now running the tests you suggested.
>
> Let me know what they show. How long do these BLAST runs take?
> Perhaps I was ambitious with the number of suggestions to try ;)
It took long, because I wanted to reproduce the same situation.
All the three suggestions you made worked!
I have at least a work-around now.
>
> Assuming the problem is with how we are calling the BLAST tool via the
> subprocess module, I have two suggested fixes in mind. The first is
> a change
> to the _invoke_blast() function in Bio/Blast/NCBIStandalone.py,
> essentially
> replace these lines:
>
> blast_process.stdin.close()
> return blast_process.stdout, blast_process.stderr
>
> With this:
>
> stdout, stderr = blast_process.communicate()
> from StringIO import StringIO
> return StringIO(stdout), StringIO(stderr)
>
> We had to make a similar change to Bio.Clustalw for Bug 2804. This
> uses
> subprocess to buffer the data in order to avoid any deadlock reading
> from
> the handles. I hadn't made this change before as it imposes a memory
> overhead (and BLAST output is often *very* large, especially as XML),
> and until now there hadn't been any problems reported. It would be
> worth
> trying in your situation (even just to confirm the source of the
> error), but
> I don't think we should make this change for the official
> distribution.
>
You're right, probably not justified if I'm the only one with this
problem.
> The second option (which I mentioned before) is to tell blastpgp to
> write
> its output directly to a file, and then parse the file. This is how
> I normally
> run large BLAST jobs. This is possible but not elegant via the
> function
> Bio.Blast.NCBIStandalone.blastpgp (which always returns stdout/stderr
> handles). Bug 2654 has an example,
> http://bugzilla.open-bio.org/show_bug.cgi?id=2654
>
> However, what I want to recommend instead is to use the more flexible
> Bio.Blast.Applications objects instead (in this case, the class
> BlastpgpCommandline). I had planed to update the BLAST chapter
> of the Biopython Tutorial to cover this, but it didn't happen in
> time for
> the Biopython 1.52 release. However, the alignment chapter goes
> through several examples of this style of command line tool wrapper,
> and the BLAST wrappers work in exactly the same way.
>
> Using these "lower level" application wrappers, it is up to you to
> invoke
> subprocess (or another system call) as you see fit (e.g. with pipes).
> This is more flexible than the old Bio.Blast.NCBIStandalone.blastpgp
> function (and others like it) where the behaviour could not be set.
I will explore this possibility, it seems definitely more elegant than
the other one (as in Bug 2654).
>
> Feel free to ask for clarification on this - questions now will help
> for
> rewriting the BLAST chapter later on ;)
I may come back with questions :-)
Thank you very much for your help!
Best,
-- Miguel
More information about the Biopython
mailing list