[Biopython] Feeding XML stream from BLAST directly into SeqIO.parse()
Wibowo Arindrarto
w.arindrarto at gmail.com
Tue Jan 5 12:23:51 UTC 2016
Hi Martin,
If you want to stay inside Python, you should be able to do something like this:
import subprocess
from Bio import SearchIO
blast_process = subprocess.Popen(...)
for record in SearchIO.parse(blast_process.stdout, 'blast-xml'):
# process each record
If you can afford to go out of Python, you can replace
`blast_process.stdout` above with `sys.stdin`. In general, the parse
functions work with string file names and file handle-like objects.
(P.S. I'm using SearchIO as an example. Both SeqIO and SearchIO uses
the same file-handling function, IIRC.)
Hope this helps,
Bow
On Sun, Jan 3, 2016 at 8:51 PM, Martin Mokrejs
<mmokrejs at fold.natur.cuni.cz> wrote:
> Hi,
> I want to avoid creation of the huge XML files and feed BLAST results
> directly into SeqIO. I think the following clearly sends all output into
> _stdout.
>
> _stdout, _stderr = subprocess.Popen(_cmdline, shell=False,
> stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()
>
>
> Wrapping _stdout with cStringIO.StringIO would give a file-like object to
> be fed into SeqIO.parse() but that does not help with the primary problem I
> have: how to delay BLASTN output or maybe just limit the buffer of Popen? so
> that it would not consume all of my computer memory.
>
> I know one can prepare the left- and right-side of a UNIX pipe separately
> but I am not running SeqIO.parse() on the right hand-side of a pipe as a
> separate process [an example could be:
> http://seqanswers.com/forums/showpost.php?s=9904bb037c254042dfe282d032f8c07d&p=140253&postcount=6
> I use something like this elsewhere but I cannot think of a way to do it now
> instead of moving the SeqIO.parse() caller into a separate program and
> executing it as the consumer on the right side of a pipe. Sounds not very
> elegant.
> Can I do it directly inside my python? Just passing a handle/iterator to
> SeqIO.parse()?
>
> What do you think of this:
> http://eyalarubas.com/python-subproc-nonblock.html
>
>
> In Biopython's TUTORIAL I only see in section 7.3 result_handle using an
> existing disk file. Chapter 8 dedicated to SearchIO does not go beyond
> file-based examples either. Is it so uncommon? ;-)
>
> Thank you for clues,
> Martin
> _______________________________________________
> Biopython mailing list - Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list