[Biopython] Feeding XML stream from BLAST directly into SeqIO.parse()
Peter Cock
p.j.a.cock at googlemail.com
Tue Jan 5 12:27:56 UTC 2016
On Mon, Jan 4, 2016 at 4:51 AM, Martin Mokrejs
<mmokrejs at fold.natur.cuni.cz> wrote:
>
> Hi,
> I want to avoid creation of the huge XML files and feed BLAST
> results directly into SeqIO.
I guess you mean SearchIO rather than SeqIO?
> I think the following clearly sends all output into _stdout.
>
> _stdout, _stderr = subprocess.Popen(_cmdline, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()
>
Calling .communicate() would return ALL of the stdout and stderr as
large strings in memory. If you have large BLAST XML output, that
would be bad.
> Can I do it directly inside my python? Just passing a handle/iterator
> to SeqIO.parse()?
Yes, see for example the "MUSCLE using stdin and stdout" example
in the AlignIO chapter of the Tutorial. You would use the child process'
.stdout handle with the Biopython parse function.
> In Biopython's TUTORIAL I only see in section 7.3 result_handle
> using an existing disk file. Chapter 8 dedicated to SearchIO does
> not go beyond file-based examples either. Is it so uncommon? ;-)
It is much harder to debug (especially if there are any errors), so
personally I tend to avoid parsing stdout from subprocess.
Peter
More information about the Biopython
mailing list