[Biopython] multiprocessing problem with pysam
Brad Chapman
chapmanb at 50mail.com
Sun Apr 10 11:15:10 UTC 2011
Michal;
> I have tried to rewrite the following code from
> http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/api.html
[...]
> with the following multiprocessing code:
[...]
> pool = Pool()
>
> samfile = pysam.Samfile("ex1.bam", "rb")
> references = samfile.references
>
> for reference in samfile.references:
> print ">", reference
> pool.apply_async(calc_pileup, [samfile, reference, 100, 120])
[...]
> However, I got the following out:
[...]
> TypeError: _open() takes at least 1 positional argument (0 given)
You are passing the open file handle 'samfile' to your multiprocessing
function. The arguments you pass through need to be able to be pickled
by Python; normally you need to stick with more basic data structures.
Specifically, I would suggest passing in the filename and then opening a
pysam reference within the worker functions.
def calc_pileup(fname, reference_name, start_pos, end_pos):
samfile = pysam.Samfile(fname, "rb")
coverages = []
print reference_name, os.getpid()
if __name__ == '__main__':
pool = Pool()
fname = "ex1.bam"
samfile = pysam.Samfile(fname, "rb")
references = samfile.references
samfile.close()
for reference in samfile.references:
print ">", reference
pool.apply_async(calc_pileup, [fname, reference, 100, 120])
My more general suggestion with multiprocessing is to start with a
simple workflow and expand. This will let you get a sense of where
your objects may be too complex to pickle and you need to simplify.
Hope this helps,
Brad
More information about the Biopython
mailing list