[Biopython] [Samtools-help] Segmentation fault

Mic mictadlo at gmail.com
Tue Oct 18 10:26:06 UTC 2011


Hello,
Thank you for your email. I updated the code and find out that
    print reads['chr1']     #works fine
but
    print reads['chr1'][0]  #caused Segmentation fault

Please find below the updated code:

from Bio import SeqIO
import pysam
from optparse import OptionParser
import subprocess, os, sys
from multiprocessing import Pool
import functools


def GetReferenceInfo(referenceFastaPath):
  referencenames = []
  referencelengths = []
  referenceFastaFile = open(referenceFastaPath)
  for record in SeqIO.parse(referenceFastaFile, "fasta"):
    referencenames.append(record.name)
    referencelengths.append(len(record.seq))
  referenceFastaFile.close()
  return (referencenames, referencelengths)


def GenerateSubsetBAM(bam_filename, ref_name):
    reads = []
    bam_fh = pysam.Samfile(bam_filename, "rb")

    for read in bam_fh.fetch(ref_name):
        reads.append(read)

    print ref_name + ' Done ' + str(len(reads))
    return (ref_name, reads)


if __name__ == '__main__':
  parser = OptionParser("Usage: %prog -b BAMfile -f new_reference_fasta -o
outputBAM")
  parser.add_option("-b", "--BAM", type="string", dest="inputBAMFilepath",
help="Specify a BAM file")
  parser.add_option("-f", "--fasta", type="string", dest="fastaFilepath",
help="Specify a reference fasta file.")
  parser.add_option("-o", "--output", type="string",
dest="outputBAMFilepath", help="Specify an output BAM file.")

  (opts, args) = parser.parse_args()

  if (opts.inputBAMFilepath is None):
    print ("\nSpecify a BAM file. eg. -b large.bam\n")
    parser.print_help()
  elif not(os.path.exists(opts.inputBAMFilepath)):
    print ("\nReference BAM file does not exists: " + opts.inputBAMFilepath
+"\n")
  elif (opts.fastaFilepath is None):
    print ("\nSpecify a reference fasta file.  eg. -f Subset.fasta\n")
    parser.print_help()
  elif not(os.path.exists(opts.fastaFilepath)):
    print ("\nReference fasta file does not exists: " + opts.fastaFilepath
+"\n")
  elif os.path.exists(opts.outputBAMFilepath) and
not(raw_input(opts.outputBAMFilepath + " exists. Overwrite? (Y/n): ")=='Y'):
    print ("\nOutput BAM exists. Please specify alternative output file.
 eg. -o Subset.bam\n")
  else:
    print "Read fasta ..."
    (ref_names, ref_lengths) = GetReferenceInfo(opts.fastaFilepath)
    print 'Done!'

    print "creating subset...."
    pool = Pool()
    GenerateSubsetBAM_wrapper = functools.partial(GenerateSubsetBAM,
opts.inputBAMFilepath)
    reads = dict(pool.imap_unordered(GenerateSubsetBAM_wrapper, ref_names))
    pool.close()
    print "Done!"

    print reads['chr1']     #works fine
    print "xxxxx"

    print reads['chr1'][0]  #caused Segmentation fault

I run the code with the pysam-0.5 examples (pysam-0.5/tests) in the
following way:

python SubsetBAM-P.py --BAM ex1.bam -f ex1.fa -o new.bam

Read fasta ...
Done!
creating subset....
chr1 Done 1464
chr2 Done 1806
Done!
[<csamtools.AlignedRead object at 0x2b975635d168>, ...,
<csamtools.AlignedRead object at 0x2b35d89b6ca8>]
xxxxx
Segmentation fault

Thank you in advance.


On Tue, Oct 18, 2011 at 7:00 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Tue, Oct 18, 2011 at 4:44 AM, Mic <mictadlo at gmail.com> wrote:
> > Hello,
> > I have tried to generate a subset BAM, but I get a 'Segmentation fault'
> with
> > the following code:
> > from Bio import SeqIO
> > import pysam
> > from optparse import OptionParser
> > import subprocess, os, sys
> > from multiprocessing import Pool
> > import functools
> > ...
>
> I tried this and it seemed to get stuck much earlier. Could you
> cut down the example a bit by removing the multiprocessing?
>
> Peter
>
> P.S. Also you can remove the unused "import argparse" line.
>



More information about the Biopython mailing list