[Biopython] Python getting stuck reading fastq file

Philipp Schiffer philipp.schiffer at gmail.com
Sat Dec 21 15:46:40 UTC 2013


Hi!

I am experiencing a problem when reading from a fastq file (qualities in Sanger scoring). Whatever I do, at one point through my file the reading (or writing, or comparing, which is later in my script and excluded here) gets stuck. The following output is from an ipython session (Python 2.7.5).
Biopython was installed through pip on a Scientific Linux 6.2 system.
Is this an error with the SeqIO parser? Or am I doing something wrong?  

Any help with this would be highly appreciated.

Kind regards

Philipp

import string
from pprint import pprint
import os
from Bio import SeqIO
from subprocess import call
import sys
import re

fqoutfile = open('/data2/PS1159/reads_Feb_12/fqoutfile.fq', 'w')
my_abundantheads=set()
my_onetwo = re.compile('\/[1-2]’)

abundant = open('/data2/PS1159/reads_Feb_12/120126_0281_AD088PACXX_5_SA-PE-023_shuf.clean.fq.gz.keep.gz.abundfilt', 'rU')
for record in SeqIO.parse(abundant, "fastq"):
        ids = my_onetwo.split(record.id)

my_abundantheads.update([ids[0]])



….

^C---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-14-d87560a5b18b> in <module>()
----> 1 for record in SeqIO.parse(abundant, "fastq"):
      2         ids = my_onetwo.split(record.id)
      3         my_abundantheads.update([ids[0]])
      4

/usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/__init__.pyc in parse(handle, format, alphabet)
    539             raise ValueError("Unknown format '%s'" % format)
    540         #This imposes some overhead... wait until we drop Python 2.4 to fix it
--> 541         for r in i:
    542             yield r
    543

/usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqPhredIterator(handle, alphabet, title2ids)
   1034     for letter in range(0, 255):
   1035         q_mapping[chr(letter)] = letter - SANGER_SCORE_OFFSET
-> 1036     for title_line, seq_string, quality_string in FastqGeneralIterator(handle):
   1037         if title2ids:
   1038             id, name, descr = title2ids(title_line)

/usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqGeneralIterator(handle)
    934         #There may now be more quality data, or another sequence, or EOF
    935         while True:
--> 936             line = handle_readline()
    937             if not line:
    938                 break  # end of file

KeyboardInterrupt:




--  
Philipp Schiffer
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)





More information about the Biopython mailing list