[Biopython] Error parsing EMBL file
Nick Semenkovich
semenko at alum.mit.edu
Mon Sep 17 17:01:00 UTC 2012
I'm trying to extract the peptide sequences from a large collection of
EMBL-formatted files (all phage & virus data from EBI).
EBI provides these as large, concatenated EMBL files, so I've been
using SeqIO.parse to read & then write the 'translation' key from
seq_feature.qualifiers.
Unfortunately, it looks like the parser dies on one input file:
http://www.ebi.ac.uk/ena/data/view/BK000583&display=txt&expanded=true
Traceback (most recent call last):
File "gbk_to_faa.py", line 7, in <module>
for seq_record in SeqIO.parse(input_handle, "embl") :
File "/usr/lib/pymodules/python2.7/Bio/SeqIO/__init__.py", line 541, in parse
for r in i:
File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line
440, in parse_records
record = self.parse(handle, do_features)
File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 423, in parse
if self.feed(handle, consumer, do_features):
File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line 391, in feed
self._feed_header_lines(consumer, self.parse_header())
File "/usr/lib/pymodules/python2.7/Bio/GenBank/Scanner.py", line
692, in _feed_header_lines
consumer.reference_bases("(bases %s)" % "; ".join(parts))
File "/usr/lib/pymodules/python2.7/Bio/GenBank/__init__.py", line
740, in reference_bases
locations = self._split_reference_locations(ref_base_info)
File "/usr/lib/pymodules/python2.7/Bio/GenBank/__init__.py", line
777, in _split_reference_locations
start, end = base_info.split('to')
ValueError: need more than 1 value to unpack
* I might dig into this a bit more to patch, but does anyone more
familiar with EMBL files know what's going on?
* Also, is there are more straightforward (or even non-BioPython way)
to go from EMBL->FAA?
Best,
Nick
--
Nick Semenkovich
Laboratory of Dr. Jeffrey I. Gordon
Medical Scientist Training Program
School of Medicine
Washington University in St. Louis
314.362.3963 (Lab)
http://web.mit.edu/semenko/
More information about the Biopython
mailing list