[Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format

Thu Dec 5 17:12:04 UTC 2013

On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Not to worry - the site did respond when I retried a bit later, and
> I can reproduce the parser error:
>
>>>> from Bio import SeqIO
>>>> r = SeqIO.read("1MRR_A.gp", "genbank")
> BiopythonParserWarning: Couldn't parse feature location:
> 'join(bond(84),bond(115),bond(118),bond(238))'
> BiopythonParserWarning: Couldn't parse feature location:
> 'join(bond(115),bond(204),bond(238),bond(241))'
> BiopythonParserWarning: Couldn't parse feature location:
> 'join(bond(194),bond(272))'
> ...
> ValueError: CompoundLocation should have at least 2 parts

The problem is the bond locations, and in particular while the
parser gave up on the ones with a warning, it fell over the
single bond entry, bond(196).

This is partly due to a change in the use of the bond term,
which used to be a compound entry like bond(194,272).
Also the GenBank parser was and is primarily used on
nucleotide sequences rather than GenPept files which are
occasionally more weird (like here!).

A short term hack would be to strip out the bond term
(with a warning) and parse the remainder as a simple
join or single residue accordingly.

Would that work for you - do you need the bond bit?

Peter