[Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format

Peter Cock p.j.a.cock at googlemail.com
Thu Dec 5 16:46:46 UTC 2013


On Thu, Dec 5, 2013 at 4:43 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Thu, Dec 5, 2013 at 4:29 PM, Brynjar Smári Bjarnason
> <binni at binnisb.com> wrote:
>>
>> Hello.
>>
>> I see CompoundLocation is quite new. I am currently using anaconda
>> (Python 2.7.6 :: Anaconda 1.8.0 (64-bit)) and BioPython 1.62.
>>
>> I am fetching gi values and using SeqIO to parse them. So far most of
>> them work but I found one that fail.
>>
>> Code:
>>
>> p = Entrez.efetch(db="protein", rettype="gp", retmode="text",id="494379")
>> seq = SeqIO.read(p,"gb")
>>
>> Gives error:
>> ValueError: CompoundLocation should have at least 2 parts
>>
>> With quite long stack trace and the last one being:
>>
>> /Bio/SeqFeature.pyc:
>>       996         if len(self.parts) < 2:
>> --> 997             raise ValueError("CompoundLocation should have at
>> least 2 parts")
>>
>> Any suggestions on how to fix this, and maybe what is different with
>> this gi from the rest of them (one gi that works: 10342)?
>>
>> Brynjar
>
> Hi Brynjar,
>
> Hmm. Right now the website is very slow & won't load
> http://www.ncbi.nlm.nih.gov/protein/494379
> and via Entrez I am getting a network error:
> urllib2.HTTPError: HTTP Error 502: Bad Gateway
>
> Where you able to save the file, and could you post it online
> (e.g. at http://gist.github.com)?
>
> Regards,
>
> Peter

Not to worry - the site did respond when I retried a bit later, and
I can reproduce the parser error:

>>> from Bio import SeqIO
>>> r = SeqIO.read("1MRR_A.gp", "genbank")
/Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1096:
BiopythonParserWarning: Couldn't parse feature location:
'join(bond(84),bond(115),bond(118),bond(238))'
  % (location_line)))
/Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1096:
BiopythonParserWarning: Couldn't parse feature location:
'join(bond(115),bond(204),bond(238),bond(241))'
  % (location_line)))
/Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1096:
BiopythonParserWarning: Couldn't parse feature location:
'join(bond(194),bond(272))'
  % (location_line)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line
646, in read
    first = next(iterator)
  File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line
582, in parse
    for r in i:
  File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py",
line 467, in parse_records
    record = self.parse(handle, do_features)
  File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py",
line 451, in parse
    if self.feed(handle, consumer, do_features):
  File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py",
line 423, in feed
    self._feed_feature_table(consumer, self.parse_features(skip=False))
  File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py",
line 374, in _feed_feature_table
    consumer.location(location_string)
  File "/Library/Python/2.7/site-packages/Bio/GenBank/__init__.py",
line 1083, in location
    operator=location_line[:i])
  File "/Library/Python/2.7/site-packages/Bio/SeqFeature.py", line
1003, in __init__
    raise ValueError("CompoundLocation should have at least 2 parts")
ValueError: CompoundLocation should have at least 2 parts

Peter




More information about the Biopython-dev mailing list