[Biopython] Is this a valid Genbank feature description or a Biopython bug?
Marc Saric
marc.saric at gmx.de
Wed Apr 18 20:58:18 UTC 2012
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
sorry for crossposting (this has also been published on stackoverflow
<http://stackoverflow.com/questions/10195198/is-this-a-valid-genbank-feature-description-or-a-biopython-bug>):
I stumbled upon a Genbank-formatted file (shown here as a minimal
dummy example), which contains a nested feature like this:
FEATURES Location/Qualifiers
xxxx_domain complement(complement(1..145))
Such a feature crashes the current Biopython Genbank parser (1.59
release), but it apparently did not in former releases (e.g. 1.55).
Apparently the behaviour was already in 1.57.
- From the Biopython bugtracker, it seems that the old locationparser
code got removed in 1.56:
- From what I could deduce from the format description on
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt and
http://www.insdc.org/documents/feature_table.html#3.4.2 this is most
likely invalid.
Can someone comment on this. I.e. is this a glitch in Biopython or in
the format of the Genbank file?
A full demo file:
LOCUS XXXXXXXXXXXXXX 240 bp DNA circular
17-JAN-2012
DEFINITION xxxxxx.
KEYWORDS xx.
SOURCE
ORGANISM
FEATURES Location/Qualifiers
xxxx_domain complement(complement(1..145))
/vntifkey="1"
/label=A label
/note="A note"
BASE COUNT 75 a 57 c 42 g 66 t
ORIGIN
1 tttacaaaac gcattttcaa accttgggta ctaccccctt ttaaatatcc
gaatacacta
61 ataaacgctc tttcctttta ggtaaacccg ccaatatata ctgatacaca
ctgatagttt
121 aaactagatg cagtggccga ccatcagatc tagtaggaaa cagctatgac
catgattacg
181 cattacttat ttaagatcaa ccgtaccagt ataccctgcc agcatgatgg
aaacctccct
//
A minimum demo program to show the error (assumes Biopython 1.59 and
Python 2.7 are installed and the above mentioned file is available as
"test.gb":
#!/usr/bin/env python
from Bio import SeqIO
s = SeqIO.read(open("test.gb")), "r"), "genbank")
This crashes with
raise LocationParserError(location_line)
Bio.GenBank.LocationParserError: complement(1..145)
- --
Bye,
Marc Saric http://www.marcsaric.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk+PKuoACgkQvKxJUF29wRLPGwCfaGI1+FzRZluJpjkfYBVdUtVq
5HIAn0ar1c2FK0eGIlekRtaQwGgJUk4U
=oI7n
-----END PGP SIGNATURE-----
More information about the Biopython
mailing list