[BioPython] parsing error with GenBank.RecordParser
Peter
biopython at maubp.freeserve.co.uk
Fri Jan 6 17:44:34 EST 2006
Hans Meier wrote:
> Hi,
>
> parsing of NC_000913.gbk does not work.
>
> Greets, Harald
Sorry I didn't reply earlier, I was away for the New Year...
From the trackback you provided, I would guess that the old GenBank
parser (included with BioPython 1.41) didn't like the double quotes in
that note:
/note="2'-(5"-phosphoribosyl)-3'-dephospho-CoA...
Interestingly enough, in the most recent version of NC_000913.gbk dated
Dec 2005 (check the first line, starting LOCUS), the NCBI have switched
the double quotes to single quotes in the note (gene citX):
/note="2'-(5'-phosphoribosyl)-3'-dephospho-CoA...
If you download this revised NC_000913.gbk the problem should go away
(but note that as Escherichia coli genbank file is 11 MB you might be
better off updating the GenBank parser).
The new GenBank parser (available in CVS now) should cope with either
version of the file (and should use less memory, and be a lot faster too).
To try this, you just need to replace the file
/usr/lib/python2.3/site-packages/Bio/GenBank/__init__.py with the latest
version (but make a backup of the old one just in case).
Peter
More information about the BioPython
mailing list