[Biopython-dev] GenBank bug, oriT feature missing

Brad Chapman chapmanb at uga.edu
Sun Feb 29 17:17:58 EST 2004


Hey guys;

[Mark reports yet another new feature tag added to GenBank files]
> Martel.Parser.ParserPositionException: error parsing at or beyond
> character 1981
> 
> After digging into the GenBank code (__init.py__) and then into Martel's
> code. I found I could turn on debugging:
> 
> GenBank.FeatureParser(debug_level=2)
> 
> I finally see where things die (and what character 1981 means).
> 
> for AE000070 there is a  feature tag "oriT", which seems to be missing
> from genbank_record.py and __init__.py

[And makes a useful suggestion that others second (and third...)]
> This really isn't a pretty way of dealing with unknown features. Is
> there a way to get this to just pass unknown features?

Yes, I completely agree that this is a pain. The problem is an
unfortunate design decision where the format used to parse the files
uses a hard-coded list of tags. This made sense when it was
originally designed since there are supposed to be a restricted set
of feature and qualifier key names that can be used. Unfortunately,
it's turned into a headache for everyone since NCBI keeps adding
tags.

I've decided to get rid of this and just checked in a series of
changes to CVS that update the genbank format so it shouldn't run
into this problem any longer -- the new format uses a general
regular expression (basically \w, plus some additional characters
that get used like ' and - ), so it shouldn't run into this problem.

In the process of making these changes I've also done a general
cleanup of the format file and merged it with the old (but still
with plenty of useful bits of code) format in
Bio.expressions.genbank. I've moved Bio/GenBank/genbank_format.py to
Bio/expressions/genbank.py -- so for those of you who look at it or
change it (thanks Peter!), you now need to look there.

So, long story short -- I hope I fixed this problem for the future.
Please do give the new version in CVS a go and let me know if it has
any problems on your files. Sorry about the pain and thanks for 
the report!

Brad



More information about the Biopython-dev mailing list