[Biopython-dev] Parsing features fuzzie in Genbank annotation att Brad.

Brad Chapman chapmanb at arches.uga.edu
Wed Sep 26 22:04:59 EDT 2001


Hi Peter;

> In genbank records the following join format will pop up:
> 
>     "join(10000,10200..10450)"

Thanks for the heads up on this. I tried this location in Andrew's
parser and it seems to handle it just fine, so I'm pretty sure the
GenBank stuff should be able to handle it. If you run across this
case in a record and the parser fails or produces erroneous results,
send the accession number along and I can fix things.

> The numbers used here represent a one base join with a second exon. 
> Can this happen in biology, 

Hmm, I'm not sure if I can think of a biological case off the top of
my head where this makes good sense. It certainly doesn't make sense
for an exon (ie. a 1 base pair exon) but maybe might make sense if
the location described something like a protein binding location or
something similar.

> P.S. pretty umbeleavable is it not?

:-). GenBank has lots of surprises.

BTW, since I haven't heard any negative comments about my proposed
SeqFeature/GenBank parser changes, I committed 'em to CVS. If anyone
gets problems on account of this, please let me know!

Brad
-- 
PGP public key available from http://pgp.mit.edu/



More information about the Biopython-dev mailing list