[BioPython] GenBank parser
Leighton Pritchard
lpritc at scri.sari.ac.uk
Thu Apr 29 10:03:14 EDT 2004
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I've noticed an oddity in the GenBank FeatureParser (CVS installation
19/4). While parsing the Salmonella typhi file NC_003198.gbk, my way of
dealing with 'gene' tags fell over. This turned out to be because the
GenBank file contains entries with valueless tags such as /partial and
/pseudo. The current parser concatenates these tags with the following
tag, e.g for:
~ CDS 1449249..1450391
~ /partial
~ /gene="fdnG"
~ /note="Similar to part of Escherichia coli formate
~ dehydrogenase, nitrate-inducible, major subunit fdnG
~ SW:FDNG_ECOLI (P24183; P78261) (1015 aa) fasta scores:
~ E(): 0, 94.4% id in 376 aa"
~ /pseudo
~ /codon_start=1
~ /transl_table=11
it returns a set of qualifiers which include the tags "partial gene" and
"pseudo codon_start". This probably isn't what was intended by the
authors ;)
I haven't got a fix for the parser, but my workaround in the code was:
##################
qualifiers = cds.qualifiers # Shorthand for qualifiers
# We need to account for use of qualifiers, e.g. in
# NC_003198.gbk, the /partial and /pseudo tags often have no
# associated value - the BioPython GenBank feature parser lumps the
# two together into a single tag, e.g. 'partial gene' and
# 'pseudo codon_start'. This buggers up our processing below,
# so the solution is to split tags by the ' ' space character,
# and add a qualifier comprising only the last item in the
# resulting list
for key in qualifiers.keys():
~ if key.count(' '):
~ qualifiers[key.split(' ')[-1]] = qualifiers[key]
###################
...I wasn't bothered about the partial or pseudo tags for my script
- --
Dr Leighton Pritchard AMRSC
D104, PPI, Scottish Crop Research Institute
Invergowrie, Dundee, DD2 5DA, Scotland, UK
E: lpritc at scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/index.shtml
T: +44 (0)1382 568579 F: +44 (0)1382 568578
PGP key FEFC205C: GPG key E58BA41B: http://www.keyserver.net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFAkQsiL1gZ+OWLpBsRAg2mAJkBe3EvfNiygGEwsJ4i5wwA85t5DwCfVfPp
nFoRXTGoAdrq8shnfhSPjuA=
=P60G
-----END PGP SIGNATURE-----
More information about the BioPython
mailing list