[BioPython] Parsing and Creating Dictionaries of GenBank files
Peter (BioPython)
biopython at maubp.freeserve.co.uk
Thu Apr 20 12:42:34 UTC 2006
Pepe Barbe wrote:
> Hello,
>
> Following the simple steps in the BioPython cookbook, I wanted to
> create a dictionary with the following GenBank file:
>
> ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K12/NC_000913.gbk
>
> Below you can find what I tried executing and the error I got. I would
> appreciate any insight into solving the error and correctly producing
> the dictionary.
The cookbook tutorial is a little misleading in that regard. Indexing a
GenBank file only makes sense for those files with multiple genbank
record (i.e. multiple LOCUS lines).
For example, you can get multi-record GenBank files with records for
different genes. These tend to be small records, and the Martel based
indexing code copes fine. It doesn't cope very well with large records
like genomes.
Your example (and in my experience all Bacterial Genomes) have just a
single very large record (which will contain many features).
Does this page help?
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/genbank/
I did suggest a change to the documentation but it looks like no one has
made the change...
http://biopython.org/pipermail/biopython-dev/2005-November/002193.html
I had forgotten to chase this up.
Peter
More information about the Biopython
mailing list