[Biopython] Lineage from GenBank files Question

Peter biopython at maubp.freeserve.co.uk
Wed Oct 27 08:56:59 UTC 2010


On Tue, Oct 26, 2010 at 9:37 PM, Ara Kooser <akooser at unm.edu> wrote:
> Peter,
>
>   Thank you for your reply. I was able to figure out the code with your
> help. I had another question since I've been looking through the
> documentation on the GenPept files. I want to get the accession number.
>
> ...
>
> I am guessing that this is also read into the records file. Is this a header
> so something like header.annotations?

Well... something like that I suppose.  Have you read the chapter in
the tutorial on the SeqRecord object?

Each sequence record in the GenBank file (i.e. LOCUS line to // line)
becomes a SeqRecord object. Most of the header ends up in the
SeqRecord's annotations dictionary - some special fields are used
for the SeqRecord name, id, description and dbxrefs (database
cross references). The feature table becomes a list of SeqFeature
objects.

Did you look at the annotations dictionary?

>>> from Bio import SeqIO
>>> record = SeqIO.read("NC_000932.gb", "genbank")
>>> print record.annotations.keys()
['comment', 'sequence_version', 'source', 'taxonomy', 'keywords',
'references', 'accessions', 'data_file_division', 'date', 'organism',
'gi']
>>> print record.annotations
{'comment': ..., 'gi': '7525012'}
>>> print record.annotations['gi']
7525012
>>> print record.annotations['accessions']
['NC_000932']

Also,

>>> record.name
'NC_000932'
>>> record.id
'NC_000932.1'

Peter




More information about the Biopython mailing list