[BioPython] Genbank dictionary question
Brad Chapman
chapmanb at uga.edu
Sun May 11 16:34:44 EDT 2003
Hi Ashleigh;
> My main problem is that I'm unable to actually retrieve the Genbank
> record, instead I get <Bio.SeqRecord.SeqRecord instance at 0x83b01f4>
> (is that where those data are actually stored on the hard drive?)
This is a SeqRecord object with all of the information for the
GenBank file already parsed. The SeqRecord is a generic sort of
representation for a sequence with features. Section 3.7.1 of the
Tutorial describes what a SeqRecord is made up of.
> Why doesn't gb_dict['my_key'] give me the record corresponding to that
> accession number?
> >>> dict_file='genbank_file'
> >>> index_file='genbank_file.idx'
> >>> GenBank.index_file(dict_file, index_file)
> >>> gb_dict=GenBank.Dictionary(index_file, GenBank.FeatureParser())
[...]
> >>> gb_dict['AJ299393']
> <Bio.SeqRecord.SeqRecord instance at 0x83b01f4>
When you create the gb_dict, you do with with the FeatureParser(),
which is why you get SeqFeature objects. You have your choice of
what kind of info you get back:
gb_dict = GenBank.Dictionary(index_file)
will give you the raw unparsed text records, while
gb_dict = GenBank.Dictionary(index_file, GenBank.RecordParser())
gives you GenBank specific Record objects. Section 3.4.2 of the
tutorial describes a bit more about the different parsers and a
couple of the difference between RecordParser()s and
FeatureParser()s.
The choice of what kind of output you want to deal with is really up
to you.
> Also, I'm unclear on how to work with the SeqRecord objects in the
> context of my dictionary.
Depending on which parser you choose to use, you can just deal with
the info you get back to extract what you want. For instance, if you wanted
to do something like store the id and sequence of the SeqRecord
object, you could do:
rec = gb_dict['AJ299393']
print rec.id, rec.seq
Hopefully this helps!
Brad
More information about the BioPython
mailing list