[Biopython] Creating GenBank files
Peter Saffrey
pzs at dcs.gla.ac.uk
Wed Sep 16 16:14:20 UTC 2009
Peter wrote:
> Yes, you must create a SeqRecord object with suitable SeqFeature objects,
> and then write it out with SeqIO in GenBank format. If all your features have
> trivial locations, this is pretty easy.
>
Thanks for this. I've managed to get this to work, but encountered a few
minor issues.
I already have GenBank files created by CLC Genomics Workbench 3 but I
want to make these in a script. The CLC generated GenBank files look
like this:
LOCUS Setd2-tagged 11750 bp DNA linear UNA
FEATURES Location/Qualifiers
misc_feature 1..50
/label="Subcloning HA Upstream"
...(snip other features)
ORIGIN
1 TTGGTGTGAG CTCTTTGTGT CTTGCCTAAG TATGTGCATC TGTCTTGTCT
...(snip sequence)
To do this in biopython, I need to create my feature thus:
sf = SeqFeature.SeqFeature(SeqFeature.FeatureLocation(0,50),
type="misc_feature", qualifiers = { "label" : [ "Subcloning HA Upstream" ]})
The issues I had were:
- In the docstring for SeqFeature, it says the attribute is "qualifier"
but it should be "qualifiers".
- My first stab at the qualifiers argument was to do
qualifiers = { "label" : "mylabel" }
but if I do that, it iterates over "mylabel" giving me one "label" for
each character! Maybe the qualifier printer should check it's being
given a list and not a string?
- I'd like to remove some of the extraneous header from the GenBank file:
DEFINITION .
ACCESSION <unknown id>
VERSION <unknown id>
KEYWORDS .
SOURCE .
ORGANISM .
.
Is this possible?
Sorry for the long message,
Peter
More information about the Biopython
mailing list