[Biopython-dev] About a new GenBankWriter class with SeqIO interface

Howard Salis salish at picasso.ucsf.edu
Wed May 16 02:34:08 UTC 2007


On 5/15/07, Peter <biopython-dev at maubp.freeserve.co.uk> wrote:

> Sounds nice - its something I've been thinking about doing myself, but I
> wanted to do both both GenBank and EMBL, sharing the feature table
> writing code.

Yep, since EMBL and GenBank share the same feature format, I've
separated the "foreword", feature table, and sequence write functions.
So if someone wants to write the EMBL writer, they just need to write
the appropriate foreword. I think the sequence data is stored the same
too? Is that correct?

> Something else to keep in mind is writing any SeqRecord to a GenBank (or
> EMBL) file, even if it did not get created from a GenBank or EMBL file
> and is therefore lacking lots of annotation.

Very true. The GenBankWriter.py will either leave these fields blank,
leave out their keywords entirely if they are optional, or add
something like <no_locus> or <unknown_description> when it's necessary
to have something there.

>  > I also add/change a couple
> > of lines in __init__.py to store whether a sequence was linear or
> > circular and to store the string that encodes its molecule type
> > (ss-RNA, etc).
>
> I thought we already stored this information - but I'm not sure off hand.

Well, there's the alphabet of the sequence (e.g. UnAmbiguousDNA())
that says whether it's DNA, RNA, peptide, etc, but even if I matched
these ups with strings, then the "ss-", "ds-", etc part would be
missing. I just saved the exact wording of the sequence type (e.g.
"ds-DNA", "ss-RNA", etc) to an dictionary key named
self.data.annotations["sequence_type"] in the _FeatureConsumer class
under GenBank. This is in addition to the alphabet of the sequence so
it shouldn't conflict.

> You could email it directly to me, but it would be better to create a
> bug (an "enhancement") and then attached the changes to the bug. Edited
> versions of files will do, but patch files are best.

Ok, done! It's at http://bugzilla.open-bio.org/show_bug.cgi?id=2294

> I look forward to seeing your code Howard.
>
> Peter

Thank you! And I hope to continue to contribute to Biopython.

-Howard



More information about the Biopython-dev mailing list