[Biopython-dev] Output sequence files

Iddo Friedberg idoerg at cc.huji.ac.il
Sat May 26 08:52:25 EDT 2001


Well, I'm happy that I hit on something other people find necessary.
Always good to know one is not alone :)

On Fri, 25 May 2001, Brad Chapman wrote:

: I think Sarah is right on this. Seq/MutableSeq classes do not store
: any useful annotations on the sequence (except the alphabet/type of
: the sequence). Things should focus on SeqRecord, which has all of the
: annotation stuff.

I concur. It's just that, as you said, SeqRecord should include a lot
more stuff for good GenBank/SwissProt records. As it is, it seems to be
good enough for FASTA format.

But the big formats (anything not Fasta) are not really interconvertible,
except maybe GenBank <--> EMBL. So maybe what we need is just the

1) {big formats} --> fasta converter
2) A writer for each of the formats ( e.g. SProt.Record.write(handle) )
3) EMBL <--> GenBank, but that's pretty superfluous

The problem arises from annotation. Do you think it's feasable to perform
a good GenPept (that's the GenBank translation database) <--> SwissProt
converter that will preserve everything? Or a PIR <--> SwissProt
converter? I think that anyone seeking to preserve annotation, beyond the
bare bones (organism, accession, maybe references, etc) would not want to
use a converter anyhow.

So the problem is basically downsized to having a writer for each
record types. And for SeqRecord which will be a generic record, but could
only be written out in Fasta. This way we don't get caught up in trying to
create a monster data type which integrates all the information which the
various formats like to preserve. (And I haven't even mentioned PDB
annotation yet!)

So maybe we just need a writer for each {database}.Record types, and a
to_fasta converter and writer in Tools.

Of course, we can beef up SeqRecord to have a bit more than bare-bones
annotation capability, for functional reasons, not only for flat-file
writing capabilities, but that's a different topic.



Iddo Friedberg                                  | Tel: +972-2-6758647
Dept. of Molecular Genetics and Biotechnology   | Fax: +972-2-6757308
The Hebrew University - Hadassah Medical School | email: idoerg at cc.huji.ac.il
POB 12272, Jerusalem 91120                      |
Israel                                          |

More information about the Biopython-dev mailing list