[Biopython-dev] Output sequence files

Jeffrey Chang jchang at SMI.Stanford.EDU
Sat May 26 20:04:21 EDT 2001


> From: Iddo Friedberg <idoerg at cc.huji.ac.il>
> 
[...]

> The problem arises from annotation. Do you think it's feasable to perform a
> good GenPept (that's the GenBank translation database) <--> SwissProt
> converter that will preserve everything?
> 
The gold standard for preserving information, is if you can convert A to B
back to A, and have it come out exactly the same.  That'll probably be
possible for a lot of records, but many of them will not work.  For example,
GenBank locations are much richer than SwissProt ones, so complex location
semantics that SwissProt doesn't handle will be lost.

> I think that anyone seeking to preserve annotation, beyond the bare bones
> (organism, accession, maybe references, etc) would not want to use a converter
> anyhow.
> 
Yes, this is a reasonable assumption.


> So the problem is basically downsized to having a writer for each record
> types. And for SeqRecord which will be a generic record, but could only be
> written out in Fasta. This way we don't get caught up in trying to create a
> monster data type which integrates all the information which the various
> formats like to preserve. (And I haven't even mentioned PDB annotation yet!)

Yep, I agree.


> So maybe we just need a writer for each {database}.Record types, and a
> to_fasta converter and writer in Tools.

Isn't this what the SeqIO directory is for?  I had always hoped to get SeqIO
functionality similar to bioperl's.


Jeff




More information about the Biopython-dev mailing list