[Biopython-dev] [Bug 2860] Writing GenBank files should output features in position order

Fri Jun 19 13:11:17 UTC 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2860

------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2009-06-19 09:11 EST -------
Hi Nick,

I understand your request, but I am not sure if it is a bug.

Do you know if the GenBank file format say anything about the order of the
features? And does this actually matter?

I know from experience that the NCBI GenBank files do seem to be sorted by
position (and then it seems to be gene before CDS and some other tie break
rules which I have not explored).

Arguably this should be left to the user - a slightly different version of
your script could avoid the issue, something like this (untested):

from Bio import SeqIO
for rec in SeqIO.parse(sys.stdin, "genbank"):
        new_features = []
        for feature in rec.features:
                if feature.type == 'CDS':
                        gene_feature = copy(feature)
                        gene_feature.type = 'gene'
                        new_features.append(gene_feature)
                new_features.append(feature)
        rec.features = new_features
        SeqIO.write([rec], sys.stdout, "genbank")

Peter

P.S. Your example may produce odd features, as gene features don't
normally include a protein id or a translation while a CDS feature may.
Again, Biopython doesn't currently try to limit this - or indeed limit
the feature types to a while list.

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.