[Biopython-dev] [Bug 2860] New: Writing GenBank files should output features in position order

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Jun 19 12:48:39 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2860

           Summary: Writing GenBank files should output features in position
                    order
           Product: Biopython
           Version: 1.50b
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: minor
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: n.j.loman at bham.ac.uk


Adding features to a SeqRecord object does not automatically sort them by
position. Therefore if you do something like this:

for rec in SeqIO.parse(sys.stdin, "genbank"):
        new_features = []
        for feature in rec.features:
                if feature.type == 'CDS':
                        gene_feature = copy(feature)
                        gene_feature.type = 'gene'
                        new_features.append(gene_feature)
        rec.features.extend(new_features)
        SeqIO.write([rec], sys.stdout, "genbank")

You will end up with an incorrectly sorted file with CDS features first, then
gene features.

You can sort rec.features in-place to correct this: 
        rec.features.sort(key=attrgetter('location'))

I am not sure the correct fix in terms of BioPython, whether it should
concentrate on changing the behaviour SeqRecord.features, or the GenBank output
code (which I am aware is a work in progress).

I guess the answer to this is should BioPython guarantee Seqrecord.features to
be sorted?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list