[Biopython-dev] [Bug 2860] Writing GenBank files should output features in position order

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Jun 19 13:31:10 UTC 2009


http://bugzilla.open-bio.org/show_bug.cgi?id=2860





------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-06-19 09:31 EST -------
(In reply to comment #2)
> Well yes, I realise that this is only a standard by convention. My assumption
> when dealing with such matters is that if NCBI/GenBank does it, it is probably
> right.

On the other hand, the NCBI are generally very good at defining their file
formats, so *if* they don't specify an order, presumably anything is OK?

> My impression is that "source" is always the first qualifier, then it is
> sorted by location with "gene" features followed by "CDS" features by
> convention.

Since the "source" feature starts from the first base, it will always be
one of the first by location.

> I guess it is acceptable for the user to deal with the order in the absence
> of a published standard for GenBank files. But I think it would be equally
> acceptable to code the GenBank outputter to enforce those rules.

We would need to know the rules though. For example, which of these locations
is first: "10..20" or "<10..20" or ">10..20" or "one-of(10,12)..20" or are
they all tied? We would also need to know the tie break rules for the feature
type, not just "source" before "gene" before "CDS". What about "tRNA" etc.

Given we don't currently know the rules, we could only implement a best guess.
If the order we write out is very clear is just the order of the SeqFeature
objects in the list (as now) the behaviour is clearly defined. This is my
preference as it gives the user full control (and full responsibility).
If we did sort things, there would be no easy way to override this sorting.

> (I know that script fragment would give weird output, it was just
> illustrative of the position issue.)

OK

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list