[Biopython-dev] Bio.GFF and Brad's code

Peter biopython at maubp.freeserve.co.uk
Tue Dec 8 22:30:20 UTC 2009


On Tue, Dec 8, 2009 at 2:15 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>
> I agree that using SeqFeature sub-features for parent/child
> relationships makes a lot of sense. BUT, we have a lot of
> existing code which follows the GenBank/EMBL parser
> route of using this for joins (and a few other corner cases).
>
> There are other annoyances with the current SeqFeature
> and FeatureLocation model - the strand and location operator
> are part of the SeqFeature not the FeatureLocation. It would
> make more sense to me to move them to the FeatureLocation
> (and have that handle joins itself). Or, move everything to
> the SeqFeature (and get rid of the FeatureLocation object).
>
> I think the best route forward is to plan a transition of the
> SeqFeature object to allow nice handling of real nested
> relationships, and a reworking of complex location handling.
> Then (hopefully) we can have the GenBank/EMBL/GFF3
> parsers all using the SeqFeature in a consistent way.
>

Just to add some ideas to this thread for discussion,
on possible ways forward without breaking backwards
compatibility... hopefully this is clear, I did have a glass
of wine with dinner ;)

Given the way the existing SeqFeature list property
subfeatures is used (by the GenBank/EMBL parser
etc), would it make sense for the GFF needs to add
a new list for child features (say property "children"),
and perhaps another property (maybe "parent") which
can point back at the parent SeqFeature. i.e. A sort
of tree, allowing us to represent genes, exons, etc.

Note we may want to use weak references in the above
(children/parent references) to assist the python GC.

Given the above, potentially the GenBank/EMBL
parser could be enhanced to use these new properties
(e.g. for linking gene and CDS features in bacteria,
or CDS and mat_peptide features in viruses etc).

[This still leaves the ontology issues - which might
be best dealt with by the GenBank output code]

Peter



More information about the Biopython-dev mailing list