[Bioperl-l] trunc and Features?

Ewan Birney birney@ebi.ac.uk
Tue, 28 Aug 2001 07:26:43 +0100 (BST)

Charles - welcome to one of the more frustrating parts of bioperl.

Bioperl doesn't do this not because this is a bad thing (it would be
great) but because it is a real implementation challenge.

In Bioperl'ese, we only know the interface (read-only) definition of
SeqFeatureI coming off Bio::SeqI's. Therefore seq->trunc can't really mess
around with the SeqFeatureI's in theory

If we relaxed this and let Bio::Seq (implmentation specific) override
trunc to sort out features - (a) do we assumme that every feature
inheriets from SeqFeature::Generic or a similar implementation class and
(b) do we do a deep clone of SeqFeature::Generic, setting the
start/end/strand to things (but this could trigger some *massive* deep
cloning procedures - we'd want to make the user aware of this. So maybe
Bio::Seq should have a trunc_with_features() method really to make this

Another option, which I marginally favour is to have
Bio::SeqFeature::Proxy objects which has-a parent feature but has its own
start/end/strand objects. All other methods are AUTOLOAD'd to delegate to
the parent feature, meaning we don't have to duplicate all the tag/value
and methods. Sneakily we could mess with the @ISA to make the Proxy look
like its parent for ->isa() calls (or override isa?), which makes this
very "frame" like in 1970's AI terms (just - that's pushing it a little)

The problem here is that calling sub_SeqFeatures on the Proxy means we
have to attach new proxies to the sub seqfeatures. Maybe we do this when
we make the Proxy (making a deep parse of the seqfeature tree). *Or* we
have a coordinate transformation object that allows late binding of
proxies to seqfeature objects.

Ensembl has this problem in truck-loads. Currently Ensembl uses a third
method of making sure that each call to Bio::Seq->top_SeqFeatures is
guarenteed to always make new in-memory objects. The top (trunc'd or
generally coordinate mangled object, VirtualContig) can then overwrite the
start/end/strand of these objects at will. I consider this a pretty big
hack (I wrote this) and it will simply not work with Bioperl ;)

The new API discussed at Ensembl is playing around in this area and I
think we will move beyond the Proxy idea to perhaps the idea that each
SeqFeature can have multiple locations in different contexts, and there is
a context managements system that can get a new location in a new context
from an old location. All v.interesting. Arne Stabenau and Craig Melsopp
are doing the hard thinking here.

Re: your immeadaite problem -

I would love to see an attempt at solving this with either (a) adding a
deep clone method on seqfeature::generic and working from there or
(b) trying out the Proxy idea.

Apologies that we haven't solved this already.