[Biojava-l] subsequences

Thomas Down td2@sanger.ac.uk
Fri, 18 Jan 2002 14:13:11 +0000


On Fri, Jan 18, 2002 at 10:00:39AM +0000, Keith James wrote:
> 
>     Matthew> This issue of overlapping features on sub-sequences was
>     Matthew> one of my gripes, so it's nice to see it resolved. I too
>     Matthew> would be interested to know if anybody relies on the old
>     Matthew> behavior in any programs. As for child features, my gut
>     Matthew> says you should project them if they are wholely
>     Matthew> contained, project them as RemoteFeatures if they overlap
>     Matthew> the boundary and, of course, discard them if they fall
>     Matthew> outside the sub-sequence. This should be reasonably easy
>     Matthew> to implement as the rules for what to do work
>     Matthew> recursively. Also, it preserves the maximum amount of
>     Matthew> 'reasonable' information. If there are no objections, I'd
>     Matthew> realy like to see this before 1.2 is released.
> 
>     Matthew> Who else has oppinions on this?
> 
> I'll put my hand up to supporting the "discarding child features
> except where contained/overlapping" heresy. However, I think the main
> thing is that the exepected behaviour is clearly stated somewhere.

Is that heretical?  As I see it, logic for doing this would be:

  - Fully contained top-level features get projected in
    exactly the normal way.

  - Partially contained features get turned into RemoteFeatures.

  - Any non-overlapping children of partially-contained parents
    are ignored.

  - Fully-contained children are projected then re-parented
    onto the RemoteFeature.

  - Partially-contained children are themselves turned into
    RemoteFeatures.

  - ...and so on down the tree...

This is nice in that the treatment of child features is consistant
with that of top-levels.  It's also reasonably easy to implement,
especially now that the feature projection code has been nicely
modularized.  I'll update SubSequence2.java to do things this way.


I've not seen many views on updating this before 1.2 -- does
this mean that SubSequence isn't widely used? (possibly because
its behaviour is a bit iffy... ;-).  In the absence of objections,
I'm inclined to make the update.

> (Off topic: I'm putting together a list of core topics on which we
> could use more documentation - and I think this is one. The intention
> is to slap it all together in DocBook with plenty of pictures.)

Sounds good!

   Thomas.