[Biojava-l] subsequences

Matthew Pocock matthew_pocock@yahoo.co.uk
Thu, 17 Jan 2002 21:49:09 +0000


Hi.

Good work Thomas.

This issue of overlapping features on sub-sequences was one of my
gripes, so it's nice to see it resolved. I too would be interested to
know if anybody relies on the old behavior in any programs. As for child
features, my gut says you should project them if they are wholely
contained, project them as RemoteFeatures if they overlap the boundary
and, of course, discard them if they fall outside the sub-sequence. This
should be reasonably easy to implement as the rules for what to do work
recursively. Also, it preserves the maximum amount of 'reasonable'
information. If there are no objections, I'd realy like to see this
before 1.2 is released.

Who else has oppinions on this?

Matthew

ps Embl entries could be construed as being isomorphic with remote
features with children when an entry builds a gene structure out of
remote features if the biojava parser was building the full gene
structure using the genomic feature interfaces

pps does anybody use the genomic feature interfaces?

Thomas Down wrote:

 > Following on from this discussion, there is one remaining
 > issue with the SubSequence code.  It's never been terribly
 > clear what to do when features overlap the boundary of
 > a SubSequence.  Currently, they're still projected onto the
 > subsequence -- and thus end up with coordinates outside the
 > SubSequence to which they're attached.
 >
 > This has been discussed in the past, and the conclusion
 > seemed to be that partial features should be presented
 > as an alternative feature type (RemoteFeature), which can
 > (where possible) be resolved back to the underlying,
 > complete feature.  The RemoteFeature interface, plus a
 > general-purpose implementation, have been included in the
 > tree for some time, but aren't being widely used.
 >
 > I've now written a (currently experimental) replacement for
 > SubSequence which transforms overhanging features into
 > RemoteFeatures.  This seems to work okay, and it's a good
 > demonstration of RemoteFeatures in action.
 >
 > I'm now wondering if it's worth committing this code before
 > the 1.2 branch (with very careful testing, obviously).  Does
 > anyone have strong feelings either way?  And is there anyone
 > who wants to speak out in favour of the old-style (overhanging
 > features) way of doing things?
 >
 > Anyway, code's attached for anyone who wants to try it, plus
 > a test suite.  One caveat at the moment:
 >
 >   - If any overhanging feature has child features, these
 >     are no longer projected onto the subsequence, even if
 >     some of the child features are fully contained within
 >     the subsequence.  Should those child features which
 >     overlap the subsequence be projected as children of
 >     the RemoteFeature?  I'm tending towards the view that
 >     they should (it's not too hard to implement this :-),
 >     but it's worth discussing -- I don't think I've ever
 >     seen anyone implement RemoteFeature with child features...
 >
 > Thomas.





_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com