[Biojava-l] subsequences

Keith James kdj@sanger.ac.uk
18 Jan 2002 10:00:39 +0000


>>>>> "Matthew" == Matthew Pocock <matthew_pocock@yahoo.co.uk> writes:

    Matthew> Hi.  Good work Thomas.

    Matthew> This issue of overlapping features on sub-sequences was
    Matthew> one of my gripes, so it's nice to see it resolved. I too
    Matthew> would be interested to know if anybody relies on the old
    Matthew> behavior in any programs. As for child features, my gut
    Matthew> says you should project them if they are wholely
    Matthew> contained, project them as RemoteFeatures if they overlap
    Matthew> the boundary and, of course, discard them if they fall
    Matthew> outside the sub-sequence. This should be reasonably easy
    Matthew> to implement as the rules for what to do work
    Matthew> recursively. Also, it preserves the maximum amount of
    Matthew> 'reasonable' information. If there are no objections, I'd
    Matthew> realy like to see this before 1.2 is released.

    Matthew> Who else has oppinions on this?

I'll put my hand up to supporting the "discarding child features
except where contained/overlapping" heresy. However, I think the main
thing is that the exepected behaviour is clearly stated somewhere.

(Off topic: I'm putting together a list of core topics on which we
could use more documentation - and I think this is one. The intention
is to slap it all together in DocBook with plenty of pictures.)

    Matthew> Matthew

    Matthew> ps Embl entries could be construed as being isomorphic
    Matthew> with remote features with children when an entry builds a
    Matthew> gene structure out of remote features if the biojava
    Matthew> parser was building the full gene structure using the
    Matthew> genomic feature interfaces

    Matthew> pps does anybody use the genomic feature interfaces?

Almost, but not quite yet... is that question indicating that you are
thinking of making changes there?

Keith

-- 

-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Cambridge, UK