[Biojava-l] subsequences
Keith James
kdj@sanger.ac.uk
18 Jan 2002 10:00:39 +0000
>>>>> "Matthew" == Matthew Pocock <matthew_pocock@yahoo.co.uk> writes:
Matthew> Hi. Good work Thomas.
Matthew> This issue of overlapping features on sub-sequences was
Matthew> one of my gripes, so it's nice to see it resolved. I too
Matthew> would be interested to know if anybody relies on the old
Matthew> behavior in any programs. As for child features, my gut
Matthew> says you should project them if they are wholely
Matthew> contained, project them as RemoteFeatures if they overlap
Matthew> the boundary and, of course, discard them if they fall
Matthew> outside the sub-sequence. This should be reasonably easy
Matthew> to implement as the rules for what to do work
Matthew> recursively. Also, it preserves the maximum amount of
Matthew> 'reasonable' information. If there are no objections, I'd
Matthew> realy like to see this before 1.2 is released.
Matthew> Who else has oppinions on this?
I'll put my hand up to supporting the "discarding child features
except where contained/overlapping" heresy. However, I think the main
thing is that the exepected behaviour is clearly stated somewhere.
(Off topic: I'm putting together a list of core topics on which we
could use more documentation - and I think this is one. The intention
is to slap it all together in DocBook with plenty of pictures.)
Matthew> Matthew
Matthew> ps Embl entries could be construed as being isomorphic
Matthew> with remote features with children when an entry builds a
Matthew> gene structure out of remote features if the biojava
Matthew> parser was building the full gene structure using the
Matthew> genomic feature interfaces
Matthew> pps does anybody use the genomic feature interfaces?
Almost, but not quite yet... is that question indicating that you are
thinking of making changes there?
Keith
--
-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Cambridge, UK