[Biojava-l] FW: Location Problems

Keith James kdj@sanger.ac.uk
25 Oct 2001 10:31:02 +0100


>>>>> "Ewan" == Ewan Birney <birney@ebi.ac.uk> writes:

    Ewan> On Wed, 24 Oct 2001, Forsch, Dan wrote:
    >> I'm pretty sure this doesn't qualify as 'brilliant thought',
    >> but...
    >> 
    >> I'd like to see the solution to this problem move BioJava away
    >> from having StrandedFeature as a sub-interface of Feature,
    >> thereby eliminating those annoying (to me anyway) 'if
    >> instanceof StrandedFeature' checks in the code.  A Strand could
    >> become an attribute of (and inner class within) something else,
    >> possibly of Locations.  If each Location has an associated
    >> Strand then the components of a CompoundLocation could differ.
    >> I'm not sure if this fixes the issue with RemoteFeatures but I
    >> think the same principle would apply.

    Ewan> This is like the bioperl approach. (bioperl takes locations
    Ewan> into a whole tree-system to allow representation of
    Ewan> FuzzyLocations)


    Ewan> I know that Thomas likes in BioJava strandness being a
    Ewan> property of the feature, not the location which I think it
    Ewan> quite a good principled stand: it just causes havoc wrt to
    Ewan> EMBL/GenBank.



    Ewan> I think I made a similar stand against "complex" locations
    Ewan> in Bioperl for a while before I was overruled by people
    Ewan> wanting, understandably, to parse the *whole* of GenBank,
    Ewan> and then round-trip it properely.



    Ewan> It is going to be interesting to see BioJava's approach to
    Ewan> this.


    Ewan> But - just to say - I don't think there is a 100% clean
    Ewan> solution here. Just different compromises.

I've had too much coffee this morning... it seems to me that

 Biojava Locations represent nothing more than "this stretch of
 sequence" with no biological interpretation

 EMBL/Genbank locations add some biological interpretation to the same
 data

 Biojava Features often offer biological interpretation over their
 Location (I think they do this implicitly wherever they have a
 Strand)

What about formalising the implicit biological interpretation in
Feature and using it to store the extra info from the EMBL/Genbank
location?

Feature would need a way of presenting the raw information in the
Location to say how it is to be interpreted. Currently this is done by
having one Strand attribute per StrandedFeature.

Instead, any Feature (rather than just StrandedFeature, addressing
Dan's point) could have a rule for, in this case, adding Strand
information to its component pieces.

Feature f;

Strand s = f.getStrand();

would return POSITIVE, NEGATIVE, UNKNOWN or MIXED


for (Iterator li = f.getLocation().blockIterator(); li.hasNext();)
{
    Strand s = f.getStrand((Location) li.next());    

would return the strand of the Location or barf "This location isn't
one of mine".

The rule would have to define what to do with recursive locations (and
could disallow them).


My 0.022455 Euros

-- 

-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
Pathogen Sequencing Unit, Wellcome Trust Sanger Institute, Cambridge, UK