[Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split

Ewan Birney birney@ebi.ac.uk
Thu, 25 Jan 2001 16:38:21 +0000 (GMT)


On Thu, 25 Jan 2001, Jason Stajich wrote:

> 
> On Thu, 25 Jan 2001, Ewan Birney wrote:
> 
> > On Thu, 25 Jan 2001, Jason Stajich wrote:
> > 
> > > On Wed, 24 Jan 2001, Hilmar Lapp wrote:
> > > 
> > > > Jason Stajich wrote:
> > > > > 
> > > > > I'd just like to reiterate - beware bioperl-live is development code.
> > > > > 
> > > > > I added these handlers for Fuzzy and Split features.  I decided to create
> > > > > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether
> > > > > or not we saw the <, > descriptors.  I probably need some more test cases
> > > > 
> > > > I may have missed the obvious solution, but how are we going to
> > > > distinguish 'unknown start/end' and 'somewhere in between'? That is,
> > > > '<150' meaning 'before position 150', making it non-obvious how to
> > > > return a minimal start, and '120.130' meaning it's between two known
> > > > positions. Will I have to test fuzzy_start() before I'm allowed to
> > > > safely call min_start()? (no, I don't want to suggest exceptions ...
> > > > :O)
> > > 
> > > Hmm, perhaps I was confused.  I thought Split Location would deal with
> > > min_start/max_end.  I believe fuzzy can have 3 qualities, a fuzzy start
> > > (<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a
> > > better word, suggestions welcome]. All 3 can be present in the same
> > > location so they have to be independent operators.   When you call
> > > start, it will return what it thinks is the start but you'll have to
> > > test to see if the range or the start is fuzzy ($loc->range_fuzzy ||
> > > $loc->start_fuzzy).  Perhaps that is too tedious?  I'd rather not throw an
> > > exception here, but can be persuaded.    
> > 
> > In my experience it is crucial to treat
> > join((<10..100),(200..300),(400..500>)) as a class of SplitLocation, not
> > as a class of FuzzyFeature.
> > 
> > The above syntax is the most used "fuzziness" and nearly everyone discards
> > the leading and trailing '<' '>' as it means "partial gene" with the
> > coordinates interpreted in a hard way.
> 
> Okay I was interpreting this as a 
> SplitLocation with 
> 3 LocationI objects
> 2 of which are Fuzzy Locations...

Ok. This is a good solution here, but the trouble about this recursion is
that of course it allows

  SplitLocationI 

    has-a SplitLocationI

etc, which now becomes 

      (a) a nightmare to do anything with

      (b) impossible to represent in EMBL/GenBank

      (c) generally lots of rope to hang ourselves with


Two options - punt on these cases in the code... or pop in another
inheritance layer in the interfaces:



             LocationI 
               ^
               |
      ------------------------
  SingleLocationI        SplitLocationI
      |                      sub_Locations defined to return SingleLocationI array
      |
      -----------------
  SimpleLocationI   FuzzyLocationI
    

(does the above crappy ascii art make sense to you?)



I guess this says that all FuzzyLocations can be made as combination of
a single SplitLocation with a set of FuzzyLocations.





???? (ewan sighs again about fuzziness. It is just a can of worms that
noone needs and noone should use)







-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------