[Bioperl-l] Bio::Location::Fuzzy, Bio::Location::Split
Ewan Birney
birney@ebi.ac.uk
Thu, 25 Jan 2001 16:38:21 +0000 (GMT)
On Thu, 25 Jan 2001, Jason Stajich wrote:
>
> On Thu, 25 Jan 2001, Ewan Birney wrote:
>
> > On Thu, 25 Jan 2001, Jason Stajich wrote:
> >
> > > On Wed, 24 Jan 2001, Hilmar Lapp wrote:
> > >
> > > > Jason Stajich wrote:
> > > > >
> > > > > I'd just like to reiterate - beware bioperl-live is development code.
> > > > >
> > > > > I added these handlers for Fuzzy and Split features. I decided to create
> > > > > methods start_fuzzy,end_fuzzy for Bio::Location::Fuzzy to handle whether
> > > > > or not we saw the <, > descriptors. I probably need some more test cases
> > > >
> > > > I may have missed the obvious solution, but how are we going to
> > > > distinguish 'unknown start/end' and 'somewhere in between'? That is,
> > > > '<150' meaning 'before position 150', making it non-obvious how to
> > > > return a minimal start, and '120.130' meaning it's between two known
> > > > positions. Will I have to test fuzzy_start() before I'm allowed to
> > > > safely call min_start()? (no, I don't want to suggest exceptions ...
> > > > :O)
> > >
> > > Hmm, perhaps I was confused. I thought Split Location would deal with
> > > min_start/max_end. I believe fuzzy can have 3 qualities, a fuzzy start
> > > (<150..100) a fuzzy end (90..<100) and fuzzy 'range' (1.12) [for lack of a
> > > better word, suggestions welcome]. All 3 can be present in the same
> > > location so they have to be independent operators. When you call
> > > start, it will return what it thinks is the start but you'll have to
> > > test to see if the range or the start is fuzzy ($loc->range_fuzzy ||
> > > $loc->start_fuzzy). Perhaps that is too tedious? I'd rather not throw an
> > > exception here, but can be persuaded.
> >
> > In my experience it is crucial to treat
> > join((<10..100),(200..300),(400..500>)) as a class of SplitLocation, not
> > as a class of FuzzyFeature.
> >
> > The above syntax is the most used "fuzziness" and nearly everyone discards
> > the leading and trailing '<' '>' as it means "partial gene" with the
> > coordinates interpreted in a hard way.
>
> Okay I was interpreting this as a
> SplitLocation with
> 3 LocationI objects
> 2 of which are Fuzzy Locations...
Ok. This is a good solution here, but the trouble about this recursion is
that of course it allows
SplitLocationI
has-a SplitLocationI
etc, which now becomes
(a) a nightmare to do anything with
(b) impossible to represent in EMBL/GenBank
(c) generally lots of rope to hang ourselves with
Two options - punt on these cases in the code... or pop in another
inheritance layer in the interfaces:
LocationI
^
|
------------------------
SingleLocationI SplitLocationI
| sub_Locations defined to return SingleLocationI array
|
-----------------
SimpleLocationI FuzzyLocationI
(does the above crappy ascii art make sense to you?)
I guess this says that all FuzzyLocations can be made as combination of
a single SplitLocation with a set of FuzzyLocations.
???? (ewan sighs again about fuzziness. It is just a can of worms that
noone needs and noone should use)
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------