[Bioperl-l] Refactoring Locations...

Ewan Birney birney@ebi.ac.uk
Sat, 29 Jun 2002 12:25:45 +0100 (BST)


On Fri, 28 Jun 2002, Chris Mungall wrote:

> 
> I second this. gadfly works in space-oriented coordinates. you have to be
> super-rigorous in import/export but otherwise it's a much better system,
> it's ridiculous having to import an awkward fuzzy system for representing
> insertions/splice sites etc.
> 
> is it really too late to have us switch to this system? I can't see how it
> would be done without extreme pain but I think it'd be worth it in the
> end. bioperl2.0?

I say no. Really. 


We have 20 years of legacy in inclusive coordinates. As much as I would
love to work in half open coordinates, the number of
bugs/misunderstandings and idiocies that will go on is too much.


In tight projects (eg Gadfly, my own Wise2 package) where everyone is 100%
mind synced, I think one can make the change, and it is much nicer to
program in. But in Bioperl, with this loose distribution of people we just
can't do it.


I vote STRONG no. We stick to what has been published/stored/used for the
last 20 years. +1 is not that hard to put in.

> 
> On Fri, 28 Jun 2002, Lincoln Stein wrote:
> 
> > The suggested refactoring sounds correct.  I prefer IN-BETWEEN to TWEEN or
> > TWIXT.
> >
> > As a meta comment, life would be much easier if positions were described
> > (perhaps internally) as zero-based half open intervals, which is the way that
> > all sensible graphics code does it (I first learned the concepts working with
> > Apple's QuickDraw).  In half-open intervals, the coordinates refer to the
> > spaces between the nucleotides, rather than to the nucleotides themselves.
> > For the dinucleotide AG, the following mappings hold:
> >
> > 	coordinate		sequence
> >
> > 	(0,1)			A
> > 	(0,2)			AG
> > 	(1,1)			space between A & G
> >
> > Note that in half-open intervals, the length of the sequence is always end
> > minus start, and that you can do coordinate arithmetic withoug adding and
> > subtracting 1's.
> >
> > Lincoln
> >
> >
> > On Thursday 27 June 2002 12:34 pm, Heikki Lehvaslaiho wrote:
> > > I ran into a small problem with Bio::Locations and would like to slightly
> > > refactor them.
> > >
> > >  From my point of view there are three types of exact sequence locations
> > > which in feature table notation are: 23, 34..55 and 46^47. The first two
> > > are handled by Bio::Location::Simple and have location_type('EXACT'). The
> > > last one is lumped into location_type('BETWEEN') together with locations
> > > like 46^78 and handled by Bio::Location::Fuzzy. The source for the
> > > confusion is that the feature table definition allows for locations like
> > > 46^78 which I do not think are used anywhere. To stress, notation 46^47 is
> > > essential when you have clean insertions between residues.
> > >
> > >
> > > Currently we have Bio::LocationI which defines the interface,
> > > Bio::Location::Simple and two subclasses of Simple: Bio::Location::Fuzzy
> > > and Bio::Location::Split.
> > >
> > > What I'd like to have is to rename the current Simple into Atomic to be a
> > > common superclass and recreate Bio::Location::Simple so that it can have
> > > two values for the method location_type(): 'EXACT' and  'IN-BETWEEN'
> > > ('TWEEN', 'TWIXT' ?). The object will throw an error if location_type() is
> > > 'TWEEN' and start() and end() are both defined and not adjacent. The length
> > > of 'TWIXT' location is always zero. The default value of location_type()
> > > will be 'EXACT'.
> > >
> > >
> > > In practice the code changes seem to be easy to make and there might even
> > > be slight speed increase: Current Simple does some thing slightly
> > > convoluted way because methods are inherited by Fuzzy and Split.
> > > Using Bio::Location::Simple in scripts and other modules is made more
> > > complicated only if you are conserned about insertions (your should be!).
> > > You can then test either location_type() or lenght().
> > >
> > >
> > > The only other place in bioperl core outside Bio::Location that I have
> > > found to be affected is FTHelper.pm where one more condition needs to be
> > > added.
> > >
> > >
> > > I have almost all the code changes ready for committing.
> > >
> > > Any comments?
> > >
> > > 	-Heikki
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------