[Bioperl-l] Refactoring Locations...

Heikki Lehvaslaiho heikki@ebi.ac.uk
Mon, 01 Jul 2002 16:19:40 +0100


'IN-BETWEEN' it will be.

	-Heikki

Lincoln Stein wrote:
> The suggested refactoring sounds correct.  I prefer IN-BETWEEN to TWEEN or 
> TWIXT.
> 
> As a meta comment, life would be much easier if positions were described 
> (perhaps internally) as zero-based half open intervals, which is the way that 
> all sensible graphics code does it (I first learned the concepts working with 
> Apple's QuickDraw).  In half-open intervals, the coordinates refer to the 
> spaces between the nucleotides, rather than to the nucleotides themselves.  
> For the dinucleotide AG, the following mappings hold:
> 
> 	coordinate		sequence
> 
> 	(0,1)			A
> 	(0,2)			AG
> 	(1,1)			space between A & G
> 
> Note that in half-open intervals, the length of the sequence is always end 
> minus start, and that you can do coordinate arithmetic withoug adding and 
> subtracting 1's.
> 
> Lincoln
> 
> 
> On Thursday 27 June 2002 12:34 pm, Heikki Lehvaslaiho wrote:
> 
>>I ran into a small problem with Bio::Locations and would like to slightly
>>refactor them.
>>
>> From my point of view there are three types of exact sequence locations
>>which in feature table notation are: 23, 34..55 and 46^47. The first two
>>are handled by Bio::Location::Simple and have location_type('EXACT'). The
>>last one is lumped into location_type('BETWEEN') together with locations
>>like 46^78 and handled by Bio::Location::Fuzzy. The source for the
>>confusion is that the feature table definition allows for locations like
>>46^78 which I do not think are used anywhere. To stress, notation 46^47 is
>>essential when you have clean insertions between residues.
>>
>>
>>Currently we have Bio::LocationI which defines the interface,
>>Bio::Location::Simple and two subclasses of Simple: Bio::Location::Fuzzy
>>and Bio::Location::Split.
>>
>>What I'd like to have is to rename the current Simple into Atomic to be a
>>common superclass and recreate Bio::Location::Simple so that it can have
>>two values for the method location_type(): 'EXACT' and  'IN-BETWEEN'
>>('TWEEN', 'TWIXT' ?). The object will throw an error if location_type() is
>>'TWEEN' and start() and end() are both defined and not adjacent. The length
>>of 'TWIXT' location is always zero. The default value of location_type()
>>will be 'EXACT'.
>>
>>
>>In practice the code changes seem to be easy to make and there might even
>>be slight speed increase: Current Simple does some thing slightly
>>convoluted way because methods are inherited by Fuzzy and Split.
>>Using Bio::Location::Simple in scripts and other modules is made more
>>complicated only if you are conserned about insertions (your should be!).
>>You can then test either location_type() or lenght().
>>
>>
>>The only other place in bioperl core outside Bio::Location that I have
>>found to be affected is FTHelper.pm where one more condition needs to be
>>added.
>>
>>
>>I have almost all the code changes ready for committing.
>>
>>Any comments?
>>
>>	-Heikki
> 
> 


-- 
______ _/      _/_____________________________________________________
       _/      _/                      http://www.ebi.ac.uk/mutations/
      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________