[Bioperl-l] Refactoring Locations...

Heikki Lehvaslaiho heikki@ebi.ac.uk
Thu, 27 Jun 2002 17:34:23 +0100


I ran into a small problem with Bio::Locations and would like to slightly 
refactor them.

 From my point of view there are three types of exact sequence locations 
which in feature table notation are: 23, 34..55 and 46^47. The first two are 
handled by Bio::Location::Simple and have location_type('EXACT'). The last 
one is lumped into location_type('BETWEEN') together with locations like 
46^78 and handled by Bio::Location::Fuzzy. The source for the confusion is 
that the feature table definition allows for locations like 46^78 which I do 
not think are used anywhere. To stress, notation 46^47 is essential when you 
have clean insertions between residues.


Currently we have Bio::LocationI which defines the interface, 
Bio::Location::Simple and two subclasses of Simple: Bio::Location::Fuzzy and 
  Bio::Location::Split.

What I'd like to have is to rename the current Simple into Atomic to be a 
common superclass and recreate Bio::Location::Simple so that it can have two 
  values for the method location_type(): 'EXACT' and  'IN-BETWEEN' ('TWEEN', 
'TWIXT' ?). The object will throw an error if location_type() is 'TWEEN' and
start() and end() are both defined and not adjacent. The length of 'TWIXT' 
location is always zero. The default value of location_type() will be 'EXACT'.


In practice the code changes seem to be easy to make and there might even be 
slight speed increase: Current Simple does some thing slightly convoluted 
way because methods are inherited by Fuzzy and Split.
Using Bio::Location::Simple in scripts and other modules is made more 
complicated only if you are conserned about insertions (your should be!). 
You can then test either location_type() or lenght().


The only other place in bioperl core outside Bio::Location that I have found 
to be affected is FTHelper.pm where one more condition needs to be added.


I have almost all the code changes ready for committing.

Any comments?

	-Heikki

-- 
______ _/      _/_____________________________________________________
       _/      _/                      http://www.ebi.ac.uk/mutations/
      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________