[Bioperl-l] Refactoring Locations...

Heikki Lehvaslaiho heikki@ebi.ac.uk
Tue, 02 Jul 2002 09:14:08 +0100


It's done.

If you have any errors being generated from location in the CVS HEAD, I'd be 
happy to have a look at them.

Bio::Location::Fuzzy now complains if location like 23^24 is assigned to it.
You should use Bio::Location::Simple with location_type('IN-BETWEEN').

Location.t tests failed overnight failed because I forgot to add and commit 
Bio::Location::Atomic. Fixed.


There are really quite a lot of errors and warnings when running tests in 
the HEAD. It is difficult to see which are important and which are caused 
from missing binaries.

	-heikki

Heikki Lehvaslaiho wrote:
> 
> I ran into a small problem with Bio::Locations and would like to 
> slightly refactor them.
> 
>  From my point of view there are three types of exact sequence locations 
> which in feature table notation are: 23, 34..55 and 46^47. The first two 
> are handled by Bio::Location::Simple and have location_type('EXACT'). 
> The last one is lumped into location_type('BETWEEN') together with 
> locations like 46^78 and handled by Bio::Location::Fuzzy. The source for 
> the confusion is that the feature table definition allows for locations 
> like 46^78 which I do not think are used anywhere. To stress, notation 
> 46^47 is essential when you have clean insertions between residues.
> 
> 
> Currently we have Bio::LocationI which defines the interface, 
> Bio::Location::Simple and two subclasses of Simple: Bio::Location::Fuzzy 
> and  Bio::Location::Split.
> 
> What I'd like to have is to rename the current Simple into Atomic to be 
> a common superclass and recreate Bio::Location::Simple so that it can 
> have two  values for the method location_type(): 'EXACT' and  
> 'IN-BETWEEN' ('TWEEN', 'TWIXT' ?). The object will throw an error if 
> location_type() is 'TWEEN' and
> start() and end() are both defined and not adjacent. The length of 
> 'TWIXT' location is always zero. The default value of location_type() 
> will be 'EXACT'.
> 
> 
> In practice the code changes seem to be easy to make and there might 
> even be slight speed increase: Current Simple does some thing slightly 
> convoluted way because methods are inherited by Fuzzy and Split.
> Using Bio::Location::Simple in scripts and other modules is made more 
> complicated only if you are conserned about insertions (your should 
> be!). You can then test either location_type() or lenght().
> 
> 
> The only other place in bioperl core outside Bio::Location that I have 
> found to be affected is FTHelper.pm where one more condition needs to be 
> added.
> 
> 
> I have almost all the code changes ready for committing.
> 
> Any comments?
> 
>     -Heikki
> 


-- 
______ _/      _/_____________________________________________________
       _/      _/                      http://www.ebi.ac.uk/mutations/
      _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
     _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
    _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
   _/  _/  _/  Cambs. CB10 1SD, United Kingdom
      _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________