[Bioperl-l] SimpleAlign changes

Heikki Lehvaslaiho heikki@ebi.ac.uk
Thu, 10 May 2001 10:37:40 +0100


Dear All,

Last week I decided to teach myself SimpleAlign. I found myself making
corrections and changes into documentation as well as into code and
writing dozens of tests. Before I commit anything, I'd like to list
what I found.

1. While Bio::AlignIO is declared to be a prefered way of reading in 
   alignments, Bio::SimpleAlign contains a number of methods doing the 
   same. Should't these be removed? Is all the funcionality of these 
   methods in Bio::AlignIO? [not done]

2. The method names in SimpleAlign should follow bioperl conventions
   (all lower case with words separated by undescores).
   The old names could be left but documetation should point to new 
   names. [done] Should depreciated names print a warning? [not done]

    	addSeq		->	add_seq
	eachSeq		->	each_seq
	eachSeqWithId	->	each_seq_with_id
        length_aln	->	length
	removeSeq	->	remove_seq


3. There seems to be two methods: maxname_length() and
maxnse_lenght(), 
   which are not used at all. There is a method
maxdisplayname_lenght() 
   which seems to have all the functionality needed. Can they be 
   removed or pointed to maxdisplayname_length()[done]?

4. Methods get_displayname() and set_displayname() should be merged 
   into displayname() and a test should be added to prevent using 
   nonexistant sequence names. (The method manipulates hashes directly 
   and new keys are automatically created when used: autovivication!)
   [done]

    Adding the test brough into light a problem: $seq->get_nse() is of 
    form "name/start-end" and $aln->addSeq/add_seq was using form
    "name-start-end". Changed '-' -> '/' in add_seq(). [done]

5. Bio::LocatableSeq has this key method get_nse() which has a 
   counterintuitive name. Couldn't we use name() while leaving the old 
   name for backward compatibility. [done] 

   Also, get_nse() accepts arguments 
   for separator characters (/ and '). This seems to be a relic  
   from pre-bioperl times.  Use of other characters needs be 
   discouraged/ability removed? [not done]

   Method name/get_nse() needs internal tests to make sure that all
   three values are set. [done]


6. Methods affecting all sequences in the alignment 
   (set_displayname_count(), set_displayname_flat(), 
   set_displayname_normal(), uppercase() ) need to return true (1) so 
   that they can be tested. [done]

7. Number of methods were calling depreciated PrimarySeqI methods. 
   [fixed]

8. Blast output parser Bio::Toold::BPbl2seq was leaving one space 
   character to the end of the sequence id resulting a nonsatandard 
   LocatableSeq name. [fixed]



After my changes all the AlignIO test pass and SimpleAlign test count
is up from 6 to 32. The only (non-IO) method that I did not test was
purge().

	-Heikki 

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________