[Biopython-dev] tiny Align.AlignInfo patch

Ivan Rossi ivan at biodec.com
Tue Dec 13 15:35:05 EST 2005


Dear BioPythoneers,
   I am submitting a tiny patch to the pos_specific_score_matrix method of 
Bio.Align.AlignInfo

It allows for the generation of PSSMs composed by the "alphabet+gap" symbols. 
I use it all the time to generate 21-symbols PSSMs for proteins, that we use 
as inputs for neural networks and HMMs.

The patch is not invasive at all and it preserves the default behavior of 
AlignInfo.pos_specific_score_matrix()

I hope it will be considered for inclusion in the CVS.

Ivan

--
  Ivan Rossi, Ph.D. - ivan AT biodec dot com OR ivan dot rossi3 AT unibo dot it
  BioDec s.r.l., Via Fanin 48, I-40127 Bologna (Italy)
  Phone: +39-051-4200321 - fax: +39-051-4200317 - web: www.biodec.com
-------------- next part --------------
*** AlignInfo.py.orig	Tue Dec 13 18:09:22 2005
--- AlignInfo.py	Tue Dec 13 18:18:40 2005
***************
*** 335,341 ****
  
  
      def pos_specific_score_matrix(self, axis_seq = None,
!                                   chars_to_ignore = []):
          """Create a position specific score matrix object for the alignment.
  
          This creates a position specific score matrix (pssm) which is an
--- 335,342 ----
  
  
      def pos_specific_score_matrix(self, axis_seq = None,
!                                   chars_to_ignore = [],
!                                   drop_gap_char = True):
          """Create a position specific score matrix object for the alignment.
  
          This creates a position specific score matrix (pssm) which is an
***************
*** 348,353 ****
--- 349,357 ----
          put on the axis of the PSSM. This should be a Seq object. If nothing
          is specified, the consensus sequence, calculated with default
          parameters, will be used.
+         o drop_gap_char - An optional boolean parameter to specify if the gap 
+         symbol has to be accounted for in the pssm. Useful to generate the 
+         "alphabet+gap" PSSMs used by some remote-homologi detection codes.
  
          Returns:
          o A PSSM (position specific score matrix) object.
***************
*** 355,363 ****
          # determine all of the letters we have to deal with
          all_letters = self.alignment._alphabet.letters
  
!         # if we have a gap char, add it to stuff to ignore
!         if isinstance(self.alignment._alphabet, Alphabet.Gapped):
!             chars_to_ignore.append(self.alignment._alphabet.gap_char)
          
          for char in chars_to_ignore:
              all_letters = string.replace(all_letters, char, '')
--- 359,368 ----
          # determine all of the letters we have to deal with
          all_letters = self.alignment._alphabet.letters
  
!         if drop_gap_char:
!             # if we have a gap char, add it to stuff to ignore
!             if isinstance(self.alignment._alphabet, Alphabet.Gapped):
!                 chars_to_ignore.append(self.alignment._alphabet.gap_char)
          
          for char in chars_to_ignore:
              all_letters = string.replace(all_letters, char, '')


More information about the Biopython-dev mailing list