[BioRuby] Consensus sequence
    James Keener 
    jimktrainslists at gmail.com
       
    Wed Aug  4 19:32:02 UTC 2010
    
    
  
At alignment.rb:118 there is this function:
      # Returns consensus character of the site.
      # If consensus is found, eturns a single-letter string.
      # If not, returns nil.
      def consensus_string(threshold = 1.0)
        return nil if self.size <= 0
        return self[0] if self.sort.uniq.size == 1
        h = Hash.new(0)
        self.each { |x| h[x] += 1 }
        total = self.size
        b = h.to_a.sort do |x,y|
          z = (y[1] <=> x[1])
          z = (self.index(x[0]) <=> self.index(y[0])) if z == 0
          z
        end
        if total * threshold <= b[0][1] then
          b[0][0]
        else
          nil
        end
      end
Now, I have 2 questions about it.
1) Why is it sorting? Shouldn't it use a linear search?
2) How can the count of the greatest residue (b[0][1]) be larger than or equal to the total number of residues?
Also, there is a whole set of functions I am adding (group entropy and some book keeping/housecleaning things) and would like to commit them back.  What is the best way to commit them back?  
Jim
    
    
More information about the BioRuby
mailing list