[BioRuby] Consensus sequence

James Keener jimktrainslists at gmail.com
Wed Aug 4 19:32:02 UTC 2010


At alignment.rb:118 there is this function:

      # Returns consensus character of the site.
      # If consensus is found, eturns a single-letter string.
      # If not, returns nil.
      def consensus_string(threshold = 1.0)
        return nil if self.size <= 0
        return self[0] if self.sort.uniq.size == 1
        h = Hash.new(0)
        self.each { |x| h[x] += 1 }
        total = self.size
        b = h.to_a.sort do |x,y|
          z = (y[1] <=> x[1])
          z = (self.index(x[0]) <=> self.index(y[0])) if z == 0
          z
        end
        if total * threshold <= b[0][1] then
          b[0][0]
        else
          nil
        end
      end

Now, I have 2 questions about it.
1) Why is it sorting? Shouldn't it use a linear search?
2) How can the count of the greatest residue (b[0][1]) be larger than or equal to the total number of residues?


Also, there is a whole set of functions I am adding (group entropy and some book keeping/housecleaning things) and would like to commit them back.  What is the best way to commit them back?  

Jim



More information about the BioRuby mailing list