[Bioperl-l] Getting IC & Consensus with Bio::Matrix::PSM::SiteMatrix

James Thompson tex at biocompute.net
Sun Mar 13 12:35:23 EST 2005


Edward,

1. There is no code in SiteMatrix (or any of other other Bio::Matrix::PSM modules
as far as I know) that calculates information content for you. It's assumed to
provided as a parameter to the constructor rather than calculated by the
SiteMatrix object itself.

2. I don't know the exact reasoning behind this implementation for calculating
ambiguity, but here's the algorithm to calculate the consensus for an individual
position:

   - Take the frequencies for a given position, multiply them all by ten and divide
   by the total number of characters at that position. In your example for the third
   position, we would transform these numbers:
   { A => 3, T => 6, C => 2, G => 1 }

   into this set of numbers:
   { A => 2.5, T => 3, C => 1.667, G => 0.833 }

   - If none of these numbers are above the threshold (which defaults to 5),
   then return an N for this position.

This algorithm is in the _to_cons method of the Bio::Matrix::PSM::SiteMatrix module
if you'd like to take a peek.

I'll defer your other questions to Stefan and the rest of the list. :)

Cheers,

James Thompson

On Mon, 14 Mar 2005, Edward Wijaya wrote:

> Hi,
> 
> Why my code below fails to return the IC values?
> I thought the method is able to do that.
> Is there anything I miss here?
> 
> My second question is about"consensus" method.
> The consensus is generated by choosing the highest probability OR *N if  
> prob is too low*
> 
> 1. How do you define when the probability is *too low*?
> 2. What is the reasoning behind this implementation?
>     e.g. Why my code below gives 'TANGTA' instead of "TATGTA"?
> 
> I find this particular module is very very useful.
> I really wish I can make best use of it.
> 
> Thanks so much for your time.
> Hope to hear from you again.
> 
> ---
> Regards,
> Edward WIJAYA
> SINGAPORE
> 
> 
> __BEGIN__
> 
> #!/usr/bin/perl -w
> use strict;
> use Data::Dumper;
> use Bio::Matrix::PSM::SiteMatrix;
> 
>       #Frequency matrix
>       my  @pA = (2,19,3,6,8,10);
>       my  @pT = (7,3,6,2,20,5);
>       my  @pC = (1,2,2,1,1,1);
>       my  @pG = (3,1,1,9,8,7);
> 
> 
> my %param =( -pA=>\@pA,-pC=>\@pC,-pG=>\@pG,-pT=>\@pT);
> my $site=new Bio::Matrix::PSM::SiteMatrix(%param);
> 
> my $consensus = $site->consensus;
> my $ic = $site->IC; #Why it fails here?
> 
> 
> print Dumper $ic;
> print Dumper $consensus;



More information about the Bioperl-l mailing list