[Bioperl-l] Getting IC & Consensus with
Bio::Matrix::PSM::SiteMatrix
Stefan Kirov
skirov at utk.edu
Mon Mar 14 08:15:43 EST 2005
Edward,
The rules for too low are:
single base probability>0.7; combination of two>0.8 and three>0.9 for
IUPAC consensus and >0.5 for simple consensus. Actually you can
recalculate the consensus by doing:
$matrix->_calculate_consensus(0.45) (naturally will set the consensus at
0.45). Probably I should document this, though generally speaking this
method is internal use only. However if you do this, you will have
A=>0.46,C=>0.01,G=>0.48,T=>0.05) and then you will get A in the
consensus (which is obviously incorrect, first base to surpass the
thresh). I can fix this, but do you really want to get in your consensus
a position with proba less than 0.5? If you use IUPAC you will get H
(A+T+C). We can easily add IC calculating method if you really need it.
Please let me know if you have further questions.
Stefan
James Thompson wrote:
>Edward,
>
>1. There is no code in SiteMatrix (or any of other other Bio::Matrix::PSM modules
>as far as I know) that calculates information content for you. It's assumed to
>provided as a parameter to the constructor rather than calculated by the
>SiteMatrix object itself.
>
>2. I don't know the exact reasoning behind this implementation for calculating
>ambiguity, but here's the algorithm to calculate the consensus for an individual
>position:
>
> - Take the frequencies for a given position, multiply them all by ten and divide
> by the total number of characters at that position. In your example for the third
> position, we would transform these numbers:
> { A => 3, T => 6, C => 2, G => 1 }
>
> into this set of numbers:
> { A => 2.5, T => 3, C => 1.667, G => 0.833 }
>
> - If none of these numbers are above the threshold (which defaults to 5),
> then return an N for this position.
>
>This algorithm is in the _to_cons method of the Bio::Matrix::PSM::SiteMatrix module
>if you'd like to take a peek.
>
>I'll defer your other questions to Stefan and the rest of the list. :)
>
>Cheers,
>
>James Thompson
>
>On Mon, 14 Mar 2005, Edward Wijaya wrote:
>
>
>
>>Hi,
>>
>>Why my code below fails to return the IC values?
>>I thought the method is able to do that.
>>Is there anything I miss here?
>>
>>My second question is about"consensus" method.
>>The consensus is generated by choosing the highest probability OR *N if
>>prob is too low*
>>
>>1. How do you define when the probability is *too low*?
>>2. What is the reasoning behind this implementation?
>> e.g. Why my code below gives 'TANGTA' instead of "TATGTA"?
>>
>>I find this particular module is very very useful.
>>I really wish I can make best use of it.
>>
>>Thanks so much for your time.
>>Hope to hear from you again.
>>
>>---
>>Regards,
>>Edward WIJAYA
>>SINGAPORE
>>
>>
>>__BEGIN__
>>
>>#!/usr/bin/perl -w
>>use strict;
>>use Data::Dumper;
>>use Bio::Matrix::PSM::SiteMatrix;
>>
>> #Frequency matrix
>> my @pA = (2,19,3,6,8,10);
>> my @pT = (7,3,6,2,20,5);
>> my @pC = (1,2,2,1,1,1);
>> my @pG = (3,1,1,9,8,7);
>>
>>
>>my %param =( -pA=>\@pA,-pC=>\@pC,-pG=>\@pG,-pT=>\@pT);
>>my $site=new Bio::Matrix::PSM::SiteMatrix(%param);
>>
>>my $consensus = $site->consensus;
>>my $ic = $site->IC; #Why it fails here?
>>
>>
>>print Dumper $ic;
>>print Dumper $consensus;
>>
>>
>
>
>
--
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov at utk.edu
sao at ornl.gov
"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"
More information about the Bioperl-l
mailing list