[Bioperl-l] Information content of alignment
Malay
mbasu at mail.nih.gov
Thu May 13 13:59:46 EDT 2004
Shawn Hoon wrote:
>
> On May 13, 2004, at 6:32 AM, martin wrote:
>
>> Hi Malay,
>>
>> Not quite sure what you mean by 'information content'. You can access a
>> single column of an alignment using the slice() function:
>>
>> $aln2 = $aln->slice(20, 30)
>>
>> which returns another AlignI object. So something like;
>>
>> foreach (0..$aln->length){
>> my $column=$aln->slice($_, $_);
>> # $column is now an AlignI object
>> # do something with it....
>> }
>>
>
>
> I had written something similar for Bio::Graphics::Pictogram, but there
> is nothing explicit
> right now that I can think of. Maybe it would be useful to add to
> SimpleAlign.
> Something I would do, continuing from the code above, once you get the
> slice you can start counting the frequencies:
> my $pos = 1;
> foreach (0..$aln->length){
> my $column = $aln->slice($_,$_);
> my @seq = $column->each_seq;
> my $total = 0;
> foreach my $letter(@seq){
> $hash{$pos}{$letter->seq}++;
> $total++;
> }
> $hash{$pos}{'total'} = $total;
> $pos++;
> }
> #calculate entropy
> foreach my $pos(sort{$a<=>$b} keys %hash){
> my $ent;
> foreach my $base(keys %{$hash{$pos}}){
> my $freq = $hash{$pos}{$base}/$hash{$pos}{'total'};
> $ent += -1 * $freq*log2($freq);
> }
> print "Position $pos, entropy: $ent bits \n";
> }
>
> sub log2{
> my ($x) = @_;
> return 0 if $x==0;
> return log($x)/log(2);
> }
>
Good. This is more or less what I wanted. Thanks a lot. But ofcourse it
will be nice to get it implemented in SimpleAlign class.
Malay
More information about the Bioperl-l
mailing list