[Bioperl-l] Information content of alignment

Malay mbasu at mail.nih.gov
Thu May 13 13:59:46 EDT 2004


Shawn Hoon wrote:

> 
> On May 13, 2004, at 6:32 AM, martin wrote:
> 
>> Hi Malay,
>>
>> Not quite sure what you mean by 'information content'.  You can access a
>> single column of an alignment using the slice() function:
>>
>> $aln2 = $aln->slice(20, 30)
>>
>> which returns another AlignI object.  So something like;
>>
>> foreach (0..$aln->length){
>>     my $column=$aln->slice($_, $_);
>>     # $column is now an AlignI object
>>     # do something with it....
>> }
>>
> 
> 
> I had written something similar for Bio::Graphics::Pictogram, but there 
> is nothing explicit
> right now that I can think of. Maybe it would be useful to add to 
> SimpleAlign.
> Something I would do, continuing from the code above, once you get the 
> slice you can start counting the frequencies:
> my $pos = 1;
> foreach (0..$aln->length){
>     my $column = $aln->slice($_,$_);
>     my @seq = $column->each_seq;
>     my $total = 0;
>     foreach my $letter(@seq){
>              $hash{$pos}{$letter->seq}++;
>         $total++;
>     }
>     $hash{$pos}{'total'} = $total;
>     $pos++;
> }
> #calculate entropy
> foreach my $pos(sort{$a<=>$b} keys %hash){
>   my $ent;
>   foreach my $base(keys %{$hash{$pos}}){
>     my $freq = $hash{$pos}{$base}/$hash{$pos}{'total'};
>     $ent += -1 * $freq*log2($freq);
>   }
>   print "Position $pos, entropy: $ent bits \n";
> }
> 
> sub log2{
>   my ($x) = @_;
>   return 0 if $x==0;
>   return log($x)/log(2);
> }
> 

Good. This is more or less what I wanted. Thanks a lot. But ofcourse it 
will be nice to get it implemented in SimpleAlign class.

Malay


More information about the Bioperl-l mailing list