[Bioperl-l] Per-column conservation of multiple alignment in Perl

Chad Davis chad.a.davis at gmail.com
Wed Jun 15 11:21:02 UTC 2011


I asked this on BioStar, but then started thinking a patch to
Bio::SimpleAlign would be easy, depending on what people here think
...

http://biostar.stackexchange.com/questions/9196/per-column-conservation-of-multiple-alignment-in-perl

Given a Bio::SimpleAlign, what is the best way to get per-column
conservation scores. E.g. into an array of values in [0:1] where the
array length would be the same as $align->length. I don't find
anything like this in Bio::SimpleAlign. I'm looking for a function
that allows:

my $io = Bio::AlignIO->new(-file=>$file);
my $align = $io->next_aln;
my @cons = $align->percentage_identity_by_column(); # <- does this exist?
print "@cons";
# 0.75 1.0 1.0 1.0 0.64 ....
Or should I just concat the gapped sequence, use substr() to extract
the characters and count them with a hash and return the frequency of
the most frequent character per column?

It looks like the private method Bio::SimpleAlign::_consensus_aa()
already does most of this, but it returns the character rather than
the fraction, which is what I was looking for. Short of submitting a
patch for that, is there a better approach?

Would there be general interest in such a patch to get per-column
conservation of multiple alignments?

Chad



More information about the Bioperl-l mailing list