[Bioperl-l] calculate the frequency of occurrence of the mostcommonly observed amino acid at each position of multiplesequence alignment
Mark A. Jensen
maj at fortinbras.us
Sat Feb 7 16:56:30 UTC 2009
Dylan- It's worth mentioning that the BioPerl method is very overhead-heavy; all
the objects make it easy to just write a few lines, but probably won't be the absolute
fastest way to do what you want. Another path to follow would be
# your seqs are plain strings in the array @seqs, and are aligned and same length
my $len = length($seqs[0]);
my @residue_counts;
foreach (0..$len-1) {
my %h = ();
foreach $seq (@seqs) {
$h{ substr($seq, $_, 1) }++;
}
push @residue_counts, \%h;
}
Now, for each elt in @residue_counts (each elt is a reference to a hash), look for the
key that has the maximum hash value. The snippet above is also worth working
through for the educational value, esp. w/r to using hashes, which (IMHO) are one of
the absolutely coolest thing about Perl.
cheers- MAJ
----- Original Message -----
From: Dylan Krishnan
To: Mark A. Jensen
Cc: bioperl-l at lists.open-bio.org
Sent: Saturday, February 07, 2009 11:43 AM
Subject: Re: [Bioperl-l] calculate the frequency of occurrence of the mostcommonly observed amino acid at each position of multiplesequence alignment
thanks mark!
the authors other approach is to load the alignment into a MS Excel worksheet and use the "autofilter" procedure to count the occurrences of any residue position of the alignment. the claim is "that excel is uselful for this purpose."sounds reasonable for 10 alignments but not 2000!
again, many thanks.
-dylan
On Sat, Feb 7, 2009 at 10:25 AM, Mark A. Jensen <maj at fortinbras.us> wrote:
Dylan,
This is an extremely good exercise for anyone learning Perl to do bioinformatics.
When you have done many exercises like this, you will see what people mean
when they say it is very straightforward.
Here are some hints:
Use the "entropy" scrap at http://www.bioperl.org/wiki/Site_entropy_in_an_alignment .
You will convert the function entropy_by_column() into the function you need.
Replace the line
$ent{$col} = entropy(values %res);
with a line you will write using the "hash key at max value" scrap, found
here: http://www.bioperl.org/wiki/Hash_key_at_the_max_value .
Happy coding!
Mark
----- Original Message ----- From: "Dylan Krishnan" <dylankrishnan at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Saturday, February 07, 2009 11:10 AM
Subject: [Bioperl-l] calculate the frequency of occurrence of the mostcommonly observed amino acid at each position of multiplesequence alignment
I am new to perl but this is somethign I am seeking to do either through a
bioperl module or just perl. Apparently, this is quite "straightforward
using PERL," but I beg to differ.
Any assistance regarding this matter would be greatly appreciated.
Thanks!
-dylan
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list