[Bioperl-l] calculate the frequency of occurrence of themostcommonly observed amino acid at each position ofmultiplesequence alignment
Dylan Krishnan
dylankrishnan at gmail.com
Sat Feb 7 20:51:39 UTC 2009
Thanks Mark! I'm still working on this - as a newbie, I'm still digesting
your suggestions - here is what I think I want to do for a multiple sequence
alignment -
1. find the total number of residues,n, in the alignment
2. find the total number of a specific residue, x, in an alignment
3. find the totalk number of times a residue,x, appears at a specific site
4. total number of sequences in an alignment.
I initially thought about writing a single script to generate all these
parameters but now think four separate (read: unsophisticated and utterly
reductionist) scripts will do...
I think your suggestions will clearly help me on this quest!
-dylan
On Sat, Feb 7, 2009 at 2:36 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> oops-bugs in that. Try
>
> my $len = length($seqs[0]);
>> my @residue_counts;
>> my %h;
>> foreach (0..$len-1) {
>> %h = ();
>> foreach $seq (@seqs) {
>> $h{ substr($seq, $_, 1) }++;
>> }
>> push @residue_counts, {%h};
>> }
>>
>
>
> ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
> To: "Dylan Krishnan" <dylankrishnan at gmail.com>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Saturday, February 07, 2009 11:56 AM
> Subject: Re: [Bioperl-l] calculate the frequency of occurrence of
> themostcommonly observed amino acid at each position ofmultiplesequence
> alignment
>
>
>
> Dylan- It's worth mentioning that the BioPerl method is very
>> overhead-heavy; all
>> the objects make it easy to just write a few lines, but probably won't be
>> the absolute
>> fastest way to do what you want. Another path to follow would be
>>
>> # your seqs are plain strings in the array @seqs, and are aligned and same
>> length
>> my $len = length($seqs[0]);
>> my @residue_counts;
>> foreach (0..$len-1) {
>> my %h = ();
>> foreach $seq (@seqs) {
>> $h{ substr($seq, $_, 1) }++;
>> }
>> push @residue_counts, \%h;
>> }
>>
>> Now, for each elt in @residue_counts (each elt is a reference to a hash),
>> look for the
>> key that has the maximum hash value. The snippet above is also worth
>> working
>> through for the educational value, esp. w/r to using hashes, which (IMHO)
>> are one of
>> the absolutely coolest thing about Perl.
>>
>> cheers- MAJ
>> ----- Original Message ----- From: Dylan Krishnan
>> To: Mark A. Jensen
>> Cc: bioperl-l at lists.open-bio.org
>> Sent: Saturday, February 07, 2009 11:43 AM
>> Subject: Re: [Bioperl-l] calculate the frequency of occurrence of the
>> mostcommonly observed amino acid at each position of multiplesequence
>> alignment
>>
>>
>> thanks mark!
>>
>> the authors other approach is to load the alignment into a MS Excel
>> worksheet and use the "autofilter" procedure to count the occurrences of any
>> residue position of the alignment. the claim is "that excel is uselful for
>> this purpose."sounds reasonable for 10 alignments but not 2000!
>>
>> again, many thanks.
>>
>>
>> -dylan
>>
>> On Sat, Feb 7, 2009 at 10:25 AM, Mark A. Jensen <maj at fortinbras.us>
>> wrote:
>>
>> Dylan,
>>
>> This is an extremely good exercise for anyone learning Perl to do
>> bioinformatics.
>> When you have done many exercises like this, you will see what people
>> mean
>> when they say it is very straightforward.
>>
>> Here are some hints:
>>
>> Use the "entropy" scrap at
>> http://www.bioperl.org/wiki/Site_entropy_in_an_alignment .
>> You will convert the function entropy_by_column() into the function you
>> need.
>> Replace the line
>>
>> $ent{$col} = entropy(values %res);
>>
>> with a line you will write using the "hash key at max value" scrap,
>> found
>> here: http://www.bioperl.org/wiki/Hash_key_at_the_max_value .
>>
>> Happy coding!
>> Mark
>>
>> ----- Original Message ----- From: "Dylan Krishnan" <
>> dylankrishnan at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Saturday, February 07, 2009 11:10 AM
>> Subject: [Bioperl-l] calculate the frequency of occurrence of the
>> mostcommonly observed amino acid at each position of multiplesequence
>> alignment
>>
>>
>>
>> I am new to perl but this is somethign I am seeking to do either
>> through a
>> bioperl module or just perl. Apparently, this is quite
>> "straightforward
>> using PERL," but I beg to differ.
>>
>> Any assistance regarding this matter would be greatly appreciated.
>>
>> Thanks!
>>
>> -dylan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>
More information about the Bioperl-l
mailing list