[Bioperl-l] statistics of sequences
Heikki Lehvaslaiho
heikki at ebi.ac.uk
Wed Apr 21 10:53:50 EDT 2004
Sujou,
You are on the right track to use OddCodes. The OddCode methods give you back
a refrence to a plain string. (which, BTW, you can store into a specialized
sequence object of type Bio::Seq::Meta) that can be manipulated using
standard perl functions.
Here are two possibilities:
# 1. works without knowing the characters
my %hash;
for (split / */, $new_coding5) {
$hash{$_}++;
}
for (keys %hash) {
print $_, ": ", $hash{$_}, "\n";
}
#2. you have to know what you are looking for
my ($O) = $new_coding5 =~ tr/O//;
print "O: $O\n";
my ($I) = $new_coding5 =~ tr/I//;
print "I: $I\n";
###
There are more ways of doing the same thing, all depends on what you want to
do with the data.
Yours,
-Heikki
On Tuesday 20 Apr 2004 23:36, S.Paul wrote:
> Hi Everybody:
>
> I am pretty new to bioperl and am trying to find the statistics of the
> polarity of amino acids in the protein sequence eg. how many are polar,
> hydrophobic etc. I tried using the SeqStats to calculate the mol wt and
> the number of A and C but cannot calculate the number of hydrophobic acids
> present. I am enclosing the portion of the code. I would appreciate if
> anybody can offer any suggestions in this regard.
>
> ***************************************************************************
>************************************************** my $seq_stats =
> Bio::Tools::SeqStats->new($seq);
> my $weight = $seq_stats->get_mol_wt();
> #note $weight is an array
> print " the weight is ", $$weight[0], "\n";
> my $monomer_ref = $seq_stats->count_monomers();
> print "Number of A\'s in sequence is $$monomer_ref{'A'} \n";
> print "Number of C\'s in sequence is $$monomer_ref{'C'} \n";
> print "Number of T\'s in sequence is $$monomer_ref{'T'} \n";
> print "Number of G\'s in sequence is $$monomer_ref{'G'} \n";
>
>
> print "\-----------------------------------------------\n";
> my $oddcode_obj = Bio::Tools::OddCodes->new(-seq =>$seq);
> #returns the reference
>
> my $output1 = $oddcode_obj->charge();
> my $output2 = $oddcode_obj->structural();
> my $output3 = $oddcode_obj->chemical();
> my $output4 = $oddcode_obj->functional();
>
> my $output5= $oddcode_obj->hydrophobic();
>
> #displays
> my $new_coding1 =$$output1;
> print "\nthe charge of the sequence is $new_coding1";
>
> print "\-----------------------------------------------\n";
>
> my $new_coding2 =$$output2;
> print "\nthe structural sequence $new_coding2";
> print "\-----------------------------------------------\n";
> my $new_coding3 =$$output3;
> print "\n the chemical structure is : $new_coding3";
> print "\-----------------------------------------------\n";
> my $new_coding4 =$$output4;
> print "\n the functional nature of the protein: $new_coding4";
> print "\-----------------------------------------------\n";
>
> my $new_coding5 =$$output5;
> print "\n the hydrophobic nature of the protein: $new_coding5";
>
> ***************************************************************************
>*******************************************
>
> Thanks
>
> Sujoy Paul
> Sujoy Paul, PRISE Centre, UniS, s.paul at surrey.ac.uk
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki_at_ebi ac uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list