[Bioperl-l] statistics of sequences

Heikki Lehvaslaiho heikki at ebi.ac.uk
Wed Apr 21 10:53:50 EDT 2004


Sujou,

You are on the right track to use OddCodes. The OddCode methods give you back 
a refrence to a plain string. (which, BTW, you can store into a specialized 
sequence object of type Bio::Seq::Meta) that can be manipulated using 
standard perl functions.

Here are two possibilities:

# 1. works without knowing the characters

my %hash;
for (split / */, $new_coding5) {
    $hash{$_}++;
}
for (keys %hash) {
    print $_, ": ", $hash{$_}, "\n";
}

#2. you have to know what you are looking for

my ($O) = $new_coding5 =~ tr/O//;
print "O: $O\n";
my ($I) = $new_coding5 =~ tr/I//;
print "I: $I\n";

###

There are more ways of doing the same thing, all depends on what you want to 
do with the data. 

Yours,
	-Heikki

On Tuesday 20 Apr 2004 23:36, S.Paul wrote:
> Hi Everybody:
>
> I am pretty new to bioperl and am trying to find the statistics of the
> polarity of amino acids in the protein sequence eg. how many are polar,
> hydrophobic etc.  I tried using the SeqStats to calculate the mol wt  and
> the number of A and C but cannot calculate the number of hydrophobic acids
> present.  I am enclosing the portion of the code.  I would appreciate if
> anybody can offer any suggestions in this regard.
>
> ***************************************************************************
>************************************************** my $seq_stats =
> Bio::Tools::SeqStats->new($seq);
> my $weight = $seq_stats->get_mol_wt();
> #note $weight is an array
> print " the weight is ", $$weight[0], "\n";
> my $monomer_ref = $seq_stats->count_monomers();
> print "Number of A\'s in sequence is $$monomer_ref{'A'} \n";
> print "Number of C\'s in sequence is $$monomer_ref{'C'} \n";
> print "Number of T\'s in sequence is $$monomer_ref{'T'} \n";
> print "Number of G\'s in sequence is $$monomer_ref{'G'} \n";
>
>
> print "\-----------------------------------------------\n";
> my $oddcode_obj = Bio::Tools::OddCodes->new(-seq =>$seq);
> #returns the reference
>
> my $output1 = $oddcode_obj->charge();
> my $output2 = $oddcode_obj->structural();
> my $output3 = $oddcode_obj->chemical();
> my $output4 = $oddcode_obj->functional();
>
> my $output5= $oddcode_obj->hydrophobic();
>
> #displays
> my $new_coding1 =$$output1;
> print "\nthe charge of the sequence is $new_coding1";
>
> print "\-----------------------------------------------\n";
>
> my $new_coding2 =$$output2;
> print "\nthe structural sequence $new_coding2";
> print "\-----------------------------------------------\n";
> my $new_coding3 =$$output3;
> print "\n the chemical structure is : $new_coding3";
> print "\-----------------------------------------------\n";
> my $new_coding4 =$$output4;
> print "\n the functional nature of the protein: $new_coding4";
> print "\-----------------------------------------------\n";
>
>   my $new_coding5 =$$output5;
> print "\n the hydrophobic nature of the protein: $new_coding5";
>
> ***************************************************************************
>*******************************************
>
> Thanks
>
> Sujoy Paul
> Sujoy Paul, PRISE Centre, UniS, s.paul at surrey.ac.uk

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki_at_ebi ac uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________


More information about the Bioperl-l mailing list