[Bioperl-l] Bio::Tools::SeqStats count_codons

Thu Sep 29 03:31:32 EDT 2005

With further feedback from Govind, the count_codons() method is now fixed. The 
main problem was that codon counting was case sensitive. All codons are now 
canonised to upper case. Also, you can now set the verbosity level to 
sub-zero to suppress the warning about ambiguous characters.

 -Heikki

On Wednesday 28 September 2005 16:22, Heikki Lehvaslaiho wrote:
> Govind,
>
> Not having your data files, I truncated your code down to essentials:
>
> #=============================================
> use Bio::Seq;
> use Bio::Tools::SeqStats;
>
> $tseq = "atcgatgctagcwgcatgcatgctagwctag";
> #$tseq = "ATCGATGCTAGCWGCATGCATGCTAGWCTAG";
>
> $codobj=Bio::Seq->new('-seq' => $tseq,
>         '-alphabet' => 'dna'
>         );
> $ss=Bio::Tools::SeqStats->new(-seq => $codobj);
> $refcount = $ss->count_codons();
>
> foreach $codon (sort keys(%$refcount)) {
>     print("$codon   $refcount->{$codon}\n");
> }
> #================================================
>
>
> I understand that you get different results depending case. Try this with
> both $tseq values. I get a warning and same number of codons in both cases
> (ignoring the case). Do you?
>
>  -Heikki
>
> On Tuesday 27 September 2005 16:15, Govind Chandra wrote:
> > The count_codons method in Bio::Tools::SeqStats seems to be case
> > sensitive. See script below. Is this intended? Or am I doing something
> > silly.
> >
> > ### script begin ###
> > use Bio::SeqIO;
> > use Bio::Seq;
> > use Bio::Tools::SeqStats;
> > $seqio=Bio::SeqIO->new('-file' => 'Ssc.fas',
> >                        '-format' => 'fasta');
> >
> > $seqobj=$seqio->next_seq();
> >
> > open(ORFS, "<ordered_all_orfs.out");
> >
> > while(<ORFS>) {
> > chomp();
> > ($strand, $start, $end, $nt_len, $aa_len)=split(/\s+/, $_);
> > #print("$strand   $start   $end   $nt_len   $aa_len\n");
> >
> > if($strand==1) {
> > $cdsobj=$seqobj->trunc($start, $end);
> > }
> > elsif($strand==-1) {
> > $cdsobj=$seqobj->trunc($start, $end)->revcom();
> > }
> >
> > $tseq=uc($cdsobj->seq());  # works
> > #$tseq=$cdsobj->seq(); # does not work. given and ambiguous count only
> >
> > $codobj=Bio::Seq->new('-seq' => $tseq,
> >         '-alphabet' => 'dna'
> >         );
> >
> > print(">$strand $aa_len\n",$cdsobj->seq(),"\n");
> >
> > $ss=Bio::Tools::SeqStats->new(-seq => $codobj);
> >
> > $refcount = $ss->count_codons();
> >
> > foreach $codon (keys(%$refcount)) {
> > print("$codon   $refcount->{$codon}\n");
> > }
> > }
> >
> > ### script end ###
> >
> > Cheers
> >
> > Govind
> > John Innes Centre
> > Norwich UK.
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_ebi _ac _uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambridge, CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________