[Bioperl-l] Bio::Tools::SeqStats count_codons
Heikki Lehvaslaiho
heikki.lehvaslaiho at gmail.com
Thu Sep 29 03:31:32 EDT 2005
With further feedback from Govind, the count_codons() method is now fixed. The
main problem was that codon counting was case sensitive. All codons are now
canonised to upper case. Also, you can now set the verbosity level to
sub-zero to suppress the warning about ambiguous characters.
-Heikki
On Wednesday 28 September 2005 16:22, Heikki Lehvaslaiho wrote:
> Govind,
>
> Not having your data files, I truncated your code down to essentials:
>
> #=============================================
> use Bio::Seq;
> use Bio::Tools::SeqStats;
>
> $tseq = "atcgatgctagcwgcatgcatgctagwctag";
> #$tseq = "ATCGATGCTAGCWGCATGCATGCTAGWCTAG";
>
> $codobj=Bio::Seq->new('-seq' => $tseq,
> '-alphabet' => 'dna'
> );
> $ss=Bio::Tools::SeqStats->new(-seq => $codobj);
> $refcount = $ss->count_codons();
>
> foreach $codon (sort keys(%$refcount)) {
> print("$codon $refcount->{$codon}\n");
> }
> #================================================
>
>
> I understand that you get different results depending case. Try this with
> both $tseq values. I get a warning and same number of codons in both cases
> (ignoring the case). Do you?
>
> -Heikki
>
> On Tuesday 27 September 2005 16:15, Govind Chandra wrote:
> > The count_codons method in Bio::Tools::SeqStats seems to be case
> > sensitive. See script below. Is this intended? Or am I doing something
> > silly.
> >
> > ### script begin ###
> > use Bio::SeqIO;
> > use Bio::Seq;
> > use Bio::Tools::SeqStats;
> > $seqio=Bio::SeqIO->new('-file' => 'Ssc.fas',
> > '-format' => 'fasta');
> >
> > $seqobj=$seqio->next_seq();
> >
> > open(ORFS, "<ordered_all_orfs.out");
> >
> > while(<ORFS>) {
> > chomp();
> > ($strand, $start, $end, $nt_len, $aa_len)=split(/\s+/, $_);
> > #print("$strand $start $end $nt_len $aa_len\n");
> >
> > if($strand==1) {
> > $cdsobj=$seqobj->trunc($start, $end);
> > }
> > elsif($strand==-1) {
> > $cdsobj=$seqobj->trunc($start, $end)->revcom();
> > }
> >
> > $tseq=uc($cdsobj->seq()); # works
> > #$tseq=$cdsobj->seq(); # does not work. given and ambiguous count only
> >
> > $codobj=Bio::Seq->new('-seq' => $tseq,
> > '-alphabet' => 'dna'
> > );
> >
> > print(">$strand $aa_len\n",$cdsobj->seq(),"\n");
> >
> > $ss=Bio::Tools::SeqStats->new(-seq => $codobj);
> >
> > $refcount = $ss->count_codons();
> >
> > foreach $codon (keys(%$refcount)) {
> > print("$codon $refcount->{$codon}\n");
> > }
> > }
> >
> > ### script end ###
> >
> > Cheers
> >
> > Govind
> > John Innes Centre
> > Norwich UK.
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki at_ebi _ac _uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambridge, CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________
More information about the Bioperl-l
mailing list