[Bioperl-l] Help w/ AlignIO and consensus_iupac

Jason Stajich jason at cgt.duhs.duke.edu
Tue Aug 12 23:47:56 EDT 2003


>From the iupac_consensus docs:
 Note that if your alignment sequences contain a lot of
 IUPAC ambiquity codes you often have to manually set
 alphabet.  Bio::PrimarySeq::_guess_type thinks they
 indicate a protein sequence.

Do this in your code once you have an Alignment object called $aln.
for my $seq ( $aln->each_seq ) {
  $seq->alphabet('dna'); # or rna if that is what you have
}
Then try calling the consensus method again.

-jason
On Tue, 12 Aug 2003 gadbermd at earthlink.net wrote:

> Hi everyone,
>
> I am a BioPerl newbie and I was wondering if someone could help me figure out how to generate a consensus from a Clustalw .aln file? I tried to write this sample code:
>
> #!/usr/bin/perl
> use warnings;
> use Bio::AlignIO;
>
> my $usage = "Usage: test.pl <in_file> \n";
> my $in_file = shift or die $usage;
>
> $alignio = new Bio::AlignIO(-format => 'clustalw', -file => "$in_file");
> $aln = $alignio->next_aln();
> $str = $aln->consensus_iupac();
>
> print $str;
>
>
> But it always generates an error message saying the sequence is a protein:
>
>
> % ./test.pl new.aln
> ------------- EXCEPTION  -------------
> MSG: Seq [gi|18397816|ref|NM_102852.1|/1-648] is a protein
> STACK Bio::SimpleAlign::consensus_iupac /usr/lib/perl5/site_perl/5.8.0/Bio/SimpleAlign.pm:1325
> STACK toplevel ./test.pl:9
>
> --------------------------------------
>
>
> This occurs in spite of the fact that the Clustal output is all nucleotides, not proteins...
>
>
>
> % cat new.aln
> CLUSTAL W (1.82) multiple sequence alignment
>
>
> gi|18397816|ref|NM_102852.1|      --------------------------------------------------
> gi|33242920|gb|AY332478.1|        GGTTAATTTTGGTTGGAGGTAGAGAGAGAGAGAGAGGGAGGGAGGGAGGA
> gi|18424168|ref|NM_125279.1|      --------------------------------------------------
>
>
>
> gi|18397816|ref|NM_102852.1|      -----------------ATGAGG--AAAGGTAAGAGAGTGATA-------
> gi|33242920|gb|AY332478.1|        GGAGGAGGAGGAGGAGGAGGAGG--AAGAACAGGAGGAAGATGGGGCGGG
> gi|18424168|ref|NM_125279.1|      -----------------ATGGTTCCGAAAGTGGTCGACCTACA-------
>                                                    * *      *        *    *
>
> etc. etc. etc.
>
>
> I have tried it with a number of different Clustal output files but it always complains about them containing proteins.  I figure I have to be doing something wrong here.
>
> Thanks so much,
> Mike G.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list