[Bioperl-l] CVS AND PAML

Luba Pardo lubapardo at gmail.com
Tue Apr 17 14:01:57 UTC 2007


Hi,
Sorry. Bellow is the code. The part of the code that does not work is when
using the codeml module.
Thanks
Luba
# for projecting alignments from protein to R/DNA space
use Bio::Align::Utilities qw(aa_to_dna_aln);
# for input of the sequence data
use Bio::SeqIO;
use Bio::AlignIO;

my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new;
my $seqdata = shift || 'cds.fa';

my $seqio = new Bio::SeqIO(-file   => $seqdata,
                           -format => 'fasta');
my %seqs;
my @prots;
# process each sequence
while ( my $seq = $seqio->next_seq ) {
    $seqs{$seq->display_id} = $seq;
    # translate them into protein
    my $protein = $seq->translate();
    my $pseq = $protein->seq();
    if( $pseq =~ /\*/ &&
        $pseq !~ /\*$/ ) {
          warn("provided a CDS sequence with a stop codon, PAML will
choke!");
          exit(0);
    }
    # Tcoffee can't handle '*' even if it is trailing
    $pseq =~ s/\*//g;
    $protein->seq($pseq);
    push @prots, $protein;
}

if( @prots < 2 ) {
    warn("Need at least 2 CDS sequences to proceed");
    exit(0);
}

open(OUT, ">align_output.txt") ||  die("cannot open output align_output for
writing");
# Align the sequences with clustalw
my $aa_aln = $aln_factory->align(\@prots);
# project the protein alignment back to CDS coordinates
my $dna_aln = aa_to_dna_aln($aa_aln, \%seqs);

my @each = $dna_aln->each_seq();

my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
                   ( -params => { 'runmode' => -2,
                                  'seqtype' => 1,
                                } );

# set the alignment object
$kaks_factory->alignment($dna_aln);

# run the KaKs analysis
my ($rc,$parser) = $kaks_factory->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();

my @otus = $result->get_seqs();
# this gives us a mapping from the PAML order of sequences back to
# the input order (since names get truncated)
my @pos = map {
    my $c= 1;
    foreach my $s ( @each ) {
      last if( $s->display_id eq $_->display_id );
      $c++;
    }
    $c;
   } @otus;

print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
CDNA_PERCENTID)),"\n";
for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
  for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
    my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
    my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
    print OUT join("\t", $otus[$i]->display_id,

$otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
                         $MLmatrix->[$i]->[$j]->{'dS'},
                         $MLmatrix->[$i]->[$j]->{'omega'},
                         sprintf("%.2f",$sub_aa_aln->percentage_identity),
                         sprintf("%.2f",$sub_dna_aln->percentage_identity),
                         ), "\n";
  }
}

On 17/04/07, Albert Vilella <avilella at gmail.com> wrote:
>
> hmmm, there are some perldoc links around your code snippet. can you post
> the code again? what is the input data you are trying this with?
>
> On 4/17/07, Luba Pardo <lubapardo at gmail.com> wrote:
>
> > Dear all,
> > I have two questions.
> > 1.) I am trying to download some modules from Bioperl-run via CVS but I
> > can
> > not login.
> >
> > $ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.
> >
> > The error I get is: time out, failed to connect to the server. I have
> > no trouble to download other files and I installed bioperl modules via
> > CPAN and it works.
> >
> > 2) The second question I have is that I am using the PAML:CODEML
> > module to do phylogenetic analysis.
> >
> > I have used the example provided in the HOWTO:PAML (also given as
> > example: pairwise_ka_ks.PL). The program does not crash but it returns
> > and empty object. I think the problem is in the last part of the
> > script because I manage to get sequences and also the alignment, but I
> > can not get any ka, ks value. I am not sure whether there is a bug in
> > the last part of the script.
> >
> > Does anyone have an idea?
> >
> > Thank you very much
> >
> > Luba Pardo
> >
> > $kaks_factory->alignment($dna_aln);
> >
> > # run the KaKs analysis
> > my ($rc,$parser) = $kaks_factory->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> >
> > my @otus = $result->get_seqs();
> > # this gives us a mapping from the PAML order of sequences back to
> > # the input order (since names get truncated)
> > my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
> >     my $c= 1;
> >     foreach my $s ( @each ) {
> >       last if( $s->display_id eq $_->display_id );
> >       $c++;
> >     }
> >     $c;
> >    } @otus;
> >
> > print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
> > < http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
> > <http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
> > Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
> > for( my $i = 0; $i < (scalar
> > <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
> > $i++) {
> >   for( my $j = $i+1; $j < (scalar
> > <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
> >     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
> >     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
> >     print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
> > join < http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
> > $otus[$i]->display_id,
> >
> > $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
> >                          $MLmatrix->[$i]->[$j]->{'dS'},
> >                          $MLmatrix->[$i]->[$j]->{'omega'},
> >                          sprintf
> > < http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> > >("%.2f",$sub_aa_aln->percentage_identity),
> >                          sprintf
> > < http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> > >("%.2f",$sub_dna_aln->percentage_identity),
> >                          ), "\n";
> >   }
> > }
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>



More information about the Bioperl-l mailing list