[Bioperl-l] results of clustalw->seqboot->protdist->neighbor->consense

june tantoolvesm june at ics.es.osaka-u.ac.jp
Sun Jun 13 03:58:07 EDT 2004


Hi,

Without bootstrapping, everything works great--thank you.

But I'm not sure if anyone has ever had this experience. When I run
clustalw->seqboot->protdist->neighbor->consense manually, and when I
automate the process using bioperl, the resulting final trees are quite
different, to the point where it makes a difference. Is there a way to
find out where this discrepancy came from, or is this normal. 

One difference between the two methods is that manually, I can get all
the distance matrices in one go and then input all the matrices into
neighbor; using bioperl however it is done one matrix, one tree at a
time. Should this have an effect? It seems it shouldn't.

Or does this have anything to do with the program SeqBoot itself, with
the chosen random seed? But then again, when I tried it manually using
different starting seeds, I still end up with the same tree.

I am not sure if anyone has ever come across this?
Here is my code, it runs fine, though.

use Bio::Tools::Run::Alignment::Clustalw;
use Bio::Tools::Run::Phylo::Phylip::ProtDist;
use Bio::Tools::Run::Phylo::Phylip::Neighbor;
use Bio::Tools::Run::Phylo::Phylip::SeqBoot;
use Bio::Tools::Run::Phylo::Phylip::Consense;

use Bio::TreeIO;
use Bio::AlignIO;
use Bio::SimpleAlign;
use strict;

$ENV{PHYLIPDIR} = '/home/pippin/june/phylip';
$ENV{CLUSTALDIR} = '/home/pippin/june/clustalx';

for (my $i=1; $i<2; $i++) {

   my $inputfilename = "fasta/homologene$i.fa";

   if (-e($inputfilename)) { # file exists

      # create a SimpleAlignobject
      my $clustalw_factory =
Bio::Tools::Run::Alignment::Clustalw->new();
      my $aln = $clustalw_factory->align($inputfilename); # $aln is a
SimpleAlign object

      # use seqboot to generate bootstrap alignments
      my @params_seqboot = ('datatype'=>'SEQUENCE', 'replicates'=>100);
      my $seqboot_factory =
Bio::Tools::Run::Phylo::Phylip::SeqBoot->new(@params_seqboot)
;
      my $aln_ref = $seqboot_factory->run($aln);

      # create distance matrices and construct trees using neighbor
      my $protdist_factory =
Bio::Tools::Run::Phylo::Phylip::ProtDist->new();
      my $neighbor_factory =
Bio::Tools::Run::Phylo::Phylip::Neighbor->new();
      my @tree;
      foreach my $a (@{$aln_ref}) {
         my $matrix = $protdist_factory->run($a);
         push @tree, $neighbor_factory->run($matrix);
      }

      # use consense to get a final tree
      my $consense_factory =
Bio::Tools::Run::Phylo::Phylip::Consense->new();
      my $tree = $consense_factory->run(\@tree);

      # output treefile
      my $outfilename = "outtrees_ur/homologene$i.nh"; # unrooted tree
      my $outtree = new Bio::TreeIO('-format' => 'newick',
                                    '-file'   => ">$outfilename");
      $outtree->write_tree($tree);
   }
} 


Thank you for any help. No help is fine too, I think I've been helped
enough.

June



More information about the Bioperl-l mailing list