[Bioperl-l] questions on CLustalW.pm

Wed May 18 22:32:31 UTC 2011

Hi all,
I have a few question regarding the package
Bio::Tools::Run::Alignment::Clustalw.  The following script:

#!/usr/local/bin/perl -w
> use 5.010;
> use strict;
>
> use lib "/Library/Perl/";
> use Bio::Perl;
> use Bio::Seq;
> use Bio::SeqIO;
> # definition of the environmental variable CLUSTALDIR
> BEGIN {$ENV{CLUSTALDIR} =
> '/Applications/Bioinformatics/clustalw-2.0.10-macosx/ '}
> use Bio::Tools::Run::Alignment::Clustalw;
>
> my $sequencesfilename =
> "/Users/Lorenzo/Documents/SequencesDatabase/plaza_public_02_Apr27/plaza_public_02/BLAST_Parsed_results/PerSpecies/test_vs_test.besth.pep1.fas
> ";
> my $format = 'fasta';
> #my $inseq = Bio::SeqIO->new(-file => "<$sequencesfilename",
> #                            -format => $format );
>
> my $factory = Bio::Tools::Run::Alignment::Clustalw->new (); #use default
> parameters
> #my @seq_object_array = read_all_sequences(    -file =>
> "<$sequencesfilename",
> #                                            -format => $format );
> #my $seq_array_ref = \@seq_object_array;
> #my $aln = $factory->align($seq_array_ref);
> my $aln = $factory->align($sequencesfilename);
> my $avgpercentid = $aln->percentage_identity;
> my $alnlength = $aln->length();
> my $numberalnresidues = $aln->no_residues;
> print "$avgpercentid and $alnlength and $numberalnresidues\n";
>

is returning the following error message:

Use of uninitialized value in concatenation (.) or string at
> /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm line 753.
> Use of uninitialized value in concatenation (.) or string at
> /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm line 754.
> sh: align: command not found
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: ClustalW call ( align
> -infile="/Users/Lorenzo/Desktop/test_vs_test.besth.pep1.fas" -output=gcg
> -outfile="/var/folders/rA/rApd7cXoFyWK-Yhn66cxZk+++TI/-Tmp-/O3Was62L0X/exicCvJnrF"
> 2>&1) crashed: 32512
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /Library/Perl//5.10.0/Bio/Root/Root.pm:368
> STACK: Bio::Tools::Run::Alignment::Clustalw::_run
> /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm:768
> STACK: Bio::Tools::Run::Alignment::Clustalw::align
> /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm:515
> STACK: /Users/Lorenzo/Documents/workspace/PlantEvolGen/test.pl:22
> -----------------------------------------------------------
>
 What would be more efficient in term of memory usage:
i.-performing the alignment directly over a fasta sequences file or
ii.-performing the alignment over a ref to an array of seq objects:

my @seq_object_array = read_all_sequences(    -file =>
> "<$sequencesfilename",
>                                             -format => $format );
> my $seq_array_ref = \@seq_object_array;
> my $aln = $factory->align($seq_array_ref);
>

Unfortunately my script is not running neither in this form. I checked and
custalw is properly installed in the given dir It appears as the script is
not reading properly my file (see attached document). Should I move the seqs
files to the clustalw dir?

FInally, is there any way of geting the number of aminoacids in the aligned
region in eg. the longer or the shorter sequence implemented or should I
loop over the sequences in the $aln Bio::SimpleAlign object etc?.

Greetings from Spain,
Lorenzo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_vs_test.besth.pep1.fas
Type: application/octet-stream
Size: 1323 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20110519/4536c603/attachment-0004.obj>