[Bioperl-l] questions on Bio::Tools::Run::Alignment::Clustalw

Thu May 19 11:42:34 UTC 2011

On 5/19/11 11:54 AM, Dave Messina wrote:
> Hi Lorenzo,
>
> Your code and data works for me with both clustalw v1.83 and 2.1. 
> However, I did have to change the name of the clustalw 2.1 executable 
> from clustalw2 to clustalw.
>
> $ perl lorenzo.pl <http://lorenzo.pl> test_vs_test.besth.pep1.fas
>
>  CLUSTAL 2.1 Multiple Sequence Alignments
>
> Sequence format is Pearson
> Sequence 1: gnl|Alyrata|AL6G05070   602 aa
> Sequence 2: gnl|Alyrata|AL3G15690   611 aa
> Start of Pairwise alignments
> Aligning...
>
> Sequences (1:2) Aligned. Score:  33
> Guide tree file created:   [test_vs_test.besth.pep1.dnd]
>
> There are 1 groups
> Start of Multiple Alignment
>
> Aligning...
> Group 1: Sequences:   2      Score:6856
> Alignment Score 1214
>
> GCG-Alignment file created     
>  [/var/folders/Na/NagaNXNhHHm1GDx6seD-ME+++TI/-Tmp-/sniIE2msWJ/fGoixJVoUf]
>
>
> --------------------- WARNING ---------------------
> MSG: Use of method no_residues() is deprecated, use num_residues() instead
> To be removed in 1.0075
> ---------------------------------------------------
> 34.8639455782313 and 625 and 1213
>
>
>
>
>
>      What would be more efficient in term of memory usage:
>     i.-performing the alignment directly over a fasta sequences file or
>     ii.-performing the alignment over a ref to an array of seq objects:
>
>
> Option i. But unless you're doing a ton, you probably won't notice 
> either way, so I would do whichever is more convenient.
>
>
>     Should I move the seqs files to the clustalw dir?
>
>
> No, this isn't the problem. In the error message:
> MSG: ClustalW call ( align 
> -infile="/Users/Lorenzo/Desktop/test_vs_test.besth.pep1.fas"
>   -output=gcg     
> -outfile="/var/folders/rA/rApd7cXoFyWK-Yhn66cxZk+++TI/-Tmp-/O3Was62L0X/exicCvJnrF"
>   2>&1) crashed: 32512
>
> I notice that the path to your input file in that error is different 
> than the path in your code — perhaps this is the issue?
>
>
>     FInally, is there any way of geting the number of aminoacids in
>     the aligned region in eg. the longer or the shorter sequence
>     implemented or should I loop over the sequences in the $aln
>     Bio::SimpleAlign object etc?.
>
>
> I'm not sure I understand your question: do you want something 
> different than $aln->length() ?
>
>
>
> Dave
>
Thanks for the answer (and sorry for the multiple messages). I'll take a 
look again but my script still doesn't run, even after changing the name 
of the executable to clustalw. I ran the program loading files from 
different locations, and I posted a version of the script with attached 
fasta file from different a location.
Anyway, what the error message means?:

    Use of uninitialized value in concatenation (.) or string at /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm line 753.
    Use of uninitialized value in concatenation (.) or string at /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm line 754

I checked lines 753 and 754 of Clustalw.pm and found:

    $self->debug( "Program "._$self_->executable."\n");
    my $commandstring =_$self_->executable." $command"." $instring"." -output=$output". " $param_string";

Similarly, I found

    	 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm:768
	STACK: Bio::Tools::Run::Alignment::Clustalw::align /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm:515

and I found again:

	my $aln =_$self_->_run('align', $infilename, $param_string);
	close($pipe) || (_$self_->throw("ClustalW call ($commandstring) crashed: $?"));

so I guess the problem should refer to $self->executable, which must be 
solved after changing the executable name to clustalw (is it right?). 
However, I don't understand the rest of the error message:

	sh: align: command not found

	.

	.

	.

	STACK: Error::throw

	STACK: Bio::Root::Root::throw /Library/Perl//5.10.0/Bio/Root/Root.pm:368

Regarding my last question, what I want is to align the sequences, using 
clustalw preferably, to get the total number of aligned aas for both the 
longest and the shortest sequence in the alignment. I need these data to 
apply the following formula:

      	  I'=I*Min(n1L1,n2L2)

    	 where I is the percentage of identical aas in the aligned region,
	Li is the length of sequence i and ni is the number of aas in the
	aligned regions in sequence i

Lorenzo

-- 
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
Lorenzo Carretero Paulet
Institute for Plant Molecular and Cell Biology - IBMCP (CSIC-UPV)
Integrative Systems Biology Group
C/ Ingeniero Fausto Elio s/n.
46022 Valencia, Spain

Phone:  +34 963879934
Fax:    +34 963877859
e-mail: locarpau at upvnet.upv.es
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*