[Bioperl-l] questions on Bio::Tools::Run::Alignment::Clustalw
Lorenzo Carretero
locarpau at upvnet.upv.es
Thu May 19 11:42:34 UTC 2011
On 5/19/11 11:54 AM, Dave Messina wrote:
> Hi Lorenzo,
>
> Your code and data works for me with both clustalw v1.83 and 2.1.
> However, I did have to change the name of the clustalw 2.1 executable
> from clustalw2 to clustalw.
>
> $ perl lorenzo.pl <http://lorenzo.pl> test_vs_test.besth.pep1.fas
>
> CLUSTAL 2.1 Multiple Sequence Alignments
>
> Sequence format is Pearson
> Sequence 1: gnl|Alyrata|AL6G05070 602 aa
> Sequence 2: gnl|Alyrata|AL3G15690 611 aa
> Start of Pairwise alignments
> Aligning...
>
> Sequences (1:2) Aligned. Score: 33
> Guide tree file created: [test_vs_test.besth.pep1.dnd]
>
> There are 1 groups
> Start of Multiple Alignment
>
> Aligning...
> Group 1: Sequences: 2 Score:6856
> Alignment Score 1214
>
> GCG-Alignment file created
> [/var/folders/Na/NagaNXNhHHm1GDx6seD-ME+++TI/-Tmp-/sniIE2msWJ/fGoixJVoUf]
>
>
> --------------------- WARNING ---------------------
> MSG: Use of method no_residues() is deprecated, use num_residues() instead
> To be removed in 1.0075
> ---------------------------------------------------
> 34.8639455782313 and 625 and 1213
>
>
>
>
>
> What would be more efficient in term of memory usage:
> i.-performing the alignment directly over a fasta sequences file or
> ii.-performing the alignment over a ref to an array of seq objects:
>
>
> Option i. But unless you're doing a ton, you probably won't notice
> either way, so I would do whichever is more convenient.
>
>
> Should I move the seqs files to the clustalw dir?
>
>
> No, this isn't the problem. In the error message:
> MSG: ClustalW call ( align
> -infile="/Users/Lorenzo/Desktop/test_vs_test.besth.pep1.fas"
> -output=gcg
> -outfile="/var/folders/rA/rApd7cXoFyWK-Yhn66cxZk+++TI/-Tmp-/O3Was62L0X/exicCvJnrF"
> 2>&1) crashed: 32512
>
> I notice that the path to your input file in that error is different
> than the path in your code — perhaps this is the issue?
>
>
> FInally, is there any way of geting the number of aminoacids in
> the aligned region in eg. the longer or the shorter sequence
> implemented or should I loop over the sequences in the $aln
> Bio::SimpleAlign object etc?.
>
>
> I'm not sure I understand your question: do you want something
> different than $aln->length() ?
>
>
>
> Dave
>
Thanks for the answer (and sorry for the multiple messages). I'll take a
look again but my script still doesn't run, even after changing the name
of the executable to clustalw. I ran the program loading files from
different locations, and I posted a version of the script with attached
fasta file from different a location.
Anyway, what the error message means?:
Use of uninitialized value in concatenation (.) or string at /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm line 753.
Use of uninitialized value in concatenation (.) or string at /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm line 754
I checked lines 753 and 754 of Clustalw.pm and found:
$self->debug( "Program "._$self_->executable."\n");
my $commandstring =_$self_->executable." $command"." $instring"." -output=$output". " $param_string";
Similarly, I found
STACK: Bio::Tools::Run::Alignment::Clustalw::_run /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm:768
STACK: Bio::Tools::Run::Alignment::Clustalw::align /Library/Perl//5.10.0/Bio/Tools/Run/Alignment/Clustalw.pm:515
and I found again:
my $aln =_$self_->_run('align', $infilename, $param_string);
close($pipe) || (_$self_->throw("ClustalW call ($commandstring) crashed: $?"));
so I guess the problem should refer to $self->executable, which must be
solved after changing the executable name to clustalw (is it right?).
However, I don't understand the rest of the error message:
sh: align: command not found
.
.
.
STACK: Error::throw
STACK: Bio::Root::Root::throw /Library/Perl//5.10.0/Bio/Root/Root.pm:368
Regarding my last question, what I want is to align the sequences, using
clustalw preferably, to get the total number of aligned aas for both the
longest and the shortest sequence in the alignment. I need these data to
apply the following formula:
I'=I*Min(n1L1,n2L2)
where I is the percentage of identical aas in the aligned region,
Li is the length of sequence i and ni is the number of aas in the
aligned regions in sequence i
Lorenzo
--
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
Lorenzo Carretero Paulet
Institute for Plant Molecular and Cell Biology - IBMCP (CSIC-UPV)
Integrative Systems Biology Group
C/ Ingeniero Fausto Elio s/n.
46022 Valencia, Spain
Phone: +34 963879934
Fax: +34 963877859
e-mail: locarpau at upvnet.upv.es
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
More information about the Bioperl-l
mailing list