[Bioperl-l] Muscle Alignment and Memory Allocation

Jason Stajich jason at bioperl.org
Tue Jul 13 22:44:32 UTC 2010


Veronica -

I think whole genome alignment is better applied with a program other 
than MUSCLE - or other than a typical MSA approach.

See the extensive literature for this type of approach such as LAGAN, 
PECAN, MAVID, MAUVE, and MERCATOR (scaffold then align with MAVID or 
other tools) to name a few.

If you insist on a traditional multiple sequence alignment only approach 
you may want to also try MAFFT but that is more suited for lots of 
sequences rather than long whole genome sequences.

-jason

armendarez77 at hotmail.com wrote, On 7/13/10 11:57 AM:
> That would be nice, but not possible right now :)
>
>
>
>    
>> Date: Tue, 13 Jul 2010 11:43:35 -0700
>> From: randalls at bioinfo.wsu.edu
>> To: armendarez77 at hotmail.com
>> CC: bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Muscle Alignment and Memory Allocation
>>
>> One suggestion is to use a computer with a lot more memory......
>>
>> Randall Svancara
>> Systems Administrator/DBA/Developer
>> Main Bioinformatics Laboratory
>>
>>
>>
>> ----- Original Message -----
>> From: armendarez77 at hotmail.com
>> To: bioperl-l at lists.open-bio.org
>> Sent: Tuesday, July 13, 2010 11:00:27 AM
>> Subject: [Bioperl-l] Muscle Alignment and Memory Allocation
>>
>> Hello,
>>
>> I need to align 20-30 large full genome sequences (150,000+ bp each),
>> but I run out of memory. I've tried using -maxmb at the command line and
>> as an argument for Bio::Tools::Run::Alignment::Muscle, but I'm either
>> using it wrong or it's not working.
>>
>> I've also tried aligning 2 sequences at a time and then aligning those
>> alignments using the -profile command, but it's still too much.
>>
>> Do you have any advice on how to do such alignments? My attempts are
>> below.
>>
>> Thank you,
>>
>> Veronica
>>
>>
>> MUSCLE v3.6 by Robert C. Edgar
>>
>> http://www.drive5.com/muscle This software is donated to the public
>> domain. Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.
>>
>> 07-13-2010_fullGenomes 17 seqs, max length 165101, avg length 152670
>> 00:00:00 26 MB(-9%) Iter 1 100.00% K-mer dist pass 1
>> 00:00:00 26 MB(-9%) Iter 1 100.00% K-mer dist pass 2
>> 00:00:01 105 MB(-37%) Iter 1 6.25% Align node
>> *** OUT OF MEMORY ***
>> Memory allocated so far 3211.48 MB
>>
>> Alignment not completed, cannot save.
>>
>>
>>
>> Using -maxmb at the command line:
>>
>> $ muscle -in 07-13-2010_fullGenomes.fasta -clwout
>> 07-13-2010_fullGenomes.clw -maxiters 1 -diags1 -sv -maxmb 4000
>>
>> MUSCLE v3.6 by Robert C. Edgar
>>
>> http://www.drive5.com/muscle This software is donated to the public
>> domain. Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.
>>
>> 07-13-2010_fullGenomes 17 seqs, max length 165101, avg length 152670
>> 00:00:00 26 MB(-9%) Iter 1 100.00% K-mer dist pass 1
>> 00:00:00 26 MB(-9%) Iter 1 100.00% K-mer dist pass 2
>> 00:00:01 105 MB(-37%) Iter 1 6.25% Align node
>> *** OUT OF MEMORY ***
>> Memory allocated so far 3210.74 MB
>>
>> Alignment not completed, cannot save.
>>
>>
>> Using Bio::Tools::Run::Alignment::Muscle and -maxmb
>>
>> SCRIPT: my $inputFile = $ARGV[0];
>> my $factory = Bio::Tools::Run::Alignment::Muscle->new(-maxmb=>4000);
>>
>> my $alnObj = $factory->align($inputFile);
>> my $output = "output.clw";
>> my $clwOut = Bio::AlignIO->new(-format=>'clustalw',
>> -file=>">$output.clw"); $clwOut->write_aln($alnObj);
>>
>> OUTPUT:
>>
>> MUSCLE v3.6 by Robert C. Edgar
>>
>> http://www.drive5.com/muscle This software is donated to the public
>> domain. Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.
>>
>> 07-13-2010_fullGenomes 17 seqs, max length 165101, avg length 152670
>> 00:00:01 26 MB(-9%) Iter 1 100.00% K-mer dist pass 1
>> 00:00:01 26 MB(-9%) Iter 1 100.00% K-mer dist pass 2
>> 00:00:01 105 MB(-37%) Iter 1 6.25% Align node
>> *** OUT OF MEMORY ***
>> Memory allocated so far 3210.9 MB
>>
>> Alignment not completed, cannot save.
>>
>> --------------------- WARNING ---------------------
>> MSG: Muscle call crashed: 512 [command /usr/bin/muscle -in
>> 07-13-2010_fullGenomes.fasta -out /tmp/ubyNWLmbV8/GggmsmA0vA]
>>
>> ---------------------------------------------------
>>
>>
>>
>>
>>
>>
>> _________________________________________________________________ The
>> New Busy is not the too busy. Combine all your e-mail accounts with
>> Hotmail.
>> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
>> _______________________________________________ Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>      
>   		 	   		
> _________________________________________________________________
> Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>    



More information about the Bioperl-l mailing list