[Biopython-dev] Bio.AlignIO, Bio.Nexus, MrBayes, polymorphic sites, maximum line length

Nick Loman n.j.loman at bham.ac.uk
Thu Dec 2 15:25:06 UTC 2010


Peter wrote:
>> Is this the best way of doing it? Would a method call in AlignIO to
>> do the same thing be useful to others?
>>     
> I've got some code somewhere for iterating over the columns of
> the alignment, and think I filed an enhancement bug for this.
> Would that do what you want?
>   
Hi Peter,

Yes, that would make the code more readable, definitely. Not sure 
whether you think a function to return an alignment containing just the 
polymorphic sites would also be helpful to others.

>> 2) When outputting long alignments in Nexus format, MrBayes refuses to read
>> the resulting files saying that the maximum line length is 19900 characters.
>> I'm assuming that is not the maximum input to MrBayes and that it can handle
>> longer alignments if they are split in some way. Would it be possible for
>> Bio.Nexus to split alignments in the appropriate format?
>>     
>
> Are you outputting the large alignment using Bio.AlignIO or using
> Bio.Nexus directly?
>   
In this case I was using Bio.Nexus but it would be the same with 
Bio.AlignIO.
> The file format details are not fresh in my mind, but I think that long
> sequences can be split over multiple lines - so if the problem is
> just with how MrBayes parses the file, that might be fixable. Can
> you give me a test case for this (maybe generate a simple but
> large alignment in code) with the MrBayes call that fails?
>   
Sure thing:

from Bio import AlignIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Align import MultipleSeqAlignment
from Bio.Alphabet import generic_dna
import subprocess

align1 = MultipleSeqAlignment([
     SeqRecord(Seq("A" * 20000, generic_dna), id="Alpha"),
     SeqRecord(Seq("A" * 20000, generic_dna), id="Beta"),
])

AlignIO.write([align1], "out.nex", "nexus")

p = subprocess.Popen(["mb"], stdin=subprocess.PIPE)
p.communicate("execute out.nex")


This gives the error:

MrBayes > execute out.nex

   Executing file "out.nex"
   UNIX line termination
   Longest line length = 20006
   A maximum of 19900 characters is allowed on a single line
   in a file. The longest line of the file out.nex
   contains at least one line with 20056 characters.
   Error in command "Execute"

Cheers

Nick




More information about the Biopython-dev mailing list