[Bioperl-l] Speed issues with making IUPAC consensus from alignment

Thu May 23 13:56:32 UTC 2013

(keep the list cc'd)

On May 22, 2013, at 6:31 PM, Senanu <senanu.junk at gmail.com> wrote:

> On May 22, 2013, at 4:17 PM, Fields, Christopher J wrote:
> 
>> Hi all,
>>> 
>>> I am wondering if the consensus_iupac method of Bio::Align is known to be extremely slow, or if I'm doing something wrong. 
>> 
>> Probably the former, but...
>> 
>>> I have bacterial whole-genome alignments (~7 Mbases) that I made in progressiveMauve and wish to get an IUPAC consensus. (I know that progressiveMauve uses a non-standard XMFA format, but Bio::AlignIO seems to read them just fine.) The code below takes more than all night to make a consensus. It works fine on tiny test alignments. 
>> 
>> It shouldn't take that long, 7 Mb isn't that large.  Or is that 7 Mb for one genome?
> 
> It is 7Mb per genome, but there are only 2 genomes in the alignment, and the sequences are very similar to one another. 
> 
>> 
>>> Is this a known problem? Is there another way to generate such a consensus?
>> 
>> The code isn't really optimized for this, but again this isn't terribly large.  Is the bottleneck reading the alignment in, or is it the consensus_iupac() step?  Hard to say w/o seeing the alignment data itself.
> 
> The bottleneck is definitely with the consensus_iupac step. Reading the alignment in takes a few seconds. 

That's interesting, but again not surprising.  One would have to look at the code, but I wouldn't be surprised if the method is terribly inefficient.

chris