[Bioperl-l] Speed issues with making IUPAC consensus from alignment

Wed May 22 23:17:50 UTC 2013

On May 22, 2013, at 3:15 PM, Senanu <senanu.pearson at gmail.com> wrote:

> Hi all,
> 
> I am wondering if the consensus_iupac method of Bio::Align is known to be extremely slow, or if I'm doing something wrong. 

Probably the former, but...

> I have bacterial whole-genome alignments (~7 Mbases) that I made in progressiveMauve and wish to get an IUPAC consensus. (I know that progressiveMauve uses a non-standard XMFA format, but Bio::AlignIO seems to read them just fine.) The code below takes more than all night to make a consensus. It works fine on tiny test alignments. 

It shouldn't take that long, 7 Mb isn't that large.  Or is that 7 Mb for one genome?

> Is this a known problem? Is there another way to generate such a consensus?

The code isn't really optimized for this, but again this isn't terribly large.  Is the bottleneck reading the alignment in, or is it the consensus_iupac() step?  Hard to say w/o seeing the alignment data itself.

> my $in = Bio::AlignIO->new(-file => $files[0],
>                           -format => 'XMFA');
> while  (my $aln = $in->next_aln()) {
>    foreach  my $seq ($aln->each_seq) {
>        $seq->alphabet('dna');
>    }
>    my $con = $aln->consensus_iupac();
> }
> 
> 
> Thanks in advance.
> Ngwenyama
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

chris